Google admits 'garbage in, garbage out' translation problem

This discussion belongs to Translation news » "Google admits 'garbage in, garbage out' translation problem".
You can see the translation news page and participate in this discussion from there.

Jeff Whittaker
Jeff Whittaker  Identity Verified
США
Local time: 17:58
испанский => английский
+ ...
Wow! I predicted this on ProZ.com three years ago Feb 8, 2014

I even used the same phrase: Garbage In, Garbage Out.
See this post:
http://www.proz.com/forum/machine_translation_mt/186784-the_future_of_google_translate.html

[Edited at 2014-02-08 00:30 GMT]


 
Post removed: This post was hidden by a moderator or staff member for the following reason: Empty, duplicate post
Orrin Cummins
Orrin Cummins  Identity Verified
Япония
Local time: 07:58
японский => английский
+ ...
As the old saying goes Feb 8, 2014

You get what you pay for, I guess.

 
Claudia Cherici
Claudia Cherici  Identity Verified
Италия
Local time: 23:58
Член ProZ.com c 2010
английский => итальянский
+ ...
well spotted Feb 8, 2014

well done Jeff, you spotted the exact problem with the Google trans system and using even the exact wording is rather impressive, I must say

 
Samuel Murray
Samuel Murray  Identity Verified
Нидерланды
Local time: 23:58
Член ProZ.com c 2006
английский => африкаанс
+ ...
The comment about watermarking is more interesting than the so-called admission Feb 8, 2014

The original video

The exact words that were spoken, and the question that prompted it, can be heard here:
http://www.livestream.com/niac2014/video?clipId=pla_d5da38fb-0dfb-4dab-8f03-57e6de1ef672
(at minute 51 to minute 53)

There was no admission, however. The
... See more
The original video

The exact words that were spoken, and the question that prompted it, can be heard here:
http://www.livestream.com/niac2014/video?clipId=pla_d5da38fb-0dfb-4dab-8f03-57e6de1ef672
(at minute 51 to minute 53)

There was no admission, however. The man from Google simply "said it" -- he did not "admit to it". I can understand that a news editor might use "admit" in a heading because it is shorter than "acknowledge", but if the news writer persists in referring to the statement as an "admission" throughout the news report is bad journalism, in my opinion.

The question was not about garbage in general but about a specific type of garbage, namely content that was translated by Google itself and left unedited. The question was about the danger of Google using content that it itself had translated, to improve its machine translation system. The Google man's answer is that they are aware of that danger but don't think that it is a threat at this time. In other words, while we can speculate about a worst case scenario, the engineers at Google Translate are not blind to this issue and do actually keep an eye on it. This does not make me trust Google Translate any less.

On watermarking

The Google man told about one experimental method that they used to be able to recognise translations that were translated by Google. They don't use that method any more, but may use it again later. It involves classifying each word in a language as "even" or "odd", and when a translation is about to be generated, and multiple valid word sequences are available for that text, Google would favour a sequence that produces "all even words" or "all odd words" in a phrase. The human reader won't notice the difference, but Google will be able to spot large chunks of all even-classified words or all odd-classified words in web sites that they scrape, and know that the translation is therefore more likely a machine translation. Very clever, IMO.

The fact that Google Translate includes non-printing control characters into its translations may also be a form of watermarking. If you do a translation in Google and copy/paste it into MS Word and enable display of non-printing characters, you will sometimes see those characters show up as grey blocks. They are not printed or visible under normal circumstances (e.g. on web sites or PDFs or other files translated with Google Translate) but they are there and can be detected. In fact, you can search for them in MS Word... their code is ChrW(8203).

With regard to what the Google man said about evaluating the quality of the content, I did notice that about a year or two ago Google Translate changed its output so that it is deliberately poor, from a typesetting point of view. Many translated phrases now start with a lowercase letter even if the source text started with an uppercase letter, or vice versa, and the translated text contains spacing errors next to certain types of punctuation that "good quality" authors would never permit or commit.


[Edited at 2014-02-08 10:30 GMT]
Collapse


 
LilianNekipelov
LilianNekipelov  Identity Verified
США
Local time: 17:58
русский => английский
+ ...
All their translations are odd, anyhow, Feb 8, 2014

so why do they even bother. The spacing problem--yes, no surprise. The spacing problem becomes more and more annoying even when you, personally--not a machine, are typing. Also, some letters are often skipped or reversed. It is a real pain when you try to type directly on the internet these days.




[Edited at 2014-02-08 11:58 GMT]


 
DLyons
DLyons  Identity Verified
Ирландия
Local time: 22:58
испанский => английский
+ ...
Some sites need to be filtered. Feb 8, 2014

Of course sites such as Alibaba should be ignored (or better filtered out from Google hits) by translators. But that's a different problem from Google self-training - watermarking may help Google to recognized and eliminate its own translations from its training material.

 
Maxime Bujakov
Maxime Bujakov  Identity Verified
Франция
Local time: 23:58
французский => английский
+ ...
Machine translation Feb 10, 2014

Funny! I love that say too - junk in - junk out. (Even shorter and more rhythm)

Now, seriously, I recently researched Machine translation by participating in several projects:
1 - I post-edited machine translation to train the machine,
2 - got a machine translation, post-edited it while clocking my time, then another editor (very picky) was given my text without any notion of MT involved, and approved my outpu
... See more
Funny! I love that say too - junk in - junk out. (Even shorter and more rhythm)

Now, seriously, I recently researched Machine translation by participating in several projects:
1 - I post-edited machine translation to train the machine,
2 - got a machine translation, post-edited it while clocking my time, then another editor (very picky) was given my text without any notion of MT involved, and approved my output in 99% of the word count.

The verdict: 20% increase in my productivity due to the MT, no loss of quality (1% of corrections would still appear if I worked from scratch).
Collapse


 
Jeff Whittaker
Jeff Whittaker  Identity Verified
США
Local time: 17:58
испанский => английский
+ ...
Yes, but... Feb 13, 2014

did the editor actually read each line of the source text and compare it to the translation. Or did the "editor" just read the translation until they found something that didn't look right.

I find that in most cases, the MT editor does not really read the source text at all, but just accepts everything until something looks strange. While this may work with human translation, assuming that the human translator is a good one and has necessarily read the source text, you cannot make
... See more
did the editor actually read each line of the source text and compare it to the translation. Or did the "editor" just read the translation until they found something that didn't look right.

I find that in most cases, the MT editor does not really read the source text at all, but just accepts everything until something looks strange. While this may work with human translation, assuming that the human translator is a good one and has necessarily read the source text, you cannot make this assumption with MT, and therefore, each and every line of the source text must be read and compared with the translation. In other words, even if it could be shown that it takes less time to "post-edit" MT translation, the editing process should take twice as long as a standard editing job, thus nullifying most of the time savings.

What you end up getting is a text where no one has read the majority of the original source document and no one can be sure that the "translation" that now sounds all good and grammatical, is in fact a translation at all.


Maxime Bujakov wrote:

Funny! I love that say too - junk in - junk out. (Even shorter and more rhythm)

Now, seriously, I recently researched Machine translation by participating in several projects:
1 - I post-edited machine translation to train the machine,
2 - got a machine translation, post-edited it while clocking my time, then another editor (very picky) was given my text without any notion of MT involved, and approved my output in 99% of the word count.

The verdict: 20% increase in my productivity due to the MT, no loss of quality (1% of corrections would still appear if I worked from scratch).
Collapse


 
Maxime Bujakov
Maxime Bujakov  Identity Verified
Франция
Local time: 23:58
французский => английский
+ ...
20% increase in my productivity due to the M Feb 18, 2014

Jeff Whittaker wrote:

did the editor actually read each line of the source text and compare it to the translation. Or did the "editor" just read the translation until they found something that didn't look right.

I find that in most cases, the MT editor does not really read the source text at all, but just accepts everything until something looks strange. While this may work with human translation, assuming that the human translator is a good one and has necessarily read the source text, you cannot make this assumption with MT, and therefore, each and every line of the source text must be read and compared with the translation. In other words, even if it could be shown that it takes less time to "post-edit" MT translation, the editing process should take twice as long as a standard editing job, thus nullifying most of the time savings.

What you end up getting is a text where no one has read the majority of the original source document and no one can be sure that the "translation" that now sounds all good and grammatical, is in fact a translation at all.

The verdict: 20% increase in my productivity due to the MT, no loss of quality (1% of corrections would still appear if I worked from scratch).
[/quote]

Jeff, of course as a translator in charge I read it all, that's why I still spent 80% of my regular typing time.

The editor must have tracked the source as well plus was a good reference to know that my writing style did not deteriorate.

MT is also surprisingly good at suggesting very appropriate words in some of the most difficult cases - like when you sit and think for minutes over one single word.

Finally, when it comes to someone's personal business operations in an unknown language environment MT revolutionized the life. I can practically read and write in Lithuanian, the oldest European language, having just a basic idea of the language structure.


 


To report site rules violations or get help, contact a site moderator:

Модератор(ы) этого форума
Jared Tabor[Call to this topic]

You can also contact site staff by submitting a support request »

Google admits 'garbage in, garbage out' translation problem







TM-Town
Manage your TMs and Terms ... and boost your translation business

Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work.

More info »
Trados Studio 2022 Freelance
The leading translation software used by over 270,000 translators.

Designed with your feedback in mind, Trados Studio 2022 delivers an unrivalled, powerful desktop and cloud solution, empowering you to work in the most efficient and cost-effective way.

More info »