Google admits 'garbage in, garbage out' translation problem
| | | Post removed: This post was hidden by a moderator or staff member for the following reason: Empty, duplicate post | Orrin Cummins Япония Local time: 07:58 японский => английский + ... As the old saying goes | Feb 8, 2014 |
You get what you pay for, I guess. | | | Claudia Cherici Италия Local time: 23:58 Член ProZ.com c 2010 английский => итальянский + ...
well done Jeff, you spotted the exact problem with the Google trans system and using even the exact wording is rather impressive, I must say | |
|
|
Samuel Murray Нидерланды Local time: 23:58 Член ProZ.com c 2006 английский => африкаанс + ... The comment about watermarking is more interesting than the so-called admission | Feb 8, 2014 |
The original video
The exact words that were spoken, and the question that prompted it, can be heard here: http://www.livestream.com/niac2014/video?clipId=pla_d5da38fb-0dfb-4dab-8f03-57e6de1ef672 (at minute 51 to minute 53)
There was no admission, however. The... See more The original video
The exact words that were spoken, and the question that prompted it, can be heard here: http://www.livestream.com/niac2014/video?clipId=pla_d5da38fb-0dfb-4dab-8f03-57e6de1ef672 (at minute 51 to minute 53)
There was no admission, however. The man from Google simply "said it" -- he did not "admit to it". I can understand that a news editor might use "admit" in a heading because it is shorter than "acknowledge", but if the news writer persists in referring to the statement as an "admission" throughout the news report is bad journalism, in my opinion.
The question was not about garbage in general but about a specific type of garbage, namely content that was translated by Google itself and left unedited. The question was about the danger of Google using content that it itself had translated, to improve its machine translation system. The Google man's answer is that they are aware of that danger but don't think that it is a threat at this time. In other words, while we can speculate about a worst case scenario, the engineers at Google Translate are not blind to this issue and do actually keep an eye on it. This does not make me trust Google Translate any less.
On watermarking
The Google man told about one experimental method that they used to be able to recognise translations that were translated by Google. They don't use that method any more, but may use it again later. It involves classifying each word in a language as "even" or "odd", and when a translation is about to be generated, and multiple valid word sequences are available for that text, Google would favour a sequence that produces "all even words" or "all odd words" in a phrase. The human reader won't notice the difference, but Google will be able to spot large chunks of all even-classified words or all odd-classified words in web sites that they scrape, and know that the translation is therefore more likely a machine translation. Very clever, IMO.
The fact that Google Translate includes non-printing control characters into its translations may also be a form of watermarking. If you do a translation in Google and copy/paste it into MS Word and enable display of non-printing characters, you will sometimes see those characters show up as grey blocks. They are not printed or visible under normal circumstances (e.g. on web sites or PDFs or other files translated with Google Translate) but they are there and can be detected. In fact, you can search for them in MS Word... their code is ChrW(8203).
With regard to what the Google man said about evaluating the quality of the content, I did notice that about a year or two ago Google Translate changed its output so that it is deliberately poor, from a typesetting point of view. Many translated phrases now start with a lowercase letter even if the source text started with an uppercase letter, or vice versa, and the translated text contains spacing errors next to certain types of punctuation that "good quality" authors would never permit or commit.
[Edited at 2014-02-08 10:30 GMT] ▲ Collapse | | | All their translations are odd, anyhow, | Feb 8, 2014 |
so why do they even bother. The spacing problem--yes, no surprise. The spacing problem becomes more and more annoying even when you, personally--not a machine, are typing. Also, some letters are often skipped or reversed. It is a real pain when you try to type directly on the internet these days.
[Edited at 2014-02-08 11:58 GMT] | | | DLyons Ирландия Local time: 22:58 испанский => английский + ... Some sites need to be filtered. | Feb 8, 2014 |
Of course sites such as Alibaba should be ignored (or better filtered out from Google hits) by translators. But that's a different problem from Google self-training - watermarking may help Google to recognized and eliminate its own translations from its training material. | | | Maxime Bujakov Франция Local time: 23:58 французский => английский + ... Machine translation | Feb 10, 2014 |
Funny! I love that say too - junk in - junk out. (Even shorter and more rhythm)
Now, seriously, I recently researched Machine translation by participating in several projects: 1 - I post-edited machine translation to train the machine, 2 - got a machine translation, post-edited it while clocking my time, then another editor (very picky) was given my text without any notion of MT involved, and approved my outpu... See more Funny! I love that say too - junk in - junk out. (Even shorter and more rhythm)
Now, seriously, I recently researched Machine translation by participating in several projects: 1 - I post-edited machine translation to train the machine, 2 - got a machine translation, post-edited it while clocking my time, then another editor (very picky) was given my text without any notion of MT involved, and approved my output in 99% of the word count.
The verdict: 20% increase in my productivity due to the MT, no loss of quality (1% of corrections would still appear if I worked from scratch). ▲ Collapse | |
|
|
did the editor actually read each line of the source text and compare it to the translation. Or did the "editor" just read the translation until they found something that didn't look right.
I find that in most cases, the MT editor does not really read the source text at all, but just accepts everything until something looks strange. While this may work with human translation, assuming that the human translator is a good one and has necessarily read the source text, you cannot make ... See more did the editor actually read each line of the source text and compare it to the translation. Or did the "editor" just read the translation until they found something that didn't look right.
I find that in most cases, the MT editor does not really read the source text at all, but just accepts everything until something looks strange. While this may work with human translation, assuming that the human translator is a good one and has necessarily read the source text, you cannot make this assumption with MT, and therefore, each and every line of the source text must be read and compared with the translation. In other words, even if it could be shown that it takes less time to "post-edit" MT translation, the editing process should take twice as long as a standard editing job, thus nullifying most of the time savings.
What you end up getting is a text where no one has read the majority of the original source document and no one can be sure that the "translation" that now sounds all good and grammatical, is in fact a translation at all.
Maxime Bujakov wrote:
Funny! I love that say too - junk in - junk out. (Even shorter and more rhythm )
Now, seriously, I recently researched Machine translation by participating in several projects:
1 - I post-edited machine translation to train the machine,
2 - got a machine translation, post-edited it while clocking my time, then another editor (very picky) was given my text without any notion of MT involved, and approved my output in 99% of the word count.
The verdict: 20% increase in my productivity due to the MT, no loss of quality (1% of corrections would still appear if I worked from scratch). ▲ Collapse | | | Maxime Bujakov Франция Local time: 23:58 французский => английский + ... 20% increase in my productivity due to the M | Feb 18, 2014 |
Jeff Whittaker wrote:
did the editor actually read each line of the source text and compare it to the translation. Or did the "editor" just read the translation until they found something that didn't look right.
I find that in most cases, the MT editor does not really read the source text at all, but just accepts everything until something looks strange. While this may work with human translation, assuming that the human translator is a good one and has necessarily read the source text, you cannot make this assumption with MT, and therefore, each and every line of the source text must be read and compared with the translation. In other words, even if it could be shown that it takes less time to "post-edit" MT translation, the editing process should take twice as long as a standard editing job, thus nullifying most of the time savings.
What you end up getting is a text where no one has read the majority of the original source document and no one can be sure that the "translation" that now sounds all good and grammatical, is in fact a translation at all.
The verdict: 20% increase in my productivity due to the MT, no loss of quality (1% of corrections would still appear if I worked from scratch). [/quote]
Jeff, of course as a translator in charge I read it all, that's why I still spent 80% of my regular typing time.
The editor must have tracked the source as well plus was a good reference to know that my writing style did not deteriorate.
MT is also surprisingly good at suggesting very appropriate words in some of the most difficult cases - like when you sit and think for minutes over one single word.
Finally, when it comes to someone's personal business operations in an unknown language environment MT revolutionized the life. I can practically read and write in Lithuanian, the oldest European language, having just a basic idea of the language structure. | | | To report site rules violations or get help, contact a site moderator: You can also contact site staff by submitting a support request » Google admits 'garbage in, garbage out' translation problem TM-Town | Manage your TMs and Terms ... and boost your translation business
Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work.
More info » |
| Trados Studio 2022 Freelance | The leading translation software used by over 270,000 translators.
Designed with your feedback in mind, Trados Studio 2022 delivers an unrivalled, powerful desktop
and cloud solution, empowering you to work in the most efficient and cost-effective way.
More info » |
|
| | | | X Sign in to your ProZ.com account... | | | | | |