Systran software has ruled computer translation for years. It has been the technology behind both AltaVista’s Babelfish (now owned by Yahoo), and Google’s translation service, called Google Translate. But now Google has replaced Systran technology with its own translation software.
Google says their approach was to “feed the computer billions of words of text, both monolingual text in the target language, and aligned text consisting of examples of human translations between the languages. We then apply statistical learning techniques to build a translation model. We’ve achieved very good results in research evaluations.”
This approach sounds a bit naive on the face of it. Could it work? Let’s try a sample translation on both Babelfish and Google Translate. To keep things fair, I consulted my Yi jing page, which randomly produced hexagram 39, “Stumbling” (hmmm). The lines go like this (those after the asterisk are the commentary portion of the text):
Stumbling forth and strutting back
Porters stumbling under loads
Stumbling and turning about
Turning back to join with friends
Friends appear for welcoming
Stumbling forth and riding back
Water over mountain. Hard to get a foothold.
Choose the easier path.
Okay. We’ll translate into French and then back into English and then into German and then back into English. We’re using two languages that contributed heavily to the development of English rather than languages that are unrelated to it, so this should be a piece of cake, right?
Babelfish results first:
Stolpern in front and pavanement the back luggage cart-loads, those under the loads stolpern and turns around revolution again with stolpern, connects to the friends to assemble those the friends for the Stolpern of the admission in front and after looks * finished Montagne of the water. A balance strongly reach. Select the simpler way.
Gibberish, although I do like the way a Chinese flavor is creatively introduced by rendering “hard to get a foothold” as “a balance strongly reach.” Now let’s try Google Translate:
Stumbling block strutting back and forth
Owners stumbling block under strain
Stumbling block and
To return to connect with friends
Friends at the reception
Stumbling block fourth and riding back
The water on the mountain. Hard to get a foot.
Select the way.
Somewhat better — at least all of the words are English — although most of the sense is still wrong (how in the world did “forth” become “fourth”?). Still, while I’m not eager to add to the Google world information monopoly, it looks to me like the Google engineers have indeed surpassed Systran. The Google translation is not only a bit more intelligible and closer to the original but it also retains the format of the original. And the web interface was cleaner and easier besides. It’s not the result I was expecting, but I have to say, comparatively good job, Google.
With the caveat, of course, that both results are nearly useless. Bottom line: if you really need something translated correctly, hire a human.