Online translation tools, thoughts and challenges June 1, 2012Posted by Mike Gulliver in Technology.
Tags: Banquets, grammar, machine translation, semantics, translate
I’m aware that posting up the French Banquet transcriptions isn’t enormously helpful without the English – unless, of course, you can read French (in which case, let me know and I’ll post you photocopies and you can help me!).
So what do you do with them?
Well, one thing you can do is to get there before I do, and translate them yourself, by using one of the dozens of online translation tools that are available on the Web.
This, however, is not as straightforward as it sounds – in fact, it’s rather like walking into a minefield without a map – particularly if you want to do any more critical reading of the text.
Here, for example, is the opening phrase of the Banquet introduction with a couple of human translations:
Original – Quel pays n’a pas eu sa caste persécutrice et sa caste persécutée ?
Human translation (close) – What country has not had its persecuting caste, and persecuted castes?
Human translation (loose) – Can anyone show me a country that doesn’t have a history in which one group of people has persecuted another?
And… some machine translations.
Google Translate – Which country has not had his caste and his caste persecuted persecutor?
Babelfish (Microsoft) – What country did not persecuting caste and caste persecution?
http://www.freetranslation.com/ – Which country did not have his class persecutor and his persecuted class?
http://translation.babylon.com/french/to-english/ – What country has not had its persecutory approach toward caste and its caste persecuted?
http://translation2.paralink.com/ – What country did not have its clique persecutor and its persecuted clique?
http://www.worldlingo.com/en/products_services/worldlingo_translator.html - Which country didn’t have its persecuting caste and her persecuted caste?
http://www.systranet.com/translate – Which country didn’t have its persecuting caste and her persecuted caste?
As you can see, although some of them are close, none of them get it completely right – and none are even close to the ‘looser’ rendering.
From this little exercise, three things are particularly interesting for m:
The first is the simple variety of the translations:
There are some real clangers:
- Where Bablefish got the noun “persecution” from, when the original is the adjective “persécutée”, I have no idea.
- Similarly, the Babylon translation switches the problem from the persecution of one caste by another, to the persecution of the caste system itself.
But there are some nice touches:
- Google has managed to sort out the pluperfect – rendering “n’a pas eu” as “has not had”.
- Google has also tried to manage the replication of “caste”. They’ve got it wrong and moved “persecuted” and “persecutor” together… but it’s a nice try.
- The last two (one really) get the “persecuting caste and persecuted caste” section right – which then means that the rest of the phrase also largely makes sense.
All of this variation from one simple phrase that has no embedded clauses or anything that’s too complicated, which rather sadly suggests that there’s a tremendous amount of investment going into generating lots of versions of a system that still doesn’t work terribly well.
The second thing that I notice is the fragility of most of the translations’ mapping from semantic to grammatical levels. Which rather suggests that rather than look for the relationship between grammar and meaning, they are still focusing on parsing grammar and vocab, and then attempting to source statistical equivalents. This is a difference that I’ll try and unpack in a future post, but it does rather back up what a friend of mine said recently: that “machine translation has been about for 50 years or so, but all people seem to be doing is throwing more and more computational power at it, rather than think about how it actually works.”
However, the final thing – and this is altogether more positive – is that, despite the problems, if you use not one, not two, but a number of different translations – you can probably compile a translation that is meaningful enough to work with. At least, until a human translation becomes available.
I’ve provided the links above for you to explore with the French Banquet text – I’d be interested in how you get on