The Speculist: Transference Is the Challenge

Here's the opening from Stephen's recent entry on a bad shopping experience with Target:

As the amount of data and intelligence available to merchants increases, so should our expectations as customers. Some stores seem to get this, others don't.

Now here's the "same"phrase translated from English to Japanese, then back to English, to Chinese, back to English, to French, back to English, to German, back to English, to Italian, back to English, to Portuguese, back to English, to Spanish, and then finally once more back to English:

Whereas they probably use it, our switches of the emergency of the hope of the commerce of the commerce therefore magnify the data of the client and the intelligent amount. Together if the memory, that one he to take with this, other subjects if memorizzato like.

One Carl Tashian wrote a program to abuse phrases by passing them iteratively through computer translation programs. Of course, even if translation programs were 99.9% accurate and reliable not just for vocabulary but for idiomatic phrases, it would still be an abuse of them to use them this way. If you make a photocopy of a photograph, then photocopy the copy, then copy that copy and on and on, you will see a lot of degradation of the picture even with a really good copier.

Still, the results are interesting to say the least. Even tiny, simple phrases such as "I love you" get pretty mangled. In fact, the odd title of this entry is really just the intended title, "Communication Is Challenging" sent through the wringer. At least that one is in the ballpark, I guess.

Important to note: this is state-of-the-art as of 2003. Maybe the technology has improved since then?

Comments

is it funny that technology is mangling those sentences or that translation is mangling them? I doubt the results would be vastly better if you translated that paragraph through as many human interpreters, or that the meaning would not be significantly changed just having a few retellings in english. It might come down to the (unfortunate) fact that communication with humans is a lossy process.

Posted by: MikeD | May 25, 2007 09:04 AM

Hmmm, sometime I'll need to do some asymptotic analysis on this stuff. I think perfect reproduction of the original message is impossible, though it should get good enough to preserve meaning (for phrases that don't depend on context). But another problem is what happens to the message over many translations? In the future, I can see the possibility of messages that have legitimately passed through translators several times, drifting in meaning with each step and slowly growing larger. While drift of meaning would be very hard to measure, growth of the message over time is easy.

Let's say there are N languages to chose from. Then one can construct arbitrarily long loops and measure message growth in each loop. A reasonable hypothesis is that the size efficiency of each translator is independent of other translators (ie, that the efficiency of translating from English through Spanish to Polish to English is completely described by efficiency of the translators taken in isolation).

Certainly, I think going through loops of translators is the way to measure efficiency.

Posted by: Karl Hallowell | May 25, 2007 09:20 AM

Do you find a metric for the efficiency of the language itself? Dutch has single words that (while an exceedingly long number of characters) has as much context or meaning as an entire sentence in english. Would it be an 'efficient' translator that converts 300 characters of english sentence to 75 characters of dutch word? would there be a penalty for converting 75 characters back to 300, even if 300 characters is the absolute minimum required to express the original idea? Or are you only measuring that the hamiltonian circuit started with 300 characters and ends with 300 characters despite having passed an arbitrary number of language nodes?

I believe my agent/interface should convert written ideas to a neutral interface (like XML*) containing contextual clues to the meaning that I assume and am uninterested in providing. Of course this makes the neutral medium orders of magnitude larger than the text I write. The receiving agent (your end) converts the neutral content back to the language you speak, including the contextual assumptions you have. Our agents are trained on our communication so they grow increasingly optimized to their task. Maybe they are capable of engaging supplemental exchanges to establish proper knowledgebases between sender and receiver. This would be analogous to the "mental model" we build through interaction with friends or coworkers with whom we are in frequent contact. Obviously that takes too long to do with 'strangers' on the internet - but that responsibility can/should be offloaded to digital agents (which may be the best use of near human-level AI i can think of)

*XML may not be the ideal solution for any actual task, but the concept of a well supported neutral encoding schema is a good start.

Posted by: MikeD | May 25, 2007 11:10 AM

Hamiltonian circuits come from graph theory and are paths that pass through every vertex of the graph. In this case, hamiltonian circuits would be translation paths that pass through every translatable language once and end in the language that the translation started with. I don't intend to be comprehensive, though there might be information in doing an elaborate loop of translation.

Here, my thinking is that certain translation loops may have hidden inefficiencies. Spanish to Russian and Russian to Chinese may be decently efficient (let's ignore for the moment what that would mean), but the combined step of Spanish to Russian to Chinese may have additional inefficiency due to how the translators interact or couple.

I don't know how to measure efficiency of individual translation steps, but one does get a lot of information from studying these translation loops. If loops involving the swahili to spanish translator are unusually bad, then you know there's something wrong with that program.

PS, an occasion bit of speculation is whether we could make some more powerful language far superior to the ones currently in use. For example, mathematics is choke full of obscure and opaque terms (like "hamiltonian circuit" and "graph theory"). And language restricts in ways that are hard to perceive (if you don't have a term for it, you might not see it).

It would be interesting if there were a language that in some way had a great deal more functionality (or some other -ality) than the popular languages like English. It would need this in order to sway people enough to adopt it on a large scale. I see no reason to ignore languages used in highly specialized situations, say for air traffic control or soldiers on a battlefield (real or in a massive multiplayer game).

But there's a certain lure in becoming considerably smarter merely because you speak and think in a better language.

Posted by: Karl Hallowell | May 25, 2007 12:18 PM

All your base are belong to us!

Posted by: Stephen Gordon | May 25, 2007 01:15 PM

No two translations are bilaterally symmetrical. I would be curious to see how much less mangled the text would have been had the path been English to Japanese, to Chinese, to French, to German, to Italian, to Portuguese, to Spanish, and then finally back to English. In particular I would expect the two Romance to Romance steps at the end to have relatively little impact.

Posted by: triticale | May 25, 2007 02:44 PM

Would you be _smarter_ if you could think in a language that was perfectly succint, but was unintelligible to anyone but yourself? Isn't that what we do now? The problem comes from trying to use a less than ideal communication channel to express ourselves. You called graph theory obscure, yet you knew what it was - and the mental model you constructed from that term was more information-rich than possibly the rest of my last post.

Do you compare the effectiveness of converting A to B then B to C versus A directly to C?

Would you take the time to learn lojban if it meant you would be one of a few dozen people that could speak with the first emerging AI? What if it turns out that this language becomes the lingua franca of our future? I haven't decided - I'm inclined to maintain my American attitude of entitlement: if the machines wants to speak with ME, it should learn MY language. I know that's ignorant, but that's my story (for now) :)

Posted by: MikeD | May 26, 2007 11:15 AM

I'm sorry MikeD, I'm afraid I can't do that.

Posted by: triticale | May 27, 2007 09:55 PM

I have no "cultural" attachment or preferance for English. Ie, it's just that I already know it, more or less. And I feel too busy to learn to think in another language- unless/until the compensation or potential compensation is big enough.

Phil's image iteration example makes me think that there ought to be a way to "digitize" language meaning. If the idea could be distilled- "transference a challenge" then the machine could put it into any language from the idea.

I have some experience in the field- and I know that standard practice for text translation going from one language to another, and another and so on is to reference the original. Ie. English to Japanese to Chinese, would actually be English to Japanese and English (original) to Chinese. Not English to Japanese back to English form Japanese and then to Chinese. Nor Japanese straight to CHinese. Only in academic settings where the original is not available will pro translators even attempt it.

It's a positive and redundant feedback technique and it's the only way.

And the word on the street is tht Dragon has learned to speak our language.

Posted by: MDarling | June 1, 2007 10:09 AM

Live to see it.

Transference Is the Challenge

Comments

Post a comment

Getting Around

Be a Speculist

Recent Posts

Blogroll

Categories

Archives