Death by Machine Translation

23/09/2022

Photo: suanmoo/Unsplash

Misplaced trust in machine translation tools’ ability is already leading to their misuse by authorities in high-stake situations, according to experts.
In Israel, a man captioned a photo of himself leaning on a bulldozer with the Arabic “يصبحهم,” or “good morning,” but a social media platform rendered it as “hurt them”.
“Your child is having a seizure” could come up in your mother tongue as “your child is dead”, according to a 2014 study about the limited usefulness of machine translation in clinical settings.
Machine translation tools like Google Translate are some of the most challenging in data-processing. Training a big model can produce as much CO2 as a trans-Atlantic flight.
One way to take advantage of increasingly sophisticated technology while guarding against errors is something called machine translation followed by post-editing.

Imagine you are in a foreign country where you don’t speak the language and your small child unexpectedly starts to have a fever seizure. You take them to the hospital, and the doctors use an online translator to let you know that your kid is going to be okay. But “your child is having a seizure” accidentally comes up in your mother tongue as “your child is dead”.

This specific example is a very real possibility, according to a 2014 study published in BMJ about the limited usefulness of machine-learning-powered translation in communications between patients and doctors. (Because it’s a British publication, the actual hypothetical quote was “your child is fitting.” Sometimes we need American-British translation, too.)

Machine translation tools like Google Translate can be super handy, and Big Tech often promotes them as accurate and accessible tools that’ll break down many intra-linguistic barriers in the modern world.

But the truth is that things can go awfully wrong. Misplaced trust in these machine translation tools’ ability is already leading to their misuse by authorities in high-stake situations, according to experts – ordering a coffee in a foreign country or translating lyrics can only do so much harm, but think about emergency situations involving firefighters, police, border patrol or immigration. And without proper regulation and clear guidelines, it could get worse.

Machine translation systems such as Google Translate, Microsoft Translator and those embedded in platforms like Skype and Twitter are some of the most challenging tasks in data processing. Training a big model can produce as much carbon dioxide as a trans-Atlantic flight.

For the training, an algorithm or a combination of algorithms is fed a specific dataset of translations. The algorithms save words and their relative positions as probabilities that they may occur together, creating a statistical estimate as to what other translations of similar sentences might be. The algorithmic system, therefore, doesn’t interpret the meaning, context, and intention of words, like a human translator would. It takes an educated guess – one that isn’t necessarily accurate.

In South Korea, a young man used a Chinese-to-Korean translation app to tell his female co-worker’s Korean husband they should all hang out together again soon. A mistranslation resulted in him erroneously referring to the woman as a nightlife establishment worker, resulting in a violent fistfight between the two in which the husband was killed, the Korea Herald reported in May.

In Israel, a young man captioned a photo of himself leaning on a bulldozer with the Arabic caption “يصبحهم,” or “good morning,” but the social media’s AI translation rendered it as “hurt them” in English or “attack them” in Hebrew. This led the man, a construction worker, to being arrested and questioned by police, according to The Guardian in October 2017.

Something similar happened in Denmark, where, the Copenhagen Post Online reported in September 2012, police erroneously confronted a Kurdish man for financing terrorism because of a mistranslated text message.

In 2017, a cop in Kansas used Google Translate to ask a Spanish-speaker if they could search their car for drugs. But the translation was inaccurate and the driver did not fully understand what he had agreed to given the lack of accuracy in the translation. The case was thrown out of court, according to state legal documents.

These examples are no surprise. Accuracy of translation can vary widely within a single language – according to language complexity factors such as syntax, sentence length or the technical domain – as well as between languages and language pairs, depending on how well the models have been developed and trained.

A 2019 study showed that, in medical settings, hospital discharge instructions translated with Google Translate into Spanish and Chinese are getting better over the years, with between 81% and 92% overall accuracy. But the study also found that up to 8% of mistranslations actually have potential for significant harm.

A pragmatic assessment of Google Translate for emergency department instructions from 2021 showed that the overall meaning was retained for 82.5% of 400 translations using Spanish, Armenian, Chinese, Tagalog, Korean and Farsi. But while translations in Spanish and Tagalog are accurate more than 90% of the time, there’s a 45% chance that they’ll be wrong when it comes to languages like Armenian. Not all errors in machine translation are of the same severity, but quality evaluations always find some critical accuracy errors, according to a June paper.

The good news is that Big Tech companies are fully aware of this, and their algorithms are constantly improving. Year after year, their BLEU scores – which measure how similar machine-translated text is to a bunch of high quality human translations – get consistently better.

Just recently, Microsoft replaced some of its translation systems with a more efficient class of AI model. Software programs are also updated to include more languages, even those often described as “low-resource languages”, because they are less common or harder to work with. That includes most non-European languages, even widely used ones like Chinese, Japanese and Arabic, to small community languages, like Sardinian and Pitkern.

For example, Google has been building a practical machine translation system for more than 1,000 languages. Meta has just released the ‘No Language Left Behind’ project, which attempts to deploy high-quality translations directly between 200 languages, including languages like Asturian, Luganda and Urdu, accompanied by data about how improved the translations were overall.

However, the errors that lead to consequential mistakes – like the construction worker experienced – tend to be random, subjective and different for each platform and each language. So cataloguing them is only superfluously helpful in figuring out how to improve MT, says Félix Do Carmo, a senior lecturer at the Centre for Translation Studies at the University of Surrey.

What we need to talk about instead, he says, is “how are these tools integrated into society?” Most critically, we have to be realistic about what machine translation can and can’t do for people right now. This involves understanding the role machine translation can have in everyday life, when and where it can be used, and how it is perceived by the people using it.

“We have seen discussions about errors in every generation of machine translation. There is always this expectation that it will get better,” says Do Carmo. “We have to find human-scale solutions for human problems.”

And that means understanding the role human translators still need to play. Even as medications have become massively better over the decades, there still is a need for a doctor to prescribe them. Similarly, in many translation use cases, there is no need to totally cut out the human mediator, says Sabine Braun, director of the Centre for Translation Studies at the University of Surrey. One way to take advantage of increasingly sophisticated technology while guarding against errors is something called machine translation followed by post-editing, or MT+PE, in which a human reviews and refines the translation.

One of the oldest examples of a company using MT+PE successfully is detailed in this a 2012 study about Autodesk, a software company that provides imaging services for architects and engineers, which used post-editing for machine translation to translate the user interface into 12 languages.

Other similar solutions have been reported by a branch of the consulting company Ernst & Young, for example, and the Swiss bank MigrosBank, which found that post-editing boosted translation productivity by up to 60%, according to Slator.

Already, some machine translation companies have stopped selling their technologies for direct use of clients and now always require some sort of post-editing translation, Do Carmo says. For example, Unbabel and Kantan are platform plugins that businesses add into their customer support and marketing workflows to reach clients all over the world. When they detect poor quality in the translated texts, the texts are automatically routed to professional editors. Although these systems aren’t perfect, learning from these could be a start.

Ultimately, Braun and Do Carmo think that it’s necessary to develop holistic frameworks that go far beyond the metrics used at the moment to assess or evaluate quality of translation, like BLEU. They would like to see the field working on an evaluation system which encompasses the “why” behind the use of translation, too.

One approach might be an independent, international regulatory body to oversee the use and development of machine translation into the real world – with plenty of social scientists on board. Already, there are many standards in the translation industry as well as technological standardisation bodies, like the W3 organisation – so experts believe it can be done, as long as there is some more organisation in the industry.

Governments and private companies alike also need clear policies about exactly when officials should and should not use machine translation tools, either free consumer ones or others.

Neil Coulson is the founder of CommSOFT, a communication and language software technology company trying to make machine translation safer. “Police forces, border-control agencies, and many other official organisations aren’t being told that machine translation isn’t real translation and so they give these consumer gadgets a go,” he says.

In March 2020, his organisation sent out a Freedom of Information request to 68 different large UK public-sector organisations asking for their policies on the use of consumer gadget translation technologies. The result: none of these organisations had any policy for their use of machine translation, and they didn’t monitor any of their organisational or staff’s ad hoc use of machine translation.

This can lead to an unregulated, free-for-all landscape in which anyone can publish a translation app and claim that it works, says Coulson. “It’s a ‘let a thousand flowers bloom’ approach … but eventually someone eats a flower that turns out to be poisonous and dies.”

Education about the pros and cons of machine translation, of course, is paramount – among researchers, companies and organisations that want to actually start using the tool, but most importantly, among everyday users.

That’s why Lynne Bowker, a professor of translation and interpretation at the University of Ottawa, started the ‘Machine Translation Literacy’ project. Their goal is to spread awareness of how machine translation systems process information and teach researchers and scholars how to actually use them more effectively.

Including information about machine translation as part of the broader digital literacy and information literacy training given to school kids would also be welcome.

“Being machine translation literate means understanding the essentials of how this technology works in order to be able to evaluate its strengths and weaknesses for a particular task or use,” says Bowker. Language, in a social context, is communication. “One of the real challenges we are facing is how to reach the wider public with this message.”

Being able to differentiate between low-stakes tasks and high-stakes tasks remains one of the key points, Bowker says. Thankfully, in the meantime, most mistranslations still just lead up to a laugh: according to a 2016 study in the International Journal of Communication, there’s a Chinese restaurant called Translate Server Error. The machine translation system mistranslated the original language, but the restaurant owners didn’t know English well enough to realise something was off.

Future Tense is a partnership of Slate, New America, and Arizona State University that examines emerging technologies, public policy, and society.