How Does It Work? A Guide to Machine Translation & Related Tools

There are a lot of online tools out there that language learners and instructors might encounter. In this guide, we break down how some of these tools actually work and highlight implications for language learning and teaching.

Machine Translation

The first renditions of machine translation (1960s-70s) were called Rule-Based Machine Translation and relied on explicit rules defined by grammar and complete dictionaries, which all had to be manually written into the program. It was difficult and expensive to program, and could not translate nuance or idioms. This model also required constant manual input to account for changes in language.

Next, there was Statistical Machine Translation (SMT), which worked by trying to generate translations using a pattern matching method based on pre-translated bilingual text corpora, such as EU documents translated into multiple languages. This approach (1980s-90s) was time-consuming and expensive to program, and only worked if there was enough bilingual text corpora. SMT models still needed to be trained in order to work properly and could only work if it had text-based reference.

In 2016, Google Translate switched from SMT to Neural Machine Translation (NMT), which is now typically what we refer to when we say “machine translation” more generally, as previous models have been phased out. In a nutshell, NMT uses Neural Networks to predict the likelihood of words in a sequence. Neural Networks is a method used in Artificial Intelligence (AI) that is loosely based on neuronal activity in the human brain and allows computers to process data in a similar way.

It is a type of machine learning process called deep learning, where computers can process and thus “learn” from data without specific instructions by recognizing patterns and making inferences from those patterns. For example, for translating English to Spanish, the model is presented with millions of examples of English-to-Spanish translations until it "learns" to conjugate verbs correctly in the target language. Neural Networks require massive amounts of training data in order for it to function properly. The training data, however, is subject to biases, such as standard language and gender bias.

More technically, NMT works by turning words and sentences into a series of numbers (called vectors) and then doing math with these vectors. The math is done by putting the vectors into what is called a transformer model; a transformer encodes and decodes the vectors, based on the billions of data points it was trained on. The encoder turns the words into vectors, and the decoder turns it back into words. The vectors can also show the relationships of words to each other.

For example, this model reduces the likelihood of using the wrong preposition in a translated sentence, not because it was programmed to know the rule, but because the computer will not recognize it as a likely combination or established pattern.

If you would like to learn more about vectors and transformer models, please see this video and this article

It is important to note that the quality of the translation output depends on whether a language is a High Resource Language, or a Low Resource Language. High Resource Languages are languages with a strong internet presence, resources, and training data, such as English, German, Chinese, and Spanish. Low Resource Languages are languages that do not have much internet presence or training data, such as Galician, Kurdish, and Somali. These titles do not always correlate with the number of speakers or prevalence. It is also worth noting that when translating between two Low Resource Languages, it is possible that the computer will use English as an intermediary language.

Key Points

Machine translation has a long history, with several previous approaches.
The most current approach, neural machine translation, is more accurate and faster than previous versions.
MT tools are prediction tools: they are trained to predict the highest probability of a character, word, or set of words given the context. As such, they do not understand meaning in language nor are they following grammatical rules; moreover, they are subject to inaccuracies and bias as a result of their training data.

Generative AI

AI has been a part of language technologies for a while (think autocorrect!) beyond just its applications in machine translation. But in 2022, OpenAI released the first public-facing AI chatbot called ChatGPT, introducing another layer into the fold. While not trained or designed as a translator, users can prompt ChatGPT and other AI chatbots to translate single words, sentences, and paragraphs. ChatGPT’s technology is similar to Neural Machine Translation's technology in that it uses huge databases called Large Language Models (LLMs) to answer prompts using technology that predicts the likelihood of a sequence of words given the words that come before it. Similar to the training data used to train neural machine translation tools, LLMs encode bias, such as standard language and gender bias.

AI chatbots differ from NMT in that NMT is pre-trained and fine-tuned for translation tasks, whereas chatbots, while often capable of translating, are not. This is because the LLMs behind these chatbots are not fine-tuned for translation tasks. NMT and LLMs are powered by similar transformer model technology, but ultimately have different approaches.

Key Points

Many generative AI chatbots can complete translation tasks; however, they are not always trained to do so, which might make them less accurate when used as translation tools.
In contrast to machine translation tools, generative AI chatbots can produce novel text in response to a prompt.
Generative AI chatbots are prediction-based: their results are based on the highest statistical probability. As such, they do not understand meaning in language nor are they following grammatical rules; moreover, they are subject to inaccuracies and bias as a result of their training data.

Related Tools

Online Dictionaries

Online Dictionaries are online databases that utilize a search system to provide translations for words and phrases, either bilingually or monolingually, as well as giving information about words such as parts of speech, declensions, gender, etc.

Word Reference is an example of an online dictionary that has a large number of various language pairs and utilizes a search feature for looking up words. In addition to definitions, it provides additional information about the word, example sentences, and a forum for users to ask questions.

Parallel Corpora

Parallel Corpora is a large collection of texts that are translated into one or more languages and placed alongside each other. In the context of online language tools, an Online Parallel Corpora is similar to an Online Dictionary in many of its features, but includes side-by-side bilingual examples found in the Parallel Corpora, aligned at the sentence level.

Linguee: Owned by DeepL, Linguee is an Online Parallel Corpora that serves as Online Dictionary. Bilingual definitions are given in addition to side-by-side real-world/naturally-occurring examples of a given word or phrase in translation taken from bilingual texts. The examples are sourced by bots that search the internet for bilingual texts, which are then aligned by sentence and then evaluated by AI and its users for accuracy. In addition to its online dictionary, Linguee also has an online machine translation feature.

Online Verb Conjugators

Similar to Online Dictionaries, Online Verb Conjugators are online databases that utilize a search system for the purpose of showing all the possible declinations of a verb, typically listed in table format organized by tense, mood, and other grammatical features. They can either be bilingual or monolingual.

Glossary

NMT: An approach to machine translation that uses neural networks to predict the statistical likelihood of a sequence of words.

Artificial Intelligence (AI): Technology that allows computers to simulate human intelligence by performing tasks

Neural Networks: A model inspired by the neurons in a human brain that uses a system of interconnected nodes, it is a subfield of machine learning.

Large Language Models (LLMs): A deep learning model used in AI systems that is capable of understanding and generating human language by processing vast amounts of text.

High Resource Language: Languages with strong internet presence and large amount of resources

Low Resource Language: Languages with smaller internet presence and fewer resources

Chatbot: Software that is programmed to mimic human conversation

Machine Learning: A subset of AI that involves the development of algorithms that can "learn" to perform tasks from data, either with explicit labels (supervised learning) or without (unsupervised learning).

Deep Learning: A subset of neural networks that prioritize longer networks capable of developing data filters and processing long sequences of unlabeled or labeled data

Transformer Model: A neural network structure that changes an input sequence into an output sequence by learning its context and meaning through tracking the relationships between the sequential components. In this case, the words in a sentence.

Vectors: in machine learning, a vector is a list of numbers that represents meaning.

Sources & Further Reading

IBM. (2023). What are neural networks? https://www.ibm.com/topics/neural-networks

IBM. (2023). What are large language models? https://www.ibm.com/topics/large-language-models

IBM. What is a transformer model? https://www.ibm.com/topics/transformer-model

IBM. (2023). What is machine learning? https://www.ibm.com/topics/machine-learning

IBM. (2024, June 17). What is deep learning? https://www.ibm.com/topics/deep-learning

Lee, T. B., & Trott, S. (2023, July 31). A jargon-free explanation of how AI large language models work. Ars Technica. https://arstechnica.com/science/2023/07/a-jargon-free-explanation-of-how...

Mirela. (2024, June 27). NMT vs LLM: discussing the differences. POEditorhttps://poeditor.com/blog/nmt-vs-llm/

Mirela. (2024, January 1). Low-resource languages: a localization challenge. https://poeditor.com/blog/low-resource-languages/

Mirela. (2024, January 4). The role of high-resource languages in NLP and localization. https://poeditor.com/blog/high-resource-languages/

The Wall Street Journal. (2024, May 25). How Google Translate uses math to understand 134 languages [Video]. https://www.wsj.com/video/series/tech-behind/how-google-translate-uses-m...