Machine Translation: State of the Art

The concept of machine translation was first proposed by Warren Weaver in 1947, just one year after the development of the first computer. In 1954 Georgetown University, with the collaboration of IBM, completed an experiment in Russian-English machine translation for the first time using the Ibm-701 computer: the dream of machine translation had become reality. But research continued and several machine translation companies were launched, including Trados in 1984, which was the first to develop translation memory technology. Meanwhile, with the rise of the Internet, the need for international communication grows at an unprecedented rate. Two main paradigms emerge in the industry. The first is rule-based machine translation (Rbmt), and the second, dominant until recently, is statistical phrase-based machine translation (Smt). Recent years have seen significant advances, particularly with Google’s research into Neural Machine Translation (Nmt), which teaches software to translate using machine learning technology based on artificial networks. In 2020, neural machine translation was able to instantly translate texts with 60-90% accuracy, which means that there is still effort to be made at the editing and quality research level to pass the old Turing test. Software cannot handle the nuances of human speech, including words with multiple meanings, sentences with multiple grammatical structures. They lack the tools to process the context in which a language is used. This is because emotions, nonverbal communication, and culture all have an effect on language. The programs have also shown modest effectiveness when used in the translation of health, legal and business documents while in other cases they have shown gender bias. Machine translation models such as Nmt also need a parallel amount of data to learn the translation. Most of the world’s languages lack sufficient data, which is why they are referred to as resource-poor languages. According to Internet World Stats, the number of users of the world’s top ten languages (English, Chinese, Spanish, Arabic, Portuguese, Indonesian/Malay, French, Japanese, Russian, and German) on the Internet represents about 77% of the total number of users. Of these, English and Chinese account for 25.9% and 19.4%, respectively, while the sum of all users of other languages accounts for only 23.1%. For resource-rich languages such as Chinese and English, it is possible to collect billions of sentence pairs to train an Nmt model; for resource-poor language pairs such as Chinese-Hindi or Chinese-Kiswahili, it is much more difficult.
Researchers acknowledge the advances but also the limitations of modern MT systems and question which paradigm will emerge in the future. They range from those who believe that technology will take over completely, making the human translator unnecessary, to those who believe that translations will always require the intervention of a professional. Increasingly sophisticated neural engines that can adapt to content and metadata will usher in the era of responsive machine translation. This advancement will require fundamental changes to the underlying MT technology and the way developers interact with the technology. At the moment, the human-in-the-loop model, which combines the experience of people with digital technologies, remains the most advantageous. The reason is quite simple: high-quality results require an expert to train a machine over time, enabling it to understand the writing style, tone, nuance and jargon of a specific industry. It stands to reason, then, that translation technologies will not replace human translators and interpreters. While increasingly powerful machines will reduce translators’ workloads, translators will be able to focus more on proofreading and post-editing, concentrating on the creative touch that only people can provide. There is only one profitable and sustainable way to grow along this path and that is through the symbiosis of man and machine. The former instructs and corrects the latter, which then translates all the sentences that have already been translated, leaving the professional with the fine work that is possible only for those who are able to grasp the nuances of each language and the context in which certain terms are used.