NEF - Le Livre 010101 de Marie Lebert - Multilingualism on the Web
4.1. Translation Services
4.2. Machine Translation
4.3. Computer-Assisted Translation
Maintained by Vorontsoff, Wesseling & Partners, Amsterdam, the Netherlands, Aquarius is a directory of translators and interpreters including 6,100 translators, 800 translation companies, 91 specialized areas of expertise and 369 language combinations. This non-commercial project helps to locate and contact the best translators in the world directly, without intermediaries or agencies. Aquarius Database can be searched using location, language combination and specialization.
Founded by Bill Dunlap, Euro-Marketing Associates proposes Global Reach, a methodology for companies to expand their Internet presence into a more international framework. this includes translating a website into other languages, actively promoting it and using local banner advertising to increase local website traffic in all on-line countries. Bill Dunlap explains:
"Promoting your website is at least as important as creating it, if not more important. You should be prepared to spend at least as much time and money in promoting your website as you did in creating it in the first place. With the "Global Reach" program, you can have it promoted in countries where English is not spoken, and achieve a wider audience... and more sales. There are many good reasons for taking the on-line international market seriously. "Global Reach" is a means for you to extend your website to many countries, speak to on-line visitors in their own language and reach on-line markets there."
In his e-mail of December 11, 1998, he also explains what the use of the Internet brought in his professional life:
"Since 1981, when my professional life started, I've been involved with bringing American companies in Europe. This is very much an issue of language, since the products and their marketing have to be in the languages of Europe in order for them to be visible here. Since the Web became popular in 1995 or so, I've turned these activities to their on-line dimension, and have come to champion European e-commerce among my fellow American compatriates. Most lately at Internet World in New York, I spoke about European e-commerce and how to use a website to address the various markets in Europe."
Machine translation (MT) is the automated process of translating from one natural language to another. MT analyzes the language text in the source language and automatically generates corresponding text in the target language.
Characterized by the absence of any human intervention during the translation process, machine translation (MT) is also called "fully automatic machine translation (FAMT)". It differs from "machine-aided human translation (MAHT)" or "computer-assisted translation (CAT)", which involves some interaction between the translator and the computer.
As SYSTRAN, a company specialized in translation software, explains on its website:
"Machine translation software translates one natural language into another natural language. MT takes into account the grammatical structure of each language and uses rules to transfer the grammatical structure of the source language (text to be translated) into the target language (translated text). MT cannot replace a human translator, nor is it intended to."
The European Association for Machine Translation (EAMT) gives the following definition:
"Machine translation (MT) is the application of computers to the task of translating texts from one natural language to another. One of the very earliest pursuits in computer science, MT has proved to be an elusive goal, but today a number of systems are available which produce output which, if not perfect, is of sufficient quality to be useful for certain specific applications, usually in the domain of technical documentation. In addition, translation software packages which are designed primarily to assist the human translator in the production of translations are enjoying increasing popularity within professional translation organizations."
Machine translation is the earliest type of natural language processing. Here are the explanations given by Globalink:
"From the very beginning, machine translation (MT) and natural language processing (NLP) have gone hand-in-hand with the evolution of modern computational technology. The development of the first general-purpose programmable computers during World War II was driven and accelerated by Allied cryptographic efforts to crack the German Enigma machine and other wartime codes. Following the war, the translation and analysis of natural language text provided a testbed for the newly emerging field of Information Theory.
During the 1950s, research on Automatic Translation (known today as Machine Translation, or 'MT') took form in the sense of literal translation, more commonly known as word-for-word translations, without the use of any linguistic rules.
The Russian project initiated at Georgetown University in the early 1950s represented the first systematic attempt to create a demonstrable machine translation system. Throughout the decade and into the 1960s, a number of similar university and government-funded research efforts took place in the United States and Europe. At the same time, rapid developments in the field of Theoretical Linguistics, culminating in the publication of Noam Chomsky's Aspects of the Theory of Syntax (1965), revolutionized the framework for the discussion and understanding of the phonology, morphology, syntax and semantics of human language.
In 1966, the U.S. government-issued ALPAC report offered a prematurely negative assessment of the value and prospects of practical machine translation systems, effectively putting an end to funding and experimentation in the field for the next decade. It was not until the late 1970s, with the growth of computing and language technology, that serious efforts began once again. This period of renewed interest also saw the development of the Transfer model of machine translation and the emergence of the first commercial MT systems.
While commercial ventures such as SYSTRAN and METAL began to demonstrate the viability, utility and demand for machine translation, these mainframe-bound systems also illustrated many of the problems in bringing MT products and services to market. High development cost, labor-intensive lexicography and linguistic implementation, slow progress in developing new language pairs, inaccessibility to the average user, and inability to scale easily to new platforms are all characteristics of these second-generation systems."
A number of companies are specialized in machine translation development, such as Lernout & Hauspie, Globalink, Logos or SYSTRAN.
Based in Ieper (Belgium) and Burlington (Massachussets, USA), Lernout & Hauspie (L&H) is an international leader in the development of advanced speech technology for various commercial applications and products. The company offers four core technologies - automatic speech recognition (ASR), text-to-speech (TTS), text-to-text and digital speech compression. Its ASR, TTS and digital speech compression technologies are licensed to main companies in the telecommunications, computers and multimedia, consumer electronics and automotive electronics industries. Its text-to-text (translation) services are provided to information technology (IT) companies and vertical and automation markets.
The Machine Translation Group of Lernout & Hauspie comprises enterprises that develop, produce, and market highly sophisticated machine translation systems: L&H Language Technology, AppTek, AILogic, NeocorTech and Globalink. Each is an international leader in its particular segment.
Founded in 1990, Globalink is a major U.S. company in language translation software and services, which offers customized translation solutions built around a range of software products, on-line options and professional translation services. The company publishes language translation software products in Spanish, French, Portuguese, German, Italian and English, and finds solutions to translation problems faced by individuals and small businesses, to multinational corporations and governments (a stand-alone product that gives a fast, draft translation or a full system to manage professional document translations). Globalink explains its corporate information on its website as follows:
"With Globalink's translation applications, the computer uses three sets of data: the input text, the translation program and permanent knowledge sources (containing a dictionary of words and phrases of the source language), and information about the concepts evoked by the dictionary and rules for sentence development. These rules are in the form of linguistic rules for syntax and grammar, and some are algorithms governing verb conjugation, syntax adjustment, gender and number agreement and word re-ordering.
Once the user has selected the text and set the machine translation process in motion the program begins to match words of the input text with those stored in its dictionary. Once a match is found, the application brings up a complete record that includes information on possible meanings of the word and its contextual relationship to other words that occur in the same sentence. The time required for the translation depends on the length of the text. A three-page, 750-word document takes about three minutes to render a first draft translation."
Randy Hobler is a Marketing Consultant for Globalink. He is currently acting as the Product Marketing Manager for Globalink's suite of Internet based products and services. In his e-mail of 3 September 1998, he wrote:
"85% of the content of the Web in 1998 is in English and going down. This trend is driven not only by more websites and users in non-English-speaking countries, but by increasing localization of company and organization sites, and increasing use of machine translation to/from various languages to translate websites.
Because the Internet has no national boundaries, the organization of users is bounded by other criteria driven by the medium itself. In terms of multilingualism, you have virtual communities, for example, of what I call 'Language Nations'... all those people on the Internet wherever they may be, for whom a given language is their native language. Thus, the Spanish Language nation includes not only Spanish and Latin American users, but millions of Hispanic users in the US, as well as odd places like Spanish-speaking Morocco.
Language Transparency: We are rapidly reaching the point where highly accurate machine translation of text and speech will be so common as to be embedded in computer platforms, and even in chips in various ways. At that point, and as the growth of the Web slows, the accuracy of language translation hits 98% plus, and the saturation of language pairs has covered the vast majority of the market, language transparency (any-language-to-any-language communication) will be too limiting a vision for those selling this technology. The next development will be 'transcultural, transnational transparency', in which other aspects of human communication, commerce and transactions beyond language alone will come into play. For example, gesture has meaning, facial movement has meaning and this varies among societies. The thumb-index finger circle means 'OK' in the United States. In Argentina, it is an obscene gesture.
When the inevitable growth of multi-media, multi-lingual videoconferencing comes about, it will be necessary to 'visually edit' gestures on the fly. The MIT Media Lab [MIT: Massachussets Institute of Technology], Microsoft and many others are working on computer recognition of facial expressions, biometric access identification via the face, etc. It won't be any good for a U.S. business person to be making a great point in a Web-based multi-lingual video conference to an Argentinian, having his words translated into perfect Argentinian Spanish if he makes the 'O' gesture at the same time. Computers can intercept this kind of thing and edit them on the fly.
There are thousands of ways in which cultures and countries differ, and most of these are computerizable to change as one goes from one culture to the other. They include laws, customs, business practices, ethics, currency conversions, clothing size differences, metric versus English system differences, etc., etc. Enterprising companies will be capturing and programming these differences and selling products and services to help the peoples of the world communicate better. Once this kind of thing is widespread, it will truly contribute to international understanding."
Logos is an international company (US, Canada and Europe) specialized in machine translation for 25 years, which provides various translation tools, machine translation systems and supporting services.
SYSTRAN (an acronym for System Translation) is a company specialized in machine translation software. SYSTRAN's headquarters are located in Soisy-sous-Montmorency, France. Sales and marketing, along with most development, operate out of its subsidiary, in La Jolla, California. The SYSTRAN site gives an interesting overview of the company's history. One of the company's products is AltaVista Translation, an automatic translation service of English Web pages into French, German, Italian, Portuguese, or Spanish, and vice versa, and is available on the AltaVista site, the most frequently used search engine on the Web.
Based in Montreal, Canada, Alis Technologies is an international company specialized in the development and marketing of language handling solutions and services, particularly at language implementation in the IT industry. Alis Translation Solutions (ATS) offers a wide selection of applications and languages, and multiple tools and services for best possible translation quality. Language Technology Solutions (LTS) is devoted to commercializing advanced tools and services in the field of language engineering and information technology. The unilingual information systems are transformed into software that users can put to work in their own language (90 languages covered).
Another machine translation development is SPANAM and ENGSPAN, which are fully automatic machine translation systems developed and maintained by the computational linguists, translators, and systems programmer of the Pan American Health Organization (PAHO), Washington, D.C. The PAHO Translation Unit has used SPANAM (Spanish to English) and ENGSPAN (English to Spanish) to process over 25 million words since 1980. Staff and free-lance translators postedit the raw output to produce high-quality translations with a 30-50% gain in productivity. The system is installed on a local area network at PAHO Headquarters and is used regularly by staff in the technical and administrative units. The software is also installed in a number of PAHO field offices and has been licensed to public and non-profit institutions in the US, Latin America, and Spain.
Some associations also contribute to machine translation development.
The Association for Computational Linguistics (ACL) is the main international scientific and professional society for people working on problems involving natural language and computation. Published by MIT Press, the ACL quarterly journal, Computational Linguistics (ISSN 0891-2017), continues to be the primary forum for research on computational linguistics and natural language processing. The Finite String is its newsletter supplement. The European branch of ACL is the European Chapter of the Association of Computational Linguistics (EACL), which provides a regional focus for its members.
The International Association for Machine Translation (IAMT) heads a worldwide network with three regional components: the Association for Machine Translation in the Americas (AMTA), the European Association for Machine Translation (EAMT) and the Asia-Pacific Association for Machine Translation (AAMT).
The Association for Machine Translation in the Americas (AMTA) presents itself as an association dedicated to anyone interested in the translation of languages using computers in some way. It has members in Canada, Latin America, and the United States. This includes people with translation needs, commercial system developers, researchers, sponsors, and people studying, evaluating, and understanding the science of machine translation and educating the public on important scientific techniques and principles involved.
The European Association for Machine Translation (EAMT) is based in Geneva, Switzerland. This organization serves the growing community of people interested in MT (machine translation) and translation tools, including users, developers, and researchers of this increasingly viable technology.
The Asia-Pacific Association for Machine Translation (AAMT), formerly called the Japan Association for Machine Translation (created in 1991), is comprised of three entities: researchers, manufacturers, and users of machine translation systems. The association endeavors to develop machine translation technologies to expand the scope of effective global communications and, for this purpose, is engaged in machine translation system development, improvement, education, and publicity.
In Web embraces language translation, an article of ZDNN (ZD Network News) of July 21, 1998, Martha L. Stone explains:
"Among the new products in the $10 billion language translation business are instant translators for websites, chat rooms, e-mail and corporate intranets.
The leading translation firms are mobilizing to seize the opportunities. Such as:
SYSTRAN has partnered with AltaVista and reports between 500,000 and 600,000 visitors a day on babelfish.altavista.digital.com, and about 1 million translations per day -- ranging from recipes to complete Web pages.
About 15,000 sites link to babelfish, which can translate to and from French, Italian, German, Spanish and Portuguese. The site plans to add Japanese soon.
'The popularity is simple. With the Internet, now there is a way to use US content. All of these contribute to this increasing demand,' said Dimitros Sabatakakis, group CEO of SYSTRAN, speaking from his Paris home.
Alis technology powers the Los Angeles Times' soon-to-be launched language translation feature on its site. Translations will be available in Spanish and French, and eventually, Japanese. At the click of a mouse, an entire web page can be translated into the desired language.
Globalink offers a variety of software and Web translation possibilities, including a free e-mail service and software to enable text in chat rooms to be translated.
But while these so-called 'machine' translations are gaining worldwide popularity, company execs admit they're not for every situation.
Representatives from Globalink, Alis and SYSTRAN use such phrases as 'not perfect' and 'approximate' when describing the quality of translations, with the caveat that sentences submitted for translation should be simple, grammatically accurate and idiom-free.
'The progress on machine translation is moving at Moore's Law -- every 18 months it's twice as good,' said Vin Crosbie, a Web industry analyst in Greenwich, Conn. 'It's not perfect, but some [non-English speaking] people don't realize I'm using translation software.'
With these translations, syntax and word usage suffer, because dictionary-driven databases can't decipher between homonyms -- for example, 'light' (as in the sun or light bulb) and 'light' (the opposite of heavy).
Still, human translation would cost between $50 and $60 per Web page, or about 20 cents per word, SYSTRAN's Sabatakakis said.
While this may be appropriate for static 'corporate information' pages, the machine translations are free on the Web, and often less than $100 for software, depending on the number of translated languages and special features."
Within the World Health Organization (WHO), Geneva, Switzerland, the Computer-assisted Translation and Terminology (Unit (CTT) is assessing technical options for using computer-assisted translation (CAT) systems based on "translation memory". With such systems, translators have immediate access to previous translations of portions of the text before them. These reminders of previous translations can be accepted, rejected or modified, and the final choice is added to the memory, thus enriching it for future reference. By archiving daily output, the translator would soon have access to an enormous "memory" of ready-made solutions for a considerable number of translation problems. Several projects are currently under way in such areas as electronic document archiving and retrieval, bilingual/multilingual text alignment, computer-assisted translation, translation memory and terminology database management, and speech recognition.
Contrary to the imminent outbreak of the universal translation machine announced some 50 years ago, the machine translation systems don't yet produce good quality translations. Why not? Pierre Isabelle and Patrick Andries, from the Laboratoire de recherche appliquée en linguistique informatique (RALI) (Laboratory for Applied Research in Computational Linguistics) in Montreal, Quebec, explain this failure in La traduction automatique, 50 ans après (Machine translation, 50 years later), an article published in the Dossiers of the daily cybermagazine Multimédium:
"The ultimate goal of building a machine capable of competing with a human translator remains elusive due to the slow progress of the research. [...] Recent research, based on large collections of texts called corpora - using either statistical or analogical methods - promise to reduce the quantity of manual work required to build a MT [machine translation] system, but it is less sure than they can promise a substantial improvement in the quality of machine translation. [...] the use of MT will be more or less restricted to information assimilation tasks or tasks of distribution of texts belonging to restricted sub-languages."
According to Yehochua Bar-Hillel's ideas expressed in The State of Machine Translation, an article published in 1951, Pierre Isabelle and Patrick Andries define three MT implementation strategies: 1) a tool of information assimilation to scan multilingual information and supply rough translation, 2) situations of "restricted language" such as the METEO system which, since 1977, has been translating the weather forecasts of the Canadian Ministry of Environment, 3) the human being/machine coupling before, during and after the MT process, which is not inevitably economical compared to traditional translation.
The authors favour "a workstation for the human translator" more than a "robot translator":
"The recent research on the probabilist methods permitted in fact to demonstrate that it was possible to modelize in a very efficient way some simple aspects of the translation relationship between two texts. For example, methods were set up to calculate the correct alignment between the text sentences and their translation, that is, to identify the sentence(s) of the source text which correspond(s) to each sentence of the translation. Applied on a large scale, these techniques allow the use of archives of a translation service to build a translation memory which will often permit the recycling of previous translation fragments. Such systems are already available on the translation market (IBM Translation Manager II, Trados Translator's Workbench by Trados, RALI TransSearch, etc.)
The most recent research focuses on models able to automatically set up the correspondences at a finer level than the sentence level: syntagms and words. The results obtained foresee a whole family of new tools for the human translator, including aids for terminological studying, aids for dictation and translation typing, and detectors of translation errors."
Chapter 5: Language-Related Research
Table of Contents
Mutilingualism on the Web
Le Livre 010101: Home Page
NEF: Home Page
© 1999 Marie Lebert