NEF - Le Livre 010101 de Marie Lebert - Multilingualism on the Web

Multilingualism on the Web (1999)
3. Language Resources

3.1. Sites Indexing Language Resources
3.2. Language Directories
3.3. Dictionaries and Glossaries
3.4. Textual Databases
3.5. Terminological Databases

3.1. Sites Indexing Language Resources

Prepared by the Telematics for Libraries Programme of the European Union, Multilingual Tools and Services gives a series of links to dictionaries, multilingual support, projects, search engines by language, terminology data banks, thesauri, and translation systems.

Created by Tyler Chambers in May 1994, The Human-Languages Page is a comprehensive catalog of 1,800 language-related Internet resources in more than 100 different languages. The subject listings are: languages and literature; schools and institutions; linguistics resources; products and services; organizations; jobs and internships. The category listings are: dictionaries and language lessons.

Tyler Chambers' other main language-related project is the Internet Dictionary Project. As explained on the website:

"The Internet Dictionary Project's goal is to create royalty-free translating dictionaries through the help of the Internet's citizens. This site allows individuals from all over the world to visit and assist in the translation of English words into other languages. The resulting lists of English words and their translated counterparts are then made available through this site to anyone, with no restrictions on their use. [...]

The Internet Dictionary Project began in 1995 in an effort to provide a noticeably lacking resource to the Internet community and to computing in general -- free translating dictionaries. Not only is it helpful to the on-line community to have access to dictionary searches at their fingertips via the World Wide Web, it also sponsors the growth of computer software which can benefit from such dictionaries -- from translating programs to spelling-checkers to language-education guides and more. By facilitating the creation of these dictionaries on-line by thousands of anonymous volunteers all over the Internet, and by providing the results free-of-charge to anyone, the Internet Dictionary Project hopes to leave its mark on the Internet and to inspire others to create projects which will benefit more than a corporation's gross income."

Tyler Chambers answered my questions in his e-mail of 14 September 1998.

ML: "How do you see multilingualism on the Web?"

TC: "Multilingualism on the Web was inevitable even before the medium 'took off', so to speak. 1994 was the year I was really introduced to the Web, which was a little while after its christening but long before it was mainstream. That was also the year I began my first multilingual Web project, and there was already a significant number of language-related resources on-line. This was back before Netscape even existed -- Mosaic was almost the only Web browser, and web pages were little more than hyperlinked text documents. As browsers and users mature, I don't think there will be any currently spoken language that won't have a niche on the Web, from Native American languages to Middle Eastern dialects, as well as a plethora of 'dead' languages that will have a chance to find a new audience with scholars and others alike on-line. To my knowledge, there are very few language types which are not currently on-line: browsers currently have the capability to display Roman characters, Asian languages, the Cyrillic alphabet, Greek, Turkish, and more. Accent Software has a product called 'Internet with an Accent' which claims to be able to display over 30 different language encodings. If there are currently any barriers to any particular language being on the Web, they won't last long."

ML: "What did the use of the Internet bring to your professional life?"

TC: "My professional life is currently completely separate from my Internet life. Professionally, I'm a computer programmer/techie -- I find it challenging and it pays the bills. On-line, my work has been with making language information available to more people through a couple of my Web-based projects. While I'm not multilingual, nor even bilingual, myself, I see an importance to language and multilingualism that I see in very few other areas. The Internet has allowed me to reach millions of people and help them find what they're looking for, something I'm glad to do. It has also made me somewhat of a celebrity, or at least a familiar name in certain circles -- I just found out that one of my Web projects had a short mention in Time Magazine's Asia and International issues. Overall, I think that the Web has been great for language awareness and cultural issues -- where else can you randomly browse for 20 minutes and run across three or more different languages with information you might potentially want to know? Communications mediums make the world smaller by bringing people closer together; I think that the Web is the first (of mail, telegraph, telephone, radio, TV) to really cross national and cultural borders for the average person. Israel isn't thousands of miles away anymore, it's a few clicks away -- our world may now be small enough to fit inside a computer screen."

ML: "How do you see the future of Internet-related activities as regards languages?"

TC: "As I've said before, I think that the future of the Internet is even more multilingualism and cross-cultural exploration and understanding than we've already seen. But the Internet will only be the medium by which this information is carried; like the paper on which a book is written, the Internet itself adds very little to the content of information, but adds tremendously to its value in its ability to communicate that information. To say that the Internet is spurring multilingualism is a bit of a misconception, in my opinion -- it is communication that is spurring multilingualism and cross-cultural exchange, the Internet is only the latest mode of communication which has made its way down to the (more-or-less) common person. The Internet has a long way to go before being ubiquitous around the world, but it, or some related progeny, likely will. Language will become even more important than it already is when the entire planet can communicate with everyone else (via the Web, chat, games, e-mail, and whatever future applications haven't even been invented yet), but I don't know if this will lead to stronger language ties, or a consolidation of languages until only a few, or even just one remain. One thing I think is certain is that the Internet will forever be a record of our diversity, including language diversity, even if that diversity fades away. And that's one of the things I love about the Internet -- it's a global model of the saying 'it's not really gone as long as someone remembers it'. And people do remember."

Since its inception in 1989, the CTI (Computer in Teaching Initiative) Centre for Modern Languages has been based in the Language Institute at the University of Hull, United Kingdom, and aims to promote and encourage the use of computers in language learning and teaching. The Centre provides information on how computer assisted language learning (CALL) can be effectively integrated into existing courses and offers support for language lecturers who are using, or who wish to use, computers in their teaching.

June Thompson, Manager of the Centre, answered my questions in his e-mail of December 14, 1998.

ML: "How do you see multilingualism on the Internet?"

JT: "The Internet has the potential to increase the use of foreign languages, and our organisation certainly opposed any trend towards the dominance of English as the language of the Internet. An interesting paper on this topic was delivered by Madanmohan Rao at the WorldCALL conference in Melbourne, July 1998." [See details of the forthcoming conference book]

ML: "What did the use of the Internet bring to the life of your organization?"

JT: "The use of the Internet has brought an enormous new dimension to our work of supporting language teachers in their use of technology in teaching."

ML: "How do you see the future of Internet-related activities as regards languages?"

JT: "I suspect that for some time to come, the use of Internet-related activities for languages will continue to develop alongside other technology-related activities (e.g. use of CD-ROMs - not all institutions have enough networked hardware). In the future I can envisage use of Internet playing a much larger part, but only if such activities are pedagogy-driven. Our organisation is closely associated with the WELL project [Web Enhanced Language Learning] which devotes itself to these issues."

Hosted by the CTI Centre for Modern Languages and the University of Hull (United Kingdom), EUROCALL is the European Association for Computer Assisted Language Learning. This association of language teaching professionals from Europe and worldwide aims to: promote the use of foreign languages within Europe; provide a European focus for all aspects of the use of technology for language learning; enhance the quality, dissemination and efficiency of CALL (computer assisted language learning) materials; and support Special Interest Groups (SIGs): CAPITAL (Computer Assisted Pronunciation Investigation Teaching and Learning), a group of researchers and practitioners interested in using the computers in the domain of pronunciation in the widest sense of the word, and WELL (Web Enhanced Language Learning), which will provide access to high-quality Web resources in 12 languages, selected and described by subject experts, plus information and examples on how to use them for teaching and learning.

Internet Resources for Language Teachers and Learners offers several categories of links: general languages resources (centres and departments, dictionaries and grammars; discussion lists; distance language learning; fonts; journals; linguistics; lists and indexes; miscellaneous; newspapers and periodicals; organizations; resource sites; software; translation and interpreting); language-specific resources; multilingual language sites; search engines and indexes; and commercial language sites (audiovisual, language schools, resources and directories, software).

Maintained by the Institute of Phonetic Sciences, Amsterdam, the Netherlands, Speech on the Web is an extensive list of links organized in various sections: congresses, meetings, and workshops; links and lists; phonetics and speech; natural language processing, cognitive science, and AI (artificial intelligence); computational linguistics; dictionaries; electronic newsletters, journals and publications.

Travlang is a site dedicated both to travel and languages. Created by Michael C. Martin in 1994 on the site of his university when he was a student in physics, Foreign Languages for Travelers, included in Travlang in 1995, gives the possibility to learn 60 different languages on the Web. Translating Dictionaries gives access to free dictionaries in various languages (Afrikaans, Czech, Danish, Dutch, Esperanto, Finnish, French, Frisian, German, Hungarian, Italian, Latin, Norwegian, Portuguese, and Spanish). Maintained by its founder, who is now a researcher in experimental physics at the Lawrence Berkeley National Laboratory, California, the site offers numerous links to language dictionaries, translation services, language schools, multilingual bookstores, etc.

Michael C. Martin answered my questions in his e-mail of August 25, 1998.

ML: "How do you see multilingualism on the Web?"

MCM: "I think the Web is an ideal place to bring different cultures and people together, and that includes being multilingual. Our Travlang site is so popular because of this, and people desire to feel in touch with other parts of the world."

ML: "What did the use of the Internet bring to your professional life?"

MCM: "Well, certainly we've made a little business of it! The Internet is really a great tool for communicating with people you wouldn't have the opportunity to interact with otherwise. I truly enjoy the global collaboration that has made our Foreign Languages for Travelers pages possible."

ML: "How do you see the future of Internet-related activities as regards languages?"

MCM: "I think computerized full-text translations will become more common, enabling a lot of basic communications with even more people. This will also help bring the Internet more completely to the non-English speaking world."

The LINGUIST List is the component of the WWW Virtual Library for linguistics. It gives an extensive series of links on linguistic resources: the profession (conferences, linguistic associations, programs, etc.); research and research support (papers, dissertation abstracts, projects, bibliographies, topics, texts); publications; pedagogy; language resources (languages, language families, dictionaries, regional information); and computer support (fonts and software).

Helen Dry, moderator of the LINGUIST List, explained in her e-mail of August 18, 1998:

"The LINGUIST List, which I moderate, has a policy of posting in any language, since it's a list for linguists. However, we discourage posting the same message in several languages, simply because of the burden extra messages put on our editorial staff. (We are not a bounce-back list, but a moderated one. So each message is organized into an issue with like messages by our student editors before it is posted.) Our experience has been that almost everyone chooses to post in English. But we do link to a translation facility that will present our pages in any of 5 languages; so a subscriber need not read LINGUIST in English unless s/he wishes to. We also try to have at least one student editor who is genuinely multilingual, so that readers can correspond with us in languages other than English."

Maintained by the Yamada Language Center of the University of Oregon, the Yamada WWW Language Guides is a directory of language resources by geographic family and alphabetic family. It covers organizations, teaching institutes, curriculum materials, cultural references, and WWW links.

Language today is a new magazine for people working in applied languages: translators, interpreters, terminologists, lexicographers and technical writers. It is a collaborative project between Logos, who provide the website, and Praetorius, the UK language consultancy which keeps itself constantly informed about developments in applied languages. The site gives links to translators associations, language schools, and dictionaries.

Geoffrey Kingscott, managing director of Praetorius, answered my questions in his e-mail of September 4, 1998.

ML: "How do you see multilingualism on the Web?"

GK: "Because the salient characteristics of the Web are the multiplicity of site generators and the cheapness of message generation, as the Web matures it will in fact promote multilingualism. The fact that the Web originated in the USA means that it is still predominantly in English but this is only a temporary phenomenon. If I may explain this further, when we relied on the print and audiovisual (film, television, radio, video, cassettes) media, we had to depend on the information or entertainment we wanted to receive being brought to us by agents (publishers, television and radio stations, cassette and video producers) who have to subsist in a commercial world or -- as in the case of public service broadcasting -- under severe budgetary restraints. That means that the size of the customer-base is all-important, and determines the degree to which languages other than the ubiquitous English can be accommodated. These constraints disappear with the Web. To give only a minor example from our own experience, we publish the print version of Language Today only in English, the common denominator of our readers. When we use an article which was originally in a language other than English, or report an interview which was conducted in a language other than English, we translate into English and publish only the English version. This is because the number of pages we can print is constrained, governed by our customer-base (advertisers and subscribers). But for our Web edition we also give the original version."

ML: "What did the use of the Internet bring to your company?"

GK: "The Internet has made comparatively little difference to our company. It is an additional medium rather than one which will replace all others."

ML: "How do you see the future with the Internet?"

GK: "We will continue to have a company website, and to publish a version of the magazine on the Web, but it will remain only one factor in our work. We do use the Internet as a source of information which we then distill for our readers, who would otherwise be faced with the biggest problem of the Web -- undiscriminating floods of information."

3.2. Language Directories

The Ethnologue is the electronic version of The Ethnologue, 13th ed., (editor: Barbara F. Grimes, consulting editors: Richard S. Pittman and Joseph E. Grimes), published in 1996 by the Summer Institute of Linguistics, Dallas, Texas. This catalogue of more than 6,700 languages spoken in 228 countries is accessible through two search tools: The Ethnologue Name Index, which lists language names, dialect names, and alternate names, and The Ethnologue Language Family Index, which organizes languages according to language families.

Barbara F. Grimes, editor of The Ethnologue, wrote in her e-mail of August 18, 1998:

"Multilingual web pages are more widely useful, but much more costly to maintain. We have had requests for The Ethnologue in a few other languages, but we do not have the personnel or funds to do the translation or maintenance, since it is constantly being updated.

We have found the Internet to be useful, convenient, and supplementary to our work. Our main use of it is for e-mail.

It is a convenient means of making information more widely available to a wider audience than the printed Ethnologue provides.

On the other hand, many people in the audience we wish to reach do not have access to computers, so in some ways the Ethnologue on Internet reaches a limited audience who own computers. I am particularly thinking of people in the so-called 'third world'."

Created in December 1995 by Yoshi Mikami of Asia Info Network, The Languages of the World by Computers and the Internet (commonly called Logos Home Page or Kotoba Home Page) gives, for each language, its brief history, features, writing system, and character set and keyboard for computers and the Internet processing. In his e-mail of December 17, 1998, Yoshi Mikami wrote:

"My native tongue is Japanese. Because I had my graduate education in the US and worked in the computer business, I became bilingual Japanese/American English. I was always interested in different languages and cultures, so I learned some Russian, French and Chinese along the way. In late 1995, I created on the Web The Languages of the World by Computers and the Internet and tried to summarize there the brief history, linguistic and phonetic features, writing system and computer processing for each of the six major languages of the world, in English and Japanese. As I gained more experience, I invited my two associates to write a book on viewing, understanding and creating the multilingual web pages, which was published in August, 1997, as "The Multilingual Web Guide" (see its support page) in the Japanese edition, the world's first book on such a subject.

Thousands of years ago, in Egypt, China and elsewhere, people were more conscious about communicating their laws and thoughts not in just one language, but in different languages. In our modern world, each nation state has adopted more or less one language for its own use. I see in the future of the Internet a greater use of different languages and multilingual pages, not a simple gravitation to American English, and a more creative use of multilingual computer translation. Ninety nine percent of the Webs created in Japan are written in Japanese!"

Maintained on the website of the College Sabhal Mór Ostaig, Island of Skye, Scotland, by Caoimhín P. Ó Donnaíle, European Minority Languages is a list of minority languages by alphabetic order and by language family. The site also gives links to other sites dealing with the same subject worldwide.

Caoimhín P. Ó Donnaíle wrote in her e-mail of August 18, 1998:

"-- The Internet has contributed and will contribute to the wildfire spread of English as a world language.

-- The Internet can greatly help minority languages, but this will not happen by itself. It will only happen if people want to maintain the language as an aim in itself.

-- The Web is very useful for delivering language lessons, and there is a big demand for this.

-- The Unicode (ISO 10646) character set standard is very important and will greatly assist in making the Internet more multilingual."

3.3. Dictionaries and Glossaries

There are more and more on-line dictionaries. Let us give three examples (English, French and multilingual).

In Merriam-Webster Online: the Language Center, a main publisher of English dictionaries gives free access to a collection of on-line resources. The goal is to help track down definitions, spellings, pronunciations, synonyms, vocabulary exercises, and other key facts about words and language. The main on-line resources are: WWWebster Dictionary, WWebster Thesaurus, Webster's Third (a lexical landmark), Guide to International Business Communications, Vocabulary Builder (with interactive vocabulary quizzes), and the Barnhart Dictionary Companion (hot new words).

The Dictionnaire francophone en ligne is the web version of the Dictionnaire universel francophone, published by Hachette, a major French publisher, and the Agence universitaire de la Francophonie (AUPELF-UREF) (University Agency for Francophony), which presents the standard French and the French words and expressions used in the five continents.

The Logos Dictionary is a multilingual dictionary with 8 million entry words in all languages. Logos, an international translation company based in Modena, Italy, gives free access to the linguistic tools used by its translators: 200 translators in its headquarters and 2,500 translators on-line all over the world, who process around 200 texts per day. Apart from the Logos Dictionary, these tools include: the Wordtheque, a word-by-word multilingual library with a massive database (325 million words) containing multilingual novels, technical literature and translated texts; Linguistic Resources, a database of 536 glossaries; and the Universal Conjugator, a database for conjugation of verbs in 17 languages.

In Les mots pour le dire, an article of the French daily newspaper Le Monde of December 7, 1997, Annie Kahn wrote:

"The Logos site is much more than a mere dictionary or a collection of links to other on-line dictionaries. A cornerstone of the system is the document search software, which processes a corpus of literary texts available free of charge on the Web. If you search for the definition or the translation of a word ('didactique', for example), you get not only the answer sought, but also a quote from one of the literary works containing the word (in our case, an essay by Voltaire). All it takes is a click on the mouse to access the whole text or even to order the book, thanks to a partnership agreement with, the well-known on-line book shop. Foreign translations are also available. If however no text containing the required word is found, the system acts as a search engine, sending the user to other websites concerning the term in question. In the case of certain words, you can even hear the pronunciation. If there is no translation currently available, the system calls on the public to contribute. Everyone can make their own suggestion, after which Logos translators and the company verify the translations forwarded."

In the same article, Rodrigo Vergara, the Head of Logos, explained:

"We wanted all our translators to have access to the same translation tools. So we made them available on the Internet, and while we were at it we decided to make the site open to the public. This made us extremely popular, and also gave us a lot of exposure. In fact the operation attracted a great number of customers, and also allowed us to widen our network of translators, thanks to the contacts made in the wake of this initiative."

The dictionary directories are invaluable tools for linguists, such as Dictionnaires électroniques (Electronic Dictionaries), OneLook Dictionaries and A Web of Online Dictionaries.

Dictionnaires électroniques (Electronic Dictionaries) is an extensive list of electronic dictionaries prepared by the Section française des Services linguistiques centraux (SLC-f) (French Section of the Central Linguistic Services) of the Swiss Federal Administration, and classified into five main sections: abbreviations and acronyms; monolingual dictionaries; bilingual dictionaries; multilingual dictionaries; and geographical information. The search of a dictionary is also possible by key-words.

Marcel Grangier, head of this section, answered my questions in his e-mail of January 14, 1999.

ML: "How do you see multilingualism on the Internet?"

MG: "Multilingualism on the Internet can be seen as a happy and above all irreversible inevitability. In this perspective we have to make fun of the wet blankets who only speak to complain about the supremacy of English. This supremacy is not wrong in itself, inasmuch as it is the result of mainly statistical facts (more PCs per inhabitant, more English-speaking people, etc.). The counter-attack is not to 'fight against English' and even less to whine about it, but to increase sites in other languages. As a translation service, we also recommend the multilingualism of websites."

ML: "What did the use of the Internet bring to your professional life?"

MG: "To work without the Internet is simply impossible now -- as well as all the tools used (e-mail, electronic press, services for translators), Internet is for us an essential and inexhaustible source of information in what I would call the 'non-structured sector' of the Web. For example, when the answer to a translation problem can't be found in websites presenting information in an organized way, in most cases search engines allow us to find the missing link somewhere on the network."

ML: "How do you see the future of Internet-related activities as regards languages?"

MG: "The increase in the number of languages on the Internet is inevitable, and can only be a benefit for multicultural exchanges. For the exchanges to happen in an optimal environment, it is still necesssary to develop tools which will improve compatibility -- the complete management of diacritics is only one example of what can be done."

Provided as a free service since April 1996 by Study Technologies, Englewood, Colorado, OneLook Dictionaries, by Robert Ware, is the fastest finder for more than 2 million words in 425 dictionaries in various fields: business, computer/Internet, medical, miscellaneous, religion, science, sports, technology, general, and slang.

In his e-mail of September 2, 1998, Robert Ware explained:

"On the personal side, I was almost entirely in contact with people who spoke one language and did not have much incentive to expand language abilities. Being in contact with the entire world has a way of changing that. And changing it for the better! [...] I have been slow to start including non-English dictionaries (partly because I am monolingual). But you will now find a few included."

A Web of Online Dictionaries, by Robert Beard, is an index of more than 800 on-line dictionaries in 150 languages, and other tools: multilingual dictionaries; specialized English dictionaries; thesauri and other vocabulary aids; language identifiers and guessers; an index of dictionary indices; a Web of on-line grammars; and a Web of linguistic fun (materials about linguistics for non-specialists).

Robert Beard answered my questions in his e-mail of September 1, 1998.

ML: "How do you see multilingualism on the Web?"

RB: "There was an initial fear that the Web posed a threat to multilingualism on the Web, since HTML and other programming languages are based on English and since there are simply more websites in English than any other language. However, my websites indicate that multilingualism is very much alive and the Web may, in fact, serve as a vehicle for preserving many endangered languages. I now have links to dictionaries in 150 languages and grammars of 65 languages. Moreover, the new attention paid by browser developers to the different languages of the world will encourage even more websites in different languages."

ML: "What did the use of the Internet bring to your professional life?"

RB: "As a language teacher, the Web represents a plethora of new resources produced by the target culture, new tools for delivering lessons (interactive Java and Shockwave exercises) and testing, which are available to students any time they have the time or interest -- 24 hours a day, 7 days a week. It is also an almost limitless publication outlet for my colleagues and I, not to mention my institution."

ML: "How do you see the future of Internet-related activities as regards languages?"

RB: "Ultimately all course materials, including lecture notes, exercises, moot and credit testing, grading, and interactive exercises far more effective in conveying concepts that we have not even dreamed of yet. The Web will be an encyclopedia of the world by the world for the world. There will be no information or knowledge that anyone needs that will not be available. The major hindrance to international and interpersonal understanding, personal and institutional enhancement, will be removed. It would take a wilder imagination than mine to predict the effect of this development on the nature of humankind."

Initiated by the WorldWide Language Institute, NetGlos (The Multilingual Glossary of Internet Terminology) is currently being compiled from 1995 as a voluntary, collaborative project by a number of translators and other professionals. Versions for the following languages are being prepared: Chinese, Croatian, English, Dutch/Flemish, French, German, Greek, Hebrew, Italian, Maori, Norwegian, Portuguese, and Spanish.

Brian King, director of the WorldWide Language Institute, answered my questions in his e-mail of September 15, 1998.

ML: "How do you see multilingualism on the Web?"

BL: "Although English is still the most important language used on the Web, and the Internet in general, I believe that multilingualism is an inevitable part of the future direction of cyberspace.

Here are some of the important developments that I see as making a multilingual Web become a reality:

1. Popularization of information technology

Computer technology has traditionally been the sole domain of a 'techie' elite, fluent in both complex programming languages and in English -- the universal language of science and technology. Computers were never designed to handle writing systems that couldn't be translated into ASCII. There wasn't much room for anything other than the 26 letters of the English alphabet in a coding system that originally couldn't even recognize acute accents and umlauts -- not to mention nonalphabetic systems like Chinese.

But tradition has been turned upside down. Technology has been popularized. GUIs (graphical user interfaces) like Windows and Macintosh have hastened the process (and indeed it's no secret that it was Microsoft's marketing strategy to use their operating system to make computers easy to use for the average person). These days this ease of use has spread beyond the PC to the virtual, networked space of the Internet, so that now nonprogrammers can even insert Java applets into their webpages without understanding a single line of code.

2. Competition for a chunk of the 'global market' by major industry players

An extension of (local) popularization is the export of information technology around the world. Popularization has now occurred on a global scale and English is no longer necessarily the lingua franca of the user. Perhaps there is no true lingua franca, but only the individual languages of the users. One thing is certain -- it is no longer necessary to understand English to use a computer, nor it is necessary to have a degree in computer science.

A pull from non-English-speaking computer users and a push from technology companies competing for global markets has made localization a fast growing area in software and hardware development. This development has not been as fast as it could have been. The first step was for ASCII to become Extended ASCII. This meant that computers could begin to start recognizing the accents and symbols used in variants of the English alphabet -- mostly used by European languages. But only one language could be displayed on a page at a time.

3. Technological developments

The most recent development is Unicode. Although still evolving and only just being incorporated into the latest software, this new coding system translates each character into 16 bytes. Whereas 8 byte Extended ASCII could only handle a maximum of 256 characters, Unicode can handle over 65,000 unique characters and therefore potentially accommodate all of the world's writing systems on the computer.

So now the tools are more or less in place. They are still not perfect, but at last we can at least surf the Web in Chinese, Japanese, Korean, and numerous other languages that don't use the Western alphabet. As the Internet spreads to parts of the world where English is rarely used -- such as China, for example, it is natural that Chinese, and not English, will be the preferred choice for interacting with it. For the majority of the users in China, their mother tongue will be the only choice.

There is a change-over period, of course. Much of the technical terminology on the Web is still not translated into other languages. And as we found with our Multilingual Glossary of Internet Terminology -- known as NetGlos -- the translation of these terms is not always a simple process. Before a new term becomes accepted as the 'correct' one, there is a period of instability where a number of competing candidates are used. Often an English loanword becomes the starting point -- and in many cases the endpoint. But eventually a winner emerges that becomes codified into published technical dictionaries as well as the everyday interactions of the nontechnical user. The latest version of NetGlos is the Russian one and it should be available in a couple of weeks or so [end of September 1998]. It will no doubt be an excellent example of the ongoing, dynamic process of 'Russification' of Web terminology.

4. Linguistic democracy

Whereas 'mother-tongue education' was deemed a human right for every child in the world by a UNESCO report in the early '50s, 'mother-tongue surfing' may very well be the Information Age equivalent. If the Internet is to truly become the Global Network that it is promoted as being, then all users, regardless of language background, should have access to it. To keep the Internet as the preserve of those who, by historical accident, practical necessity, or political privilege, happen to know English, is unfair to those who don't.

5. Electronic commerce

Although a multilingual Web may be desirable on moral and ethical grounds, such high ideals are not enough to make it other than a reality on a small-scale. As well as the appropriate technology being available so that the non-English speaker can go, there is the impact of 'electronic commerce' as a major force that may make multilingualism the most natural path for cyberspace.

Sellers of products and services in the virtual global marketplace into which the Internet is developing must be prepared to deal with a virtual world that is just as multilingual as the physical world. If they want to be successful, they had better make sure they are speaking the languages of their customers!"

ML: "What did the Internet bring to the life of your organization?"

BK: "Our main service is providing language instruction via the Web. Our company is in the unique position of having come into existence BECAUSE of the Internet!"

ML: "How do you see the future of Internet-related activities as regards languages?"

BK: "As a company that derives its very existence from the importance attached to languages, I believe the future will be an exciting and challenging one. But it will be impossible to be complacent about our successes and accomplishments. Technology is already changing at a frenetic pace. Life-long learning is a strategy that we all must use if we are to stay ahead and be competitive. This is a difficult enough task in an English-speaking environment. If we add in the complexities of interacting in a multilingual/multicultural cyberspace, then the task becomes even more demanding. As well as competition, there is also the necessity for cooperation -- perhaps more so than ever before."

The seeds of cooperation across the Internet have certainly already been sown. Our NetGlos Project has depended on the goodwill of volunteer translators from Canada, U.S., Austria, Norway, Belgium, Israel, Portugal, Russia, Greece, Brazil, New Zealand and other countries. I think the hundreds of visitors we get coming to the NetGlos pages everyday is an excellent testimony to the success of these types of working relationships. I see the future depending even more on cooperative relationships -- although not necessarily on a volunteer basis."

3.4. Textual Databases

Let us take the example of two textual databases relating to the French language -- the French FRANTEXT and the US-French ARTFL Project.

The FRANTEXT textual database has been available on the Web through subscription since the beginning of 1995. It is prepared in France by the Institut national de la langue française (INaLF) (National Institute of the French Language), a section of the Centre national de la recherche scientifique (CNRS) (National Center for Scientific Research). This interactive database includes 180 million words resulting from the automatic processing of a collection of 3,500 texts in arts, techniques and sciences, representing five centuries of literature (16th-20th centuries).

At the beginning of 1998, 82 research centers and university libraries in Europe, Australia, Canada and Japan were subscribing to FRANTEXT, with 1,250 work stations connected to the database, and about 50 questioning sessions per day. The detailed results of the inquiry sent to FRANTEXT users in January 1998 are presented on the website by Arlette Attali.

In the future, Arlette Attali is thinking about "contributing to the development of the linguistic tools associated to the FRANTEXT database and getting teachers, researchers and students to know them." In her e-mail of June 11, 1998, she also explained the changes brought by the Internet in her professional life:

"As I was more specially assigned to the development of textual databases at the INaLF, I had to explore the websites giving access to electronic texts and test them. I became a 'textual tourist' with the good and bad sides of this activity. The tendency to go quickly from one link to another, and to skip through the information, was a permanent danger -- it is necessary to target what you are looking for if you don't want to lose your time. The use of the Web totally changed my working methods -- my investigations are not only bookish and within a narrow circle anymore, on the contrary they are expanding thanks to the electronic texts available on the Internet."

The ARTFL Project (ARTFL: American and French Research on the Treasury of the French Language) is a cooperative project established in 1981 by the Institut national de la langue française (INaLF) (National Institute of the French Language, based in France) and the Division of the Humanities of the University of Chicago. Its purpose is to be a research tool for scholars and students in all areas of French studies.

The origin of the project is a 1957 initiative of the French government to create a new dictionary of the French language, the Trésor de la Langue Française (Treasure of the French Language). In order to provide access to a large body of word samples, it was decided to transcribe an extensive selection of French texts for use with a computer. Twenty years later, a corpus totaling some 150 million words had been created, representing a broad range of written French -- from novels and poetry to biology and mathematics -- stretching from the 17th to the 20th centuries.

This corpus of French texts was an important resource not only for lexicographers, but also for many other types of humanists and social scientists engaged in French studies -- on both sides of the Atlantic. The result of this realization was the ARTFL Project, as explained on its website:

"At present the corpus consists of nearly 2,000 texts, ranging from classic works of French literature to various kinds of non-fiction prose and technical writing. The eighteenth, nineteenth and twentieth centuries are about equally represented, with a smaller selection of seventeenth century texts as well as some medieval and Renaissance texts. We have also recently added a Provençal database that includes 38 texts in their original spellings. Genres include novels, verse, theater, journalism, essays, correspondence, and treatises. Subjects include literary criticism, biology, history, economics, and philosophy. In most cases standard scholarly editions were used in converting the text into machine-readable form, and the data contain page references to these editions."

One of the largest of its kind in the world, the ARTFL database permits both the rapid exploration of single texts, and the inter-textual research of a kind. ARTFL is now on the Web, and the system is available through the Internet to its subscribers. Access to the database is organized through a consortium of user institutions, in most cases universities and colleges which pay an annual subscription fee.

The ARTFL Encyclopédie Project is currently developing an on-line version of Diderot and d'Alembert's Encyclopédie, ou Dictionnaire raisonné des sciences, des arts et des métiers, including all 17 volumes of text and 11 volumes of plates from the first edition, that is to say about 18,000 pages of text and exactly 20,736,912 words.

Published under the direction of Diderot between 1751 and 1772, the Encyclopédie counted as contributors the most prominent philosophers of the time: Voltaire, Rousseau, d'Alembert, Marmontel, d'Holbach, Turgot, etc.

"These great minds (and some lesser ones) collaborated in the goal of assembling and disseminating in clear, accessible prose the fruits of accumulated knowledge and learning. Containing 72,000 articles written by more than 140 contributors, the Encyclopédie was a massive reference work for the arts and sciences, as well as a machine de guerre which served to propagate Enlightened ideas [...] The impact of the Encyclopédie was enormous, not only in its original edition, but also in multiple reprintings in smaller formats and in later adaptations. It was hailed, and also persecuted, as the sum of modern knowledge, as the monument to the progress of reason in the eighteenth century. Through its attempt to classify learning and to open all domains of human activity to its readers, the Encyclopédie gave expression to many of the most important intellectual and social developments of its time."

At present, while work continues on the fully navigational, full-text version, ARTFL is providing public access on its website to the Prototype Demonstration of Volume One. From Autumn 1998 a preliminary version is released for consultation by all ARTFL subscribers.

Mentioned on the ARTFL home page in the Reference Collection, other ARTFL projects are: the 1st (1694) and 5th (1798) editions of the Dictionnaire de L'Académie française; Jean Nicot's Trésor de la langue française (1606) Dictionary; Pierre Bayle's Dictionnaire historique et critique (1740 edition) (text of an image-only version); The Wordsmyth English Dictionary-Thesaurus; Roget's Thesaurus, 1911 edition; Webster's Revised Unabridged Dictionary; the French Bible by Louis Segond and parallel Bibles in German, Latin, and English, etc.

Created by Michael S. Hart in 1971, the Project Gutenberg was the first information provider on the Internet. It is now the oldest digital library on the Web, and the biggest considering the number of works (1,500) which has been digitalized for it, with 45 new titles per month. Michael Hart's purpose is to put on the Web as many literary texts as possible for free.

In his e-mail of August 23, 1998, Michael S. Hart explained:

"We consider e-text to be a new medium, with no real relationship to paper, other than presenting the same material, but I don't see how paper can possibly compete once people each find their own comfortable way to e-texts, especially in schools. [...] My own personal goal is to put 10,000 e-texts on the Net, and if I can get some major support, I would like to expand that to 1,000,000 and to also expand our potential audience for the average e-text from 1.x% of the world population to over 10%... thus changing our goal from giving away 1,000,000,000,000 e-texts to 1,000 time as many... a trillion and a quadrillion in US terminology."

Project Gutenberg is now developing its foreign collections, as announced in the Newsletter of October 1997. In the Newsletter of March 1998, Michael S. Hart mentioned that Project Gutenberg's volunteers were now working on e-texts in French, German, Portuguese and Spanish, and he was also hoping to get some e-texts in the following languages: Arabic, Chinese, Danish, Dutch, Esperanto, Greek, Hebrew, Hungarian, Italian, Japanese, Korean, Latin, Lithuanian, Polish, Romanian, Russian, Slovak, Slovene, and Valencian (Catalan).

3.5. Terminological Databases

The free consultation of terminological databases on the Web is much appreciated by language specialists. There are some terminological databases maintained by international organizations, such as Eurodicautom, maintained by the Translation Service of the European Commission; ILOTERM, maintained by the International Labour Organization (ILO), the ITU Telecommunication Terminology Database (TERMITE), maintained by the International Telecommunication Union (ITU) and the WHO Terminology Information System (WHOTERM), maintained by the World Health Organization (WHO).

Eurodicautom is the multilingual terminological database of the Translation Service of the European Commission. Initially developed to assist in-house translators, it is consulted today by an increasing number of European Union officials other than translators, as well as by language professionals throughout the world. Its huge, constantly updated, contents is drafted in twelve languages (Danish, Dutch, English, Finnish, French, German, Greek, Italian, Latin, Portuguese, Spanish, Swedish), and covers a broad spectrum of human knowledge, while the main core relates to European Union topics.

ILOTERM is the quadrilingual (English, French, German, Spanish) terminology database maintained by the Terminology and Reference Unit of the Official Documentation Branch (OFFDOC) of the International Labour Office (ILO), Geneva, Switzerland. Its primary purpose is to provide solutions, reflecting current usage, to terminological problems in the social and labor fields. Terms are entered in English with their French, Spanish and/or German equivalents. The database also includes records (in up to four languages) concerning the structure and programmes of the ILO, official names of international institutions, national bodies and employers' and workers' organizations, as well as titles of international meetings and instruments.

The ITU Telecommunication Terminology Database (TERMITE) is maintained by the Terminology, References and Computer Aids to Translation Section of the Conference Department of the International Telecommunication Union (ITU), Geneva, Switzerland. TERMITE (59,000 entries) is a quadrilingual (English, French, Spanish, Russian) terminological database which contains all the terms which appeared in ITU printed glossaries since 1980, as well as more recent entries relating to the different activities of the Union.

Maintained by the World Health Organization (WHO), Geneva, Switzerland, the WHO Terminology Information System (WHOTERM) includes: the WHO General Dictionary Index, giving access to an English glossary of terms, with the French and Spanish equivalents for each term; three glossaries in English: Health for All, Programme Development and Management, and Health Promotion; the WHO TermWatch, an awareness service of the Technical Terminology, which is a service reflecting the current WHO usage -- but not necessarily terms officially approved by WHO -- and a series of links to health-related terminology

Chapter 4: Translation Resources
Table of Contents

Mutilingualism on the Web
Le Livre 010101: Home Page
NEF: Home Page

© 1999 Marie Lebert