NEF - Le Livre 010101 de Marie Lebert - From the Print Media to the Internet

From the Print Media to the Internet (1999)
7. Digital Libraries

7.1. The Digital Library: A Definition
7.2. Digital Libraries: Some Examples
7.3. Digital Image Collections
7.4. Future Trends for Digital Libraries


7.1. The Digital Library: A Definition

Digital libraries may be the major contribution from the print media to the Internet, and vice versa.

Thanks to the Internet, hundreds of public works, literary and scientific documents, articles, academic and research works, pictures and sound tracks are available on the screen for free. The collections of existing digital libraries increase regularly, and new digital libraries come up constantly.

Some digital libraries are created by "traditional" libraries who want to put their documents at the disposal of Internet users. Other digital libraries are "only" digital - their life is 100% on the Web.

Hosted by the Carnegie Mellon University, in Pittsburgh, Pennsylvania, the Universal Library defines the digital library as "a digital library of digital documents, artifacts, and records. The advantage of having library material available in digital form is threefold: (1) the content occupies less space and can be replicated and made secure electronically, (2) the content can be made immediately available over the Internet to anyone, anywhere, and (3) search for content can be automated. The promise of the digital library is the promise of great cost reductions while providing great increases in archive availability and accessibility. [...]

There are literally thousands of digital library initiatives of a great many varieties going on in the world today. Digital libraries are being formed of scholarly works, archives of historical figures and events, corporate and governmental records, museum collections and religious collections. Some take the form of scanning and putting documents to the World Wide Web. Still other digital libraries are formed of digitizing paintings, films and music. Work even exists in 3D reconstructive digitization that permits a digital deconstruction, storage, transmission, and reconstruction of solid object."

The British Library is a pioneer in Europe for research relating to digital libraries. Some treasures of the library are already on-line: Beowulf, the first great English masterpiece dated 11th century; Magna Carta, one example from 1215 issued over the Great Seal of King John; the Lindisfarne Gospels, dated 698; the Diamond Sutra, dated 868, which is the world's earliest printed book; the Sforza Hours, dated 1490-1520, which is an outstanding Renaissance treasure; the Codex Arundel, a notebook of Leonardo Da Vinci (1452-1519), and the Tyndale New Testament, which was the first printed New Testament in English, from the press of Peter Schoeffer in Worms.

Brian Lang, Chief Executive of the British Library, states on the British Library website:

"We do not envisage an exclusively digital library. We are aware that some people feel that digital materials will predominate in libraries of the future. Others anticipate that the impact will be slight. In the context of the British Library, printed books, manuscripts, maps, music, sound recordings and all the other existing materials in the collection will always retain their central importance, and we are committed to continuing to provide, and to improve, access to these in our reading rooms. The importance of digital materials will, however, increase. We recognize that network infrastructure is at present most strongly developed in the higher education sector, but there are signs that similar facilities will also be available elsewhere, particularly in the industrial and commercial sector, and for public libraries. Our vision of network access encompasses all these."

The Digital Library Programme will begin in February 1999. The two potential partners are: Dawson-IBM-The Stationery Office Consortium, and the Digital Library Consortium (Blackwell, Chadwyck-Healey, MicroPatent, Unisys). The confirmation of the preferred bidder is planned for February 1999, and the contract will be awarded in Spring 1999.

"The development of the Digital Library will enable the British Library to embrace the digital information age. Digital technology will be used to preserve and extend the Library's unparalleled collection. Access to the collection will become boundless with users from all over the world, at any time, having simple, fast access to digitized materials using computer networks, particularly the Internet."

What exactly is digitization? Digitization is the conversion of text, sound or images to digital form, that is, in the form of numerical digits (bits and bytes) for handling by computer. Digitization has made it possible to create, record, manipulate, combine, store, retrieve and transmit information and information-based products in ways which magnetic tape, celluloid and paper did not permit. Digitization thus allows music, cinema and the written word to be recorded and transformed through similar processes and without separate material supports. Previously dissimilar industries, such as publishing and sound recording, now both produce CD-ROMs, rather than simply books and records.


7.2. Digital Libraries: Some Examples

Created by Michael S. Hart in 1971, the Project Gutenberg was the first information provider on the Internet. It is now the oldest digital library on the Web, and the biggest in terms of the number of works (1,500) which have been digitized for it, with around 45 new titles per month. Michael Hart's purpose is to put on the Web as many literary texts as possible for a minimal price.

In his e-mail of August 23, 1998, Michael Hart explained:

"We consider Etext to be a new medium, with no real relationship to paper, other than presenting the same material, but I don't see how paper can possibly compete once people each find their own comfortable way to Etexts, especially in schools. [...] My own personal goal is to put 10,000 Etexts on the Net, and if I can get some major support, I would like to expand that to 1,000,000 and to also expand our potential audience for the average Etext from 1.x% of the world population to over 10%... thus changing our goal from giving away 1,000,000,000,000 Etexts to 1,000 time as many... a trillion and a quadrillion in US terminology."

The Etext # 1000 was Dante's Divine Comedy, in both English and Italian, and Michael Hart dreams about Etext # 2000 for January 1st, 2000. In the Project Gutenberg Newsletter of February 1998, he wrote: "If we do 36 per month for the next 23 month period, we should be able to reach 2,000 Etexts by January 1 of the year 2000. . . [...] I think it would be kind of nice to do our 2,000th Etext during the big celebration..."

An average of 50 hours is necessary to get any Etext selected, entered, proofread, edited, copyright-searched, analyzed, etc.

How did Project Gutenberg begin?

Project Gutenberg began in 1971 when Michael Hart was given an operator's account with $100,000,000 of computer time in it by the operators of the Xerox Sigma V mainframe at the Materials Research Lab at the University of Illinois. Michael decided there was nothing he could do, in the way of "normal computing", that would repay the huge value of the computer time he had been given... so he had to create $100,000,000 worth of value in some other manner. He immediately announced that the greatest value created by computers would not be computing, but would be the storage, retrieval, and searching of what was stored in our libraries. He then proceeded to type in the Declaration of Independence and tried to send it to everyone on the networks. Project Gutenberg was born.

There are three sections in the Project Gutenberg, basically described as:

- Light Literature; such as Alice in Wonderland, Through the Looking-Glass, Peter Pan, Aesop's Fables, etc.;
- Heavy Literature; such as the Bible or other religious documents, Shakespeare, Moby Dick, Paradise Lost, etc.; and
- References; such as Roget's Thesaurus, almanacs, and a set of encyclopedia, dictionaries, etc.

"The Light Literature Collection is designed to get persons to the computer in the first place, whether the person may be a pre-schooler or a great-grandparent. We love it when we hear about kids or grandparents taking each other to an Etext to Peter Pan when they come back from watching Hook at the movies, or when they read Alice in Wonderland after seeing it on TV. We have also been told that nearly every Star Trek movie has quoted current Project Gutenberg Etext releases (from Moby Dick in The Wrath of Kahn; a Peter Pan quote finishing up the most recent, etc.) not to mention a reference to Through the Looking-Glass in JFK. This was a primary concern when we chose the books for our libraries.

We want people to be able to look up quotations they heard in conversation, movies, music, other books, easily with a library containing all these quotations in an easy to find Etext format.

With Plain Vanilla ASCII you will be easily able to search an entire library, without any program more sophisticated than a plain search program. In fact, these Project Gutenberg Etext files are so plain that you can do a search on them without even using an intermediate search program (i.e. a program between you and the disk). Norton's and other direct disk access programs can search every one of your files without you even naming them, pointing to an Etext directory, or whatever. You can simply search a raw output from the disk. . .I do this on a half gigabyte disk partition, containing all our editions."

In this same spirit, Project Gutenberg selects Etexts that large portions of the audience will want and use frequently. It has also avoided requests, demands, and pressures to create authoritative editions.

"We do not write for the reader who cares whether a certain phrase in Shakespeare has a ':' or a ';' between its clauses. We put our sights on a goal to release Etexts that are 99.9% accurate in the eyes of the general reader. Given the preferences our proofreaders have, and the general lack of reading ability the public is currently reported to have, we probably exceed those requirements by a significant amount. However, for the person who wants an 'authoritative edition' we will have to wait some time until this becomes more feasible. We do, however, intend to release many editions of Shakespeare and the other classics for comparative study on a scholarly level, before the end of the year 2001, when we are scheduled to complete our 10,000 book Project Gutenberg Electronic Public Library."

"Anything that can be entered into a computer can be reproduced indefinitely." The Project Gutenberg Philosophy uses this premise to make information, books and other materials available to the general public in forms a vast majority of the computers, programs and people can easily read, use, quote, and search. Project Gutenberg Etexts are made available in what has become known as 'Plain Vanilla ASCII', meaning the low set of the American Standard Code for Information Interchange (ASCII). The reason for this is that 99% of the hardware and software a person is likely to run into can read and search these files." Plain Vanilla ASCII thus addresses the audience with Apples and Ataris all the way to the old homebrew Z80 computers, not to mention the audience of Mac, UNIX and mainframers. Michael Hart explains:

"When we started, the files had to be very small .... So doing the U.S. Declaration of Independence (only 5K) seemed the best place to start. This was followed by the Bill of Rights - then the whole U.S. Constitution, as space was getting large (at least by the standards of 1973). Then came the Bible, as individual books of the Bible were not that large, then Shakespeare (a play at a time), and then into general work in the areas of light and heavy literature and references...By the time Project Gutenberg got famous, the standard was 360K disks, so we did books such as Alice in Wonderland or Peter Pan because they could fit on one disk. Now 1.44 is the standard disk and ZIP is the standard compression; the practical file size is about three million characters, more than long enough for the average book.

However, pictures are still so bulky to store on disk that it will still be a while before we include even the lowres Tenniel illustrations in Alice and Looking-Glass. However we are very interested in doing them, and are only waiting for advances in technology to release a test edition. The market will have to establish some standards for graphics, however, before we can attempt to reach general audiences, at least on the graphics level."

The On-Line Books Page is a directory of books that can be freely read right on the Internet. It was founded in 1993 by John Mark Ockerbloom, a graduate student in computer science at Carnegie Mellon University, Pittsburgh, Pennsylvania, who remains the editor of the pages. It includes: an index of more than 7,000 on-line books on the Internet, which can be browsed by author, by title or by subject; pointers to significant directories and archives of on-line texts; and special exhibits. From the main search page, users have options to search for four types of media: books, music, art, and video.

"Along with books, The On-Line Books Page is also now listing major archives of serials (such as magazines, published journals, and newspapers), as of June 1998. Serials can be at least as important as books in library research. Serials are often the first places that new research and scholarship appear. They are sources for firsthand accounts of contemporary events and commentary, They are also often the first (and sometimes the only) place that quality literature appears. (For those who might still quibble about serials being listed on a 'books page', back issues of serials are often bound and reissued as hardbound 'books'.)"

Web space and computing resources are provided by the School of Computer Science at Carnegie Mellon University. The On-Line Books Page participates in the Experimental Search System of the Library of Congress. It works with The Universal Library Project, also hosted at Carnegie Mellon University.

In his e-mail to me of September 2, 1998, John Mark Ockerbloom explained how the site began:

"I was the original Webmaster here at CMU CS, and started our local Web in 1993. The local Web included pages pointing to various locally developed resources, and originally The On-Line Books Page was just one of these pages, containing pointers to some books put on-line by some of the people in our department. (Robert Stockton had made Web versions of some of Project Gutenberg's texts.)

After a while, people started asking about books at other sites, and I noticed that a number of sites (not just Gutenberg, but also Wiretap and some other places) had books on-line, and that it would be useful to have some listing of all of them, so that you could go to one place to download or view books from all over the Net. So that's how my index got started.

I eventually gave up the Webmaster job in 1996, but kept The On-Line Books Page, since by then I'd gotten very interested in the great potential the Net had for making literature available to a wide audience. At this point there are so many books going on-line that I have a hard time keeping up (and in fact have a large backlog of books to list). But I hope to keep up my on-line books works in some form or another."

In his e-mail of September 1, 1998, he explained the way he sees the relationship between the print media and the Internet:

"I certainly find both the print media and the Internet very useful, and am very excited about the potential of the Internet as a mass communication medium in the coming years. I'd also like to stay involved, one way or another, in making books available to a wide audience for free via the Net, whether I make this explicitly part of my professional career, or whether I just do it as a spare-time volunteer."

Created by the Carnegie Mellon University, in Pittsburgh, Pennsylvania, the Universal Library Project is chaired by Raj Raddy. According to the website:

"The mission of the Universal Library Project is to start a worldwide movement to make available on the Internet all the Authored Works of Mankind so that anyone can access these works from any place at any time. This is a major new initiative in digital libraries that will build a technically realistic and economically practical infrastructure for putting and accessing library documents on the World Wide Web. In this regard, access to the Universal Library would be free and have the same stated goal as the Carnegie Library of the last century.

[It] has a vision that goes beyond the scope of most other digital library projects. Simply put, our goal is to spark a lasting movement, in which all of the institutions responsible for the collection of mankind's works will place these works on the Internet to educate and inspire all of the world's people. Our project will, therefore, serve as an umbrella over all of these efforts, with common indices, guidelines, and systems that allow the quickest, simplest access possible."

In summer 1998, The Universal Library was working on the Book Object project:

"The Universal Library Book Object is intended to let you read a book off the web the way you would like to read it, by giving you book presentation options. You can either download the whole book as a single HTML or ASCII MIME object. Download by the screen-full. Download by the section or chapter. You can have the book in HTML, in ASCII, in Postscript, in RTF, or image GIF. In short, you don't have to read the book in the same form in which it is stored on the remote server. Such conversion of original presentation format is already common in printer drivers, although we also provide a means to permission use.

To complement the users' freedom to read the book in the form in which they desire to read it, the Book Object also has complementary provisions by which a book owner can control or restrain the freedoms allowed. This includes not only presentation constraints, but also permission to print or permission that may require monetary payments. The Universal Library Book Object is still a work in progress, but we have now overcome a few of the more fundamental hurdles in establishing the question of its feasibility."

Founded in 1992 by Paul Southworth, The ETEXT Archives are home to electronic texts of all kinds, from the sacred to the profane, and from the political to the personal. Their duty is to provide electronic versions of texts without judging their content.

The contents are:
- E-zines: electronic periodicals from the professional to the personal;
- Politics: political zines, essays, and home pages of political groups;
- Fiction: publications of amateur authors;
- Religion: mainstream and off-beat religious texts;
- Poetry: an eclectic mix of mostly amateur poetry; and
- Quartz: the archive formerly hosted at quartz.rutgers.edu.

The ETEXT Archives were founded in the Summer of 1992 by Paul Southworth, and hosted by the User Services Department of the University of Michigan's Information Technology Division.

"The Web was just a glimmer, gopher was the new hot technology, and FTP was still the standard information retrieval protocol for the vast majority of users. The origin of the project has caused numerous people to associate it with the University of Michigan, although in fact there has never been an official relationship and the project is supported entirely by volunteer labor and contributions. The equipment is wholly owned by the project maintainers.

The project was started in response to the lack of organized archiving of political documents, periodicals and discussions disseminated via Usenet on newsgroups such as alt.activism, misc.activism.progressive, and alt.society.anarchy. The alt.politics.radical-left group came later and was also a substantial source of both materials and regular contributors.

Not long thereafter, electronic 'zines (e-zines) began their rapid proliferation on the Internet, and it was clear that these materials suffered from the same lack of coordinated collection and preservation, not to mention the fact that the lines between e-zines (which at the time were mostly related to hacking, phreaking, and Internet anarchism) and political materials on the Internet were fuzzy enough that most e-zines fit the original mission of The ETEXT Archives. One thing led to another, and e-zines of all kinds -- many on various cultural topics unrelated to politics -- invaded the archives in significant volume."

The Logos Wordtheque is a word-by-word multilingual library with a massive database (325,916,827 words as of December 10, 1998) containing multilingual novels, technical literature and translated texts.

Logos, an international translation company based in Modena, Italy, gives free access to the linguistic tools used by its translators: 200 translators at its headquarters and 2,500 translators on-line all over the world, who process around 200 texts per day. Apart from the Logos Wordtheque, the tools include the Logos Dictionary, a multilingual dictionary with 7,580,560 entry words (as of December 10, 1998); Linguistic Resources, a database of 553 glossaries; and the Universal Conjugator, a database for conjugation of verbs in 17 languages.

When interviewed by Annie Kahn in the French daily newspaper Le Monde of December 7, 1997, Rodrigo Vergara, the Head of Logos, explained:

"We wanted all our translators to have access to the same translation tools. So we made them available on the Internet, and while we were at it we decided to make the site open to the public. This made us extremely popular, and also gave us a lot of exposure. The operation has in fact attracted a great number of customers, but also allowed us to widen our network of translators, thanks to the contacts made in the wake of the initiative."

In the same article, Annie Kahn wrote:

"The Logos site is much more than a mere dictionary or a collection of links to other on-line dictionaries. A system cornerstone is the document search software, which processes a corpus of literary texts available free of charge on the Web. If you search for the definition or the translation of a word ('didactique', for example), you get not only the answer sought, but also a quote from one of the literary works containing the word (in our case, an essay by Voltaire). All it takes is a click on the mouse button to access the whole text or even to order the book, thanks to a partnership agreement with Amazon.com, the famous on-line book shop. Foreign translations are also available. If however no text containing the required word is found, the system acts as a search engine, sending the user to other websites concerning the term in question. In the case of certain words, you can even hear the pronunciation. If there is no translation currently available, the system calls on the public to contribute. Everyone can make their own suggestion, after which Logos translators and the company verify the translations forwarded."

Begun in 1997, Gallica is a massive undertaking by the Bibliothèque nationale de France to digitize thousands of texts and images relating to French history, life and culture. The first step of the program - the pictures and the texts of French 19th century - is now available on the Web.

Many organizations have a digital library organized around a subject. For example, the Electronic Frontier Foundation (EFF), a non-profit civil liberties organization working in the public interest to protect privacy, free expression, and access to public resources and information on-line, as well as to promote responsibility in new media, run the EFF Archives, with documents on civil liberties.

Are there only English texts on the Web? Not any longer - what was true at the beginning of the Internet, when it was a network created in the US before becoming worldwide, is not true any more. More and more digital libraries are offering texts in languages other than English.

Project Gutenberg is now developing its foreign collections, as announced in the Project Gutenberg Newsletter of October 1997. In the Newsletter of March 1998, Michael Hart, its founder and executive director, mentioned that Project Gutenberg's volunteers were now working on Etexts in French, German, Portuguese and Spanish, and he was also expecting to have some coming in the following languages: Arabic, Chinese, Danish, Dutch, Esperanto, Greek, Hebrew, Hungarian, Italian, Japanese, Korean, Latin, Lithuanian, Polish, Romanian, Russian, Slovak, Slovene, and Valencian (Catalan).

Founded in 1993, the ABU: la bibliothèque universelle (ABU: The Universal Library) offers a collection of French-language texts of public domain. It gives free access to 223 texts and 76 authors (as of November 1998).

Located on the site of the University of Geneva, Switzerland, Athena is a digital library of documents in several languages about philosophy, science, classics, literature, history, economics, etc. It also focuses on putting French texts at the disposal of the Internet community. The Helvetia section gathers documents about Switzerland. The site offers links to other digital libraries.

The Bielefeld University Library (Bibliothek der Universität Bielefeld), Germany, is a collection of German digitized texts. Michael Behrens, responsible for the digital library, answered to my questions in his e-mail of September 25, 1998.

ML: "When did you begin your digital library?"

MB: "[It] depends on what the term would be understood to mean. To some here, 'digital library' seems to be everything that, even remotely, has to do with the Internet. The library started its own web server some time in summer 1995. There's no exact date to give because it took some time until we got it to work in a reasonably reliable way. Before that, it had been offering most of its services via Telnet, which wasn't used much by patrons, although in theory they could have accessed a lot of material from home. But in those days almost nobody really had Internet access at home... We started digitizing rare prints from our own library, and some that were sent in via library loan, in November 1996."

ML: "How many digitized texts do you have?"

MB: "In that first phase of our attempts at digitization, starting Nov. 1997 and ending June 1997, 38 rare prints were scanned as image files and made available via the Web. During the same time, there were also a few digital materials prepared as accompanying material for lectures held at the university (image files as excerpts from printed works). These are, for copyright reasons, not available outside of campus. The next step, which is just being completed, is the digitization of the Berlinische Monatsschrift, a German periodical from the Enlightenment, comprising 58 volumes, 2,574 articles on 30,626 pages.

A somewhat bigger digitization project of German periodicals from the 18th and early 19th century is planned. The size will be about 1,000,000 pages. These periodicals will be not just from the holdings of this library, but the project would be coordinated here, and some of the technical would be done here, also."

Projekt Gutenberg-DE is a German digital library created in 1994 because there were very few German texts on the Web. Texts are organized for reading on-line with longer works divided into chapters. There is an alphabetic list of authors, with for each a biography and a list of works, and a full text search for titles.

In Italy, Liber Liber, whose maxim is: "Nullus amicus magis liber quam liber", is a non-profit cultural association whose aim is the promotion of any kind of artistic and intellectual expression. In particular, it is an attempt to draw humanistic and scientific culture together thanks to the qualified use of computer technologies in the humanistic field.

Liber Liber promotes the Manuzio project (projetto Manuzio), a collection of electronic texts in Italian which was renamed after the famous publisher from Venice who in the 16th century improved the printing techniques created by Gutenberg.

The Manuzio project has the ambition to make a noble idea real: the idea of making culture available to everybody. How? By making books, graduation theses, articles, tales or any other document which can be memorized by a computer available all over the world, at any minute and free-of-charge. Via modem, or using floppy disks (in which case there is only the cost of the disk and the delivery), it is already possible to get hundreds of books. And Projetto Manuzio needs only a few people to make such a masterpiece as Dante Alighieri's Divina Commedia available to millions of people.

Created by the University of Virginia and the University of Pittsburgh, the Japanese Text Initiative (JTI) is a collaborative effort to make texts of classical Japanese literature available on the World Wide Web. The goal of the Japanese Text Initiative (JTI) is "to put on-line on the Web texts of classical Japanese literature in Japanese characters. Our primary audience is English-speaking scholars and students. Where possible, the Japanese texts will be accompanied by English translations. All JTI texts will be tagged in Standard Generalized Markup Language (SGML), according to Text Encoding Initiative (TEI) standards, and converted to HTML for display on the Web. An important purpose is to make JTI texts in both Japanese and English searchable, both individually and as a group." Venezuela Analítica, an electronic magazine, conceived as a public forum to exchange ideas on politics, economics, culture, science and technology, created in May 1997 BitBlioteca, a digital library which comprises about 700 texts mainly in Spanish, and also in French, English and Portuguese.

In his e-mail of September 3, 1998, Roberto Hernández Montoya, Head of BitBlioteca, explains the way he sees the relationship between the print media and the Internet:

"The printed text can't be replaced, at least not for the foreseeable future. The paper book is a tremendous 'machine'. We can't leaf through an electronic book in the same way as a paper book. On the other hand electronic use allows us to locate text chains more quickly. In a certain way we can more intensively read the electronic text, even with the inconvenience of reading on the screen. The electronic book is less expensive and can be more easily distributed worldwide (if we don't count the cost of the computer and the Internet connection).

[The use of the Internet] has been very important for me personally. It became my main way of life. As an organization it gave us the possibility to communicate with thousands of people, which would have been economically impossible if we had published a paper magazine. I think the Internet is going to become the essential means of communication and of information exchange in the coming years."

Projekt Runeberg is a digital library initiated in December 1992 by Lysator, a students' computer club, in cooperation with the Linköping University, Sweden. It is an open and voluntary initiative to create and collect free electronic editions of classic Nordic literature and art. Around 200 titles are available in full text, and there is also data on more than 6,000 Nordic authors.

Some digital libraries are organized around an author, for example The Complete Works of William Shakespeare, The Dante Project or The Marx/Engels Internet Archive (MEIA).

Begun in 1996, The Marx/Engels Internet Archive (MEIA) "is continually expanding, as one work after another is brought on-line [...] Pictures/photos now adorn the site, with many more to come". The Marx & Engels WWW Library gives a chronology of the collected works of Karl Marx and Frederick Engels, and access to a number of them. The Photo Gallery presents the Marx and Engels clan from 1839 to 1894, and their dwellings from 1818 to 1895.

The MEIA Search allows searching in the entire Marx/Engels Internet Library. "As larger works come on-line, they will also have small search pages made for them alone - for instance, Capital will have a search page for that work alone." The biographical archive gives access to biographies of Marx and Engels, and also short notices and photographs of the members of their family and their friends. The link "Others" gives access to a short biography and the works of Marxist writers, including: James Connolly, Daniel DeLeon, andHal Draper. The MEIA Non-English Archive lists the works of Marx and Engels in other languages (Danish, French, German, Greek, Italian, Japanese, Polish, Portuguese, Spanish, and Swedish), with links to them. The following statement is posted on the website:

"There's no way to monetarily profit from this project. 'Tis a labor of love undertaken in the purest communitarian sense. The real 'profit' will hopefully manifest in the form of individual enlightenment through easy access to these classic works. Besides, transcribing them is an education in itself... Let me also add that this is not a sectarian/One-Great-Truth effort. Help from any individual or any group is welcome. We have but one slogan: 'Piping Marx & Engels into cyberspace!'"


7.3. Digital Image Collections

Other digital libraries include pictures, for example the impressive Gallica. Available since 1997, Pictures and Texts of French 19th Century are the first part of the massive project of the French National Library (Bibliothèque nationale de France) which is digitizing thousands of texts and images relating to French history, life and culture.

The digital collections of American Memory are a major component of the Library of Congress's National Digital Library Program. The National Digital Library Program (NDLP) is an effort to digitize and deliver electronically the distinctive, historical Americana holdings at the Library of Congress, including photographs, manuscripts, rare books, maps, recorded sound, and moving pictures.

"The Library of Congress National Digital Library Program (NDLP) is assembling a digital library of reproductions of primary source materials to support the study of the history and culture of the United States. Begun in 1995 after a five-year pilot project, the program began digitizing selected collections of Library of Congress archival materials that chronicle the nation's rich cultural heritage. In order to reproduce collections of books, pamphlets, motion pictures, manuscripts and sound recordings, the Library has created a wide array of digital entities: bitonal document images, grayscale and color pictorial images, digital video and audio, and searchable texts."

There are currently over 30 collections in American Memory, for example:

(1) African American Perspectives: Pamphlets from the Daniel A. P. Murray Collection, 1818-1907: 351 rare pamphlets offering insight into attitudes and ideas of African Americans between Reconstruction and the First World War;

(2) Architecture and Interior Design for 20th Century America: Photographs by Samuel Gottscho and William Schleisner, 1935-1955: Approximately 29,000 photographs of buildings, interiors, and gardens of renowned architects and interior designers.

The New York Public Library Digital Collections provide the public with digital versions of books, manuscripts, photographs, engravings, and other items as well as tools to browse, search, and analyze these materials remotely via the Internet. Four general sections allow the browsing of the collections: Digital Schomburg (Center for Research in Black Culture); Archival finding aids; Cooperative projects; and On-Line Exhibitions.

SPIRO (UC Berkeley Architecture/Slide Library Slide and Photograph Collection) is the visual on-line public access catalog (VOPAC) for the UC (University of California) Berkeley's Architecture Slide Library (ASL) collection of 200,000 35mm slides.

"SPIRO can be accessed using either Image Query, a powerful database retrieval package, or the World Wide Web. ImageQuery2.0 was developed originally by UC Berkeley's Information Systems and Technology, Advanced Technology Planning (ATP) Office under the direction of Barbara Morgan. ImageQuery2.0 is currently maintained by the Museum Informatics Project (MIP). ImageQuery SPIRO permits access to the collection by ten access points: period; place; creator name; object name; view type; subject terms from the Art and Architecture Thesaurus; source of image; creation dates; classification number; image identification number. The vast majority of images in SPIRO are copyrighted."

IMAGES 1 (on-line images of the National Library of Australia's Pictorial Collection) contains over 15,000 historical and contemporary images relating to Australia and its place in the world, including paintings, drawings, rare prints, objects and photographs. The images have been selected from more than 40,000 paintings, drawings and prints and more than 550,000 photographs held in the National Library's Pictorial Collection. Topics covered include first impressions of Australia, convict days, gold mining and Australian towns.

IMAGES 1 offers a number of search options to enhance access to the images including searching by the creator (for example photographer or artist; other names associated with a work or collection; title; subject; the image number in the database; and by format (for example, watercolor or photograph).

Founded in 1989 by Bill Gates, the head of Microsoft, Corbis is a main provider of visual content and services in the digital age, offering more than 20 million photographs and fine-art images (and 1,3 on-line) for access worldwide via the Internet, on CD-ROM disc, and through traditional stock catalogs. The images includes contemporary stock photography, photojournalism, archival photography, and royalty-free images, available to both creative professionals and private consumers.


7.4. Future Trends for Digital Libraries

The quick development of digital libraries leads us to define the role of the digital library, a very recent concept, relating to the much older "traditional" library, and vice versa.

In the same way that the paper document is not going to be "killed" by the electronic document, at least not in the near future, many librarians believe the "traditional" library is not going to be "killed" by the digital library.

When interviewed by Jérôme Strazzulla in Le Figaro of June 3, 1998, Jean-Pierre Angremy, president of the French National Library (Bibliothèque nationale de France) stated: "We cannot, we will not be able to digitize everything. In the long term, a digital library will only be one element of the whole library".

Digital libraries give instant access to many works in the public domain. They also give instant access to old and rare texts and images. The full-screen images are still quite long to download, so many sites were backed up to present small images, so as not to ask too much from the cybernaut's patience. Most of the time a bigger format can be requested by clicking on the selected image. This problem should be solved in the future with improvements in data transmission.

The digital libraries also further the textual research on one or several works at the same time, such as the works of Shakespeare, Dante's Divine Comedy, different versions of The Bible, etc.

The major problem of the cyberlibrary is the fact that recent documents cannot be posted because they don't belong to the public domain. Some projects, like DOI: The Digital Identifier System, an identification system for digital media, will enable automated copyright management systems.

Another problem is format harmonization, to allow the downloading of the texts by any hardware and software. Libraries often choose the ASCII format (ASCII: American standard code for information interchange) or the SGML format (SGML: standard generalized markup language).

Many organizations are involved in research relating to digital libraries.

Sponsored by the The Library of UC Berkeley and Sun Microsystems, SunSITE is the site where the Berkeley Digital Library builds digital collections and services while providing information and support to others doing the same. Its contents are: catalogs and indexes; help/search tools and administrative info; Java corner; teaching and training; text and image collections; information for digital library developers; research and development: where digital libraries are being built; tools: software for building digital libraries.

The Digital Library Technology (DLT) Project supports the development of new technologies to facilitate public access to the data of NASA (National Aeronautics and Space Administration) via computer networks, particularly technologies that develop tools, applications, and software and hardware systems that are able to scale upward to accommodate evolving user requirements and order-of-magnitude increases in user access.

The Stanford Universities Digital Libraries Project deals primarily with computing literature, with a strong focus on networked information sources. It is one participant among five universities of the Digital Library Initiative, supported by the NSF (National Science Foundation), DARPA (Defense Advanced Research Projects Agency), and NASA (National Aeronautics and Space Administration). "The Initiative's focus is to dramatically advance the means to collect, store, and organize information in digital forms, and make it available for searching, retrieval, and processing via communication networks - all in user-friendly ways."

Library 2000 gives the historical record of a project held by the MIT Laboratory for Computer Science (MIT: Massachusetts Institute of Technology) between Fall 1995 and February 1998. Library 2000 was a computer systems research project that explored the implications of large-scale on-line storage using the future electronic library as an example. The project was pragmatic, developing a prototype using the technology and system configurations expected to be economically feasible in the year 2000.

Based at the Corporation for National Research Initiatives (CNRI), the D-Lib Program supports the community of people with research interests in digital libraries and electronic publishing. D-Lib Magazine, the magazine of digital library research, is a monthly compilation of contributed stories, commentary, and briefings.

The International Federation of Library Associations and Institutions (IFLA) provides a very interesting section Electronic Collections and Services.


Chapter 8: On-Line Catalogs
Table of Contents


From the Print Media to the Internet
Le Livre 010101: Home Page
NEF: Home Page


© 1999 Marie Lebert