The ACH-ACL-ALLC Text Encoding Initiative: A Brief Overview Nancy Ide TEI Steering Committee Document Number: TEI J17 21 March 1996 (Version 6) 1 WHAT IS THE TEI? The Text Encoding Initiative (TEI) is an international project to develop guidelines for the preparation and interchange of electronic texts for scholarly research, and to satisfy a broad range of uses by the language industries more generally. The TEI is sponsored by the Association for Computers and the Humanities (ACH), the Association for Computational Linguistics (ACL), and the Association for Literary and Linguistic Computing (ALLC). Major support for the project has come from the U.S. National Endowment for the Humanities (NEH), Directorate XIII of the Commission of the European Communities (CEC/DG-XIII), the Andrew W. Mellon Foundation, and the Social Science and Humanities Research Council of Canada. The TEI was established at a planning conference convened by ACH at Vassar College, Poughkeepsie, New York on 12-13 November 1987, in response to the pressing need for a common text encoding scheme demon- strated by the chaotic diversity of formats in use in the mid- 1980's. Since then, the need for standardized encoding practices has become even more critical as the need to use and, most importantly, reuse vast amounts of electronic text has dramatically increased for both research and industry. The growing diversity of applications for electronic texts includes natural language processing, scholarly editions, infor- mation retrieval, hypertext, electronic publishing, various forms of literary and historical analysis, and lexicography. The central objec- tive of the TEI is to ensure that any text that is created can be used for any number of these applications and for more, as yet not fully understood, purposes. 2 ORGANIZATION The TEI is managed by a Steering Committee consisting of two repre- sentatives from each of the sponsoring organizations. Fifteen scholarly organizations are represented on the project's Advisory Board, which approved the plan of work and will endorse the published Guidelines. Two editors, one European and one North American, coordinate the work and are responsible for drafting the TEI Guidelines. 3 THE TEI GUIDELINES In May 1994, the TEI issued its "Guidelines for the Encoding and Interchange of Machine-Readable Texts." The Guidelines provide encoding conventions for describing the physical and logical structure of a large range of text types and features relevant for research in language tech- nology, the humanities, and computational linguistics. These include character sets, language corpora, general linguistics, dictionaries, terminological data, spoken texts, hypermedia, literary prose, verse, drama, historical source materials, and text critical apparatus. The Guidelines treat common text encoding problems, including intra- and inter-textual cross reference, demarcation of arbitrary text segments, alignment of parallel elements, overlapping hierarchies, etc. In addi- tion, they provide conventions for linking texts to acoustic and visual data. The TEI Guidelines answer the fundamental needs of a wide range of users: scholars of language, literature and the social and natural sci- ences, publishers, librarians, and those concerned generally with docu- ment retrieval and storage. They also answer many of the needs of the growing "language technology" community, which is amassing substantial multi-lingual, multi-modal corpora of spoken and written texts and lex- icons in order to advance research in human language understanding, pro- duction, and translation. 4 BULLETIN BOARD The TEI maintains an electronic bulletin board on which news about the TEI is posted and which provides a forum for detailed discussion of the TEI Guidelines. To subscribe, send an electronic mail containing only the line SUBSCRIBE TEI-L to LISTSERV@UICVM.BITNET or LISTSERV@UICVM.UIC.EDU 5 THE FUTURE The TEI has achieved a major milestone in establishing an intellectu- al foundation for text encoding and a set of encoding conventions sub- stantial enough to serve the fundamental needs of most encoding projects, both large and small. However, much of this development has necessarily taken place in advance of experience. The TEI has moved into a new phase in which the primary focus is the wide-spread and large-scale implementation of the Guidelines. Actual use of the Guidelines is now the primary force driving the development of extensions and modifications to it. A central concern of this phase is systematic evaluation and review, again accomplished on the basis of actual experience using the Guidelines. The results will guide the fur- ther development of the Guidelines. The shift in emphasis from the creation of the standard to its use and evaluation means that activity within the TEI will focus on user support in this phase. This will involve development of tutorials, workshops, and other user support materials, as well as the provision of consulting services for individuals and projects using the Guide- lines. 6 FURTHER INFORMATION For further information, contact the editors: In Europe: Lou Burnard, Oxford University Computing Services, 13 Banbury Rd, Oxford OX2 6NN, England. E-mail: lou@vax.ox.ac.uk. Phone: (+44 865) 273238. Fax :(+44 865) 273275. URL: http://info.ox.ac.uk/~archive Elsewhere: C. M. Sperberg-McQueen, Computer Center (M/C 135), Univer- sity of Illinois at Chicago, 1940 W. Taylor St., Room 124, Chicago IL 60612- 7352, USA. Bitnet: u35395@uicvm Internet: tei@uic.edu Phone: (+1 312) 413-0317 Fax: (+1 312) 996-6834. URL: http://www-tei.uic.edu/orgs/tei Also, for additional information on the TEI and how to obtain the TEI Guidelines, send a note to Listserv@uicvm.uic.edu with the message: GET TEIINTRO PACKAGE