Date: Thu, 27 Apr 89 09:29 EST From: To: u35395@uicvm Brief Project Description for Circulation to AB Members The first meeting of the Text Encoding Initiative's Advisory Board brought together seventeen representatives from key professional and learned societies representing academic disciplines across the spectrum from hard core computer science to lexicography, literary studies and anthropology as well as the professional interests of librarians and publishers. The purpose of the event, hosted by the University of Illinois at Chicago, was to seek the views of the newly constituted Advisory Board concerning the structure and proposed strategy of the Text Encoding Initiative (TEI), to explain its relevance to the interests of the societies and to encourage active participation in the work of the Initiative by the societies' members. History and Structure of TEI In the fall of 1987, the Association for Computers and the Humanities (ACH) under the directorship of Nancy M. Ide organized a conference at Vassar College from which emerged a set of resolutions upon the necessity and feasibility of defining a set of guidelines to facilitate both the interchange of existing encoded texts and the creation of newly encoded texts. The guidelines would specify both what<\em> features should be encoded and also how<\em> they should be encoded, as well as suggesting ways of describing the resulting encoding scheme and its relationship with other pre-existing such schemes. In the intervening period, ACH, together with the Association for Literary and Linguistic Computing (ALLC) and the Association for Computational Linguistics (ACL), has defined a four year work plan to achieve these goals, which was presented at the Chicago Meeting. Funding for the work plan has been provided by a substantial grant from the American National Endowment for the Humanities, which will cover the bulk of the costs of American participation in the Initiative for the first phase of the project, due to end June 1990. Equivalent funding for European participation has been obtained from the European Economic Community, and it is also hoped to secure further support from industry and government. Committee Structure The work plan will be co-ordinated by a six-member steering committee and two Editors, one American and one British. It calls initially for the setting up of four Working Committees, each responsible for a distinct part of the work plan. Committee 1, the Committee for Text Documentation, with a membership drawn largely from the library and archive management communities, will deal with issues concerning the cataloguing and identification of key features of encoded texts. It will draw on work already done in this field for social science data, for example in the establishment of the Standard Study Description. All the Committees will be expected to work within established frameworks where these are available; the relevance here of work already done in establishing Anglo-American Cataloguing Rules for machine-readable sources is apparent. Committee 2, for Text Representation, is concerned with the encoding of such features as layout and character sets. It will aim to provide precise recommendations covering all the features of continuous discourse for which a convention already exists in printed or written sources. This will involve a consideration of the character sets of all alphabetic scripts currently used in computer-based research. Explicit consideration of non-alphabetic scripts, though not excluded, has been deferred; transcriptions of spoken language will however be included. It will also recommend ways of representing the structural divisions of a text (book, chapter, paragraph etc.) and all other features conventionally signalled in printed or written texts, such as emphasis, quotation, critical apparatus etc. Committee 3, the Committee for Text Analysis and Interpretation, has the largest and most open-ended set of responsibilities of the four. It will aim to provide discipline-specific sets of tags appropriate to the analytic procedures favored by that discipline, but in such a way as to permit their extension and generalization to other disciplines using analogous procedures. Because this is a very large task, committee 3 will focus initially on a single discipline (linguistics), chosen primarily because of its clear relevance to all other text-based types of analysis. As work proceeds,the focus of committee 3 will shift toward literary analysis and other humanites disciplines. Committees 1, 2, and 3, with an average membership of ten, will set up sub-committees which will do the preliminary design work for tag sets within specialized areas. Committee 3 already has one subcommittee, concerned with tag sets for dictionary markup, which has already produced a set of preliminary guidelines for monolingual dictionaries. A subcommittee of committee 2 is also being formed, concerned with the tagging of historical sources, to take advantage of the substantial progress already made in this area by the informal network of European scholars collaborating on the Leio<\it> project at the Max Planck Inst. fuer Geschichte in Goettingen FRG, at Graz University in Austria and elsewhere. Committee 4 was charged at the Poughkeepsie meeting with the definition of a "metalanguage" - a language capable of specifying and describing mark up languages. The emergence of an ISO standard (6689, Standard Generalized Markup Language) and its increasing acceptance within both government and publishing communities, has removed that burden, but replaced it by that of assessing the extent of compatibility possible between the tag sets proposed by the other three committees and the SGML standard. The Guidelines will work within the syntactic framework of SGML, departing from it if (and only if) it proves inadequate to the needs of research. So far no areas of divergence have been identified, though there has been considerable discussion within the Committee (which began work last month) on the extent to which all features of SGML can be recommended. The committee's main task will be to validate and test the Guidelines as they emerge, to arbitrate on matters of SGML-conformance and also to propose ways of mapping existing encoding standards to the Guidelines.

The Chicago Meeting In addition to the three sponsoring organizations, the following associations are currently represented on the Advisory Board: American Anthropological Association; American Historical Association; American Philological Association; American Society for Information Science; Association for Computing Machinery; Association for Documentary Editing; Association for History and Computing; Association Internationale Bible et Informatique; Canadian Linguistic Association; Dictionary Society of North America; Electronic Publishing SIG; International Federation of Library Associations and Institutions; Linguistic Society of America;Modern Language Association.<\it> After an initial presentation about the history, background, objectives and structure of the TEI, delegates were invited to comment on their own interest and the constituencies they served. A series of presentations concerning the implications of the TEI for Humanities Research, for Computational Linguistics and for the Language Industries followed. The goals and responsibilities of each of the working committees were then described, as outlined above. The second full day of the meeting began with a very brief tutorial on SGML and a longer description of the design principles, scope and end products of the Guidelines. After a wide ranging and useful discussion, in which some constructively critical reactions were expressed, members of the Advisory Board expressed approval of the objectives, organisational structure and design goals of the Initiative, as they had been presented at the meeting. It also noted the draft work plans submitted by the Heads of Committees to the meeting, with the understanding that these would be revised in accordance with experience and the suggestions made by the Board. If you would like more information about the TEI, please contact <\doc> \nopagenumbers \centerline{\bf Sponsoring and Advisory Organizations} \vskip .25in \parindent=0pt \centerline{Sponsoring Organizations and Representatives} \bigskip {\obeylines Association for Computers and the Humanities Nancy M. Ide Vassar College \medskip Association for Computational Linguistics Donald E. Walker Bell Communications Research \medskip Association for Literary and Linguistic Computing Susan Hockey Oxford University Computing Service \bigskip \centerline{Advisory Board Organizations and Representatives} \bigskip American Anthropological Association Chad K. MacDaniel University of Maryland \medskip American Historical Association Elizabeth A. R. Brown Brooklyn College \medskip American Philological Association Jocelyn Penny Small Rutgers University \medskip American Society for Information Science Clifford A. Lynch University of California \medskip Association for Computing Machinery/Special Interest Group for Information Retrieval Scott Deerwester University of Chicago \medskip Association for Documentary Editing David Chestnutt University of South Carolina \medskip Association for History and Computing Dr. Manfred Thaller Max-Planck Institut fur Geschichte \medskip Association Internationale Bible et Informatique Wilhelm Ott Universitat T\"ubingen \medskip Canadian Linguistic Association Anne-Marie di Sciullo Universit\'e du Quebec \`a Montreal \medskip Dictionary Society of North America Thomas Cresswell \vfill\eject Electronic Publishers Special Interest Group Betsy Kiser Online Computer Library Center (OCLC) \medskip International Federation of Library Associations and Institutions Dr J. D. Byrum Jr. The Library of Congress \medskip Linguistic Society of America Stephen Anderson Johns Hopkins University \medskip Modern Language Association Randy Jones Brigham Young University \medskip \bye