Received: from UKACRL.BITNET by UICVM (Mailer R2.02) with BSMTP id 1361; Sat, 04 Mar 89 11:13:12 CST Received: from RL.IB by UKACRL.BITNET (Mailer X1.25) with BSMTP id 2337; Sat, 04 Mar 89 17:11:40 GMT Received: from RL.IB by UK.AC.RL.IB (Mailer X1.25) with BSMTP id 6578; Sat, 04 Mar 89 17:11:39 GM Via: UK.AC.OX.VAX; 4 MAR 89 17:11:30 GMT Date: Sat, 4 Mar 89 17:12 GMT From: Lou Burnard To: U35395@UICVM Subject: draft AB press-release This is as much as I've been able to get done for you so far; it will clearly need some revision and I'm not sure of the exact register to aim at; which is why it starts and finishes as a press release, with a lot of background guff inserted in between. You may find it amusing to read my perspective on the deal anyway. If I hadnt lost my notebook there'd be more nitty gritty about what actually went down at the meeting, but then I expect Board Members will prefer to add their own views on that anyway. You will also note that I have not named any names anywhere: this is a very British habit about reticence and decorum (like not mentioning the cost of anything) which you may revise as you see fit. And I'm sorry it hasnt been done faster - you wouldn't believe the backlog of jobs I've had this last 2 weeks. And no crack in the clouds till way past easter... <\comment> Brief Project Description for Circulation to AB Members That bumper-sticker dictum about the necessity of using a computer "to really foul things up" took another knock at an unusual gathering in Chicago last month. The first meeting of the Text Encoding Initiative's Advisory Board brought together around two dozen representatives from key professional and learned societies representing academic disciplines across the spectrum from hard core computer science to lexicography, literary studies and anthropology as well as the professional interests of librarians and publishers. The purpose of the event, hosted by the University of Illinois at Chicago, was to seek the views of the newly constituted Advisory Board concerning the structure and proposed strategy of the Text Encoding Initiative (TEI), to explain its relevance to the interests of the societies and to encourage active participation in the work of the Initiative by the societies' members.

History and Structure of TEI It is now over a year since the Association for Computers and the Humanities (ACH) organised a conference at Vassar College from which emerged a set of resolutions (now known as the "Poughkeepsie Principles") upon the necessity and feasibility of defining a set of guidelines to facilitate both the interchange of existing encoded texts and the creation of newly encoded texts. The guidelines would specify both what<\em> features should be encoded and also how<\em> they should be encoded, as well as suggesting ways of describing the resulting encoding scheme and its relationship with other pre-existing such schemes. In the intervening period, ACH, together with the Association for Literary and Linguistic Computing (ALLC) and the Association for Computational Linguistics (ACL), has defined a three year work plan to achieve these goals, which was presented at the Chicago Meeting. Funding for the work plan has been provided by a substantial grant from the American National Endowment for the Humanities, which will cover the bulk of the costs of American participation in the Initiative for the first phase of the project, due to end June 1990. Equivalent funding for European participation is currently being negotiated with the European Commission, and it is also hoped to secure further support from industry and government.

Committee Structure The work plan will be co-ordinated by a six-member steering committee and two Editors, one American and one British. It calls initially for the setting up of four Working Committees, each responsible for a distinct part of the work plan. Committee 1, the Committee for Text Documentation, with a membership drawn largely from the library and archive management communities, will deal with issues concerning the cataloguing and identification of key features of encoded texts. It will draw on work already done in this field for social science data, for example in the establishment of the Standard Study Description. All the Committees will be expected to work within established frameworks where these are available; the relevance here of work already done in establishing Anglo-American Cataloguing Rules for machine-readable sources is apparent. Committee 2, for Text Representation, is concerned with the encoding of such features as layout and character sets. It will aim to provide precise recommendations covering all the features of continuous discourse for which a convention already exists in printed or written sources. This will involve a consideration of the character sets of all alphabetic scripts currently used in computer-based research. Explicit consideration of non-alphabetic scripts, though not excluded, has been deferred; transcriptions of spoken language will however be included. It will also recommend ways of representing the structural divisions of a text (book, chapter, paragraph etc.) and all other features conventionally signalled in printed or written texts, such as emphasis, quotation, critical apparatus etc. Committee 3, the Committee for Text Analysis and Interpretation, has the largest and most open-ended set of responsibilities of the four. It will aim to provide discipline-specific sets of tags appropriate to the analytic procedures favoured by that discipline, but in such a way as to permit their extension and generalisation to other disciplines using analogous procedures. This being a very large brief, committee 3 will focus initially on a single discipline (linguistics) and a single analytic procedure (lexico-syntactic analysis), chosen primarily because of its clear relevance to all other text-based types of analysis. Its remit will shift focus as work proceeds, probably towards literary and more introspective types of analysis. It is anticipated that all four committees, with a maximum membership of ten, will frequently need to set up sub-committees, and Committee 3 already has one such, concerned with tagsets for dictionary markup, which has already produced a set of guidelines based on work done at the University of Waterloo in the tagging of the New Oxford English Dictionary<\it>; it is hoped to form a second, concerned with the tagging of historical sources, to take advantage of the substantial progress already made in this area by the informal network of European scholars collaborating on the Leio<\it> project at the Max Planck Inst. fuer Geschichte in Goettingen FRG, at Graz University in Austria and elsewhere. Committee 4 was charged at the Poughkeepsie meeting with the definition of a "metalanguage" - a language capable of specifying and describing mark up languages. The emergence of an ISO standard (6689, Standard Generalised Markup Language) and its increasing acceptance within both government and publishing communities, has removed that burden, but replaced it by that of assessing the extent of compatibility possible between the tag sets proposed by the other three committees and the SGML standard. The Guidelines will work within the syntactic framework of SGML, departing from it if (and only if) it proves inadequate to the needs of research. So far no areas of divergence have been identified, though there has been considerable discussion within the Committee (which began work last month) on the extent to which all features of SGML can be recommended. The committee's main task will be to validate and test the Guidelines as they emerge, to arbitrate on matters of SGML-conformance and also to propose ways of mapping existing encoding standards to the Guidelines.

The Chicago Meeting In addition to the three sponsoring organisations, the following associations are currently represented on the Advisory Board: American Anthropological Association; American Historical Association; American Philological Association; American Society for Information Science; Association for Computing Machinery; Association for Documentary Editing; Association for History and Computing; Association Internationale Bible et Informatique; Canadian Linguistic Association; Dictionary Society of North America; Electronic Publishing SIG; International Federation of Library Associations and Institutions; Linguistic Society of America;Modern Language Association.<\it> After an initial presentation about the history, background, objectives and structure of the TEI, delegates were invited to comment on their own interest and the constituencies they served. A series of presentations concerning the implications of the TEI for Humanities Research, for Computational Linguistics and for the Language Industries followed lunch. The goals and responsibilities of each of the working committees were then described, as outlined above. The second full day of the meeting began with a very brief tutorial on SGML and a longer description of the design principles, scope and end products of the Guidelines. After a wide ranging and useful discussion, in which some constructively critical reactions were expressed, members of the Advisory Board expressed approval of the objectives, organisational structure and design goals of the Initiative, as they had been presented at the meeting. It also noted the draft work plans submitted by the Heads of Committees to the meeting, with the understanding that these would be revised in accordance with experience and the suggestions made by the Board. <\doc>