EEC contract for 1990-1992: technical annex I 1. The goal of the Text Encoding Initiative is to provide guidelines for a common publicly-defined format, in which linguists, computational linguists, developers of language industry and information technologists can exchange textual and lexical data, in machine-readable form, either before analysis or in an enriched form, incorporating the results of the analysis of the texts. In particular, the project aims: - to specify a common interchange format for machine-readable texts and their linguistic analysis, and for machine-readable dictionaries; - to provide a set of recommendations for encoding new textual material and its linguistic analysis; - to document the existing major encoding schemes and to develop methods for describing them formally and rendering them into the common interchange format. It has been agreed that the Standard Generalized Markup Language (SGML) will form the syntactic basis for the guidelines and that the guidelines should be compatible with SGML, unless the needs of research require modifications or extensions to SGML principles. The Office Document Architecture (ODA) proposals will also be taken into consideration. 2. The Text Encoding Initiative has been initiated and jointly sponsored by the Association for Computational Linguistics,, the Association for Literary and Linguistic Computing and the Association for Computers and the Humanities. The organizational structure of the project is the following: Several learned and professional societies have agreed, through a memorandum of understanding, to participate in the general coordination of the project through membership in an Advisory Board, to which the guidelines will be submitted, in particular for the final vote. The project is managed by a Steering Committee of 6 members representing the sponsoring organizations. Two of the members are European. In order to prepare the guidelines, four working committees are addressing the questions of text documentation, text representation, text analysis and interpretation, and formal description. The committee on text documentation will substantially complete its work in the first half of 1990, and is not expected to meet during the second development cycle unless new problems arise out of the work of the other committees. The three continuing committees have 32 members overall; each is led by a committee head, who is responsible for setting up subcommittees to work on specific relevant topics. One of the committee heads and 15 of the committee members are European. Several independent projects are affiliated with the Text Encoding Initiative and will provide test beds for testing the draft guidelines during the second cycle. The group of affiliated projects at this time includes two European projects; more may be added by the Steering Committee. An American editor and a European associate editor coordinate the results of the work of the 4 committees and will edit and review the final documents. 3. The participation of American researchers and developers is already supported through U.S. governmental and private funding agencies. The purpose of this contract is to support European participation in the project, in particular by ensuring reimbursement of a part of the following expenditures: - the fees and expenses of the European associate editor; - the fees and expenses of the European committee head; - the participation of European members in the meetings of the Steering Committee, the Advisory Board, and of the 3 continuing working committees. The names of the European associate editor, of the European head of working groups, and of the European members of the various committees are established by the Steering Committee. The names of those presently serving are given below. Should they be unable to continue serving, they will be replaced by other European researchers at the discretion of the Steering Committee. Associate editor: Lou D. Burnard, Oxford University Text Representation: Stig Johansson, Universitetet i Oslo (head) Roberto Cencioni, Commission of the European Communities Susan Hockey, Oxford University Wilhelm Ott, UniversitaEt TuEbingen Manfred Thaller, Max-Planck Institut, GoEttingen Text Analysis and Interpretation: Branimir Boguraev, IBM Nicoletta Calzolari, Universita`` di Pisa Winfried Lenders, UniversitaEt Bonn Nelleke Oostdijk, Universiteit Nijmegen Serge Perschke, Commission of the European Communities Hans Uszkoreit, UniversitaEt SaarbruEcken Antonio Zampolli, Universita`` di Pisa Metalanguage and Syntax Issues: Jean-Pierre Gaspart, SEMA Group, Bruxelles Eugenio Picchi, Universita`` di Pisa Giovanni Varile, Commission of the European Communities The EC contribution to the expenses shall be reimbursed by the Contractor to the institutions to which they belong or to the sponsoring associations respectively. 4. The work of the project is divided into two development cycles. The first cycle will end 31 May 1990. The second cycle will begin 1 June 1990 and end 30 June 1992. It will extend the coverage of the Guidelines to include text types and subject areas not explicitly addressed during the first cycle; it will also subject the results of the first cycle to systematic testing with reference to their adequacy and utility, machine processability, and conformance to relevant standards. The 6 major phases of this second cycle can be characterized as follows: First, the results of the first cycle will be circulated to the project advisory board, cooperating projects, and interested members of the research community. Simultaneously, the working committees will specify in fuller detail the work items required for extension of the draft guidelines to other text types and subject areas (by October 1990). Committees and subcommittees will analyze the textual and linguistic features relevant to the areas of extension defined in the preceding phase, discuss existing practice, and consider basic outlines of possible recommendations. Simultaneously, the editors will be accumulating comments and suggestions on the draft guidelines from affiliated projects (by March 1991). Committees and subcommittees will formalize their extensions to the guidelines, and consider comments on the first draft of the guidelines. Formal replies to comments will be issued, and the extensions and revisions will be incorporated into a second (interim) public draft of the guidelines. The interim draft will be released (by August 1991). The interim draft will be distributed to advisory board and affiliated projects; the editors will accumulate comments and suggestions. Committees will determine areas still requiring extensions to the guidelines, analyze the areas, and develop the substance of their recommendations. They will also prepare revisions of the guidelines if required by experience (by January 1992). Committees and subcommittees will formalize their further extensions to the guidelines, and consider comments on the interim draft. Formal replies to comments will be issued, and the extensions and revisions will be incorporated into a final draft of the guidelines. The final draft will be distributed to the advisory board for consideration (by April 1992). The Advisory Board will meet to consider and endorse the guidelines (by June 1992).