Text Encoding Initiative A Joint Project of the Association for Computational Linguistics, Association for Computers and the Humanities, and Association for Literary and Linguistic Computing Minutes of the meeting of the temporary Steering Committee Pisa, 12-13 December 1987 Present: Susan Hockey (ALLC), Nancy Ide (ACH), Michael Sperberg-McQueen (ACH), Antonio Zampolli (ALLC) Absent: Robert Amsler (ACL), Donald Walker (ACL) 1. Agenda The agenda suggested by NI was accepted: 1 formalization of existing situation a memorandum of understanding among ACH, ACL, and ALLC (the 'sponsoring organizations') [see section 3 below] b memorandum of understanding between the sponsoring organizations and other participating organizations [see section 4 below] 2 planning of next stages a editor(s): how many, full or half time, how chosen [see section 8 below] b committees: how many, how named, method of work, etc. [see sections 6 and 9 below] c review-and-comment and publication phases: brief discussion 3 funding [see section 10 below] a potential sources of funds b timing of applications c who writes the applications d timing of effort as a whole (must be coordinated with timing requirements of funding agencies) 4 return to point 1 to complete drafting of memoranda In the press of time, item 2c was skipped, and some topics were taken up out of turn. These minutes preserve the order of discussion. 2. Status of decisions It was noted that two members of the temporary steering committee were absent, and confirmed several times that the decisions being taken were to be submitted to them for comment and the issues reopened for further discussion if necessary. 3. Role of Sponsoring Organizations (agenda item 1a) It was agreed that the day-to-day supervision of the project to create guidelines and an interchange format for text encoding should be performed by the permanent steering committee, when it is constituted. The three sponsoring organizations must take the lead in the effort and accept final responsibility for the result; since they cannot exercise detailed supervision directly, they must exercise their responsibility through the steering committee which they appoint. The three sponsoring organizations should undertake: 1 to name two members each to a permanent steering committee, which will supervise the work until completion, naming the editor(s) and voting members of the working committees, supervising the work, and checking the results 2 to establish some mechanism for maintaining the guidelines and interchange format after the first version is completed (N.B. this commitment could conceivably have financial repercussions for the organizations) 3 to endorse, publish, and disseminate the guidelines 4 to contribute the technical expertise of their members to the success of the project These points will be included in a formal memorandum of understanding which will be drafted by the temporary steering committee. The first draft of this memorandum will be prepared by MS-McQ and circulated to the rest of the temporary steering committee for comment. 4. Participating Organizations (agenda item 1b) 4.a. Prospective Participants [see also Addenda] The "participating organizations" are expected to include: the Modern Language Association of America (MLA), the Linguistic Society of America (LSA), the Association of Documentary Editors (ADE), the American Historical Association (AHA), the Association for Computing Machinery Special Interest Group for Information Retrieval (ACM/SIGIR), the American Philological Association (APA), and the Association of American Publishers (AAP), which have all expressed interest in participating, although they have not yet officially signed any memorandum of understanding. The organizations already named are predominantly North American; it was decided to ask equivalent European organizations to participate in the project, but strictly national organizations in the European countries will not be invited to participate, because they are too numerous and the project would bog down. Individual text archives, similarly, will not be invited to participate as organizations, in part because of their numbers and in part because of the difficulty in deciding whom to invite and whom not to invite. The archives will be officially informed of the initiative by individual letters from the steering committee, and their cooperation will be requested and encouraged, but most of their staff are members of the sponsoring organizations, so their participation can be channeled through those groups; the steering committee foresees no formal role for the archives as institutions. The following organizations will definitely be approached regarding participation: the International Linguistic Association (AILA) the Association for History and Computing (AHC) the European Association for Lexicography (Euralex) the Societas Linguistica Europaea (SLE) The AHC is already working on the encoding of historical materials in a committee comprising Manfred Thaller of Goettingen, Bozzi of Pisa, and Jean-Philippe Genet of Paris; they must be approached cautiously. AZ will confer informally with them by telephone before a formal approach is made. Other organizations considered without a final decision being made: the Dictionary Society of North America (DSNA) the British Academy the various library associations, notably the (British) Library Association, the American Library Association, and the Canadian Committee on Cataloguing The British Academy may be approached for funding instead of or in addition to participation. The library community should be represented, if possible, by a single body rather than many, for example the Joint Steering Committee for the Revision of AACR II, which unites representatives of the three Anglophone library associations just named. 4.b. Role of Participating Organizations The temporary steering committee will draw up a memorandum of understanding to govern the participation of these organizations (first draft to be prepared by NI). It will specify that each participating organization: 1 endorses the idea of a common interchange format and set of guidelines for new texts, to be prepared in accordance with the Poughkeepsie Statement 2 will name one member of an advisory board, who will represent the organization at meetings and be responsible for circulating draft documents within the organization for informed comment 3 may suggest individuals qualified to serve on the working committees 4 will publicize the initiative among their members 5 will circulate drafts among their members and staff so as to encourage adequate technical review and comment The advisory board consisting of the representatives of all the participating organizations will meet once or twice, probably at the beginning of the drafting phase and again at the end of the revision phase -- this later meeting should produce the final approval of the guidelines from all participating organizations. The advisory board will be too large to meet more often; members will have to be kept informed of progress by mail and by the distribution of drafts. 5. Letterhead, Name [see also Addenda] Since the letters to known and prospective participating organizations should be signed by the steering committee, it was decided to prepare letterhead stationery for the initiative with funds remaining from the Vassar conference. The name of the initiative was, after discussion and reflection, formulated as: ACH/ACL/ALLC Text Encoding Initiative: Initiative for Text Encoding Guidelines and a Common Interchange Format for Literary and Linguistic Data NI will have the stationery printed and distribute it as soon as possible, so that the memoranda of understanding can be distributed with cover letters on the stationery, in time for the Christmas meetings of the various North American groups. It was explicitly decided to avoid the term 'standard' for the moment; the end product of the initiative will, it is hoped, effect a spontaneous normalization of practice, but this can occur without ISO recognition. If ISO recognition seems a reasonable goal later, the sponsoring organizations can pursue it then. 6. Working Committee Organization (agenda item 2b) 6.a. Number and Constitution of Committees It was decided to constitute four committees, each responsible for one of the four areas delineated in point 6 of the Poughkeepsie Statement: text documentation, text representation, text analysis and interpretation, and development of a formal metalanguage and description of existing schemes. The chair of each committee is to be named by the steering committee with the editor(s), and will be requested to report in writing on the progress of the committee, as well as meeting from time to time with the steering committee, the editor(s), and the other committee chairs. Other formal members of the committees are to be named by the steering committee; participating and sponsoring organizations will be invited to suggest names, but the steering committee will not be bound by their suggestions. Subcommittees on specific topics will be organized by each chairman; their membership will be open to all volunteers, and will not be limited to members of the parent committee. Subcommittees will report to a parent committee, and the four parent committees will be responsible for ensuring compatibility among the recommendations of the various subcommittees. 6.b. Coordination of Committee Work Coordination among the subcommittees and among the four parent working committees will be handled by meetings of the committee chairs and by the exchange of documents. The editor(s) will have to take particular care that communication among the parent committees takes place as required; for this reason, the editor will be expected to attend meetings of the working committees whenever possible. The editor(s), in consultation with the steering committee, will draft general working instructions for the committees, showing concretely what documents the committees are responsible for producing. A rough division of labor among the committees will also be prepared by the editor(s) and the steering committee. From these preliminary documents each committee chair will be encouraged to prepare an initial analysis of the area to be covered by the committee; this initial paper prepared by the chair will serve as a starting point for the committee's work. 7. Publication of Working Papers Since the heart of the work to be done will involve intensive analytic work by members of the subcommittees, it is essential that the individuals who perform this work have an opportunity to publish their results in an appropriate form. It was decided to seek commitments from the editors of Literary and Linguistic Computing, Computational Linguistics, and Computers and the Humanities (as well as Linguistica Computazionale) that they will accept appropriate papers arising out of the preparation of working papers for the project. It was also decided that the steering committee will arrange to publish a series of working papers as required, and will explicitly undertake to help ensure that work performed for the project will be published in appropriate organs. 8. Editor(s) (agenda item 2a) [see also Addenda] It was decided after brief discussion, MS-McQ dissenting, that a single editor should oversee the project rather than two co-editors. Funding will be sought to cover 100% of this editor's time. It was also decided (MS-McQ abstaining) to ask MS-McQ to be the editor. He replied that he would be willing to accept the task, circumstances permitting, and outlined some developments which might make that impossible. It will be known by March or April, and possibly as early as the end of December, whether MS-McQ will or will not be able to act as editor at all. Whether he can act full-time is a separate question; his superiors have thus far placed the following conditions upon any editorial position: 1 it should not exceed 50% of MS-M's time 2 the true cost of salary must be funded (salary + overhead) 3 the editorial position should be so discharged as to help resolve satisfactorily certain matters internal to the university The steering committee saw no objection to point (2), MS-M undertook to satisfy point 3 without prejudice to the intellectual content of the guidelines, and the issue of full-time versus half-time will be taken up with MS-McQ's superiors at the University of Illinois at Chicago. The tasks of the editor were outlined as follows: 1 to ensure the proper circulation of documents among those working on the project, and perform other duties of a scientific secretariat 2 to coordinate the work of the four working committees and their subcommittees, serving as liaison among the committees and between the committees and the steering committee 3 to draft the documents describing the organization of committee work, in conjunction with the steering committee 4 to draft the basic charge or list of responsibilities for each committee 5 to receive from the committees the results of their work with all the relevant information, and ensure their compatibility each with the other 6 to review and edit the final document, integrating the work of the various committees into a single coherent whole and rewriting portions of the document as necessary so as to ensure consistency 7 to administer the scientific secretariat of the initiative, and distribute travel funds and subsidies In all of these tasks the editor will work under the supervision of the steering committee, and with their help. The active role of the editor in coordinating content and style of the guidelines was discussed at some length; such active intervention by editors is more common in Anglophone countries than on the Continent, and it was observed that in discussing the project with non-English speakers the term 'editor' and its cognates should perhaps be avoided or heavily qualified, to avoid misunderstanding. 9. Working Committees -- Division of Labor (agenda item 2b) 9.a. Text Documentation Committee The text documentation committee will work primarily on tags for identifying the text in a file, its source edition, and the parties responsible for the machine-readable encoding. This committee will need coordination with library-based efforts to catalogue machine-readable data files, but no serious intellectual problems are expected. The computational documentation of the file (in the form of declarations, etc.) may also be considered by this committee, but the major responsibility for the content of declarations will be borne by committees 2 and 3, and for the syntax of the declarations by committee 4. 9.b. Text Representation Committee 2 (text representation) will consider techniques for encoding all the information explicitly present in a copy text on the physical or graphetic level, as well as all information conventionally represented on the graphetic level, whether present in the copy text or added by the encoder or later analysts. Topics included in the field of this committee thus include the marking or encoding of: quotations editorial additions, deletions, or corrections editoral apparatus (apparatus criticus) mathematical formulas diacritics character sets logical structure of a text (chapters, paragraphs, etc.) topographical / layout information figures, tables, and illustrations captions conventional reference numbers for a text lineation (on page, in column, in logical subdivision, etc.) recto and verso, color of page, etc. hyphenations (including declaration of how hyphenation is treated) punctuation change of language or alphabets the conventional use, in a given encoding, of characters as alphabetics, punctuation, diacritics, or separator special problems of numismatic, epigraphic, or paleographic material special problems posed by the physical realization of a genre (e.g. comic strips) Special problems of spoken texts are not taken up here but in the committee on text analysis and interpretation. It was noted that the distinction between committees 2 and 3 was not that between 'objective' and 'subjective' or 'interpretive' information, since the graphetic level of the text can indicate specific editorial interpretations by font, layout, or special punctuation. Committee 2 is not limited to strictly objective information. 9.c. Text Analysis and Interpretation Committee 3 will be responsible for all interpretive material not conventionally represented physically in an edition. The boundary between it and committee 2 is known to be porous, and will require special attention from the editor and both chairs. Certainly included under committee 3 are: phonological transcriptions syntactic analyses content analyses thematic analyses stylistic analyses (to the extent they are included at all) metrical analyses special problems of spoken language special problems of dictionary encoding special problems of other reference works (encyclopedias, etc.) The results of statistical analyses, and some types of textual-critical analysis, represent known borderline cases. The steering committee will suggest that committee 3 begin with the most common kinds of analysis, striving first to reach a consensus on the formal syntax of data representation (e.g. a common syntax for representing tree structures), and attempting further, if possible, to achieve some consensus on aspects of content that are relevant for exchange of data among scholars of divergent theoretical stances. 9.d. Metalanguage Definition [tabled] 10. Funding (agenda item 3) 10.a. National Endowment for the Humanities NI reported that the National Endowment for the Humanities (NEH) has requested a proposal from us by the end of January, so it can be sent out for review and considered at an NEH meeting in May. A second application, also for the entire amount of the grant, should be made to the National Science Foundation (NSF) at the same time. NEH and NSF will agree between them how to share the costs of supporting the work. NEH is pleased that the initiative is international in scope, but feels strongly that European participation in the effort should be funded by European sources. NI and MS-M will draft the NEH/NSF proposal and distribute it for comment to the members of the steering committee. It will be signed by all six members of the committee. (No physical signatures will be required, but all six names will be on the title page.) 10.b. European Funding Sources AZ reviewed the various potential sources for European funding. The European Science Foundation's Standing Committee on Humanities may be interested in funding the initiative, but the secretary is not available for a meeting. The standing committee meets in January or February; any funding proposal must be discussed at two consecutive meetings. AZ will explore the possibilities. Any application to ESF will have to stress, and support, the research and analytic aspects, rather than the standardization aspects, of the project. Some funding for limited purposes may be found from various institutes such as the ICL in Pisa or similar institutions in other countries. SH observed that the British Library is funding a research worker to work with Lou Burnard on documenting encoding schemes used in the Oxford Text Archive (among other problems). SH will also look into applications to the British Academy for some funds. The EEC has allocated funds for a large effort which is to include "development of methods and tools for the reusability of lexical resources in computerized applications" and "creation of standards for lexicological and terminological data". Mechanisms for distributing these funds are now being devised; when they are complete, we may be able to apply for funding. Because we are just beginning our contacts with the European funding agencies, it will be some time before European funds can be available. 10.c. Tentative Budget A tentative rough budget was drawn up, based on a three-year duration: ca. $200 000 Editor's salary, including overhead costs of home institution ($60000/year, with rises) ca. 100 000 Editor's travel (NI and MS-M now suggest reducing this to ca $36 000, and putting remainder into travel subsidies for committees) ca. 40 000 Eight steering committee meetings (each ca. $7000) ca. 20 000 Two meetings of the advisory board ca. 100 000 Travel subsidies for committee work (NI and MS-M suggest now: 164 000) ca. 40 000 Clerical help, offic expenses, documents, etc. (N.B. The editor's salary is computed here to include university overhead but the other amounts do NOT include overhead of the administering institution. If the host institution insists of charging overhead for these amounts, the amount to be requested in the grant will rise.) It was agreed to ask NEH and NSF for about $450 000 to $500 000, and ask the European funding sources for about $150 000 to $200 000. Further work on the budget and on the timetable of committee meetings will be performed by NI and MS-M in connection with drafting the NEH/NSF proposal. 11. Steering Committee Chair This meeting proceeded without a chair, but it was decided to name a chair for administrative purposes, and let it rotate among the organizations: ACH in the first year, ACL in the second, and ALLC in the third. 12. Agenda for the near future: 1 proposal to NEH and NSF. Due 31 January 1988. NI and MS-M will draft this and distribute it to the steering committee. 2 final constitution of permanent steering committee. ACH's executive council will meet December 29 and should name its two permanent representatives then. ALLC's council will vote by mail and should have a final decision by 9 January. ACL appears not to require council confirmation of Walker and Amsler; NI will check. 3 memorandum of understanding among the sponsoring organizations. MS-M will draft and circulate by 20 December. 4 memorandum of understanding for participating organizations. NI will draft and circulate by 20 December. 5 funding dicussions: NI will continue to confer with NEH, AZ will explore possibilities within ESF, EEC, and the Council of Europe. He expects to have something to report by late February. 6 approach to prospective participating organizations. Indirect and informal approaches by telephone first, followed by letters signed by steering committee, or by one member for entire committee. AZ will phone the European organizations. (?) 7 editorial site. MS-M will discuss possibilities with his university and report. 8 additional funding. AZ will suggest a meeting on text encoding as the topic for the Grosseto meeting in October. 9 next steering committee meeting. Suggested for a weekend in late February 1988. Tentatively: 20-21 February 1988, in New York or northern New Jersey. If not then, the meeting will have to be postponed until March. [see Addenda] Final note: many thanks are due to Antonio Zampolli for his initiative in organizing this meeting, his willingness to host it, and the wonderful hospitality (even including access to the network!) he and his institute extended to the way-weary representatives from Britain and the U.S. Many thanks to him, and may all our meetings be as pleasant and as productive. Respectfully submitted, 12 January 1988, Michael Sperberg-McQueen, Secretary pro tem. * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * Addenda (from the discussion of the earlier draft of these minutes) 1 The ACL representatives have urged some role as sponsoring or participating organization for the American Society for Information Science (ASIS). 2 The name of the initiative has been reconsidered, and the consensus appears to be that the name should be simply 'Text Encoding Initiative' with the names of the sponsoring organizations below it in smaller type on the letterhead. 3 BA would like to discuss some more formal role for the text archives. 4 The management of the University of Illinois at Chicago Computer Center is willing to allow MSM to devote half time to the text encoding initiative for three years, if funding can be found; full time release from present duties is not possible. 4 A February date is not possible for a New Jersey meeting; DW has proposed March 13-14.