Computational Lexica Work Group Minutes of Meeting Held in Cambridge, Massachusetts 31 October 1990 Robert Ingria Document Number: TEI AI6M1 The TEI working group on computational lexica had its initial meeting on October 31, 1990, at BBN in Cambridge, Massachusetts. Present were Robert Ingria (chair), James Pustejovsky, and Susan Warwick. At this meeting, one parameter on the nature of possible tag sets for lexicons was decided upon. (Note that the term ``lexicon'' here denotes a machine-readable text containing lexical information, which is principally intended for machine consumption.) It is not necessary for an interchange tag set for lexicons to preserve the physical structure of the input text. Unlike other texts that might be encoded according to TEI guidelines, there is no original print form that needs to be preserved. Only the informational structure needs to be preserved. Given this background, we looked at a number of actual lexical entries, principally from Machine Translation lexicons. (Sample MT entries had been gathered by Susan Warwick via EMail solicitation.) Among the systems represented were Critter, Polygloss, METAL, ELU, DLT, and The Utrecht Mimo system. These systems differed along a number of dimensions: (1) Declarative vs. procedural representation. (2) Full bilingual lexica for individual language pairs vs. monolingual lexica for each language and separate transfer lexica, for each language pair. There were also brief examinations of entries from parsing lexicons, and the two volumes of Igor Melcuk's Explanatory Combinatory Dictionary for French which have appeared so far. After examining the range of parameters on which these lexicons and lexical databases vary, we came up with the following proposals. (1) The members of the Working Group will each examine one of the major lexicon types, by consulting actually existing lexicons, in a feasability study to determine whether it is even possible to come up with an interchange format for each individual lexicon type. Tasks were divide up as follows: Ingria - will update his Grosseto survey on parsing lexicons and will also look at generation lexicons. Pustejovsky - will look at lexical knowledge bases, such as Miller's Wordnet, Melcuk's ECD, and others, such as that currently being developed at Brandeis. Warwick - will continue her examination of MT lexicons. (2) Since Warwick has been invited to attend the DARPA Spoken Language Workshop in Pacific Grove, California, on February 19-22, 1991, that seems like a reasonable time for a meeting at which the Working group members will report on their conclusions. (There may be one meeting of all the members, if this is possible, or there may be a meeting of Ingria and Pustejovsky in Boston, and one between Ingria and Warwick, in Pacific Grove. This is yet to be firmed up.) (3) The feasability studies will determine what the group can usefully do within the mandate of the TEI to propose standards for exchange of lexical data. If possible, we will propose an initial set of guidelines. To the greatest extent possible, we will try to be consistent with the proposals of the dictionary working group, and other working groups (such as syntax, morphology, etc.) (4) We will have a follow-up meeting, probably in Europe, to examine the individual draft proposals and to see if they can be combined into a single set of guideline or if they should be separate. Even if they are separate tag sets, it is likely that identical or analogous pieces of information will be given the same tagging. Scheduling for this meeting will follow the successful outcome of our February meeting(s). (5) At the end of the first year of this working group, depending on the outcome of the meetings in (2) and (4), we will form one or more new working groups, to draft more complete draft specification for the requisite tag set(s). The following items were handed out to the participants: By the Chair: An SGML version of TEI document AI6 P1 A text version of TEI document AI6 P1 An excerpt from the minutes of the Tucson meeting dealing with a discussion of the goals of the lexicon working group Discussion notes By Susan Warwick-Armstrong: Susan Warwick: ``Automated Lexical Resources in Europe: a survey'' Example MT entries: heid.ex = Polygloss entries metal.ex = Metal entries aval.ex = ELU Transfer dictionary dlt.ex = DLT examples pim1.ex = MIMO system pim2.ex = ditto CRITTER examples