Date: Mon, 17 Feb 92 18:36 EDT From: "NANCY M. IDE (914) 437 5988" Subject: minutes of mitre To: u35395@UICVM.BITNET From: IN%"amsler@starbase.MITRE.ORG" 9-FEB-1992 13:19:40.49 To: Carol.Van.Ess.Dykema@nl.cs.cmu.edu, GLOTTOLO%ICNUCEVM.CNUCE.CNR.IT@CUNYV M.CUNY.EDU, amsler@starbase.mitre.org, fwtompa@waterloo.edu, ide@vassar.edu, ingria@bbn.com, jjohn@apollo.lap.upenn.edu, louise@nmsu.edu, warwick%cui.unige.ch @relay.cs.net CC: Subj: Emergency Sub-Group PDWG meeting Date: Sun, 9 Feb 92 13:10:42 EST From: amsler@starbase.MITRE.ORG Subject: Emergency Sub-Group PDWG meeting To: Carol.Van.Ess.Dykema@nl.cs.cmu.edu, GLOTTOLO%ICNUCEVM.CNUCE.CNR.IT@CUNYVM.CUNY.EDU, amsler@starbase.mitre.org, fwtompa@waterloo.edu, ide@vassar.edu, ingria@bbn.com, jjohn@apollo.lap.upenn.edu, louise@nmsu.edu, warwick%cui.unige.ch@relay.cs.net Message-id: <9202091810.AA09860@starbase.mitre.org> This is a quick rundown on what went on, presented to solicit (1) feedback from the people who were there (2) opinions and reminders from those who weren't when/where we may have undone things that seemed to be agreed to previously of forgotten things we should have remembered. The intent is to send Michael Sperberg McQueen and Lou Burnard some report on this meeting ASAP especially since we are petitioning for a two week grace period in which to rewrite materials for the P2 draft. ==== The meeting was held in McLean, Virginia at MITRE Corp.'s Hayes Buidling, 7525 Colshire Drive and started at 9 AM and ran through til 4 PM. Nancy Ide, Jean Veronis, John Fought, and Carol Van-Ess Dykema and myself, Robert Amsler in attendance. The agenda was to discuss the state of the AI5 standard write-up and the materials distributed by MSM. We quickly reached a concensus that there were essentially three levels of standard being described. The first was the "base tag" level. The second was the allownce of "nesting" of base tags inside specific labeled tags. The third was the "Grouping Tag" discussion, and in particular the idea of a generic group tag which could bundle almost arbitrary subcollections of nesting and base tags together for the purpose of making explicit the semantics of inheritance of information. We decided that the concept of the grouping tag had not been well explained so far and that this was causing difficulty. Nancy Ide agreed to supply a new description of the grouping tag to help alleviate this problem. We then went about creating a new outline of the AI5 guidelines, since none could determine what the guidelines actually looked like from the materials in fragments provided by MSM (Sorry, Michael, we couldn't read the stuff well enough to know what the whole thing looked like). The remainder of this message details the discussion of the standard as we recalled it, however before that the conclusions were that we'd like two weeks in which to prepare and submit new prose and tag lists for consideration. The new tag list is being prepared by Fought and Van Ess Dykema, the new prose description by Nancy Ide. These will be quickly distributed the the rest of the AI5 committee for discussion and sent on to the editors in two weeks. ------ [[Comment by Amsler: Please supplement and correct the following ]] The new outline was as follows: Three MAIN sections, I. Context/Introduction II. Tags III. Examples & DTD In I. 1. Introduction 2. Issues in Dictionary Encoding 2.1 Multiple Goals of Encoding (to cover the goals of being faithful to the "printed form" vs. being faithful to the "logical structure" of the material; recording "explicit" vs. "implicit" information; and recording "literal" vs. "interpreted" information) [Some concern here was expressed that too much had been deleted from the EUROLEX paper materials by LB such that there were serious problems in explaining that the intent of the person creating a representation of a print dictionary would force choices between these options that could not be reconciled in one encoding.] 2.2 Multiple Levels of Encoding ("base", "nested" and "grouped" for semantic inheritance). A matrix was developed to explain these two axes, which may be visualized as follows: || LEVELS || || base tags | nested tags | "group" tag ========================================================== faithful || | | G to the || | | O printed || | | A book || | | L ---------------------------------------------------------- S "augmented"|| | | || | | ---------------------------------------------------------- "altered" || | | || | | ---------------------------------------------------------- The vertical dimension "can" be contracted into only two levels, with "augmented" and "altered" being considered the same. The idea was that some changes to the dictionary, such as normalizing the abbreviation used for "feminine" if the dictionary used all of "f.", "fem." and "feminine" and expanding the elliptical forms of things like "talk, -ed, -ing" to "talk, talked, talking" are really just "augmentations" rather than alterations, whereas others such as parceling out the British vs. American pronunciations between senses which are listed as US only or Brit. only, when the pronunciations were given once at the top as both applying to the entry, may be considered "altering". (It was noted, that SOME changes, such as eliminating end-of-line hyphens, are not considered either "augmentations" or "alterations", since these are artifacts of the typographer's formatting rather than any content-based changes. A discussion here about which of the following forms were allowed was enlightening. If value1 is the form directly printed in the dictionary, e.g. "fem." or "fem" or "feminine" or "f" or "F", etc. and value2 is some cannonical form desired by the encoder, e.g. "f" then the options for how to record this information include the following six cases: Legal or not (X) X 1. (no data, original preserved) X 2. (no data, original lost) [[Comment by RA: Numbering on value in cases 1 and 2 reversed to be consistent with other cases]] X 3. value2 (only changed data kept) 4. value1 (only original data kept) 5. value1 (both kept) 6. value2 (both kept) Thus, cases 4,5 and 6 were considered to be the ones we wanted to endorse. This is a bit prescriptive, since it is quite possible in a preface to say explicitly what one has done, but we felt 1,2 and 3 were more inappropriate. The Outline continued with Section II, Tags, and has three parts 3. Base Tags for a Dictionary Entry 4. Nesting Tags for a Dictionary Entry 5. The Abstract Group Tag for a Dictionary Entry Section III, contained two parts 6. Example Entries Tagged with Base, Nesting and Group tags - monolingual, bilingual 7. DTD ---- The production of this outline consumed most of the morning session, we broke for lunch in the MITRE cafeteria, resuming shortly after 1 The afternoon session deepened the discussions of the topics of the morning, but primarily focused on an effort to enumerate the base and nesting tags. The base tag list was found to contain (with some implicit ordering for order of apperance in the entry): (back again after another debate on whether it is needed. The pform tag is basically for "monstrosities" in entries which present intermingled syllable, stress, pronunciation and/or orthographic forms, with things like centered dots used for syllables in the recording of headwords. This may mean pform is limited to the "faithful to printed book" row of the matrix. (headword, used to note the single word selected for the basis for the alphabetical order in which the entry appears, though this "could" be an attribute on the particular "orth") [[Comment by RA: This latter option sounds better to me, i.e. an hwd is merely an orth with the attribute type=hwd]] (sense number) (e.g. singular/plural) (part of speech) [[comment by RA: It strikes me we did nothing about morphology (i.e. syllables), pronunciation (stress), or Synonym/Anyonyms]] These "base" tags were nested inside the next level of tags, in many cases such that some of the nesting tags could hardly be left out if their base tags were used--or conversely, some of the nesting tags were so essential that it was hard to imagine not recording them if the base tags were used. This leads to a bit of a dilemma, in that nesting seems to be a variable thing, for example, seems quite likely to be wanted whereas the base tags inside it are a bit more problematic to encode. and seem to both be used together or else would be a "base" tag? The nesting tags are:
, which contains , and , which contains the etymology , which contains which contains , , and which contains , , , as well as the base tags which contains (i.e. , , and ). which contains all of the above and itself. [[Comment by RA: The difficulty of being unable to specify the attributes of a nesting tag seem to create a need for some special forms of some tags, as for to be the unique, single tag which contains all the material selected to appear at one alphabetic position in the directionary). ??]] The meeting concluded with a new discussion of how the tag was actually intended to cover the strangeness of the tag seemingly also being required to contain at some times as well as even containing other tags, whereas and may not contain each other. The tree structure of SNS (i.e. Sense, a.k.a. "group") | --------------------------------- | | | | | | | eg etym form SNS gram desc xref was offered, but not conclusively agreed to. Another comment was that what we REALLY wanted was a 95% for the cannonical dictionary entry and then a 5% DTD which allowed basically everything else to happen. The assumption would be that the user used the 95% "orthodox" DTD most of the time, but then occasionally noted that some entries didn't fit that form and then used the "liberal" "anything goes" DTD for the remaining entries. This reflects the experience of the OUP folks with the OED encoding, which left them feeling that there really was some sense of a real DTD in the OED, but that some entries violated it, making it impossible to state one DTD with any real character for the whole book. Another thought which seemed important was the comment made by Jean Veronis that we should maintain a collection of special examples that illustrate the abberancies of dictionary encodings. In Lou Burnard's words, a "zoo" of strange (lexical) creatures whose entries lead to all these fine distinctions. I like this idea a lot. It would seem to require some means of classification of the entries to reveal why the contributor feels that are unusual, but perhaps that can be handled in comments initially.