On Lexical Ambiguity
 
                     Document Number:  TEI-AI-W-21
 
                             March 24, 1990
 
 
                         D. Terence Langendoen
 
                       Department of Linguistics
                         University of Arizona
                            Tucson, AZ 85721
 
 
   Suppose we wish to mark up the example:
 
     Wash sinks.
 
simply for lexical ambiguity, using a standard feature-structure inter-
pretation of the Lund corpus markup tags.  First, we can give it a tex-
tual markup along the lines of my proposal in TEI-AI-W-20, as follows
(again, details are omitted concerning character markup).
 
     < s id=s1>
           < w id=w1>  Wash  < /w> &amper;rbl;
           < w id=w2>  sinks < /w>
           < c id=c11> .     < /c>
     < /s>
 
Pointers to the lexicon are not put in here, since they are to be con-
sidered part of the analysis.
 
   Next, we provide an analysis for each word.  I propose that multiple
analyses be grouped under a single tag, for which I suggest the name
"analysis-list".  This tag should be allowed to have attributes which
specify whether the alternatives are ranked and the basis for the rank-
ing, if any.  Each analysis is tagged with "analysis", as in my earlier
proposals.  A "rank" attribute on this tag specifies the ranking; it
takes numerical values, with rank=1 being the highest rank, rank=2 the
next highest, etc., and rank=0 meaning that this interpretation should
be ignored.  I assume that each analysis is a single feature-structure,
tagged with "fstruct", which is permitted besides its own ID attribute
at least one IDREF attribute which points to an entry in the lexicon.  I
will call this latter attribute "lexp", as in TEI-AI-W-20.
 
   I also assume entity definitions which have the following effect.  (I
don't specify these in SGML because I don't quite know how.)
 
Entity    Feature specification
______    _____________________
 
&amper;noun;
&amper;noun;
          < feature>< fname>cat< /fname>< fstruct>noun< /fstruct><
          /feature>
 
&amper;common;
&amper;common;
          < feature>< fname>subcat< /fname>< fstruct>common< /fstruct><
          /feature>
 
&amper;proper;
&amper;proper;
          < feature>< fname>subcat< /fname>< fstruct>proper< /fstruct><
          /feature>
 
&amper;unmarked;
&amper;unmarked;
          < feature>< fname>subsub< /fname>< fstruct>unmarked<
          /fstruct>< /feature>
 
&amper;plural;
&amper;plural;
          < feature>< fname>subsub< /fname>< fstruct>plural< /fstruct><
          /feature>
 
&amper;verb;
&amper;verb;
          < feature>< fname>cat< /fname>< fstruct>verb< /fstruct><
          /feature>
 
&amper;main;
&amper;main;
          < feature>< fname>subcat< /fname>< fstruct>main< /fstruct><
          /feature>
 
&amper;base;
&amper;base;
          < feature>< fname>subsub< /fname>< fstruct>base< /fstruct><
          /feature>
 
&amper;s-form;
&amper;s-form;
          < feature>< fname>subsub< /fname>< fstruct>s-form< /fstruct><
          /feature>
 
&amper;NC;
&amper;NC;
          &amper;noun;&amper;common;&amper;unmarked;
 
&amper;NC2;
&amper;NC2;
          &amper;noun;&amper;common;&amper;plural;
 
&amper;NP;
&amper;NP;
          &amper;noun;&amper;proper;&amper;unmarked;
 
&amper;VA0;
&amper;VA0;
          &amper;verb;&amper;main;&amper;base;
 
&amper;VA3;
&amper;VA3;
          &amper;verb;&amper;main;&amper;s-form;
 
   Here is a representation of the various analyses.
 
     < analysis-list id=w1 ranking=yes>
           < analysis rank=1>
                 < fstruct lexp=e2> &amper;NC;
                 < /fstruct>
           < /analysis>
           < analysis rank=0>
                 < fstruct> &amper;NP;
                 < /fstruct>
           < /analysis>
           < analysis rank=2>
                 < fstruct lexp=e2> &amper;VA0;
                 < /fstruct>
           < /analysis>
     < /analysis-list>
     < analysis-list id=w2 ranking=yes>
           < analysis rank=1>
                 < fstruct lexp=e1> &amper;VA3;
                 < /fstruct>
           < /analysis>
           < analysis rank=2>
                 < fstruct lexp=e1> &amper;NC2;
                 < /fstruct>
           < /analysis>
     < /analysis-list>
 
   Finally, the lexicon is as follows, where, as before, the "lexicon"
tag simply indicates where the lexicon section starts.
 
     < lexicon>
           < entry id=e1> sink < /entry>
           < entry id=e2> wash < /entry>
 
 
 
 
 
 
 
 
 
 
                                                                  page 2
------------------------------------------------------------------------
 
 
Terry Langendoen             phone: (+1 602) 621-6898 Department of Lin-
guistics    bitnet: langendt@arizvm1 University of Arizona        inter-
net: langendt@arizvm1.ccit.arizona.edu Tucson, AZ 85721 USA         fax:
(+1 602) 621-9424