.sr docfile = &sysfnam. ;.sr docversion = 'Draft' .im teigmlp1 .* Document proper begins. Tagging Parts of Speech <author>C. M. Sperberg-McQueen <docnum>TEI &docfile. <date>&docdate. </titlep> <abstract> A set of feature structures sufficient to express the analysis underlying the tags of the tagged LOB corpus is provided, with a set of entity names for tagging text. The method of construction, similar to that described in AIW21, "On Lexical Ambiguity," is such as to allow simple extension to other languages and other grammatical features. The possible meanings of underspecified analyses are described and discussed. <note>The paper is not now complete; it offers a full list of the basic grammatical categories in the LOB scheme, and full specification of the features of verbs and some other categories. Full definitions of the LOB tags in feature-structure notation are given only for verb tags. The work on non-lexical parts of speech, especially, does not agree with normal linguistic analysis and should be brought into line. <p> Much work remains to be done in this area; I believe it should proceed as follows: <ol> <li>assume the method described here and in AI W 21 for representing complex feature structures with simple entity references built up out of other entity references for simple feature-value pairs <li>develop a set of feature structures (ignoring for the moment the SGML formalism) for <q>standard average European</q> as commonly annotated in large corpora: number, gender, case, tense, etc. <li>ensure that the feature structures so developed are upward compatible with commonly used schemes like LOB, Brown, etc. That is, LOB, Brown and other common schemes should fall out of the TEI scheme as simplifications or as particular sets of values. <li>if consensus can be achieved, propose a specific set of tags using this standard-average-European feature set, for use in corpus annotation. <li>using the method assumed in point 1, create the required entity definitions for features and feature structures. Optionally provide DTD modifications for enforcing the standard average feature set. </ol> </note> </abstract> <toc> </frontm> <body> <h1>Introduction This paper describes one approach to expressing part-of-speech tags using the feature-structure markup proposed by the A&I committee. It takes as a given the part-of-speech classification of the LOB corpus and seeks an equivalent expression of that classification in TEI-conformant SGML. <note>It is intended that this paper eventually provide a full specification of the LOB tags in feature-structure notation. A partial specification, however, is enough to make clear the direction being suggested and to allow for comment. The paper is thus being distributed in a half-complete state.</note> The grammatical features now outlined in this paper include those required for a full treatment of LOB scheme's verb tags, and the grammatical features required for LOB's treatment of nouns, pronouns, conjunctions, numerals, and determiner-pronouns. Features required for adverbs, determiners and articles, adjectives, qualifiers, and WH-pronouns must still be added; this is a matter of transcription from the <q>naive</q> SGML form in which they have already been worked out. Full feature-structure definitions are given only for the LOB verb tags; similar definitions for the other categories remain to be formulated, which should be straightforward. Further work should attempt to extend these definitions to other classifications. <h1>Description of the LOB Tags We begin with a list of the tags used in the LOB markup. This is taken from <cit>The Tagged LOB Corpus: User's Manual</cit>, by Stig Johansson in collaboration with Eric Atwell, Roger Garside, and Geoffrey Leech (Bergen: Norwegian Computing Centre for the Humanities, 1986). <note>List to be added.</note> <h2>LOB Verb Tags For the moment, let's work with just the verb tags: <dl> <dt>BE <dd>the verb TO BE <dt>BED <dd>the verb TO BE, past tense <dt>BEDZ <dd>the verb TO BE, past tense, 3d person singular <dt>BEG <dd>the participle BEING <dt>BEM <dd>am, 'm <dt>BEN <dd>been <dt>BER <dd>are, 're <dt>BEZ <dd>is, 's <dt>DO <dd>the verb DO as auxiliary <dt>DOD <dd>did <dt>DOZ <dd>does <dt>HV <dd>have <dt>HVD <dd>had (as past tense) <dt>HVG <dd>having <dt>HVN <dd>had (as past participle) <dt>HVZ <dd>has <dt>MD <dd>modal auxiliary verb <dt>VB <dd>lexical verb <dt>VBD <dd>lexical verb in past tense <dt>VBG <dd>lexical verb, present participle <dt>VBN <dd>lexical verb, past participle <dt>VBZ <dd>lexical verb, third-person singular present tense </dl> <h2>Naive Transcription into SGML The semantics of these 22 tags can be reduced to a few atomic notions; if we use conventional (traditional?) grammatical terms, we can arrange these tags in a (sparse) matrix along the following axes: <dl> <dt>lexical type <dd>lexical verb, BE, DO, HAVE, auxiliary, or modal <dt>number <dd>singular, plural, or unmarked <dt>person <dd>1st, 2nd, 3rd, unmarked, or not applicable (for participles) <dt>tense <dd>present, past, future </dl> Because this is modern English, these axes are not truly orthogonal: <term>plural</term> occurs only for BER and <term>3rd-person</term> and <term>singular</term> correlate strongly. An analysis having <emph>only</emph> modern English in mind might thus collapse these features for reasons of economy; I keep them separate because this traditional analysis is clear and commonly understood and because it can more readily be extended to historical forms of English and to other Indo-European languages. The full analysis will also be required in in the pronoun system, in any case. The feature structures for the LOB tags for verbs can be built out of these primitive notions. One straightforward approach would use the major category as a generic identifier and specify feature-value pairs using the attribute-value notation. The element and attribute declarations would look like this: <xmp font=mono><![CDATA[ <!ELEMENT verb> <!ATTLIST verb n (sg, pl, ind) i -- number: singular, plural, indefinite -- p (1, 2, 3, 0) 0 -- person: 1, 2, 3, or unmarked -- pt (participle, nonparticipial) nonparticipial -- participles: yes or no -- t (pres, past, fut) pres -- tense: present, preterite, future -- lex (lex, be, do, have, aux, mod) lex -- lexical, auxiliary (and which), or modal -- > ]]> </xmp> So the various LOB verb tags could be specified thus: <gl> <gt>BE (be) <gd>verb lex=be <gt>BED (were) <gd>verb lex=be t=past <gt>BEDZ (was) <gd>verb lex=be t=past p=3 n=sg <gt>BEG (being) <gd>verb lex=be t=pres pt=part <gt>BEM (am, 'm) <gd>verb lex=be t=pres p=1 n=sg <gt>BEN (been) <gd>verb lex=be t=past pt=part <gt>BER (are, 're) <gd>verb lex=be t=pres n=pl <gt>BEZ (is, 's) <gd>verb lex=be t=pres p=3 n=sg <gt>DO (do) <gd>verb lex=do <gt>DOD (did) <gd>verb lex=do t=past <gt>DOZ (does) <gd>verb lex=do t=pres p=3 n=sg <gt>HV (have) <gd>verb lex=have <gt>HVD (had, 'd) <gd>verb lex=have t=past <gt>HVG (having) <gd>verb lex=have t=pres pt=part <gt>HVN (had (pp)) <gd>verb lex=have t=past pt=part <gt>HVZ (has, 's) <gd>verb lex=have t=pres p=3 n=sg <gt>MD (modal aux) <gd>verb lex=mod <gt>VB (base verb) <gd>verb <gt>VBD (past tense) <gd>verb t=past <gt>VBG (present participle, gerund) <gd>verb pt=part t=pres <gt>VBN (past participle) <gd>verb pt=part t=past <gt>VBZ (3d pers sg) <gd>verb p=3 n=sg </gl> In the notes which follow, this direct translation of category names and values into generic identifiers, attribute names, and attribute values will be called the <q>naive</q> approach. <h2>Transcription into SGML Using Feature Structures The naive SGML version of the LOB verb tags can be translated directly into the feature-structure notation devised by the A&I committee. The structure for BEZ, for example, might be expressed thus: <xmp font=mono> <![ CDATA [ <f.struct> <f.struct.name>BEZ <feature><f.name>category <f.struct>verb </feature> <feature><f.name>lexical type <f.struct>copula </feature> <feature><f.name>number <f.struct>singular </feature> <feature><f.name>person <f.struct>3rd </feature> <feature><f.name>tense <f.struct>present </feature> </f.struct> ]]> </xmp> while the feature structure for VBD might be somewhat simpler: <xmp font=mono> <![ CDATA [ <f.struct> <f.struct.name>VBD <feature><f.name>category <f.struct>verb </feature> <feature><f.name>lexical type <f.struct>full verb</feature> <feature><f.name>tense <f.struct>preterite</feature> </f.struct> ]]> </xmp> <h2>Interpretation of Missing Features Here we encounter a minor conundrum. By leaving <term>number</term> and <term>person</term> unspecified, this rendition of VBD could conceivably be claiming any of the following: <ol> <li id=free >1. that <term>number</term> was either <term>singular</term> or <term>plural</term> or <term>unmarked</term>, those being the allowable values <li id=default>2. that the word in question is not marked for number (so the feature defaults to the value <term>unmarked</term>) <li id=unknown>3. that the feature has an unknown value (e.g. the analysis is not complete and may or may not be completed later) <li id=inappli>4. that the feature does not apply here (i.e. the analysis is complete without it and cannot ever supply a value for this feature) </ol> It seems better to forbid the second interpretation (<liref refid=free>), and insist that failure to specify a value says <emph>nothing</emph> about the value---no defaulting mechanism is provided or allowed. Similarly, the first interpretation (<liref refid=default>) can be forced by explicitly providing an OR of the various possible values over which the feature can range or by providing a value like <term>unmarked</term>, which may have a similar effect, as it does here. The final interpretation (<liref refid=inappli>) may be tempting, but it would be unenforceable by any SGML parser. Moreover it would be redundant to specify this interpretation, by silence or any other means, every time a feature was not mentioned. It would suffice to specify such information once in a grammar; I conclude that a grammar is where such claims belong, and that we can therefore eliminate the final interpretation. Inapplicable features will always be passed over in silence, but not all features passed over in silence need be interpreted as inapplicable.<fn>Alternatively, a value of <term>unknown</term> or <term>unspecified</term> could be required for all features. This would eliminate the ambiguity between feature values not yet analyzed and inapplicable features, but it also makes the provision of underspecified analyses much more cumbersome. I do not recommend it.</fn> More properly, then, the tag VBD ought to be analyzed this way, specifying the value </term>unmarked</term> for both <term>person</term> and <term>number</term>. <xmp font=mono> <![ CDATA [ <f.struct> <f.struct.name>VBD <feature><f.name>category <f.struct>verb </feature> <feature><f.name>lexical type <f.struct>full verb</feature> <feature><f.name>number <f.struct>unmarked </feature> <feature><f.name>person <f.struct>unmarked </feature> <feature><f.name>tense <f.struct>preterite</feature> </f.struct> ]]> </xmp> Or we can be more explicit about the combinatorial possibilities, banning the value <term>unmarked</term> and restricting the values to 1st, 2nd, or 3rd person and singular or plural. In this case we provide explicit alternations to show the range of possibilities: <xmp font=mono> <![ CDATA [ <f.struct> <f.struct.name>VBD <feature><f.name>category <f.struct>verb </feature> <feature><f.name>lexical type <f.struct>full verb</feature> <feature><f.name>number <f.s.OR><f.struct>singular</f.struct> <f.struct>plural </f.struct> </f.s.OR> </feature> <feature><f.name>person <f.s.OR><f.struct>1st</f.struct> <f.struct>2nd</f.struct> <f.struct>3rd</f.struct> </f.s.OR> </feature> <feature><f.name>tense <f.struct>preterite</feature> </f.struct> ]]> </xmp> <h1>Definitions of Primitive Grammatical Elements It seems clear that feature structures like those just described may conveniently be expressed by general entity references which occur within <tag>f.struct</tag> tags, or which themselves contain the <tag>f.struct</tag> tags. Thus in a running text one might have: <xmp font=mono> Wash <f.struct>&nn; </f.struct> sinks <f.struct>&vbz; </f.struct> . <f.struct>&punct.stop;</f.struct> </xmp> It also seems clear that such complex entity values are best built up from smaller primitive entity values, each describing one feature. This has the advantage of allowing all analyses which use a grammatical feature (e.g. <term>number</term>) to use the same definitions. In the remainder of this paper I will give the entity definitions required for the LOB tags and give some simple examples of their possible use. <h2>Major Categories The major categories (<q>parts of speech</q>) assumed by the LOB tagging can be treated as values of a feature called <term>category</term>. <xmp font=mono> <![ cdata [ <!ENTITY v "<feature><f.name> category </f.name> <f.struct> verb </f.struct> </feature>" > <!ENTITY adv "<feature><f.name> category </f.name> <f.struct> adverb </f.struct> </feature>" > <!ENTITY n "<feature><f.name> category </f.name> <f.struct> noun </f.struct> </feature>" > <!ENTITY pron "<feature><f.name> category </f.name> <f.struct> pronoun </f.struct> </feature>" > ]]> </xmp> <xmp font=mono> <![ CDATA [ <!ENTITY conj "<feature><f.name> category </f.name> <f.struct> conjunction</f.struct> </feature>" > <!ENTITY num "<feature><f.name> category </f.name> <f.struct> numeral </f.struct> </feature>" > <!-- determiner-pronoun class includes determiners, quantifiers, --> <!-- and qualifiers which can act as determiners or pronominally. --> <!-- AB subclass is pre-qualifiers and pre-quantifiers. --> <!ENTITY AB "<feature><f.name> category </f.name> <f.struct> determiner-pronoun</f.struct> </feature> <feature><f.name> position </f.name> <f.struct> pre-posed</f.struct> </feature>" > ]]> </xmp> <xmp font=mono> <![ CDATA [ <!-- AP subclass is post-determiner/pronoun --> <!ENTITY AP "<feature><f.name> category </f.name> <f.struct> determiner-pronoun</f.struct> </feature> <feature><f.name> position </f.name> <f.struct> post-posed</f.struct> </feature>" > <!ENTITY det "<feature><f.name> category </f.name> <f.struct> determiner </f.struct> </feature>" > <!ENTITY article "<feature><f.name> category </f.name> <f.struct> article </f.struct> </feature>" > ]]> </xmp> <xmp font=mono> <![ CDATA [ <!ENTITY ex "<feature><f.name> category </f.name> <f.struct> existential THERE</f.struct> </feature>" > <!ENTITY prep "<feature><f.name> category </f.name> <f.struct> preposition </f.struct> </feature>" > <!ENTITY adj "<feature><f.name> category </f.name> <f.struct> adjective </f.struct> </feature>" > <!ENTITY qual "<feature><f.name> category </f.name> <f.struct> qualifier </f.struct> </feature>" > ]]> </xmp> <xmp font=mono> <![ CDATA [ <!ENTITY to "<feature><f.name> category </f.name> <f.struct> infinitival TO </f.struct> </feature>" > <!ENTITY uh "<feature><f.name> category </f.name> <f.struct> interjection </f.struct> </feature>" > <!ENTITY wh "<feature><f.name> category </f.name> <f.struct> WH-determiner </f.struct> </feature>" > <!ENTITY not "<feature><f.name> category </f.name> <f.struct> NOT </f.struct> </feature>" > <!ENTITY letter "<feature><f.name> category </f.name> <f.struct> letter </f.struct> </feature>" > <!ENTITY punct "<feature><f.name> category </f.name> <f.struct> punctuation </f.struct> </feature>" > <!ENTITY formula "<feature><f.name> category </f.name> <f.struct> formula </f.struct> </feature>" > <!ENTITY foreign "<feature><f.name> category </f.name> <f.struct> foreign phrase </f.struct> </feature>" > ]]> </xmp> <h2>Lexical Subcategorizations Like most linguists, LOB distinguishes among subgroups of the major categories; these subcategorizations may be expressed in feature-structure notation this way: <note>This section is not complete for all categories.</note> <xmp font=mono> <![ cdata [ <!-- VERBS --> <!-- Lexical class of verb: LOB distinguishes lexical verbs, --> <!-- auxiliaries, and modals. We use +/- AUX, +/- MODAL, and --> <!-- a LEXITEM feature to make these distinctions. --> <!-- An alternative analysis would use a single feature and allow --> <!-- it the values LEXICAL, MODAL, BE, DO, HAVE. This would be --> <!-- very close to the tag construction of LOB, but seems less --> <!-- general. --> <!-- Lexical verbs are -AUX -MOD --> <!ENTITY vb.lex "&v; <feature><f.name>AUX</f.name><minus></feature> <feature><f.name>MOD</f.name><minus></feature>" > <!-- Modal verbs are +AUX +MOD --> <!ENTITY vb.mod "&v; <feature><f.name>AUX</f.name><plus></feature> <feature><f.name>MOD</f.name><plus></feature>" > <!-- Auxiliary verbs are +AUX -MOD and get a LEXITEM feature --> <!ENTITY vb.be "&v; <feature><f.name>AUX</f.name><plus></feature> <feature><f.name>MOD</f.name><minus></feature> <feature><f.name> Lexical item</f.name> <f.struct> be </f.struct> </feature>" > <!ENTITY vb.do "&v; <feature><f.name>AUX</f.name><plus></feature> <feature><f.name>MOD</f.name><minus></feature> <feature><f.name> Lexical item</f.name> <f.struct> do </f.struct> </feature>" > <!ENTITY vb.have "&v; <feature><f.name>AUX</f.name><plus></feature> <feature><f.name>MOD</f.name><minus></feature> <feature><f.name> Lexical item</f.name> <f.struct> have </f.struct> </feature>" > ]]> </xmp> <xmp font=mono> <![ CDATA [ <!-- NOUNS --> <!-- Lexical class of noun: LOB distinguishes common and proper --> <!-- nouns. Common nouns may be marked additionally as capped. --> <!-- Such nouns are common nouns habitually written with --> <!-- uppercase initial, which act syntactically and --> <!-- morphologically as common nouns, not proper nouns. --> <!-- Examples: Jew, Englishman, the English, Urdu, a Thatcherite,--> <!-- an Etonian, Gaullism. --> <!-- Proper nouns may also be marked as locative or titular. --> <!-- The locatives are locative words written with initial cap. --> <!-- E.g. Bay, Bight, Cape, Firth, Hill, \0Is, Island, Isle, --> <!-- Lake, Loch, \0Mt, Mount, Mountain, Peninsula, Plain, Point, --> <!-- \0Rd, Road, \0St, Street, Square, Valley, Wood. Loch_NPL --> <!-- Ness_NP, the Firth_NPL of Forth_NP, the Houses_NPLS of --> <!-- Parliament_NP. --> <!-- An alternative analysis would use a single feature and allow --> <!-- it the values COMMON, PROP, PROPTIT, PROPLOC, CAP. As for --> <!-- verbs, we prefer what appears a more general construction. --> <!-- Some special features are also marked by LOB: nouns of --> <!-- measure (UNIT), cited words, and nouns used adverbially. --> <!-- Common nouns are -PROPER, -CAP and otherwise unmarked. --> <!ENTITY n.com "&n; <feature><f.name>proper </f.name><minus> </feature> <feature><f.name>capitalized </f.name><minus> </feature> <feature><f.name>unit noun </f.name><minus> </feature> <feature><f.name>cited word </f.name><minus> </feature> <feature><f.name>noun-as-adv </f.name><minus> </feature>" > <!-- Capitalized nouns are -PROPER, +CAP. --> <!ENTITY n.cap "&n; <feature><f.name>proper </f.name><minus> </feature> <feature><f.name>capitalized </f.name><plus> </feature> <feature><f.name>unit noun </f.name><minus> </feature> <feature><f.name>cited word </f.name><minus> </feature> <feature><f.name>noun-as-adv </f.name><minus> </feature>" > <!-- Proper nouns are +PROPER, +CAP. LOB apparently does not --> <!-- recognize -CAP proper nouns. (Treatment of 'van' and 'de' --> <!-- should be checked to make sure this is correct.) --> <!ENTITY n.proper "&n; <feature><f.name>proper </f.name><plus> </feature> <feature><f.name>capitalized </f.name><plus> </feature> <feature><f.name>unit noun </f.name><minus> </feature> <feature><f.name>cited word </f.name><minus> </feature> <feature><f.name>noun-as-adv </f.name><minus> </feature>" > <!-- Locative proper nouns have +LOC -TITLE --> <!ENTITY np.loc "&n; <feature><f.name>proper </f.name><plus> </feature> <feature><f.name>capitalized </f.name><plus> </feature> <feature><f.name>locative </f.name><plus> </feature> <feature><f.name>title </f.name><minus> </feature> <feature><f.name>unit noun </f.name><minus> </feature> <feature><f.name>cited word </f.name><minus> </feature> <feature><f.name>noun-as-adv </f.name><minus> </feature>" > <!-- Titles have -LOC +TITLE --> <!ENTITY np.title "&n; <feature><f.name>proper </f.name><plus> </feature> <feature><f.name>capitalized </f.name><plus> </feature> <feature><f.name>locative </f.name><minus> </feature> <feature><f.name>title </f.name><plus> </feature> <feature><f.name>unit noun </f.name><minus> </feature> <feature><f.name>cited word </f.name><minus> </feature> <feature><f.name>noun-as-adv </f.name><minus> </feature>" > <!-- Cited nouns are tagged by LOB as otherwise like common nouns.--> <!ENTITY n.cited "&n; <feature><f.name>proper </f.name><minus> </feature> <feature><f.name>capitalized </f.name><minus> </feature> <feature><f.name>unit noun </f.name><minus> </feature> <feature><f.name>cited word </f.name><plus> </feature> <feature><f.name>noun-as-adv </f.name><minus> </feature>" > <!-- Unit nouns are otherwise like common nouns. --> <!ENTITY n.unit "&n; <feature><f.name>proper </f.name><minus> </feature> <feature><f.name>capitalized </f.name><minus> </feature> <feature><f.name>unit noun </f.name><plus> </feature> <feature><f.name>cited word </f.name><minus> </feature> <feature><f.name>noun-as-adv </f.name><minus> </feature>" > <!-- Adverbial nouns are otherwise like common nouns. --> <!ENTITY n.adverb "&n; <feature><f.name>proper </f.name><minus> </feature> <feature><f.name>capitalized </f.name><minus> </feature> <feature><f.name>unit noun </f.name><minus> </feature> <feature><f.name>cited word </f.name><minus> </feature> <feature><f.name>noun-as-adv </f.name><plus> </feature>" > <!-- Note that LOB does not define an orthogonal set of tags --> <!-- for the various imaginable interactions among these features --> ]]> </xmp> <xmp font=mono> <![ CDATA [ <!-- ADVERBS --> <!-- LOB distinguishes denominative, prepositional, participial, --> <!-- and other (unmarked) adverbs. --> <!-- adv.nom are nominal adverbs, e.g. here, now, ... --> <!ENTITY adv.nom "&pron; <feature><f.name> adv.type </f.name> <f.struct> denominative </f.struct> </feature>" > <!-- adv.prep are adverb homographs of prepositions --> <!ENTITY adv.prep "&pron; <feature><f.name> adv.type </f.name> <f.struct> prepositional </f.struct> </feature>" > <!-- adv.part are adverbial participles like 'back' ... --> <!ENTITY adv.part "&pron; <feature><f.name> adv.type </f.name> <f.struct> participial </f.struct> </feature>" > <!ENTITY adv.com "&pron; <feature><f.name> adv.type </f.name> <f.struct> unmarked </f.struct> </feature>" > ]]> </xmp> <xmp font=mono> <![ CDATA [ <!-- PRONOUNS --> <!-- Lexical class of pronoun: LOB distinguishes nominal --> <!-- pronouns (anybody, anyone, anything, everybody, ...), --> <!-- determiners, personal pronouns, and reflexive pronouns. --> <!ENTITY pro.nom "&pron; <feature><f.name> pron.type </f.name> <f.struct> nominal </f.struct> </feature>" > <!-- Possessive pronominal determiners include "my" etc. --> <!ENTITY pro.det "&pron; <feature><f.name> pron.type </f.name> <f.struct> determiner </f.struct> </feature>" > <!ENTITY pro.pers "&pron; <feature><f.name> pron.type </f.name> <f.struct> personal </f.struct> </feature>" > <!ENTITY pro.refl "&pron; <feature><f.name> pron.type </f.name> <f.struct> reflexive </f.struct> </feature>" > ]]> </xmp> <xmp font=mono> <![ CDATA [ <!-- CONJUNCTIONS --> <!-- LOB distinguishes coordinating and subordinating conj. --> <!ENTITY CC "&conj; <feature><f.name>subordinating</f.name><minus> </feature>" > <!ENTITY CS "&conj; <feature><f.name>subordinating</f.name><plus> </feature>" > ]]> </xmp> <xmp font=mono> <![ CDATA [ <!-- NUMERALS --> <!-- LOB distinguishes cardinals and numerals. Other --> <!-- distinctions are made, and may be found below. --> <!ENTITY num.card "# <feature><f.name>ordinal</f.name><minus> </feature>" > <!ENTITY num.ord "# <feature><f.name>ordinal</f.name><plus> </feature>" > ]]> </xmp> <xmp font=mono> <![ CDATA [ <!-- PREPOSED PRONOUN-DETERMINER --> <!-- LOB distinguishes qualifiers and quantifiers --> <!ENTITY AB.qual "&AB; <feature><f.name> det.type </f.name> <f.struct> qualifier </f.struct> </feature>" > <!ENTITY AB.quant "&AB; <feature><f.name> det.type </f.name> <f.struct> quantifier </f.struct> </feature>" > ]]> </xmp> <xmp font=mono> <![ CDATA [ <!-- ADJECTIVES --> <!-- LOB distinguishes attributive-only adjectives from those --> <!-- which can be either attributive or predicative. --> <!ENTITY jj.attr "<feature><f.name>attrib-only</f.name><plus> </feature>" > <!ENTITY jj.pred "<feature><f.name>attrib-only</f.name><minus> </feature>" > ]]> </xmp> <xmp font=mono> <![ CDATA [ <!-- WH-pronouns --> <!-- LOB distinguishes determiners, pronouns, and relatives. --> <!-- The first two can be marked with the CATEGORY feature --> <!-- already defined, (assuming we don't mind having two values --> <!-- for the same feature). The last requires a RELATIVE feature --> <!ENTITY rel.yes "<feature><f.name>relative</f.name><plus> </feature>" > <!ENTITY rel.no "<feature><f.name>relative</f.name><minus> </feature>" > ]]> </xmp> <xmp font=mono> <![ CDATA [ <!-- PUNCTUATION --> <!-- LOB distinguishes !, open and close bracket, open and --> <!-- close quote, dash, comma, stop, ellipsis, colon, semicolon, --> <!-- and question mark. --> <!ENTITY p.bang "<feature><f.name>character</f.name> <f.struct> ! </f.struct> </feature>" > <!ENTITY p.openbr "<feature><f.name>character</f.name> <f.struct> ( </f.struct> </feature>" > <!ENTITY p.closbr "<feature><f.name>character</f.name> <f.struct> ) </f.struct> </feature>" > <!ENTITY p.openq "<feature><f.name>character</f.name> <f.struct> &ldquo </f.struct> </feature>" > <!ENTITY p.closq "<feature><f.name>character</f.name> <f.struct> &rdquo </f.struct> </feature>" > <!ENTITY p.dash "<feature><f.name>character</f.name> <f.struct> &dash </f.struct> </feature>" > <!ENTITY p.comma "<feature><f.name>character</f.name> <f.struct> , </f.struct> </feature>" > ]]> </xmp> <xmp font=mono> <![ CDATA [ <!ENTITY p.stop "<feature><f.name>character</f.name> <f.struct> . </f.struct> </feature>" > <!ENTITY p.ellips "<feature><f.name>character</f.name> <f.struct> &hellip </f.struct> </feature>" > <!ENTITY p.colon "<feature><f.name>character</f.name> <f.struct> : </f.struct> </feature>" > <!ENTITY p.semi "<feature><f.name>character</f.name> <f.struct> ; </f.struct> </feature>" > <!ENTITY p.query "<feature><f.name>character</f.name> <f.struct> ? </f.struct> </feature>" > ]]> </xmp> <h2>Number, Person, Case, Gender, and Other Grammatical Features Features of traditional grammar like number, gender, and case appear in many of the LOB tags. <note>This section is not complete for all categories.</note> <xmp font=mono> <![ cdata [ <!-- Number: English words marked for number are sing or plur. --> <!-- This feature is used for verbs, nouns, pronouns, and --> <!-- numerals. --> <!ENTITY sing "<feature><f.name> number </f.name> <f.struct> singular </f.struct> </feature>" > <!ENTITY plur "<feature><f.name> number </f.name> <f.struct> plural </f.struct> </feature>" > <!ENTITY num.no "<feature><f.name> number </f.name> <f.struct> unmarked </f.struct> </feature>" > <!-- We define "unmarked" as a placeholder, so that we can --> <!-- specify that a given word is not marked for number, rather --> <!-- than either leaving it out or specifying an exhaustive --> <!-- list of alternatives. --> ]]> </xmp> <xmp font=mono> <![ CDATA [ <!-- Person: English words marked for person are 1st, 2nd, 3rd. --> <!-- This feature is used for verbs and pronouns. --> <!-- We distinguish IMPERSONAL as a value for pronouns and --> <!-- UNMARKED as a value for verbs which are not marked. --> <!-- Participles are not marked for person and have their own --> <!-- binary feature. --> <!ENTITY p1 "<feature><f.name> person </f.name> <f.struct> 1st </f.struct> </feature>" > <!ENTITY p2 "<feature><f.name> person </f.name> <f.struct> 2nd </f.struct> </feature>" > <!ENTITY p3 "<feature><f.name> person </f.name> <f.struct> 3rd </f.struct> </feature>" > <!ENTITY impers "<feature><f.name> person </f.name> <f.struct> none </f.struct> </feature>" > <!-- We might say MINUS but PERSON is not binary so we don't --> <!ENTITY per.no "<feature><f.name> person </f.name> <f.struct> unmarked </f.struct> </feature>" > <!ENTITY partic "<feature><f.name> participle </f.name> <plus> </feature>" > <!ENTITY par.no "<feature><f.name> participle </f.name> <minus> </feature>" > ]]> </xmp> <xmp font=mono> <![ CDATA [ <!-- Tense: English tenses are present and preterite. --> <!-- This feature is used for verbs. --> <!-- This omits the compound tenses because they are analytic in --> <!-- English and we are worrying only about word tags. --> <!-- To allow for the compound tenses, e.g. for phrase tagging, --> <!-- we add a future tense and introduce a +/- PERFECTIVE feature --> <!-- and perform the Cartesian product. --> <!ENTITY present "<feature><f.name> tense </f.name> <f.struct> present </f.struct> </feature>" > <!ENTITY preterite "<feature><f.name> tense </f.name> <f.struct> preterite </f.struct> </feature>" > <!-- The features above are all that are needed for LOB tags. --> <!-- The following features are added proleptically for other --> <!-- uses. --> <!ENTITY future "<feature><f.name> tense </f.name> <f.struct> future </f.struct> </feature>" > <!ENTITY presperf "<feature><f.name> tense </f.name> <f.struct> present </f.struct> </feature> <feature><f.name> perfective </f.name><plus> </feature>" > <!ENTITY pluperf "<feature><f.name> tense </f.name> <f.struct> preterite</f.struct> </feature> <feature><f.name> perfective </f.name><plus> </feature>" > <!ENTITY futperf "<feature><f.name> tense </f.name> <f.struct> future </f.struct> </feature> <feature><f.name> perfective </f.name><plus> </feature>" > ]]> </xmp> <xmp font=mono> <![ CDATA [ <!-- Degree: English modifiers are pos, comp, or sup. --> <!-- This feature is used for adverbs and adjectives. --> <!ENTITY pos "<feature><f.name> degree </f.name> <f.struct> positive </f.struct> </feature>" > <!ENTITY comp "<feature><f.name> degree </f.name> <f.struct> comparative </f.struct> </feature>" > <!ENTITY sup "<feature><f.name> degree </f.name> <f.struct> superlative </f.struct> </feature>" > ]]> </xmp> <xmp font=mono> <![ CDATA [ <!-- Case: English words marked for case are nom, gen, or acc. --> <!-- This feature is used for adverbs (sometimes marked GEN), --> <!-- nouns (NOM or GEN), pronouns, numerals (sometimes GEN), --> <!-- determinerr-pronouns, and determiners. --> <!-- NOM is nominative or "subjective" case. We use NOM not SUB --> <!-- because we hope to generalize to other IE languages later. --> <!ENTITY nom "<feature><f.name> case </f.name> <f.struct> nominative </f.struct> </feature>" > <!ENTITY gen "<feature><f.name> case </f.name> <f.struct> genitive </f.struct> </feature>" > <!-- ACC is accusative or "objective" case. We use ACC not OBJ --> <!-- or OBLIQUE to make other IE languages easier later. --> <!ENTITY acc "<feature><f.name> case </f.name> <f.struct> accusative </f.struct> </feature>" > <!ENTITY case.no "<feature><f.name> case </f.name> <f.struct> unmarked </f.struct> </feature>" > ]]> </xmp> <xmp font=mono> <![ CDATA [ <!-- Gender: English words marked for gender are masculine, --> <!-- feminine, neuter, or common. We add unmarked just in case. --> <!-- This feature is used for personal pronouns (3rd-person only) --> <!ENTITY masc "<feature><f.name> gender </f.name> <f.struct> masculine </f.struct> </feature>" > <!ENTITY fem "<feature><f.name> gender </f.name> <f.struct> feminine </f.struct> </feature>" > <!ENTITY neut "<feature><f.name> gender </f.name> <f.struct> neuter </f.struct> </feature>" > <!-- Common gender in English is masculine or feminine. Other --> <!-- languages might need to define it as a distinct value. --> <!-- Danish, for instance? --> <!ENTITY common "<feature><f.name> gender </f.name> <f.s.OR> <f.struct> masculine </f.struct> <f.struct> feminine </f.struct> </f.s.OR> </feature>" > <!ENTITY gend.no "<feature><f.name> gender </f.name> <f.struct> unmarked </f.struct> </feature>" > ]]> </xmp> <h2>Miscellaneous Features Some other features are not recognizable as traditional grammatical notions. <note>This section is not complete for all categories.</note> <xmp font=mono> <![ cdata [ <!-- LOB distinguishes cardinals with the value 1, and others. --> <!-- Because LOB tokenizes on spaces, hyphenated pairs are also --> <!-- distinguished, here with a COUNT feature whose value is the --> <!-- number of numerals in the unit being tagged. --> <!ENTITY num.one "<feature><f.name>unitary value</f.name><plus> </feature>" > <!ENTITY num.plur "<feature><f.name>unitary value</f.name><minus> </feature>" > <!ENTITY num.pair "<feature><f.name> count </f.name> <f.struct> 2 </f.struct> </feature>" > ]]> </xmp> <xmp font=mono> <![ CDATA [ <!-- LOB distinguishes the word BOTH from other ABNs because it --> <!-- can serve as a double conjunction. --> <!-- No distinction is made among uses of BOTH. --> <!-- Determiners also distinguish double and single conjunctions. --> <!ENTITY conj.dbl "<feature><f.name>double-conj </f.name><plus> </feature>" > <!ENTITY c.dbl.no "<feature><f.name>double-conj </f.name><minus> </feature>" > ]]> </xmp> <xmp font=mono> <![ CDATA [ <!-- LOB distinguishes various words which can be pre-posed, --> <!-- post-posed, or both. --> <!ENTITY pre.yes "<feature><f.name>preposable </f.name><plus> </feature>" > <!ENTITY pre.no "<feature><f.name>preposable </f.name><minus> </feature>" > <!ENTITY post.yes "<feature><f.name>postposable </f.name><plus> </feature>" > <!ENTITY post.no "<feature><f.name>postposable </f.name><minus> </feature>" > ]]> </xmp> <h1>Combinations of Primitives We can define the verbal tags of the LOB scheme fully as follows: <xmp font=mono> <![ cdata [ <!-- Simple verbs: VB, VBD, VBG, VBN, VBZ --> <!ENTITY VB "&vb.lex; &num.no; &per.no; &par.no; &present;" > <!ENTITY VBD "&vb.lex; &num.no; &per.no; &par.no; &preterite;" > <!ENTITY VBG "&vb.lex; &num.no; &partic; &present;" > <!ENTITY VBN "&vb.lex; &num.no; &partic; &preterite;" > <!ENTITY VBZ "&vb.lex; &sing; &p3; &par.no; &present;" > ]]> </xmp> <xmp font=mono> <![ CDATA [ <!-- Modal verbs: MD --> <!ENTITY MD "&vb.mod; &num.no; &per.no; &par.no; &present;" > <!-- BE Auxiliaries: BE, BED, BEDZ, BEG, BEM, BEN, BER, BEZ --> <!-- Some might wish for a +/- INFINITE feature to distinguish --> <!-- infinitives; except for BE, however, the English infinitive --> <!-- is always the same as the form unmarked for person and num. --> <!-- And BE marks all forms for per/num. So we don't need INFIN. --> <!ENTITY BE "&vb.be; &num.no; &per.no; &par.no; &present;" > <!ENTITY BED "&vb.be; &num.no; &per.no; &par.no; &preterite;" > <!ENTITY BEDZ "&vb.be; &sing; &p3; &par.no; &preterite;" > <!ENTITY BEG "&vb.be; &num.no; &per.no; &partic; &present;" > <!ENTITY BEM "&vb.be; &sing; &p1; &par.no; &present;" > <!ENTITY BEN "&vb.be; &num.no; &per.no; &partic; &preterite;" > <!ENTITY BER "&vb.be; &plur; &per.no; &par.no; &present;" > <!ENTITY BEZ "&vb.be; &sing; &p3; &par.no; &present;" > ]]> </xmp> <xmp font=mono> <![ CDATA [ <!-- DO Auxiliaries: DO, DOD, DOZ --> <!ENTITY DO "&vb.do; &num.no; &per.no; &par.no; &present;" > <!ENTITY DOD "&vb.do; &num.no; &per.no; &par.no; &preterite;" > <!ENTITY DOZ "&vb.do; &sing; &p3; &par.no; &present;" > ]]> </xmp> Note that in VBD, DOD, and BED, the string <q><entity>num.no</entity> <entity>per.no</entity></q> says, correctly, that the verbs in question are not marked for person and number. In the case of VBD and DOD, however, this means <term>person</term> can be 1st, 2nd, or 3rd, and <term>number</term> can be singular or plural, in any combination; in the case of BED, it means that <term>person</term> and <term>number</term> can be any combination except 3rd-person singular. This is a simple fact of English grammar. Our choice of expression, modeled on the choices made in the LOB tag scheme, places the burden for handling this fact on the grammar and the application program; one could also change the definitions of these entities to make it explicit here. This facility effectively allows us to specify, in our entity declarations, just what we mean by a given part-of-speech classification, and thus represents an advantage over the naive approach presented earlier. All the other LOB tags can be similarly defined; completion of the definition is for now left to the reader as an exercise. </body> <appendix> <h1>Definition of all LOB tags in feature-structure notation </appendix> <!-- appendix to EDW12, part of speech tagging --> <appendix> <h1>Summary of Features <h2>Binary Features <xmp> +/-AUX auxiliary /* verb */ +/-MOD modal /* verb */ +/-PROP proper /* noun */ +/-CAP capitalized /* noun, adjective */ +/-SUB subordinating /* conjunction */ +/-ORD ordinal /* number */ +/-PERF perfective /* tensed verbs */ +/-PART participle /* verbs */ +/-LOC locative term /* proper nouns */ +/-TITL title /* proper nouns */ +/-UNIT unit-term /* noun */ +/-CITE cited-word /* noun */ +/-ATTR attributive /* adjectives */ +/-PRED predicative /* adjectives -- redundant? */ +/-DBLC double-conj /* determiner/pronouns, and determiners */ +/-PRE preposable /* ? may precede its head */ +/-POST postposable /* ? may follow its head */ +/-PTCL particle /* ? adverb ?==? inverse of +/-takes-complement? */ +/-REL relative /* pronouns -- alternative to pron.type */ +/-PERS personal /* pronouns -- alternative to pron.type */ +/-REFL reflexive /* pronouns -- alternative to pron.type */ +/-WH WH-word /* pronouns, adverbs */ /* cross-category usages: */ +/-pseudo-adverb /* i.e. can appear in adverbial positions -- noun */ +/-pseudo-noun /* i.e. can appear in noun positions -- adverb */ +/-also-prep /* i.e. is also a preposition -- adverb */ +/-DET /* i.e. is a determiner -- pronoun */ +/-exnoun /* formed from a noun -- pronoun (anybody ...) */ </xmp> <h2>N-way Features <xmp> /* Base categories */ CAT category = verb | adverb | noun | pronoun | conjunction | number | determiner | article | THERE | preposition | adjective | qualifier | TO | interjection | [WH] | NOT | letter | punctuation | formula | foreign /* Sub-categorization */ LEX lexitem = (string) /* verbs */ CHAR character = (string) /* punctuation */ CNT count = (integer) /* numbers -- for pairs, ranges */ [ATYP adv.type = nominal | preposition | particle | unmarked ] [prefer binary +/-pseudo-noun +/-also.prep +/-ptcl ] [PTYP pron.type = nominal | determiner | personal | reflexive ] [prefer binary +/-exnoun +/-det +/-pers +/-refl +/-wh +/-rel ] DTYP det.type = qualifier | quantifier /* Categories of Traditional Grammar */ NUM number = singular | plural | unmarked PER person = 1st | 2nd | 3rd | none | unmarked TEN tense = present | preterite | future DEG degree = positive | comparative | superlative CASE case = nominative | genitive | accusative | unmarked GEN gender = masculine | feminine | neuter | unmarked [ | common ] </xmp> <h2>Cross-Category Groupings in LOB <xmp> noun-but-can-serve-as-adverb adverb-but-can-serve-as-noun (as prepositional object) adverb-or-preposition (RI) adverb-or-preposition-without-object (RP) </xmp> <h1>Definitions of LOB, Brown, and Lancaster Tags <h2>LOB tags <xmp> Summary of tags (with spaces inserted for clarity): 1 A B L pre-qualifier (quite, rather, such) 7.12 CAT=(DET|PRON), DTYP=QUALIFIER, +PRE 2 A B N pre-quantifier (all, half) 7.12 CAT=(DET|PRON), DTYP=QUANTIFIER, +PRE 3 A B X pre-quantifier/pronoun/double conjunction (both) CAT=(DET|PRON), DTYP=QUANTIFIER, +PRE, +DBLC 4 A P post-determiner/pronoun. CAT=(DET|PRON), +POST 5 A P $ other's CAT=(DET|PRON), +POST, CASE=GEN 6 A P S others CAT=(DET|PRON), +POST, NUM=PLURAL 7 A P S $ others' CAT=(DET|PRON), +POST, CASE=GEN, NUM=PLURAL 8 A T article, singular (a, an, every) 7.12 CAT=ARTICLE, NUM=SINGULAR 9 A T I article, sing or plural (the, no) 7.12 CAT=ARTICLE, NUM=UNMARKED 10 BE be CAT=VERB +AUX -MOD -PART TEN=PRES LEX=BE 11 BE D were CAT=VERB +AUX -MOD -PART TEN=PRET LEX=BE 12 BE D Z was CAT=VERB +AUX -MOD -PART TEN=PRET NUM=SING PER=3 LEX=BE 13 BE G being CAT=VERB +AUX -MOD +PART TEN=PRES LEX=BE 14 BE M am, 'm CAT=VERB +AUX -MOD -PART TEN=PRES NUM=SING PER=1 LEX=BE 15 BE N been CAT=VERB +AUX -MOD +PART TEN=PRET LEX=BE 16 BE R are, 're CAT=VERB +AUX -MOD -PART TEN=PRES NUM=UNMKD PER=UNMKD LEX=BE 17 BE Z is, 's CAT=VERB +AUX -MOD -PART TEN=PRES NUM=SING PER=3 LEX=BE 18 CC coordinating conjunction (and, and/or, but, nor, only, or, yet) CAT=CONJ -SUB 19 CD 2, 3, two, three, hundred, thousand, dozen, zero - 7.17 20 CD $ cardinal + genitive 21 CD -CD hyphenated pair of cardinals 7.17 22 CD 1 one, 1 7.17 23 CD 1 $ one's 24 CD 1 S ones 25 CD S cardinal + plural (tens, millions, dozens, etc.) 26 CS subordinating conjunction (after, although, etc.) 7.14-15 CAT=CONJ +SUB 27 DO do 7.5 CAT=VERB +AUX -MOD -PART TEN=PRES NUM=UNMKD PER=UNMKD LEX=DO 28 DO D did CAT=VERB +AUX -MOD -PART TEN=PRET NUM=UNMKD PER=UNMKD LEX=DO 29 DO Z does CAT=VERB +AUX -MOD -PART TEN=PRES NUM=SING PER=3 LEX=DO 30 DT singular detemrinal (another, each, that, this) 7.12 31 DT $ singular determiner + genitive (another's) 32 DT I singular or plural determiner (any, enough, some) 33 DT S plural determiner (those, these) 34 DT X determiner/double conjunction (either, neither) 7.12 35 EX existential 'there' 36 HV have 7.5 CAT=VERB +AUX -MOD -PART TEN=PRES NUM=UNMKD PER=UNMKD LEX=HAVE 37 HV D had, 'd CAT=VERB +AUX -MOD -PART TEN=PRET NUM=UNMKD PER=UNMKD LEX=HAVE 38 HV G having CAT=VERB +AUX -MOD +PART TEN=PRES LEX=HAVE 39 HV N had (past participle) CAT=VERB +AUX -MOD +PART TEN=PRET LEX=HAVE 40 HV Z has, 's CAT=VERB +AUX -MOD -PART TEN=PRES NUM=SING PER=3 LEX=HAVE 41 IN preposition (about, above, etc.) 7.13, 7.15 42 JJ adjective 7.3-4, 7.8-9, 7.11 43 JJ B attributive-only adjective (chief, main, entire, etc.) 44 JJ R comparative adjective 7.9, 7.11 45 JJ T superlative adjective 7.9, 7.11 46 J NP adj with word-initial capital (English, German, etc.) 47 MD modal auxiliary CAT=VERB +AUX +MOD TEN=PRES NUM=UNMKD PER=UNMKD 48 N C cited word 7.23 NC CAT=NOUN N=SING CASE=NOM -PROP -CAP -UNIT +CITE 49 N N noun, sg, common 7.4, 7.6, 7.7 NN CAT=NOUN N=SING CASE=NOM -PROP -CAP -UNIT -CITE 50 N N $ noun, sg, common, + genitive 7.6 NN$ CAT=NOUN N=SING CASE=GEN -PROP -CAP -UNIT -CITE 51 N N P noun, sg, common, with word-initial capital 7.7 NNP CAT=NOUN N=SING CASE=NOM -PROP +CAP -UNIT -CITE 52 N N P $ noun, sg, common, with word-init cap and genitive NNP$ CAT=NOUN N=SING CASE=GEN -PROP +CAP -UNIT -CITE 53 N N P S noun, pl, common, with word-init cap NNPS CAT=NOUN N=PLUR CASE=NOM -PROP +CAP -UNIT -CITE 54 N N P S $ noun, pl, common, with word-init cap and genitive NNS$ CAT=NOUN N=PLUR CASE=GEN -PROP +CAP -UNIT -CITE 55 N N S noun, pl, common 7.6, 7.7 NNS CAT=NOUN N=PLUR CASE=NOM -PROP -CAP -UNIT -CITE 56 N N S $ noun, pl, common, + genitive NNS$ CAT=NOUN N=PLUR CASE=GEN -PROP -CAP -UNIT -CITE 57 N N U noun, abbrev unit of measurement (hr., lb., etc.) NNU CAT=NOUN N=SING CASE=NOM -PROP -CAP +UNIT -CITE 58 N N U S noun, abbrev unit of measurement, pl (gns, yds, etc.) NNUS CAT=NOUN N=PLUR CASE=NOM -PROP -CAP +UNIT -CITE 59 N P noun, sg, proper 7.7 NP CAT=NOUN N=SING CASE=NOM +PROP +CAP -UNIT -CITE -LOC -TITL -PS.ADV 60 N P $ noun, sg, proper, + genitive NP$ CAT=NOUN N=SING CASE=GEN +PROP +CAP -UNIT -CITE -LOC -TITL -PS.ADV 61 N P L noun, sg, locative with word-initial cap (Abbey, NPL CAT=NOUN N=SING CASE=NOM +PROP +CAP -UNIT -CITE +LOC -TITL -PS.ADV 62 N P L $ ditto + genitive NPL$ CAT=NOUN N=SING CASE=GEN +PROP +CAP -UNIT -CITE +LOC -TITL -PS.ADV 63 N P L S noun, pl, locative with word-initial cap NPLS CAT=NOUN N=PLUR CASE=NOM +PROP +CAP -UNIT -CITE +LOC -TITL -PS.ADV 64 N P L S $ ditto + genitive NPLS$ CAT=NOUN N=PLUR CASE=GEN -PROP +CAP -UNIT -CITE +LOC -TITL -PS.ADV 65 N P S noun, pl, proper 7.7 NPS CAT=NOUN N=PLUR CASE=NOM +PROP +CAP -UNIT -CITE -LOC -TITL -PS.ADV 66 N P S $ noun, pl, proper, + genitive NPS$ CAT=NOUN N=PLUR CASE=GEN +PROP +CAP -UNIT -CITE -LOC -TITL -PS.ADV 67 N P T noun, sg, titular with word-initial cap NPT CAT=NOUN N=SING CASE=NOM +PROP +CAP -UNIT -CITE -LOC +TITL -PS.ADV 68 N P T $ noun, sg, titular, cap, + genitive NPT$ CAT=NOUN N=SING CASE=GEN +PROP -CAP -UNIT -CITE -LOC +TITL -PS.ADV 69 N P T S noun, pl, titular, cap NPTS CAT=NOUN N=PLUR CASE=NOM +PROP -CAP -UNIT -CITE -LOC +TITL -PS.ADV 70 N P T S $ noun, pl, titular, cap, + genitive NPTS$ CAT=NOUN N=PLUR CASE=GEN +PROP -CAP -UNIT -CITE -LOC +TITL -PS.ADV 71 N R noun, sg, adverbial (Jan, Feb, east, today, NR CAT=NOUN N=SING CASE=NOM -PROP -CAP -UNIT -CITE -LOC -TITL +PS.ADV 72 N R $ noun, sg, adverbial + genitive NR$ CAT=NOUN N=SING CASE=GEN -PROP -CAP -UNIT -CITE -LOC -TITL +PS.ADV 73 N R S noun, pl, adverbial NRS CAT=NOUN N=PLUR CASE=NOM -PROP -CAP -UNIT -CITE -LOC -TITL +PS.ADV 74 N R S $ noun, pl, adverbial + genitive NRS$ CAT=NOUN N=PLUR CASE=GEN -PROP -CAP -UNIT -CITE -LOC -TITL +PS.ADV 75 OD ordinal (1st, 2nd, first, ...) 7.17 76 OD $ ordinal + genitive 77 P N nominal pron (anybody, anyone, anything; everybody, 78 P N $ nominal pron + genitive 79 P P $ poss determiner (my, your, etc.) 7.12 80 P P $$ poss pron (mine, yours, etc.) 81 P P 1 A pers pron, 1st pers sing nom (I) 82 P P 1 A S pers pron, 1st pers plur nom (we) 83 P P 1 O pers pron, 1st pers sing acc (me) 84 P P 1 O S pers pron, 1st pers plur acc (us) 85 P P 2 pers pron, 2nd pers (you, thou, thee, ye) 86 P P 3 pers pron, 3rd pers sing nom + acc (it) 87 P P 3 A pers pron, 3rd pers sing nom (he, she) 88 P P 3 A S pers pron, 3rd pers plur nom (they) 89 P P 3 O pers pron, 3rd pers sing acc (him, her) 90 P P 3 O S pers pron, 3rd pers plur acc (them, 'em) 91 P P L refl pron, sg 92 P P L S refl pron, pl; reciprocal pron 93 QL qualifier (as, awfully, less, more, so, too, very, ...) 94 QL P post-qualifier (enough, indeed) 95 R B adverb 7.10-7.11 CAT=ADV DEG=POS CASE=UNMKD -PSEUDO.NOUN -ALSO.PREP -WH 96 R B $ adverb + genitive (else's) CAT=ADV CASE=GEN -PSEUDO.NOUN -ALSO.PREP -WH 97 R B R comparative adverb CAT=ADV DEG=COMP CASE=UNMKD -PSEUDO.NOUN -ALSO.PREP -WH 98 R B T superlative adverb CAT=ADV DEG=SUP CASE=UNMKD -PSEUDO.NOUN -ALSO.PREP -WH 99 R I adverb (homograph of preposition: below, near, ...) CAT=ADV DEG=POS CASE=UNMKD -PSEUDO.NOUN +ALSO.PREP -PTCL -WH 100 R N nominal adverb (here, now, there, then) 7.10 CAT=ADV DEG=POS CASE=UNMKD +PSEUDO.NOUN -ALSO.PREP -WH 101 R P adverbial particle (back, down, off, ...) 7.10, 7.13 CAT=ADV DEG=POS CASE=UNMKD -PSEUDO.NOUN +ALSO.PREP +PTCL -WH 102 TO infinitival 'to' CAT=TO 103 UH interjection CAT=INTERJECTION 104 VB base form of verb (uninflected present tense, imper) CAT=VERB -AUX -MOD -PART TEN=PRES NUM=UNMKD PER=UNMKD 105 VB D past tense of verb 7.3 CAT=VERB -AUX -MOD -PART TEN=PRET 106 VB G present participle, gerund 7.4 CAT=VERB -AUX -MOD +PART TEN=PRES 107 VB N past participle 7.3 CAT=VERB -AUX -MOD +PART TEN=PRET 108 VB Z 3d person sg CAT=VERB -AUX -MOD -PART TEN=PRES NUM=SING PER=3 109 W DT WH-determiner (what, whatever, interrogative 110 W DT R WH-determiner, relative (which) 7.16 111 W P WH-pron, interrogative, nom+acc (who, whoever) 112 W P $ WH-pron, interrogative, gen (whose) 113 W P $ R WH-pron, relative, gen (whose) 114 W P A WH-pron, nom (whosoever) 115 W P O WH-pron, interrogative, acc (whom, whomsoever) 116 W P O R WH-pron, relative, acc (whom) 117 W P R WH-pron, relative, nom+acc (that, relative who) 7.14, 118 W RB WH-adverb (how, when, ...) 7.16 119 XNOT 'not' 120 ZZ letter 121 ! exclamation mark 122 &FO formula 7.22 123 &FW foreign word 7.21 124 ( left bracket (round or square) 125 ) right bracket (round or square) 126 *' begin quote (single or double) 2.6 127 **' end quote (single or double 2.6 128 *- dash 7.24 129 , comma 7.24 130 . full stop 7.24 131 ... ellipsis 132 : colon 7.24 133 ; semicolon 7.24 134 ? question mark </xmp> </appendix> </gdoc>