To: Michael Sperberg-McQueen 312 996-2477 -2981 Subject: Word class tags - summary Date: Wed, 09 Jan 91 13:00:50 EST From: anderson@sapir.COG.JHU.EDU Michael - I've summarized our results on how to tag words below. Well, it's not just a summary, because I've managed to sneak in a few changes that occur to me as I go over the whole things. This list includes stuff from your notes, and also from the discussion Geoff and Nicoletta and I had while you, Gary and Terry were solving the real problems. It seems to me that if the formalism supports it, it's worth imposing a bit more structure on this: in particular, adding properties that are at the same level as "wordclass", like "form" below. So I've divided this summary into three parts: (1) a set of properties that recur within the specification of multiple classes, though generally subject within any given langauge to some class-specific restrictions implemented via the prologue mechanism you were describing to me; (2) the overall tag for a word, which at present contains an internally-structured class and a specification of word-form type; and (3) the content of the individual word classes. Of course I didn't prepare this in the form of an SGML-tagged document.....hope you can read it anyway. I'll be in touch again when I have a chance to go through grammars of Danish and Russian with this scheme in mind. Steve ------------------(included file tei_wordtags.txt)------------------------ TEI tags for words 1. Some properties that recur in more than one word class: Person = 1st | 2nd | 2Polite | 2Familiar | 3 | indefinite | unspecified [add 1Inclusive | 1exclusive ?] Number = sg | pl | unspecified [add | du ?] Gender = masculine | feminine | neuter | common | invariant | unspecified Animacy = animate | inanimate | unspecified Case = nom | gen | dat | acc | voc | instr | loc | subj | obj | obl | unspecified Degree = positive | comparative | superlative Definiteness = definite | indefinite | unspecified [add specific | nonspecific | generic ?] Deixis = proximal | distal | remote | unspecified Affect = diminutive | augmentative | pejorative 2. An overall tag for words: Word Wordclass = Noun | Pronoun | Adjective | Verb | Adverb | Preposition | Coordinator | Subordinator | Particle | Interjection | Punctuation Form = phrase | compound | full | reduced | clitic | proclitic | enclitic 3. Properties of the various classes: a. Noun: number gender animacy case proper = proper | common definiteness [add deixis for French "cette femme-ci/l`a" ?] affect [ What's "w.i.c.", and does it still belong here? -sra] b. Pronoun: person number gender case deixis type = personal | demonstrative | indefinite | expletive | | relative | interrogative | possessive | reflexive | reciprocal | partitive | locative | propredicate| zero pro-form = emphatic | disjunctive | conjunctive c. Adjective: number gender animacy case degree numeral = cardinal | ordinal pronominal = possessive =[ person gender number ] | relative | interrogative | demonstrative | indefinite declension = strong | weak | long | short d. Verb: agreement = [person number gender case] (we omitted this, but participle agreement may need it - sra) tense = present | past | future | imperfect | aorist | perfect | imperfect | pluperfect | future-perfect | conditional aspect = perfective | imperfective mood = indicative | subjunctive | imperative | conditional voice = active | middle | passive | mediopassive polarity = positive | negative verb-form = finite | infinitive | gerund | supine | participle = [ participle-type = present | past | future ] p-incorporation = [ do = [] io = [] partitive locative ] verb-type = lexical | auxiliary | modal | copula e.Adverb: degree deixis directionality = static | dynamic (e.g. Danish ind vs. inde) interrogative = + | - [add type = locative | temporal | manner | .... ?] [add incorporation = something, for German dabei, darauf, herein, etc.?] f.Preposition: art-incorporation = [ number (for Italian agli, gender French du, case German zum, etc.) definiteness] pro-incorporation = [ number (for Portugese, maybe also gender German damit, etc. ? case ] if so, need more properties) g. Coordinator: (no internal structure) h. Subordinator: (no internal structure - except that some Dutch and spoken German dialects have agreement with the lower subject marked on some complementizers.....) i. Particle: (any uninflectable item that doesn't belong somewhere else - no general internal structure motivated) j. Interjection: (no internal structure) k. Punctuation: orientation = open | close | matched | unary --------------------- Existing TEI tags , , seem to be usable for those objects without extension.