Notes on the Encoding of Linguistic Analysis D. Terence Langendoen Department of Linguistics University of Arizona Tucson, AZ 85721 USA E-mail: langendt@arizvm1 (bitnet) Phone: (602) 621-6898 18 January 1990 Very preliminary version 1 SAMPLE MARKUP POSSIBILITIES FOR THE ENGLISH WORD 'UNPACKED' Assuming a fully specified lexicon and (word-formation) grammar, here are schematic markups for the three interpretations of the English word 'unpacked', which assumes that none of these are entered in the lexicon, but that there are entries for the following: 'pack' 'unpack', 'un' (two different ones), 'ed' (two different ones). 1. unpacked The category tag (here ) also contains attributes identifying its argument structure and selectional restrictions. Rule un1r is the rule for forming 'negative adjectives' from adjectives. The rule has 2 parts, the prefix identified in the lexicon as un1l, and something which itself the result of an analysis. The lexical item un1l is prefixed to an adjective which in this case is composed by the rule ed2r, which suffixes the form lexically identified as ed2l to the lexical entry identified as pack3l the entry in fact may be a subentry under the lemma 'pack' to be identified by a mechanism such as Gary Simons suggests in AIW12. No attributes are provided here though they could be. 2. Rule ed3r forms passive past participles from verb stems I assume for purposes of illustration that the rule is distinct both from the rule that forms active past participles and from the rule that forms adjec- tives from verbs with the same morphology. I assume for this illustration that the verb 'unpack' is listed directly in the lexicon and does not have to be formed by a morphological rule from the prefix un2l and the verb stem pack3l. 3. unpacked I use parentheses to enclose list items, pending clarification as to the correct SGML syntax to use in this situation. The three preceding analyses could be provided together with or without a ranking provided. Presumably, id's should be provided for each of the constituent analysis tags. If the ranking is omitted, then it is assumed that the alternatives are equally ranked. 4. unpacked is an embedded analysis-tag. We now give a variant of "1. " in which we flatten the analysis; that is, we provide an analysis in which we identify the rules and morpholog- ical elements that combine, but do not indicate the order of combina- tion. 5. unpacked If we now eliminate reference to rules, we have a representation which just indicates the morphological parts. These parts could in fact be identified directly as character strings, as in: 6. unpacked un pack ed Very preliminary version