PreviousUpNext

26 Feature System Declaration

Part 5

Auxiliary Document Types

26 Feature System Declaration

The Feature System Declaration (FSD) is an auxiliary file used in conjunction with a TEI-conforming text that makes use of <fs> (that is, feature structure) elements. The FSD serves three purposes:

As a component of the interchange standard for encoded text, the FSD serves an important function in documenting precisely what the encoder intended by the system of feature structure markup used in the encoded text. As application software is developed which makes use of encoded texts, the FSD will also become an important resource that will allow software to validate the feature structure markup in a text and to infer the full interpretation of underspecified feature structures.

This chapter begins by describing how the encoded text uses header information to make links to any associated FSDs. The second through fourth sections describe the overall structure of an FSD and give details of how to encode its parts. The final section offers a full example. [ see note 128 ]

26.1 Linking a TEI Text to Feature System Declarations

In order for application software to use feature system declarations to aid in the automatic interpretation of encoded texts, or even for human readers to find the appropriate declarations which document the feature system used in markup, there must be a formal link from the encoded texts to the declarations. As it turns out, the mechanism for linking texts to FSDs parallels the mechanism for linking texts to writing system declarations (WSDs).

The linkage is made in two places. First, in the DTD subset of the document type declaration at the beginning of the text file, an external entity is defined for each FSD that is associated with the encoded text. That entity declaration associates an entity name with the name of a file on the host system. It appends the SUBDOC keyword to tell the processor that the named file is a self-contained SGML document. See the example below for details of syntax.

The second place in which the linkage from text to FSDs is made is in the TEI header, as mentioned in section 5.3.7 . Within the <encodingDesc> element, a special <fsdDecl> element may be used for each distinct feature structure type, as follows:

Note that one <fsdDecl> element must be specified for each distinct type of feature structure used in the markup. The fsd element supplies the name of the external entity containing the actual declaration for that type of feature structure.

There may be multiple <fsdDecl> elements for a given FSD; one for each type of feature structure it defines. For instance, in the following example, the file lex.fsd contains an FSD that contains definitions of feature structures for both lexical entries (<fs type=entry> ) and lexical subentries (<fs type=subentry> ).

The following example shows the markup for linking a TEI document to two WSDs and two FSDs. The linkage to both WSDs and FSDs is shown in order to illustrate the parallel nature of the linking mechanisms for both kinds of auxiliary files.

   <!DOCTYPE tei.2 PUBLIC "-//TEI P3//DTD Main Document Type//EN"
                          "tei2.dtd" [
      <!-- In the DTD subset, we declare external
           entities for our FSDs and WSDs.        -->
      <!ENTITY wsd.english system 'en.wsd'   SUBDOC >
      <!ENTITY wsd.french  system 'fr.wsd'   SUBDOC >
      <!ENTITY fsd.gazdar  system 'gpsg.fsd' SUBDOC >
      <!ENTITY fsd.lexicon system 'lex.fsd'  SUBDOC >
   ]>
   <tei.2>

   <teiHeader>
      <!-- In the encoding description of the TEI header,
           we link the FSDs to the fs TYPE attribute;
           in the profile description,
           we link the WSDs to the global LANG attribute.
           -->
      <fileDesc> ... </fileDesc>
      <encodingDesc>
           <!-- ... -->
           <fsdDecl type=GPSG     fsd=fsd.gazdar>
           <fsdDecl type=entry    fsd=fsd.lexicon>
           <fsdDecl type=subentry fsd=fsd.lexicon>
           <!-- ... -->
      </encodingDesc>
      <profileDesc>
           <!-- ... -->
           <langUsage>
               <language id=EN wsd=wsd.english>English</>
               <language id=FR wsd=wsd.french >French</>
           </langUsage>
      </profileDesc>
   </teiHeader>
   <!-- The text goes here -->
   </TEI.2>

The auxiliary tag set for feature system declarations is contained in the file teifsd2.dtd , which has the public identifier -//TEI P3//DTD Auxiliary Document Type: Feature System Declaration//EN and the overall structure shown below:

<!-- 26.1:  Feature System Declaration                        -->
<!-- Text Encoding Initiative: Guidelines for Electronic      -->
<!-- Text Encoding and Interchange. Document TEI P3, 1994.    -->

<!-- Copyright (c) 1994 ACH, ACL, ALLC. Permission to copy    -->
<!-- in any form is granted, provided this notice is          -->
<!-- included in all copies.                                  -->

<!-- These materials may not be altered; modifications to     -->
<!-- these DTDs should be performed as specified in the       -->
<!-- Guidelines in chapter "Modifying the TEI DTD."           -->

<!-- These materials subject to revision. Current versions    -->
<!-- are available from the Text Encoding Initiative.         -->
<!-- First, we declare basic parameter entities and embed     -->
<!-- auxiliary files.                                         -->

<!-- Embed entities for TEI generic identifiers.              -->

<!ENTITY % TEI.elementNames system 'teigis2.ent'                >
%TEI.elementNames;

<!-- Embed entities for TEI keywords.                         -->

<!ENTITY % TEI.keywords.ent system 'teikey2.ent'                >
%TEI.keywords.ent;

<!-- Define element classes for content models, shared        -->
<!-- attributes for element classes, and global attributes.   -->
<!-- (This all happens within the file teiclas2.ent.)         -->

<!ENTITY % TEI.elementClasses system 'teiclas2.ent'             >
%TEI.elementClasses;

<!-- Define element classes for feature structure             -->
<!-- declarations.                                            -->

<!ENTITY % x.boolean ''                                         >
<!ENTITY % m.boolean '%x.boolean any | none'                    >
<!ENTITY % x.binary ''                                          >
<!ENTITY % m.binary '%x.binary minus | plus'                    >
<!ENTITY % x.singleVal ''                                       >
<!ENTITY % m.singleVal '%x.singleVal %m.binary; | %m.boolean; | 
           dft | msr | nbr | rate | str | sym | uncertain'      >
<!ENTITY % x.complexVal ''                                      >
<!ENTITY % m.complexVal '%x.complexVal alt | fs | vAlt'         >
<!ENTITY % x.featureVal ''                                      >
<!ENTITY % m.featureVal '%x.featureVal %m.complexVal; | 
           %m.singleVal; | null'                                >

<!-- Now, we declare the elements for FSDs proper.            -->

<!-- ... declarations from section 26.2                       -->
<!--     (Feature System Declaration)                         -->
<!--     go here ...                                          -->
<!-- ... declarations from section 26.3                       -->
<!--     (Feature definitions)                                -->
<!--     go here ...                                          -->
<!-- ... declarations from section 26.4                       -->
<!--     (Feature structure constraints)                      -->
<!--     go here ...                                          -->

<!-- The elements for feature structures themselves are       -->
<!-- declared in teifs2.dtd                                   -->

<!ENTITY % TEI.fs.dtd system 'teifs2.dtd'                       >
%TEI.fs.dtd;

<!-- Finally, embed the TEI header and core tag sets.         -->

<!ENTITY % TEI.header.dtd system 'teihdr2.dtd'                  >
%TEI.header.dtd;
<!ENTITY % TEI.core.dtd system 'teicore2.dtd'                   >
%TEI.core.dtd;

26.2 The Overall Structure of a Feature System Declaration

A feature system declaration is encoded as a document of type <teiFsd2> . It has two parts: an obligatory header (which provides bibliographic information for the file) and a set of feature structure declarations (each of which defines one type of feature structure). Each feature structure declaration in turn has three parts: an optional description (which gives a prose comment on what that type of feature structure encodes), an obligatory set of feature declarations (which specify range constraints and default values for the features in that type of structure), and optional feature structure constraints (which specify co-occurrence restrictions on feature values). The header is encoded as a <teiHeader> , just as for any TEI.2 document; see chapter 5 . The other components listed above are unique to feature system declarations. Thus, the following new elements are involved:

Feature declarations and feature structure constraints are described in the next two sections of this chapter. Note that the specification of similar <fsDecl> elements can be simplified by devising an inheritance hierarchy for the feature structure types. Each <fsDecl> may name a baseType from which it inherits feature declarations and constraints. For instance, suppose that <fsDecl type=Basic> contains <fDecl name=One> and <fDecl name=Two> , and that <fsDecl type=Derived baseType=Basic> contains just <fDecl name=Three> . Then any instance of <fs type=Derived> may include all three features. This is because <fsDecl type=Derived> inherits the two feature declarations from <fsDecl type=Basic> when it specifies a baseType of Basic .

The following sample shows the overall structure of a complete FSD. Note that as a stand-alone document it begins with a DOCTYPE declaration which identifies the associated DTD.

  <!DOCTYPE teiFsd2 PUBLIC "-//TEI P3//DTD Auxiliary Document Type:
        Feature System Declaration//EN"
       "teifsd2.dtd" >
  <teiFsd2>
     <teiHeader>
        <!-- The header is as for any TEI.2 document -->
     </teiHeader>
     <fsDecl type=SomeName>
        <fsDescr>Describes what this type of fs represents</>
        <fDecl name=featureOne>
           <!-- The declaration for featureOne -->
        </fDecl>
        <fDecl name=featureTwo>
           <!-- The declaration for featureTwo -->
        </fDecl>
        <fsConstraints>
           <!-- The feature structure constraints go here -->
        </fsConstraints>
     </fsDecl>
     <fsDecl type=AnotherType>
           <!-- Declare another type of feature structure -->
     </fsDecl>
  </teiFsd2>

The formal definition of <teiFsd2> and feature structure declarations is as follows:

<!-- 26.2:  Feature System Declaration                        -->
<!ELEMENT teiFsd2       - -  (teiHeader, fsDecl+)               >
<!ATTLIST teiFsd2            %a.global;                         >
<!ELEMENT fsDecl        - -  (fsDescr?, fDecl+, fsConstraints?) 
                                                                >
<!ATTLIST fsDecl             %a.global;
          baseType           CDATA               #IMPLIED
          type               CDATA               #REQUIRED      >
<!ELEMENT fsDescr       - O  (%paraContent;)                    >
<!ATTLIST fsDescr            %a.global;                         >
<!-- This fragment is used in sec. 26.1                       -->

26.3 Feature Declarations

Each feature is declared in an <fDecl> element whose name attribute identifies the feature being declared; this matches the name attribute of the <f> elements it declares. An <fDecl> also has an org attribute which declares the organizing principle for the values of the <f> elements it declares. That is, the value may be a unit (a single value), a set (in which the order is not significant and there are no duplicates), a bag (in which the order is not significant but duplicates are allowed), or a list (in which the order is significant). (See definition of org attribute of <f> in section 16.6 .) An <fDecl> has three parts: an optional prose description (which should explain what the feature and its values represent), an obligatory range specification (which declares what values the feature is allowed to have), and an optional default specification (which declares what default value should be supplied when the named feature does not appear in an <fs> ). A single unconditional default value may be specified, or multiple conditional values. If no default is specified, or if none of the conditions is met, then the default value is <none> ; in other words, the feature is not applicable (see section 16.8 for a discussion of the <none> element).

The tags used in feature declarations are the following:

The logic for validating feature values and for matching the conditions for supplying default values is based on the operation of subsumption. Subsumption is a standard operation in feature-structure-based formalisms. Informally, a feature structure fs subsumes all feature structures that are at least as informative as itself; that is, all feature structures that specify at least as many features as fs with values at least as informative as those given in fs (Pereira 1987:6; see also Shieber 1986:14-16). [ see note 129 ] A more formal definition requires that we first define the notion of ``domain of a feature structure.'' A feature structure can be viewed as a partial function that maps features onto values; when viewed in this way, the domain of a feature structure is the set of top-level features it contains (that is, excluding features in embedded feature structures). We can now offer a more precise definition: `` fs subsumes fs′ if both are identical primitive values, or if the domain of fs is a subset of the domain of fs′, and for every feature f in the domain of fs, the value of f in fs subsumes the value of f in fs'.''

Following the spirit of the informal definition above, we can extend subsumption in a straightforward way to cover alternation, negation, special primitive values, and the use of attributes in the SGML markup. For instance, a <vAlt> containing the value v subsumes v . The negation REL=ne of value v subsumes any value that is not v . The value <unknown> subsumes any value. The value <any> subsumes any value that is in the range of a feature. <fs type=X> </fs> subsumes any feature structure with TYPE=X. <nbr rel=ge value=0> subsumes any <nbr> with value greater than or equal to zero.

As an example of feature declarations, consider the following extract from Generalized Phrase Structure Grammar by Gerald Gazdar, Ewan Klein, Geoffrey Pullum, and Ivan Sag (Harvard University Press, 1985). In the appendix to their book (pages 245-247), they propose a feature system for English of which this is just a sampling:

   feature    value range

   INV        {+, -}
   CONJ       {and, both, but, either, neither, nor, or, NIL}
   COMP       {for, that, whether, if, NIL}
   AGR        CAT
   PFORM      {to, by, for, ...}
   Feature specification defaults

   FSD 1:  [-INV]
   FSD 2:  ~[CONJ]
   FSD 9:  [INF, +SUBJ] --> [COMP for]

The INV feature, which encodes whether or not a sentence is inverted, allows only the values plus (+) and minus (-). If the feature is not specified, then the default rule (FSD 1 above) says that a value of minus is always assumed. The feature declaration for this feature would be encoded as follows:

        <fDecl name=INV>
           <fDescr>inverted sentence</fDescr>
           <vRange>
               <vAlt><plus><minus></vAlt></vRange>
           <vDefault>
               <minus></vDefault>
        </fDecl>

The value range is specified as an alternation (more precisely, an exclusive disjunction) of <plus> and <minus> . That is, the value must be one or the other, but not both or neither.

The CONJ feature indicates the surface form of the conjunction used in a construction. The ~ in the default rule (see FSD 2 above) represents negation. This means that by default the feature is not applicable, in other words, no conjunction is taking place. This corresponds to the simple value <none> ; see section 16.8 . Note that this is distinct from the NIL value allowed in the value range. In their analysis, NIL means that the phenomenon of conjunction is taking place but there is no explicit conjunction in the surface form of the sentence. The feature declaration for this feature would be encoded as follows:

<fDecl name=CONJ>
   <fDescr>surface form of the conjunction</fDescr>
   <vRange>
      <vAlt>
         <sym value=and>
         <sym value=both>
         <sym value=but>
         <sym value=either>
         <sym value=neither>
         <sym value=nor>
         <sym value=or>
         <sym value=NIL>
      </vAlt></vRange>
   <vDefault>
      <none></vDefault>
</fDecl>
Note that the <vDefault> is not strictly necessary in this case, since <none> is the value assumed in the absence of a default specification.

The COMP feature indicates the surface form of the complementizer used in a construction. In value range, it is analogous to CONJ. However, its default rule (see FSD 9 above) is conditional. It says that if the verb form is infinitival (the VFORM feature is not mentioned in the rule since it is the only feature that can take INF as a value), and the construction has a subject, then a `for' complement must be used. For instance, to make John the subject of the infinitive in `It is necessary to go,' a `for' complement must be used; that is, `It is necessary for John to go.' The feature declaration for this feature would be encoded as follows:

<fDecl name=COMP>
   <fDescr>surface form of the complementizer</fDescr>
   <vRange>
      <vAlt>
         <sym value=for>
         <sym value=that>
         <sym value=whether>
         <sym value=if>
         <sym value=NIL>
      </vAlt></vRange>
   <vDefault>
      <if><fs><f name=VFORM><sym value=INF></f>
              <f name=SUBJ><plus></f></fs>
      <then><sym value=for></if>
      </vDefault>
</fDecl>

The AGR feature stores the features relevant to subject-verb agreement. Gazdar et al. specify the range of this feature as CAT. This means that the value is a category, which is their term for a feature structure. This is actually too weak a statement. Not just any feature structure is allowable here; it must be a feature structure for agreement (which is defined in the complete example at the end of the chapter to contain the features of person and number). The following feature declaration encodes this constraint on the value range:

<fDecl name=AGR>
   <fDescr>agreement for person and number</fDescr>
   <vRange><fs type=Agreement></fs></vRange>
</fDecl>
That is, the value must be a feature structure of type Agreement . The complete example at the end of this chapter includes the <fsDecl type=Agreement> which includes <fDecl name=PERS> and <fDecl name=NUM> .

The PFORM feature indicates the surface form of the preposition used in a construction. Since PFORM is specified above as an open set, <str> is used in the range specification below rather than <sym> .

<fDecl name=PFORM>
   <fDescr>word form of a preposition</fDescr>
   <vRange><str rel=ne></str></vRange>
</fDecl>
This example makes use of a negation. <str rel=ne> </str> subsumes any string that is not the empty string.

The formal definition for feature declarations follows. Note that the class featureVal includes all possible single feature values, including a <vAlt> .

<!-- 26.3:  Feature definitions                               -->
<!ELEMENT fDecl         - -  (fDescr?, vRange, vDefault?)       >
<!ATTLIST fDecl              %a.global;
          name               NMTOKEN             #REQUIRED
          org                (unit | set | bag | list) 
                                                 unit           >
<!ELEMENT fDescr        - O  (%paraContent;)                    >
<!ATTLIST fDescr             %a.global;                         >
<!ELEMENT vRange        - O  (%m.featureVal)                    >
<!ATTLIST vRange             %a.global;                         >
<!ELEMENT vDefault      - -  ((%m.featureVal)+ | if+)           >
<!ATTLIST vDefault           %a.global;                         >
<!ELEMENT if            - -  ((fs | f | fAlt), then, 
                             (%m.featureVal) )                  >
<!ATTLIST if                 %a.global;                         >
<!ELEMENT then          - O  EMPTY                              >
<!ATTLIST then               %a.global;                         >
<!-- This fragment is used in sec. 26.1                       -->

26.4 Feature Structure Constraints

Ensuring the validity of feature structures may require much more than simply specifying the range of allowed values for each feature. There may be constraints on the co-occurrence of one feature value with the value of another feature in the same feature structure or in an embedded feature structure.

Such constraints on valid feature structures are expressed as a series of conditional and biconditional tests in the <fsConstraints> part of an <fsDecl> . A particular feature structure is valid only if it meets all the constraints. The <cond> element encodes the conventional if-then conditional of boolean logic which succeeds when both the antecedent and consequent are true, or whenever the antecedent is false. The <bicond> element encodes the biconditional (if and only if) operation of boolean logic. It succeeds only when both antecedent and consequent are true, or both are false. In feature structure constraints the antecedent and consequent are expressed as feature structures; they are considered true if they subsume (see section 26.3 ) the target feature structure. The following elements make up the <fsConstraints> part of an FSD:

For an example of feature structure constraints, consider the following `feature co-occurrence restrictions' extracted from the feature system for English proposed by Gazdar, Klein, Pullum, and Sag (1985:246-247):

The first constraint says that if a construction is inverted, it must also have an auxiliary and a finite verb form. That is,

<cond><fs><f name=INV><plus></f></fs>
      <then>
      <fs>
         <f name=AUX><plus></f>
         <f name=VFORM><sym value=FIN></f>
      </fs>
</cond>

The second constraint says that if a construction has a BAR value of zero (i.e., it is a sentence), then it must have a value for the features N, V, and SUBCAT. By the same token, because it is a biconditional, if it has values for N, V, and SUBCAT, it must have BAR=0. That is,

<bicond>
   <fs><f name=BAR><sym value=0></f></fs>
   <iff>
   <fs>
      <f name=N><any></f>
      <f name=V><any></f>
      <f name=SUBCAT><any></f>
   </fs>
</bicond>

The final constraint says that if a construction has a BAR value of 1 (i.e., it is a phrase), then the SUBCAT feature is irrelevant (~). This is not biconditional, since there are other instances under which the SUBCAT feature is irrelevant. That is,

<cond><fs><f name=BAR><sym value=1></f></fs>
      <then>
      <fs><f name=SUBCAT><none></f></fs>
</cond>

The DTD fragment for feature structure constraints is as follows. Note that <cond> and <bicond> use the empty tags <then> and <iff> , respectively, to separate the antecedent and consequent. These are primarily for the sake of enhancing human readability.

<!-- 26.4:  Feature structure constraints                     -->
<!ELEMENT fsConstraints - -  (cond | bicond)*                   >
<!ATTLIST fsConstraints      %a.global;                         >
<!ELEMENT cond          - O  ((fs | f | fAlt), then, (fs | f | 
                             fAlt))                             >
<!ATTLIST cond               %a.global;                         >
<!ELEMENT bicond        - O  ((fs | f | fAlt), iff, (fs | f | 
                             fAlt))                             >
<!ATTLIST bicond             %a.global;                         >
<!ELEMENT iff           - O  EMPTY                              >
<!ATTLIST iff                %a.global;                         >
<!-- This fragment is used in sec. 26.1                       -->

26.5 A Complete Example

To summarize this chapter, the complete FSD for the example that has run through the chapter is reproduced below:

<!DOCTYPE teiFsd2 SYSTEM "teifsd2.dtd">
<teiFsd2>
<teiHeader>
<fileDesc>
<titleStmt>
   <title>A sample FSD based on an extract from Gazdar
          et al.'s GPSG feature system for English</title>
   <resp>
      <role>encoded by</role>
      <name>Gary F. Simons</name>
   </resp>
</titleStmt>
<publicationStmt>
This sample was first encoded by Gary F. Simons (Summer
Institute of Linguistics, Dallas, TX) on January 28, 1991.
Revised April 8, 1993 to match the specification of FSDs
in version P2 of the TEI Guidelines.
</publicationStmt>
<sourceDesc><p>
This sample FSD does not describe a complete feature
system.  It is based on extracts from the feature system
for English presented in the appendix (pages 245-247) of
Generalized Phrase Structure Grammar, by Gazdar, Klein,
Pullum, and Sag (Harvard University Press, 1985).
</sourceDesc>
</fileDesc>
</teiHeader>
<!-- ************************************************** -->
<fsDecl type=GPSG>
   <fsDescr>Encodes a feature structure for the GPSG analysis
     of English (after Gazdar, Klein, Pullum, and Sag)</fsDescr>
   <fDecl name=INV>
      <fDescr>inverted sentence</fDescr>
      <vRange>
          <vAlt><plus><minus></vAlt></vRange>
      <vDefault>
          <minus></vDefault>
   </fDecl>
   <fDecl name=CONJ>
      <fDescr>surface form of the conjunction</fDescr>
      <vRange>
          <vAlt>
              <sym value=and><sym value=both>
              <sym value=but><sym value=either>
              <sym value=neither><sym value=nor>
              <sym value=or><sym value=NIL>
          </vAlt></vRange>
      <vDefault>
          <none></vDefault>
      </fDecl>
   <fDecl name=COMP>
      <fDescr>surface form of the complementizer</fDescr>
      <vRange>
          <vAlt>
              <sym value=for><sym value=that><sym value=whether>
              <sym value=if><sym value=NIL>
          </vAlt></vRange>
      <vDefault>
          <if><fs><f name=VFORM><sym value=INF></f>
                  <f name=SUBJ><plus></f></fs>
           <then><sym value=for></if>
          </vDefault>
      </fDecl>
   <fDecl name=AGR>
      <fDescr>agreement for person and number</fDescr>
      <vRange><fs type=Agreement></fs></vRange>
      </fDecl>
   <fDecl name=PFORM>
      <fDescr>word form of a preposition</fDescr>
      <vRange><str rel=ne></str></vRange>
      </fDecl>
   <!-- The complete analysis includes additional features -->
   <fsConstraints>
      <cond><fs><f name=INV><plus></f></fs>
         <then>
            <fs>
               <f name=AUX><plus></f>
               <f name=VFORM><sym value=FIN></f>
            </fs></cond>
      <bicond><fs><f name=BAR><sym value=0></f></fs>
         <iff>
            <fs>
               <f name=N><any></f>
               <f name=V><any></f>
               <f name=SUBCAT><any></f>
            </fs></bicond>
      <cond><fs><f name=BAR><sym value=1></f></fs>
         <then>
            <fs><f name=SUBCAT><none></f></fs>
            </cond>
   </fsConstraints>
</fsDecl>
<!-- ************************************************** -->
<fsDecl type=Agreement>
   <fsDescr>This type of feature structure encodes the features
      for subject-verb agreement in English</fsDescr>
   <fDecl name=PERS>
      <fDescr>person (first, second, or third)</fDescr>
      <vRange><vAlt><sym value=1><sym value=2><sym value=3>
         </vAlt></vRange>
   </fDecl>
   <fDecl name=NUM>
      <fDescr>number (singular or plural)</fDescr>
      <vRange><vAlt><sym value=sg><sym value=pl>
         </vAlt></vRange>
   </fDecl>
</fsDecl>
</teiFsd2>


PreviousUpNext