PreviousUpNext

20 Names and Dates

Part 4

Additional Tag Sets

20 Names and Dates

This chapter describes an additional tag set which may be used for the encoding of proper names and other phrases descriptive of persons, places, organizations, and also of dates and times, in a manner more detailed than that possible using the elements already provided for these purposes in the core tag set described in chapter 6 .

In section 6.4 it was noted that the elements provided in the core allow the encoder to specify that a given text segment is a proper noun, or a referring string, and to specify the kind of object named or referred to only by supplying a value for the type attribute. The elements provided by the present tag set allow the encoder both to supply a detailed sub-structure for such referring strings, and also to distinguish explicitly between names of persons, places or organizations.

Similarly, the elements provided here allow the encoder to supply a detailed analysis of the component parts of any expression which denotes a date or time, which is not possible using the elements described in section 6.4.4 .

It should be noted however that no provision is made by the present tag set for the representation of the abstract structures, or ` virtual objects' to which names or dates may be said to refer. In simple terms, where the core tag set allows one to represent a name, this additional tag set allows one to represent a personal name, but neither provides for the direct representation of a person. Appropriate mechanisms for the encoding of such interpretative gestures may be found in chapters 15 and 16 .

To enable the additional tag set described in the present chapter, a parameter entity TEI.names.dates must be declared in the document type subset with the value INCLUDE , as further described in section 3.3 . A document using the prose base tag set and this additional tag set will thus begin as follows:

 <!DOCTYPE TEI.2 PUBLIC "-//TEI P3//DTD Main Document Type//EN"
                        "tei2.dtd" [
      <!ENTITY % TEI.prose 'INCLUDE' >
      <!ENTITY % TEI.names.dates 'INCLUDE' >
 ]>

The chapter begins by discussing additional tags for the encoding of component parts of personal names (section 20.1 , place names (section 20.2 ) and organizational names (section 20.3 ). Detailed encoding of dates and times is described in section 20.4 .

The additional tag set for names and dates, included in the file teind2.dtd , has the following overall structure:

<!-- 20:  Additional tags for names and dates                 -->
<!-- ... declarations from section 20.1                       -->
<!--     (Personal names)                                     -->
<!--     go here ...                                          -->
<!-- ... declarations from section 20.2.3                     -->
<!--     (Names for places)                                   -->
<!--     go here ...                                          -->
<!-- ... declarations from section 20.3                       -->
<!--     (Organization names)                                 -->
<!--     go here ...                                          -->
<!-- ... declarations from section 20.4.2                     -->
<!--     (Date components)                                    -->
<!--     go here ...                                          -->
<!-- <dtdref dtdfrag=dndmeas> -->

When this tag set is enabled, three additional element classes called persPart, placePart, and temporalExpr are declared. The parameter entities corresponding with these classes are declared in the file teind2.ent , as follows:

<!-- 20:  Additional classes for names and dates              -->
<!ENTITY % x.personPart ''                                      >
<!ENTITY % m.personPart '%x.personPart addName | forename | 
           genName | nameLink | roleName | surname'             >
<!ENTITY % x.placePart ''                                       >
<!ENTITY % m.placePart '%x.placePart bloc | country | distance 
           | geog | offset | region | settlement'               >
<!ENTITY % x.temporalExpr ''                                    >
<!ENTITY % m.temporalExpr '%x.temporalExpr dateStruct | day | 
           distance | hour | minute | month | occasion | offset 
           | second | timeStruct | week | year'                 >
<!ENTITY % a.names '
          reg                CDATA               #IMPLIED
          key                CDATA               #IMPLIED'      >
<!ENTITY % a.personPart '    %a.names;
          sort               NUMBER              #IMPLIED
          type               CDATA               #IMPLIED
          full               (yes | abb | init)  yes'           >
<!ENTITY % a.placePart '     %a.names;
          type               CDATA               #IMPLIED
          full               (yes | abb | init)  yes'           >
<!ENTITY % a.temporalExpr '
          reg                CDATA               #IMPLIED
          value              CDATA               #IMPLIED
          type               CDATA               #IMPLIED
          full               (yes | abb | init)  yes'           >

20.1 Personal Names

The core <rs> and <name> elements can distinguish names in a text but are insufficiently powerful to mark their internal components or structure. To conduct nominal record linkage or even to create an alphabetically sorted list of personal names, it is important to distinguish between a family name, a forename and an honorary title. Similarly, when confronted with a referencing string such as ``John, by the grace of God, king of England, lord of Ireland, duke of Normandy and Aquitaine, and count of Anjou'', the analyst will often wish to distinguish among components giving some hint as to the status, occupation or residence of the person to whom the name belongs. The following elements are provided for these and related purposes:

As members of the names class, all of these elements share the following attributes:

Additionally, all of the above elements except for <persName> are members of the class personPart , and thus share the following attributes:

The <persName> element may be used in preference to the general <name> element irrespective of whether or not the components of the personal name are also to be marked. Its key and reg attributes are used in exactly the same way as those on the <rs> and <name> elements (see section 6.4 ). The tag <persName> is synonymous with the tag <name type=person> , except that its type attribute allows for further subcategorization of the personal name for example as a ``married'', ``maiden'', ``pen'', ``pseudo'' or ``religious'' name. Consequently the following examples are equivalent:

 

That silly man <rs key=DPB1 type=person reg='Brown, David
Paul'>David Paul Brown</rs> has suffered the furniture of
his office to be seized the third time for rent.

That silly man
<rs key=DPB1 type=person reg='Brown, David Paul'>
<name>David Paul Brown</name>
</rs> has suffered the furniture of
his office to be seized the third time for rent.

That silly man
<name key=DPB1 type=person reg='Brown, David Paul'>
David Paul Brown</name> has suffered ...

That silly man
<persName key=DPB1 type=person reg='Brown, David Paul'>
David Paul Brown</persName> has suffered ...

The <persName> element is more powerful than the <rs> and <name> elements because distinctive name components occurring within it can be marked as such.

Many cultures distinguish between a family or inherited surname and additional personal names, often known as given names. These should be tagged using the <surname> and <forename> elements respectively and may occur in any order:

<persName key=FDR1>
  <surname>Roosevelt</surname>,
  <forename>Franklin</forename>
  <forename>Delano</forename>
</persName>

<persName key=FDR1>
  <forename>Franklin</forename>
  <forename>Delano</forename>
  <surname>Roosevelt</surname>
</persName>

The type attribute may be used with both <forename> and <surname> elements to provide further culture- or project- specific detail about the name component, for example:

<persName key=FDR1>
 <forename type='first'>Franklin</forename>
 <forename type='middle'>Delano</forename>
 <surname type='last'>Roosevelt</surname>
</persName>

<persName key=MRT1>
 <forename type='Christian'>Margaret</forename>
 <forename type='unused'>Hilda</forename>
 <surname type='maiden'>Roberts</surname>
 <surname type='married'>Thatcher</surname>
</persName>

<persName key=MUAL1 type=religious>
 <forename>Muhammad</forename>
 <surname>Ali</surname>
</persName>
In the following two examples the type attribute of the <surname> element is used to indicate so-called double-barrelled or hyphenated surnames:
<persName key=KHS1>
  <forename>Kara</forename>
  <surname type=combine>Hattersley-Smith</surname>
</persName>

<persName key=NSJS1>
   <forename>Norman</forename>
   <surname type=combine>St John Stevas</surname>
</persName>

In most cases, patronymics should be treated as forenames, thus:

 ... but it remained for
 <persName>
   <forename>Snorri</>
   <forename>Sturluson</>
 </persName>
 to combine the two traditions in cyclic form.
When a patronymic is used as a surname, however (e.g. by an individual who otherwise would have no surname, but lives in a culture which requires surnames), it may be tagged as such:
Even <persName><forename>Finnur</> <surname>Jonsson</>
</persName> acknowledged the artificiality of the
procedure:  <q>As <title>Njála</> now begins,
no original saga ever began.</q>
In the following example, the type attribute is used to distinguish a patronymic from other forenames:
<persName key=pn9>
  <forename sort=2>Sergei</forename>
  <forename sort=3 type='patronym'>Mikhailovic</forename>
  <surname sort=1>Uspensky</surname>
</persName>

This example also demonstrates the use of the sort attribute common to all members of the personPart class; its effect is to state the sequence in which <forename> and <surname> elements should be combined when constructing a sort key for the name.

Some names include generational or dynastic information, such as ``Junior'' or ``senior'', or a number: the <genName> element may be used to distinguish these from other parts of the name, as in the following examples:

<persName key=HEMA1>
  <surname>Marques</surname>
  <genName>Junior</genName>,
  <forename>Henrique</forename>
</persName>

<persName>
  <foreName>Charles</foreName>
  <genName>II</genName>
</persName>

It is also often convenient to distinguish phrases (historically similar to the generational labels mentioned above) used to link parts of a name together, such as ``von'', ``of'', ``de'' etc. It is often a matter of arbitrary choice whether or not such components are regarded as part of the surname or not; the <nameLink> element is provided as a means of making clear what the correct usage should be in a given case, as in the following examples:

<persName key=DUDO1>
 <roleName type=honorific full=abb>Mme</roleName>
 <nameLink>de la</nameLink>
 <surname>Rochefoucault</surname>
</persName>

<persName>
  <foreName>Walter</foreName>
   <surname>de la Mare</surname>
</persname>

Finally, the <addName> and <roleName> elements are used to mark all name components other than those already listed. The distinction between them is that a <roleName> encloses an associated name component such as an aristocratic or official title which exists in some sense independently of its bearer. The distinction is not always a clear one. As elsewhere, the type attribute may be used with either element to supply culture- or application- specific distinctions. Some typical values for this attribute for names in the Western European tradition follow:

Here are some further examples of the usage of these elements:

<persName key=PGK1>
<roleName type=nobility>Princess</roleName>
<forename>Grace</forename>
</persName>

<persName key=GRMO1 type=pseudo>
<addName type=honorific>Grandma</addName>
<surname>Moses</surname>
</persName>

<persName key=MRSRO1>
<addName type=honorific>Mrs</addName>
<surname>Robinson</surname>
</persName>

<persName key=STAU1>
<roleName type=office>Saint</roleName>
<forename>Augustine</forename>
</persName>

<persName key=SLWICL1>
<roleName type=office>President</roleName>
<foreName>Bill</foreName>
<surname>Clinton</surname>
</persName>

<persName key=MOGA1>
<roleName type=military>Colonel</roleName>
<surname>Gaddafi</surname>
</persName>

<persName key=FRTG1>
<forename>Frederick</forename>
<addname type=epithet>the Great</addname>
</persName>

A name may have any combination of the above elements:

     <persName key=EGBR1>
     <roleName type=office>Governor</roleName>
     <forename sort=2>Edmund</foreName>
     <forename sort=3 full=init reg='Gerald'>G</forename>.
     <addName type=nick>Jerry</addName>
     <addName type=epithet>Moonbeam</addName>
     <surname sort=1>Brown</surname>
     <genName full=abb>Jr</genName>.
     </persName>

Although highly flexible, these mechanisms for marking personal name components will not cater for every personal name and processing need. Where the internal structure of personal names is highly complex or where name components are particularly ambiguous, feature structures are recommended as the most appropriate mechanism to mark and analyze them, as further discussed in chapter 16 .

The elements discussed in this section are formally defined as follows:

<!-- 20.1:  Personal names                                    -->
<!ELEMENT persName      - -  (#PCDATA | %m.personPart; | 
                             %m.phrase; )*                      >
<!ATTLIST persName           %a.global;
                             %a.names;
          type               CDATA               #IMPLIED       >
<!ELEMENT surname       - -  (%phrase.seq;)                     >
<!ATTLIST surname            %a.global;
                             %a.personPart;                     >
<!ELEMENT forename      - -  (%phrase.seq;)                     >
<!ATTLIST forename           %a.global;
                             %a.personPart;                     >
<!ELEMENT genName       - -  (%phrase.seq;)                     >
<!ATTLIST genName            %a.global;
                             %a.personPart;                     >
<!ELEMENT nameLink      - -  (%phrase.seq;)                     >
<!ATTLIST nameLink           %a.global;
                             %a.personPart;                     >
<!ELEMENT addName       - -  (%phrase.seq;)                     >
<!ATTLIST addName            %a.global;
                             %a.personPart;                     >
<!ELEMENT roleName      - -  (%phrase.seq)                      >
<!ATTLIST roleName           %a.global;
                             %a.personPart;                     >
<!-- This fragment is used in sec. 20                         -->

20.2 Place Names

Like other proper nouns or noun phrases used as names, place names can simply be marked up with the <rs> element, or with the <name> element.For cartographers and historical geographers, however, the component parts of a place name provide important information about the relation between the name and some spot in space and time. They also provide importance evidence in historical linguistics. For such applications and others in which the internal structure of a place name is to be encoded, the <placeName>> element and its subcomponents should be used.

As members of the names class, all these elements share the following attributes:

Additionally, all of the above elements are members of the class placePart , and thus share the following attributes:

Like the <persName> element discussed in section 20.1 , the <placeName> element may be regarded simply as an abbreviation for the tags <name type=place> or <rs type=place> . The following encodings are thus equivalent: [ see note 113 ]

After spending some time in our
<rs type=place key=NY1>modern
  <name type=place key=BA1>Babylon</name>
</rs>,
<name type=place key=NY1>New York</name>,
I have proceeded to the
<rs type=place key=PH1>City of Brotherly Love</rs>.

After spending some time in our
<placeName key=NY1>modern
  <placeName key=BA1>Babylon</placeName>
</placeName>,
<placeName key=NY1>New York</placeName>,
I have proceeded to the
<placeName key=PH1>City of Brotherly Love</placeName>.

As indicated above, the <placeName> may simply contain a character string and its type attribute may be used to provide a sub-categorization of place names. Alternatively, it may contain more detailed sub components. A place name may be analysed in several different ways: as a geo-political unit, using a hierarchy of descriptive names (see section 20.2.1 ); in terms of geographic features such as mountains and rivers (see section 20.2.2 ); relative to other place names (see section 20.2.3 .

20.2.1 Geo-political Place Names

A place name is sometimes given as sequence of geo-political or administrative units, often arranged in ascending sequence according to their size or administrative importance, for example: ``Rochester, New York'', or as a single such unit, for example ``Belgium''. The more detailed component elements listed above (<settle> for a settlement, such as a village, town or city; <region> for any administrative unit such as a county, parish or state; <country> for a politically recognized national entity; or <bloc> for any grouping of such entities) have been chosen for their generality of application. They may be tailored more closely to project- and culture-specific needs by specifying appropriate values in their respective type attributes, as in the following example:

<placeName key=RNY1>
  <settlement type=city>Rochester</settlement>,
  <region type=state>New York</region>
</placeName>

<placeName key=LSEA1>
  <country type=nation>Laos</country>,
  <bloc type=sub-continent>Southeast Asia</bloc>
</placeName>

Note that, even in the case where only one of these component place name elements is used, the <placeName> element must still be present.

I'd rather be in
<placeName>
<settlement key=RNY1 type=city>Rochester</settlement>
</placeName>
than any other place I know.

20.2.2 Geographic Names

Places may also be named in terms of geographic features such as mountains, lakes or rivers, independently of geo-political units. The <geogName> is provided to mark up such names, as an alternative to the <placeName> element discussed above. It contains a sequence of phrase level elements, optionally extended by the following special element:

For example:

  <geogName key=MIRI1 type=river>
    Mississippi River
  </geogName>

Where the <geog> element is used to characterize the kind of geographic feature being named, the <name> element will generally also be used to mark the associated proper noun or noun phrase:

  <geogName key=MIRI1 type=river>
    <name>Mississippi</name>
    <geog>River</geog>
  </geogName>
A more complex example, showing a variety of practices, follows:
The isolated ridge separates two great corridors which run from
<name key=GLCO1 type=place>Glencoe</name> into
<geogName key=GLET1 type=glen>
  <geog reg='glen'>Glen</geog>
  <name>Etive</name>
</geogName>, the
<geogName key=LAGA1 type=hill>
  <geog lang='gaelic' reg='sloping hill face'>Lairig</geog>
  <name>Gartain</name>
</geogName> and the
<geogName key=LAEI1 type=hill>
  <geog lang='gaelic' reg='sloping hill face'>Lairig</geog>
  <name>Eilde</name>
</geogName>

20.2.3 Relative Place Names

All the place name specifications so far discussed are absolute, in the sense that they define only one place. A place may however be specified in terms of its relationship to another place, for example ``10 miles northeast of Paris'' or ``near the top of Mount Sinai''. These relative place names will contain a place name which acts as a referent (e.g. ``Paris'' and ``Mount Sinai''). They will also contain a word or phrase indicating the the position of the place being named in relation to the referent (e.g. ``the top of'', ``north of''). A distance, possibly only vaguely specified, between the referent place and the place being indicated may also be present (e.g. ``10 miles'', ``near'')

Relative place names may be encoded using the following elements in combination with either a <placeName> or a <geogName> element.

Some examples of relative place names are:
<placeName key=NRPA1>
 <offset>near the top of
  </offset>
  <geogName>
     <geog>Mount</geog>
     <name>Sinai</name>
  </geogName>
</placeName>

<placeName key=NEPA1>
  <distance>10 miles</distance>
  <offset>north of</offset>
  <settlement type=city>Paris
  </settlement>
</placeName>

The internal structure of place names is like that of personal names - complex and subject to an enormous amount of variation across time and different cultures. The recommendations in this section will be adequate for a majority of users and applications. They may not, however, satisfy the most specialized inquiries and/or applications in which case it is recommended that the internal structure of place names be represented using feature structures 16 .

The elements discussed in this section are formally defined as follows:

<!-- 20.2.3:  Names for places                                -->
<!ELEMENT placeName     - -  ((#PCDATA | %m.placePart; | 
                             %m.phrase; )*)                     >
<!ATTLIST placeName          %a.global;
                             %a.names;                          >
<!ELEMENT settlement    - -  (%phrase.seq;)                     >
<!ATTLIST settlement         %a.global;
                             %a.placePart;                      >
<!ELEMENT region        - -  (%paraContent)                     >
<!ATTLIST region             %a.global;
                             %a.placePart;                      >
<!ELEMENT country       - o  (%paraContent)                     >
<!ATTLIST country            %a.global;
                             %a.placePart;                      >
<!ELEMENT bloc          - -  (%phrase.seq)                      >
<!ATTLIST bloc               %a.global;
                             %a.placePart;                      >
<!ELEMENT offset        - -  (#PCDATA)                          >
<!ATTLIST offset             %a.global;
          value              CDATA               #IMPLIED
                             %a.placePart;                      >
<!ELEMENT distance      - -  (%phrase.seq)                      >
<!ATTLIST distance           %a.global;
          key                CDATA               #IMPLIED
          value              CDATA               #IMPLIED
          type               CDATA               #IMPLIED
          full               (yes | abb | init)  yes
          reg                CDATA               #IMPLIED
          exact              (Y | N | U)         U              >
<!ELEMENT geogName      - -  (#PCDATA | geog | name )*          >
<!ATTLIST geogName           %a.global;
                             %a.names;
          type               CDATA               #IMPLIED       >
<!ELEMENT geog          - -  (#PCDATA)                          >
<!ATTLIST geog               %a.global;
                             %a.placePart;                      >
<!-- This fragment is used in sec. 20                         -->

20.3 Organization names

Like names of persons or places, organization names can be marked as referent strings or as proper names with the <rs> and <name> elements. For certain applications it may be desirable to mark the component parts of an organization. In some historical and social scientific studies, for example, the component parts of an organization names may give crucial clues which help to characterizing the organization in terms of its geographical location, ownership, likely number of employees, management structure etc. The elements discussed in this section are recommended for this purpose and include:

The <orgname> element should be used when it is desirable to mark an organization name irrespective of whether or not its components are also to be marked. In effect the <orgname> element is a special case of a <name> and thus of an <rs> element. Consequently, the following examples are synonymous, though the last is preferred:

About a year back, a question of considerable interest was
agitated in the <rs type=org key=PAS1>Pennsyla. Abolition Society</rs>.

About a year back, a question of considerable interest was
agitated in the <rs type=org key=PAS1><name>Pennsyla. Abolition
Society</name</rs>.

About a year back, a question of considerable interest was
agitated in the <name type=org key=PAS1>Pennsyla. Abolition Society</name>.

About a year back, a question of considerable interest was
agitated in the <orgname key=PAS1 type=voluntary
reg="Pennsylvania Abolition Society">Pennsyla. Abolition Society</orgname>.
Like the <rs> and <name> elements, the <orgname> element has a key attribute with which an external identifier such as a database key can be assigned to the organization name. It also has a type attribute with which the organization named in the expression can be described, and a reg attribute with which the organization name can be presented in a regularized form.

The <orgtitle> element is used to mark the expression which provides the proper name component of an organization name for example:

Mr Frost will be able to earn an extra fee from
<orgname type=media key=BSB1>
  <orgtitle type=acronym>BSkyB</orgtitle>
</orgname>
rather than the
<orgname type=media key=BBC1>
   <orgtitle type=acronym
             reg='British Broadcasting Corporation'>
   BBC</orgtitle>
</orgname>

Where personal names are encountered as component parts of an organization's title, as in ``Ernst & Young'', these may be tagged with the appropriate personal name elements as discussed in 20.1 . Examples include:

<orgname type='accountancy partnership' key=EY1>
<orgtitle>
  <persname><surname>Ernst</surname></persname> &
  <persname><surname>Young</surname></persname>
</orgtitle>
</orgname>

Organization names may also contain within them place names which, in some applications, may yield vital clues as to the organization's location and or sphere of influence. These components should be tagged with the approprate place name tags 20.2 . Examples include:

A spokesman from
<orgname type=computers key=IBM1>
  <orgtitle reg='International Business Machines'>IBM</orgtitle>
  <placeName><country reg='United Kingdom' key=UNKI1>UK
  </country></placeName>
</orgname> said...

The feeling in
<country type=nation key=CAN1>Canada</country>
is one of strong aversion to the
<orgname type=government key=USG1>
  United States  Government
</orgname>, and of predilection for self-government
under the
<orgname type=government
      reg='British monarchy'>English Crown</orgname>

The <orgtype> element is used to mark those components of an organization name which indicate something about the structure or function of the organization. Examples include:

<orgname type='utility company' key=WWPC1>
  <name type=state>Washington</name>
  <orgtype type=function>Water Power</orgtype>
  <orgtype reg='incorporated' type=structure>Inc.</orgtype>
</orgname>

THE TICKET which you will receive herewith has been formed by
the
<orgname type=political reg='Whig party' key=WHI1>
  <orgtitle>Democratic Whig</orgtitle>
  <orgtype type=function>Party</orgtype>
</orgname> after the most careful deliberation,
with a reference to all the great objects of NATIONAL, STATE,
COUNTY and CITY concern, and with a single eye to the
<emph>Welfare and Best Interests of the Community</emph>.

Organizational names may also be specified hierarchically particularly where the named organization is itself a department or a branch of a larger organizational entity. ``The Department of Modern History, Glasgow University'' is an example. The <orgdivn> element is recommended wherever it is desirable to isolate the independent levels of an organizational hierarchy that are specified in an organization name. Examples include:

<orgname type=academic key=DMHGU1>
  <orgdivn type=department>
     Department of Modern History
  </orgdivn>,
  <name type=city>Glasgow</name>
  <orgtype type=function>University</orgtype>
</orgname>

Although highly flexible, the mechanisms discussed here for marking the components of organization names will not cater for every processing need or organizational name that is encountered. Where the internal structure of organization names is highly complex, where name components are particularly ambiguous, or where it is important to indicate the assumptions made in the evaluation of an organization name, then feature structure notation is recommended 16

The formal declaration of the elements discussed in this section include:

<!-- 20.3:  Organization names                                -->
<!ELEMENT orgName       - -  (orgTitle | orgType | orgDivn | 
                             %m.phrase | #pcdata)*              >
<!ATTLIST orgName            %a.global;
          reg                CDATA               #IMPLIED
          key                CDATA               #IMPLIED
          type               CDATA               #IMPLIED       >
<!ELEMENT orgTitle      - -  (%phrase.seq)                      >
<!ATTLIST orgTitle           %a.global;
          reg                CDATA               #IMPLIED
          type               CDATA               #IMPLIED       >
<!ELEMENT orgType       - -  (%phrase.seq)                      >
<!ATTLIST orgType            %a.global;
          reg                CDATA               #IMPLIED
          type               CDATA               #IMPLIED       >
<!ELEMENT orgDivn       - -  (%phrase.seq)                      >
<!ATTLIST orgDivn            %a.global;
          reg                CDATA               #IMPLIED
          type               CDATA               #IMPLIED       >
<!-- This fragment is used in sec. 20                         -->

20.4 Dates and Time

The following elements for the encoding of dates and times were introduced in section 6.4.4 :

While adequate for many applications, these elements do not allow for the representation of the internal structure of expressions indicating dates or times, which may however be of importance for the correct interpretation of such expressions, or for certain kinds of analytic applications. In this section, we introduce the following special-purpose elements, for use when the internal structure of a temporal expression is to be encoded:

Two types of temporal expressions are envisaged for dates and times: absolute and relative. An absolute temporal expression is composed of a sequence of the following elements, possibly interspersed with character data:

A relative temporal expression describes a date or time with reference to some other (absolute) temporal expression, and thus contains the following elements in addition to those listed above:

As members of the class temporalExpr (temporal expression) these elements all share the following attributes:

20.4.1 Absolute Dates and Times

An absolute temporal expression which is a date will contain only a sequence of <day> , <month> <week> , <year> or <occasion> elements, as in the following examples:

The university's view of American affairs produced
a stinging attack by Edmund Burke in the Commons
debate of
<dateStruct value='26-10-1775'>
  <day value='26'>26</day>
  <month value='10'>October</month>
  <year value='1775'>1775</year>
</dateStruct>
Component elements of a <dateStruct> may be repeated, provided that only a single temporal expression is intended:
<dateStruct value='14-05-1993'>
  <day type=name>Friday</day>,
  <day type=number>14</day>
  <month>May</month>
  <year>1993</year>
</dateStruct>

The <occasion> element may be used for any component of a temporal expression which is given in terms of a named event, such as a public holiday for dates, or a named time such as ``tea time'' or ``matins'':

In New York,
  <dateStruct value='1-1'>
    <occasion type=holiday>New Years Day</occasion>
  </dateStruct> is the quietest of holidays,
  <dateStruct value='4 July'>
    <occasion type=holiday>Independence Day</occasion>
  </dateStruct>
the most turbulent.

These components may be applied to dates using any calendar system using subcomponents equivalent to those listed above:

<title>Le Vieux Cordelier:
Journal rédigé par Camille Desmoulins</title>,
<dateStruct type=Revolutionary value='03-02-1794'>
   <day type=name>Quintidi</day>
   <month>Pluviose</month>
   <week value=2>2e décade</week>,
   <year value=2>l'an 2 de la République Indivisible</year>
</dateStruct>

Absolute temporal expressions denoting times which are given in terms of seconds, minutes, hours or of well defined events (e.g. ``noon'', ``sunset'') may similarly be represented using the <timeStruct> element:

The train leaves for Boston at
  <timeStruct type='24hour' zone=EST>
    <hour>13</hour>:<minute>45</minute>
   </timeStruct>

At <timeStruct>
     <occasion>sunset</occasion>
   </timeStruct> we walked to the beach.

The train leaves for Boston at
  <timeStruct type='descriptive' zone=EST value='13:45'>
     a quarter of <hour reg='1400'>two</hour>
  </timeStruct>

The type attribute may be used to distinguish sub-types of component elements (for example, months or days presented as words or as numbers) or to provide additional information about the function of this particular component (for example, to distinguish types of <occasion> ). The value and reg attributes are both used to provide a standardized or regularized form of the content of an element. The distinction is that the value specified by the reg attribute is simply that chosen as a convenient way of grouping together a number of variant forms, whereas that specified for the value attribute must always be given in some application-dependent standard form, described in the <stdVals> element of the TEI header.

For example:

<dateStruct value='09-06-1807'>
  <month type=name value=06>June</month>
  <day type=number value=09>9th</day>
</dateStruct>:
The period is approaching which will terminate my
present copartnership. On the
<dateStruct value='01-01-1808'>
  <day type=number value=01>1st</day>
  <month type=name value=01 reg='January'>Jany.</month>
</dateStruct> next, it expires by its own limitation.

20.4.2 Relative Dates and Times

As noted above, relative dates and times such as ``in the Two Hundredth and First Year of the Republic'', ``twenty minutes before noon'', and, more ambiguously, ``after the lamented death of the Doctor'' or ``an hour after the game'' have two distinct components. As well as the absolute temporal expression or event to which reference is made (e.g. ``noon'', ``the game'', ``the death of the Doctor'' ``[the foundation of] the Republic''), they also contain a description of the `distance' between the time or date which is indicated and the referent expression (e.g. ``the Two Hundredth and First Year'', ``twenty minutes'', ``an hour''); and (optionally) an `offset' describing the direction of the distance between the time or date indicated and the referent expression (e.g. ``of'' implying after, ``before'', ``after'').

The elements <distance> (or <measure> ) and <offset> are used to encode these last two components within a <dateStruct> or <timeStruct> . The absolute temporal expression contained within the relative expression may be encoded using a <occasion> element, or by a nested <dateStruct> or <timeStruct> , or by a simple <date> or <time> . This allows for endlessly recursive structures such as ``the third Sunday after the first Monday before Lammastide in the fifth year of the King's second marriage ...'' --- but so does natural language. In the following examples, the reg attribute has been used to simplify processing of variant forms of expression:

<dateStruct value='11-12-1786'>
     <distance reg='14 days'>A fortnight</distance
     <offset>before</offset>
     <dateStruct>
       <occasion type='holiday'>Christmas</occasion>
       <year>1786</year>
     </dateStruct>
</dateStruct>

I reached the station
<timeStruct value='14:15'>
 <distance exact=N reg='30 minutes'>
   about a half hour</distance>
 <offset>after</offset>
 <occasion value='13:45'>
   the departure of the afternoon train to Boston</occasion>
</timeStruct>

In the following example, the exact attribute has been used to indicate a lack of precision in the distance stated:

In practice, festival candles are lit
<timeStruct>
  <distance exact=N>just</distance>
  <offset>before</offset>
  <occasion reg='evening'>sundown</occasion>
</timeStruct>

In the following example, a nested <dateStruct> element is used to show that ``my birthday'' and the cited date are parts of the same temporal expression, and hence to disambiguate the phrase ``A week before my birthday on 9th December'':

<dateStruct value='02-12'>
  <distance>A week</distance>
  <offset>before</offset>
  <dateStruct value='09-12'>
    <occasion>my birthday</occasion>
    on <day>9th</day><month>December</month>
  </dateStruct>
</dateStruct>
The alternative reading of this phrase would be encoded as follows:
<dateStruct value='09-12'>
  <distance>A week</distance>
  <offset>before</offset>
  <occasion>my birthday</occasion>
  on <day>9th</day><month>December</month>
</dateStruct>

Where more complex or ambiguous expressions are involved, and where it is desirable to make more explicit the interpretive processes required, the feature structure notation described in chapter 16 is recommended. Consider, for example, the following temporal expression which occurs in the Scottish Temperance Review of August 1850, referring to the summer holiday known in Glasgow simply as ``the Fair'':

Not only is the city, <date ana=GF50>during the Fair</date>, a
horrible nucleus of immorality and wickedness; it sends our
multitudes to pollute and demoralize the country.

For the definition of the ana attribute, see chapter 15 . It is used here to link the temporal phrase with an interpretation of it. Like most traditional fairs and market days, the Glasgow Fair was established by local custom and could vary from year to year. Consequently, in order to provide such an interpretation, it is necessary to drawn upon additional information which may or may not be located in the particular text in question. In this case, it is necessary at least to know the spatial and temporal context (year and place) of the fair referred to. These and other features required for the analysis of this particular temporal expression may be combined together as one feature structure of type date-analysis :

<fs type=date-analysis id=GF50>
<f name=event><str>the Fair</str></f>
<f name=place><str>Glasgow</str></f>
<f name=year><nbr value=1850></f>
<f name=from-value><str>08-08-1850</str></f>
<f name=to-value><str>19-09-1850</str></f>
</fs>
The elements described in this section are formally defined as follows:
<!-- 20.4.2:  Date components                                 -->
<!ELEMENT dateStruct    - -  (#PCDATA | %m.temporalExpr;)*      >
<!ATTLIST dateStruct         %a.global;
                             %a.temporalExpr;
          exact              CDATA               #IMPLIED
          calendar           CDATA               #IMPLIED       >
<!ELEMENT day           - -  (#PCDATA)                          >
<!ATTLIST day                %a.global;
                             %a.temporalExpr;                   >
<!ELEMENT week          - -  (#PCDATA)                          >
<!ATTLIST week               %a.global;
                             %a.temporalExpr;                   >
<!ELEMENT month         - -  (#PCDATA)                          >
<!ATTLIST month              %a.global;
                             %a.temporalExpr;                   >
<!ELEMENT year          - -  (#PCDATA)                          >
<!ATTLIST year               %a.global;
                             %a.temporalExpr;                   >
<!ELEMENT occasion      - -  (%phrase.seq)                      >
<!ATTLIST occasion           %a.global;
                             %a.temporalExpr;                   >
<!ELEMENT timeStruct    - -  ((#PCDATA | %m.temporalExpr;)*)    >
<!ATTLIST timeStruct         %a.global;
                             %a.temporalExpr;
          zone               CDATA               #IMPLIED       >
<!ELEMENT second        - -  (#PCDATA)                          >
<!ATTLIST second             %a.global;
                             %a.temporalExpr;                   >
<!ELEMENT minute        - -  (#PCDATA)                          >
<!ATTLIST minute             %a.global;
                             %a.temporalExpr;                   >
<!ELEMENT hour          - -  (#PCDATA)                          >
<!ATTLIST hour               %a.global;
                             %a.temporalExpr;                   >
<!-- offset and distance were defined above                   -->

<!-- This fragment is used in sec. 20                         -->


PreviousUpNext