PreviousUpNext

6 Elements Available in All TEI Documents

Part 2

Core Tags and General Rules

6 Elements Available in All TEI Documents

This chapter describes elements which may appear in any kind of text and the tags used to mark them in all TEI documents. Most of these elements are freely floating phrases, which can appear at any point within the textual structure, although they must generally be contained by a higher-level element of some kind (such as a paragraph). A few of the elements described in this chapter (for example, bibliographic citations and lists) have a comparatively well-defined internal structure, but most of them have no consistent inner structure of their own. In the general case, they contain only a few words, and are often identifiable in a conventionally printed text by the use of typographic conventions such as shifts of font, use of quotation or other punctuation marks, or other changes in layout.

To use the terminology introduced in section 3.7.3 , most of the elements described in this chapter are members of the class phrase , and a small number are members of the classes chunk or inter .

This chapter begins by describing the <p> tag used to mark paragraphs, which serve as the fundamental formal unit for running text in many base tag sets, and are available in all. This is followed, in section 6.2 , by a discussion of some specific problems associated with the interpretation of conventional punctuation, and the methods proposed by the current Guidelines for resolving ambiguities therein.

The next section (section 6.3 ) describes a number of phrase-level elements commonly marked by typographic features (and thus well-represented in conventional markup languages). These include features commonly marked by font shifts (section 6.3.2 ) and features commonly marked by quotation marks (section 6.3.3 ) as well as such features as terms, cited words, and glosses (section 6.3.4 ).

The next section (section 6.4 ) describes several phrase-level and inter-level elements which, although often of interest for analysis or processing, are rarely explicitly identified in conventional printing. These include names (section 6.4.1 ), numbers and measures (section 6.4.3 ), dates and times (section 6.4.4 ), abbreviations (section 6.4.5 ), and addresses (section 6.4.2 ).

Section 6.5 introduces some phrase-level elements which may be used to record simple editorial emendation or correction of the encoded text. The tags described here constitute a simple subset of the full mechanisms for encoding such information (described in full in chapter 18 ), which should be adequate to most commonly encountered situations.

In the same way, the following section (section 6.6 ) presents only a subset of the facilities available for the encoding of cross-references or text-linkage. The full story may be found in chapter 14 ; the tags presented here are intended to be usable for a wide variety of simple applications.

Sections 6.7 , and 6.8 , describe two kinds of quasi-structural elements, lists and notes, which may appear either within chunk-level elements such as paragraphs, or between them. Several kinds of lists are catered for, of an arbitrary complexity. The section on notes discusses both notes found in the source and simple mechanisms for adding annotations of an interpretive nature during the encoding; again, only a subset of the facilities described in full elsewhere (specifically, in chapter 15 ) is discussed.

Next, section 6.9 , describes methods of encoding within a text the conventional system or systems used when making references to the text. Some reference systems have attained canonical authority and must be recorded to make the text useable in normal work; in other cases, a convenient reference system must be created by the creator or analyst of an electronic text.

Like lists and notes, the bibliographic citations discussed in section 6.10 , may be regarded as structural elements in their own right. A range of possibilities is presented for the encoding of bibliographic citations or references, which may be treated as simple phrases within a running text, or as highly-structured components suitable for inclusion in a bibliographic database.

Additional elements for the encoding of passages of verse or drama (whether prose or verse) are discussed in section 6.11 .

The chapter concludes with a technical overview of the structure and organization of the tag set described here. This should be read in conjunction with chapter 3 , describing the structure of the TEI document type definition.

6.1 Paragraphs

The paragraph is the fundamental organizational unit for all prose texts, being the smallest regular unit into which prose can be divided. Prose can appear in all TEI texts, not simply in those using the prose base (section 8 ); the paragraph is therefore described here, as an element which can appear in any kind of text.

Paragraphs can contain any of the other elements described within this chapter, as well as some other elements which are specific to individual text types. We distinguish phrase-level elements, which must be entirely contained within a paragraph and cannot appear except within one, from chunks, which can appear between, but not within, paragraphs, and from inter-level elements, which can appear either within a single paragraph or between paragraphs. The class of phrases includes emphasized or quoted phrases, names, dates, etc. The class of inter-level elements includes bibliographic citations, notes, lists, etc. The class of chunks includes the paragraph itself.

Because paragraphs may appear in different base or additional tag sets, their possible contents may differ in different kinds of documents. In particular, additional elements not listed in this chapter may appear in paragraphs in certain kinds of text. However, the elements described in this chapter are always by default available in all kinds of text.

The paragraph is marked using the <p> element:

If a consistent internal subdivision of paragraphs is desired, the <s> or <seg> (`segment') elements may be used, as discussed in chapters 14 and 15 respectively. More usually, however, paragraphs have no firm internal structure, but contain prose encoded as a mix of characters, entity references, phrases marked as described in the rest of this chapter, and embedded elements like lists, figures, or tables.

Since paragraphs are usually explicitly marked in Western texts, typically by indentation, the application of the <p> tag usually presents few problems.

In some cases, the body of a text may comprise but a single paragraph:

<body>
<p>I fully appreciate Gen. Pope's splendid achievements with their
invaluable results; but you must know that Major Generalships in the
Regular Army, are not as plenty as blackberries.
</body>

This news story shows typically short journalistic paragraphs:

<head>SARAJEVO, Bosnia and Herzegovina, April 19</head>
<p>Serbs seized more territory in this struggling new
country today as the United States Air Force ended a
two-day airlift of humanitarian aid into the capital,
Sarajevo.
<p>International relief workers called on European
Community nations to step up their humanitarian aid to
the former Yugoslav republic, in conjunction with new
American aid flights if necessary.
<p>A special envoy from the European Community, Colin
Doyle, harshly condemned the decision by Serbs to shell
Sarajevo on Saturday night during a visit to the Bosnian
capital by a senior American official, Deputy Assistant
Secretary of State Ralph R. Johnson.
<p>...

The following extract from a Russian fairy tale demonstrates how other phrase level elements (in this case <q> elements representing direct speech; see section 6.3.3 ) may be nested within, but not across, paragraphs:

<p>A fly built a castle, a tall and mighty castle.
There came to the castle the Crawling Louse.  <q>Who,
who's in the castle?  Who, who's in your house?</q>
said the Crawling Louse.  <q>I, I, the Languishing Fly.
And who art thou?</q> <q>I'm the Crawling Louse.</q>

<p>Then came to the castle the Leaping Flea.  <q>Who,
who's in the castle?</q> said the Leaping Flea.  <q>I,
I, the Languishing Fly, and I, the Crawling Louse.  And
who art thou?</q> <q>I'm the Leaping Flea.</q>

<p>Then came to the castle the Mischievous Mosquito.
<q>Who, who's in the castle?</q> said the Mischievous
Mosquito.  <q>I, I, the Languishing Fly, and I, the
Crawling Louse, and I, the Leaping Flea.  And who art
thou?</q> <q>I'm the Mischievous Mosquito.</q>

The <p> element is formally declared as follows:

<!-- 6.1:  Paragraph                                          -->
<!ELEMENT p             - O  (%paraContent;)                    >
<!ATTLIST p                  %a.global;                         >
<!-- This fragment is used in sec. 6.12                       -->

6.2 Treatment of Punctuation

Punctuation marks cause problems for text markup because they may not be available in the character set used and because they are often ambiguous. In the former case entity names should be used to render the punctuation mark (see 4 ). In the latter case, ambiguous punctuation may be treated as described below.

Full stop (period) may mark (orthographic) sentence boundaries, abbreviations, decimal points, or serve as a visual aid in printing numbers. These usages can be distinguished by tagging S-units, abbreviations, and numbers, as described in sections 14.3 , 6.4.5 , and 6.4.3 . There are independent reasons for tagging these, whether or not they are marked by full stops. Alternatively, the following TEI-specific entity names may be used to distinguish stops (and other characters) used for these purposes:

These entities are defined in the file teipunc2 , which is documented in chapter 37 .

Question mark and exclamation mark typically mark the end of orthographic sentences, but may also be used as a mid-sentence comment by the author (`!' to express surprise or some other strong feeling, `?' to query a word or expression or mark a sentence as dubious in linguistic discussion). These uses may be distinguished by marking S-units, in which case the mid-sentence uses of these punctuation marks may be left unmarked.

Hyphens at line-end may or may not indicate permanent (`hard') hyphens in the word. Where the lineation of the machine-readable text differs from the original, the editor may eliminate soft (line-end) hyphens or replace them by a reference to the entity shy (`soft hyphen'), which is defined in the standard public entity set ISOnum defined in ISO 8879 (which should be invoked in the DTD subset if the entity shy is to be used). The solution chosen should be reported in the <hyphenation> tag of the encoding declarations in the TEI header. See chapter 5 for discussion of the TEI header and encoding declarations.

Creators of machine-readable texts are recommended to avoid soft hyphens, as one cannot tell whether the hyphens are soft or hard in the case of compounds or prefixed words which might or might not be hyphenated in mid-sentence.

Dashes are best distinguished in form by using the entity names provided in the public entity set ISOpub , defined in ISO 8879: mdash , ndash , and dash (the `true' hyphen). Dashes are used for a variety of purposes: insertion, interruption, new speaker (in dialogue), list item. In the latter two cases it is preferable to mark the underlying feature using the elements <q> or <item> , on which see section 6.3.3 , and section 6.7 , respectively.

Quotation marks should generally be replaced by the tags <q> or <quote> , especially as quotations are not always marked by quotation marks (notably long quotations) or may be marked in a variety of ways; see the discussion of quotation and related features in section 6.3.3 .

Apostrophes must be distinguished from single quote marks. This is best done by tagging quotations or other uses of quotation marks (see above). However, apostrophes have a variety of uses. In English they mark contractions, genitive forms, and (occasionally) plural forms. Full disambiguation of these uses belongs to the level of linguistic analysis and interpretation.

Parentheses and other marks of suspension such as dashes or ellipses are often used to signal information about the syntactic structure of a text fragment. Full disambiguation of their uses also belongs to the level of linguistic analysis and interpretation, and is therefore discussed in chapter 15 .

Where punctuation marks are disambiguated by tagging the underlying feature they signal, it may be debated whether they should be excluded or left as part of the text. In the case of quotation marks, it may sometimes be more convenient to distinguish opening from closing marks simply by using the appropriate entity reference, rather than using the <q> element, with or without a rend attribute. The solution chosen will vary depending upon the feature and depending upon the purpose of the project.

6.3 Highlighting and Quotation

This section deals with a variety of textual features, all of which have in common that they are frequently realized in conventional printing practice by the use of such features as underlining, italic fonts, or quotation marks, collectively referred to here as highlighting. After an initial discussion of this phenomenon and alternate approaches to encoding it, this section describes ways of encoding the following textual features, all of which are conventionally rendered using some kind of highlighting:

6.3.1 What Is Highlighting?

By `highlighting' we mean the use of any combination of typographic features (font, size, hue, etc.) in a printed or written text in order to distinguish some passage of a text from its surroundings. [ see note 41 ] The purpose of highlighting is generally to draw the reader's attention to some feature or characteristic of the passage highlighted; this section describes the elements recommended by these Guidelines for the encoding of such textual features.

In conventionally printed modern texts, highlighting is often employed to identify words or phrases which are regarded as being one or more of the following:

The textual functions signalled by highlighting may not be rendered consistently in different parts of a text or in different texts. (For example, a foreign word may appear in italics if the surrounding text is in roman, but in roman if the surrounding text is in italics.) For this reason, these Guidelines distinguish between the encoding of rendering itself and the encoding of the underlying feature expressed by it.

Highlighting as such may be encoded by using the global rend attribute which can be specified for any element in the TEI scheme. This allows the encoder both to specify the function of a highlighted phrase or word, by selecting the appropriate element described here or elsewhere in the Guidelines, and to further describe the way in which it is highlighted, by means of the rend attribute. If the encoder wishes to offer no interpretation of the feature underlying the use of highlighting in the source text, then the <hi> element may be used, which indicates only that the text so tagged was highlighted in some way.

The possible values carried by the rend attribute are not formally defined in this version of the Guidelines. Since the rend attribute may be used to document any peculiarity of the way a given segment of text was rendered in the original source text, it may need to express a very large range of typographic features, by no means restricted to type face, type size, etc.

Where it is both appropriate and feasible, these Guidelines recommend that the textual feature marked by the highlighting should be encoded, rather than just the simple fact of the highlighting. This is for the following reasons:

In many, if not most, cases the underlying function of a highlighted phrase will be obvious and non-controversial, since the distinctions indicated by a change of highlighting correspond with distinctions discussed elsewhere in these Guidelines. It should be recognized, however, that cases do exist in which is it not economically feasible to mark the underlying function of highlighting (e.g. in the preparation of large text corpora), as well as cases in which it is not intellectually appropriate (as in the transcription of some older materials, or in the preparation of material for the study of typographic practice). In such cases, the <hi> element should be used, as further discussed below.

Elements which are sometimes realized by typographic distinction but which are not discussed in this section include <title> (discussed in section 6.10 ) and <name> (discussed in section 6.4.1 ).

6.3.2 Emphasis, Foreign Words, and Unusual Language

This subsection discusses the following elements:

6.3.2.1 Foreign Words or Expressions

Words or phrases which are not in the main language of the text should be tagged as such, at least where the fact is indicated in the text. Where the word or phrase concerned is already distinguished from the rest of the text by virtue of its function (for example, because it is a name, a technical term, a quotation, a mentioned word, etc.) then the global lang attribute should be used to specify additionally that its language distinguishes it from the surrounding text. Any element in the TEI scheme may take a lang attribute, which specifies both the writing system and the language used by its content (see section 4.2 for discussion of this attribute). Where there is no other applicable element, the tag <foreign> may be used to provide a peg onto which the lang may be attached.

 
<q>Aren't you confusing <foreign lang=la>post hoc</foreign>
with <foreign lang=la>propter hoc</foreign>?</q> said the
Bee Master.  <q>Wax-moth only succeed when weak bees let
them in.</q>

The <foreign> tag should not be used to encode foreign words which are mentioned or glossed within the text: for these use the appropriate element from section 6.3.4 below. Compare the following example sentences:

 
John eats a <foreign lang=fr>croissant</foreign> every morning.
 
<mentioned lang=fr>Croissant</mentioned> is difficult to
pronounce with your mouth full.
 
A <term lang=fr>croissant</term> is a crescent-shaped piece
of light, buttery, pastry that is usually eaten for
breakfast, especially in France.

The <foreign> element is formally defined as follows:

<!-- 6.3.2.1:  Highlighted phrases                            -->
<!ELEMENT foreign       - -  (%paraContent;)                    >
<!ATTLIST foreign            %a.analysis;
                             %a.linking;
                             %a.terminology;
          id                 ID                  #IMPLIED
          rend               CDATA               #IMPLIED
          n                  CDATA               #IMPLIED
          lang               IDREF               #IMPLIED       >
<!-- (continued in sec. 6.3.2.2, 6.3.2.3, 6.3.3, 6.3.4)       -->
<!-- This fragment is used in sec. 6.12                       -->

6.3.2.2 Emphatic Words and Phrases

The <emph> element is provided to mark words or phrases which are linguistically emphatic or stressed. Text which is only typographically `emphasized' falls into the class of highlighted text, and may be tagged with the <hi> element. In printed works, emphasis is generally indicated by devices such as the use of an italic font, a large typeface or extra wide letter spacing; in manuscripts and typescripts, it is usually indicated by the use of underlining. As the following examples demonstrate, an encoder may choose whether or not to make explicit the particular type of rendition associated with the emphasis, by use of the rend attribute. If a source text consistently renders a particular feature (e.g. emphasis or words in foreign languages) in a particular way, the rendering associated with that feature may be described in the TEI header and the rend attribute used only to describe examples which deviate from the norm.

<q>Sex, sir, is <emph>purely</emph> a question of appetite!</q>
   Tarr exclaimed.
<q>What it all comes to is this,</q> he said.
<q><emph rend=italic>What does Christopher
Robin do in the morning nowadays?</emph></q>
<l>Here Thou, great <name rend=italics>Anna</name>!
      whom three Realms obey,
<l>Doth sometimes Counsel take —
      and sometimes <emph rend=italic>Tea</emph>.

The <hi> element is used to mark words or phrases which are highlighted in some way, but for which identification of the intended distinction is difficult, controversial or impossible. It enables an encoder simply to record the fact of highlighting, possibly describing it by the use of a rend attribute, as discussed above, without however taking a position as to the function of the highlighting. This may also be useful if the text is to be processed in two stages: representing simply typographic distinctions during a first pass, and then replacing the <hi> tags with more specific tags in a second pass.

Some simple examples:

<hi rend=gothic>And this Indenture further witnesseth</hi>
that the said <hi rend=italic>Walter Shandy</hi>, merchant,
in consideration of the said intended marriage ...
In this example, the first highlighted phrase uses black letter or gothic print to mimic the appearance of a legal document, and italic to mark `Walter Shandy' as a name. In a second pass, the elements <head> or <label> might be appropriate for the first use, and the element <name> for the second.
The heaviest rain, and snow, and hail, and sleet, could
boast of the advantage over him in only one respect.  They
often <hi rend=quoted>came down</hi> handsomely, and
Scrooge never did.
In this example, the phrase `came down' uses inverted commas to indicate a play on words. [ see note 42 ] In a second pass, the element <soCalled> might be preferred.

The <emph> and <hi> elements are formally defined as follows:

<!-- 6.3.2.2:  Highlighted phrases (cont'd)                   -->
<!-- (continuation of sec. 6.3.2.1)                           -->
<!ELEMENT emph          - -  (%paraContent;)                    >
<!ATTLIST emph               %a.global;                         >
<!ELEMENT hi            - -  (%paraContent;)                    >
<!ATTLIST hi                 %a.analysis;
                             %a.linking;
                             %a.terminology;
          id                 ID                  #IMPLIED
          lang               IDREF               %INHERITED
          n                  CDATA               #IMPLIED
          rend               CDATA               #IMPLIED       >

6.3.2.3 Other Linguistically Distinct Material

For some kinds of analysis, it may be desirable to encode the linguistic distinctiveness of words and phrases with more delicacy than is allowed by the <foreign> element. The <distinct> element is provided for this purpose. Its attributes allow for additional information characterizing the nature of the linguistic distinction to be made in two distinct ways: the type attribute simply assigns a user-defined code of some kind to the word or phrase which assigns it to some register, sub-language, etc. No recommendations as to the set of values for this attribute are provided at this time, as little consensus exists in the field.

Alternatively, the remaining three attributes may be used in combination to place a word or phrase on a three-dimensional scale sometimes used in descriptive linguistics. [ see note 43 ] The time attribute places a word diachronically, for example as archaic, old-fashioned, contemporary, futuristic, etc.; the space attribute places a word diatopically, that is, with respect to a geographical classification, for example as national, regional, international, etc.; the social attribute places a word diastatically, that is, with respect to a social classification, for example as technical, polite, impolite, restricted, etc. Again, no recommendations are made for the values of these attributes at this time; the encoder should provide a description of the scheme used in the appropriate section of the header (see section 5.3 ).

Examples:

Next morning a boy in that dormitory confided to his
bosom friend, a <distinct type=psSlang>fag</distinct> of
Macrea's, that there was trouble in their midst which
King <distinct type=archaic>would fain</distinct> keep
secret.
Next morning a boy in that dormitory confided to his
bosom friend, a
<distinct time=1900 social=publicschool space=GB>fag</distinct>
of Macrea's, that there was trouble in their midst which
King <distinct time=archaic>would fain</distinct> keep
secret.
Where more complex (or more rigorous) interpretive analyses of the associations of a word are required, the more detailed and general mechanisms described in chapter 16 should be preferred to these simple characterizations. It may also be preferable to record the kinds of analysis suggested here by means of the simple annotation element <note> described in section 6.8 , or the <span> element described in section 15.3 .

The <distinct> element has the following formal definition:

<!-- 6.3.2.3:  Highlighted phrases (cont'd)                   -->
<!-- (continuation of sec. 6.3.2.1)                           -->
<!ELEMENT distinct      - -  (%phrase.seq;)                     >
<!ATTLIST distinct           %a.global;
          social             CDATA               #IMPLIED
          space              CDATA               #IMPLIED
          time               CDATA               #IMPLIED
          type               CDATA               #IMPLIED       >

6.3.3 Quotation

This section discusses the following elements, all of which are often rendered by the use of quotation marks:

One form of presentational variation found particularly frequently in written and printed texts is the use of quotation marks. As with the typographic variations discussed in the preceding section, it is generally helpful to separate the encoding of the underlying textual feature (for example, a quotation or a piece of direct speech) from the encoding of its rendering (for example, the use of a particular style of quotation marks).

The most common and important use of quotation marks is, of course, to mark quotation, by which we mean simply any part of the text attributed by the author or narrator to some agency other than the narrative voice. Typical examples include passages cited from other works, for which the element <quote> may be used, and words or phrases attributed to other voices within the current work, for which the element <q> may be used. If this distinction between intra-textual and inter-textual voices cannot be made reliably, or is not of interest, then all quoted matter may simply be marked using the <q> tag. The editorial policy in this respect should be stated in the encoding description of the TEI Header. The <soCalled> element is used for cases where the author or narrator distances him or herself from the words in question without however attributing them to any other voice in particular.

Quotation may be rendered by changes in type face, by special punctuation marks (single or double or angled quotes, dashes, etc.) and by layout (indented paragraphs, etc.). If these characteristics are of interest, an appropriate value for the rend attribute should be given, to record how the <q> or <quote> element is rendered. For discussion of suggested values for this attribute, see below.

Quotation marks themselves may, like other punctuation marks, be felt for some purposes to be worth retaining within a text, quite independently of their description by the rend attribute. Where this is done, an appropriate entity reference should be chosen from the standard entity sets listed in chapter 37 ; this has the advantage that the entity may be redefined as null when the punctuation is to be ignored for some analytic purpose. Well-known ambiguities, such as whether the character ' represents an apostrophe or a closing single quotation mark, or whether the character " represents an opening or closing double quotation mark may all be resolved by the use of appropriate entity references, as discussed in section 6.2 .

Alternatively, the encoder may suppress all quotation marks, possibly recording their form using the rend attribute. Where this is done, the following list of entity names (taken from the public entity sets ISOpub and ISOnum ) may be found useful to describe quotation-mark styles common in European and American typesetting:

These may be used in the rend attribute to show how the quotation was opened and closed. For example, if the words `pre' and `post' are used to indicate preceding and following punctuation, then the following example would describe a conventional American book printed using single quotation marks:
<q rend='PRE lsquo POST rsquo'>Who-e debel you?</q>
&mdash he at last said &mdash <q
rend='PRE lsquo POST rsquo'>you no speak-e,
damme, I kill-e.</q>  And so saying,
the lighted tomahawk began flourishing
about me in the dark.
The following example demonstrates alternative policies which may be adopted with respect to encoding of the punctuation used to mark quotation:
Adolphe se tourna vers lui :
<q>&mdash Alors, Albert, quoi de neuf?</q>
<q>&mdash Pas grand-chose.</q>
<q>&mdash Il fait beau,</q> dit Robert.
Adolphe se tourna vers lui :
<q rend='PRE mdash'>Alors, Albert, quoi de neuf ?</q>
<q rend='PRE mdash'>Pas grand-chose.</q>
<q rend='PRE mdash'>Il fait beau,</q> dit Robert.
To make explicit who is speaking, which is not always stated in the above example, the who attribute should be used:
Adolphe se tourna vers lui :
<q who='Adolphe'>&mdash Alors, Albert, quoi de neuf?</q>
<q who='Albert'>&mdash Pas grand-chose.</q>
<q who='Robert'>&mdash Il fait beau,</q> dit Robert.
The who attribute is also useful as a means of supplying a normalized form of the speaker's name, to facilitate selection of text by particular speakers. As indicated above, it may be supplied whether or not an indication of the speaker is given explicitly in the text.

Where investigation of `narrative voice' is the primary object of the encoding, it may be convenient to identify each speaker as a participant in the work, and to associate individual speeches with them by means of SGML's ID/IDREF mechanism. See section 23.2.2 for discussion of the participant description component of the TEI Header.

For such analyses, it may also be useful to distinguish representations of speech from representations of thought, in modern printed texts often indicated by a change of typeface. The type attribute should be used for this purpose, as in this example:

<q type=speech>Oh yes,</q> said Henry, <q type=speech>I mean
Gordon Macrae, for example...</q>
<q type=thought>Jungian Analyst with Winebox! That's what you
called him, you callous bastard, didn't you? Eh? Eh?</q>

Quoted matter may be embedded within quoted matter, as when one speaker reports the speech of another:

<q who=Wilson>Spaulding, he came down into the office just this
day eight weeks with this very paper in his hand, and he
says:—<q who=Spaulding>I wish to the Lord, Mr. Wilson, that
I was a red-headed man.</q></q>

Direct speech nested in this way is treated in the same way as elsewhere: a change of rendition may occur, but the same element should be used. An encoder may however choose to distinguish between direct speech which contains quotations from extra-textual matter and direct speech itself, as in the following example:

<p><q>The Lord! The Lord! It is Sakya Muni himself,</q> the
lama half sobbed; and under his breath began the wonderful
Buddhist invocation:-
<q><quote><l>To Him the Way -- the Law -- Apart --
<l>Whom Maya held beneath her heart
<l>Ananda's Lord -- the Bodhisat
</quote>
And He is here! The Most Excellent Law is here also.  My
pilgrimage is well begun.  And what work! What work!</q>

Quotations from other works are often accompanied by a reference to their source. The <cit> element may be used to group together the quotation and its associated bibliographic reference, which should be encoded using the elements for bibliographic references discussed in section 6.10 , as in the following example.

<div id=MM01 type=chapter><head>Chapter 1</head>
<epigraph>
  <cit><quote><l>Since I can do no good because a woman
  <l>Reach constantly at something that is near it.</quote>
  <bibl><title>The Maid's Tragedy</title>
        <author>Beaumont and Fletcher</author></bibl>
  </cit>
</epigraph>
<p>Miss Brooke had that kind of beauty which seems to be thrown into
relief by poor dress...
Like other bibliographic references, the citation attached to a quotation may be represented simply by a pointer, as in this example:
Lexicography has shown little sign of being affected by the
work of followers of J.R. Firth, probably best summarized
in his slogan, <cit><quote>You shall know a word by the company it
keeps.</quote> <ref target=FI57>(Firth, 1957)</ref></cit>
Unlike most of the other elements discussed in this chapter, direct speech and quotations may frequently contain other high-level elements such as paragraphs or verse lines, as well as being themselves contained by such elements. Three possible solutions exist for this well-known structural problem: For further discussion, and several examples, see chapter 31 .

Finally, in this section, the element <soCalled> is provided for all cases in which quotation marks are used to distance the quoted text from the narrator or speaker. Common examples include the `scare' quotes often found in newspaper headlines and advertising copy, where the effect is to cast doubts on the veracity of an assertion:

<head>PM dodges <soCalled>election threat</soCalled>
in interview</head>

The same element should be used to mark a variety of special ironic usages. Some further examples follow:

He hated <soCalled>good</soCalled> books.

<soCalled>Croissants</soCalled> indeed! toast not good enough for you?
Although Chomsky's decision that all NL sentences are finite
objects was never justified by arguments from the attested
properties of NLs, it did have a certain
<soCalled>social</soCalled> justification.  It was
commonly assumed in works on logic until fairly recently
that the notion <mentioned>language</mentioned> is
necessarily restricted to finite strings.

The elements discussed in this section are formally defined as follows:

<!-- 6.3.3:  Highlighted phrases (cont'd)                     -->
<!-- (continuation of sec. 6.3.2.1)                           -->
<!ELEMENT q             - -  (%specialPara)                     >
<!ATTLIST q                  %a.global;
          who                CDATA               #IMPLIED
          type               CDATA               #IMPLIED
          direct             (y | n | unspecified) 
                                                 unspecified    >
<!ELEMENT quote         - -  (%specialPara;)                    >
<!ATTLIST quote              %a.global;                         >
<!ELEMENT cit           - -  (((q | quote), (bibl | loc)) | 
                             ((bibl | loc), (q | quote)))       >
<!ATTLIST cit                %a.global;                         >
<!ELEMENT soCalled      - -  (%phrase.seq;)                     >
<!ATTLIST soCalled           %a.global;                         >

6.3.4 Terms, Glosses, and Cited Words

This section describes the following textual elements, all of which have in common that they may be variously realized using italics, quotation marks or other devices:

Technical terms are often italicized or emboldened upon first mention in printed texts; an explanation or gloss is sometimes given in quotation marks. Linguistic analyses conventionally cite words in languages under discussion in italics, providing a gloss immediately following marked with single quotation marks. Other texts in which individual words or phrases are mentioned (for example, as examples) rather than used may mark them either with italics or with quotation marks, and will gloss them less regularly.

A <term> may appear with or without a gloss, as may a <mentioned> element. Where the <gloss> is present, it may be linked to the term it is glossing by means of SGML's ID/IDREF mechanism. To establish such a link, the encoder should give an id value to the <term> or <mentioned> element and provide that id as the value of the target attribute on the <gloss> element. The following examples demonstrate this facility: for more discussion of this and other kinds of linkage within TEI documents, see chapter 14 .

Examples:

We may define <term rend=sc id=tdpv>discoursal point of view</term>
as <gloss target=tdpv>the relationship, expressed through discourse
structure, between the implied author or some other addresser,
and the fiction.</gloss>
<gloss target=T1 rend=unmarked>A computational device
that infers structure from grammatical strings of words</gloss>
is known as a <term id=T1>parser</term>, and much of the
history of NLP over the last 20 years has been occupied with
the design of parsers.
There is thus a striking accentual difference between a
verbal form like <mentioned lang=grc
id=cw234>eluthemen</mentioned> <gloss target=cw234>we
were released,</gloss> accented on the second
syllable of the word, and its participial derivative
<mentioned id=cw235 lang=grc>lutheis</mentioned>
<gloss target=cw235>released,</gloss> accented on the last.

The elements discussed in this section have the following formal definitions:

<!-- 6.3.4:  Highlighted phrases (cont'd)                     -->
<!-- (continuation of sec. 6.3.2.1)                           -->
<!ELEMENT term          - -  (%phrase.seq;)                     >
<!ATTLIST term               %a.global;
          type               CDATA               #IMPLIED       >
<!ELEMENT mentioned     - -  (%phrase.seq;)                     >
<!ATTLIST mentioned          %a.global;                         >
<!ELEMENT gloss         - -  (%phrase.seq;)                     >
<!ATTLIST gloss              %a.global;
          target             IDREF               #IMPLIED       >

6.3.5 Some Further Examples

As a simple example of the elements discussed here, consider the following sentence: ``On the one hand the Nibelungenlied is associated with the new rise of romance of twelfth-century France, the romans d'antiquit[eacute], the romances of Chr[eacute]tien de Troyes, and the German adaptations of these works by Heinrich van Veldeke, Hartmann von Aue, and Wolfram von Eschenbach.'' A first approximation to the encoding of this sentence, might be simply to record the fact that the phrases printed above in italics are highlighted, as follows:

On the one hand the <hi rend=italic>Nibelungenlied</hi> is
associated with the new rise of romance of twelfth-century
France, the <hi rend=italic lang=fr>romans
d'antiquité</hi>, the romances of Chrétien de
Troyes, ...
This encoding would however lose the important distinction between an italicized title and an italicized foreign phrase. Many other phrases might also be italicized in the text, and a retrieval program seeking to identify foreign terms (for example) would not be able to produce reliable results by simply looking for italicized words. Where economic and intellectual constraints permit, therefore, it would be preferable to encode both the function of the highlighted phrases and their appearance, as follows:
On the one hand the <title
rend=italic>Nibelungenlied</title> is associated with the
new rise of romance of twelfth-century France, the <foreign
rend=italic>romans d'antiquité</foreign>, the
romances of Chrétien de Troyes, ...

In this example, the decision as to which textual features are distinguished by the highlighting is relatively uncontroversial. As a less straightforward example, consider the use of italic font in the following passage from Samuel Richardson's Clarissa (1747). ``A pretty common case, I believe; in all vehement debatings. She says I am too witty; Anglic[eacute], too pert; I, that she is too wise; that is to say, being likewise put into English, not so young as she has been: in short, she is grown so much into a mother, that she had forgotten she ever was a daughter. ...''

Clearly, the word `vehement' is not italicized for the same reason as the phrase `not so young as she has been'; the former is emphasized, while the latter is proverbial. It also provides an ironic gloss for the words `too wise', in the same way as `too pert' glosses `too witty'. The glossed phrases are not however technical terms or cited words, but quoted phrases, as if Clarissa were putting words into her own and her mother's mouths. Finally, the words `mother' and `daughter' are apparently italicized simply to oppose them in the sentence; certainly they do not fit into any of the categories so far proposed as reasons for italicizing. Note also that the word `Anglic[eacute]' is not italicized although it is not generally considered an English word.

The following sample encoding for the above passage attempts to take into account all the above points:

A pretty common case, I believe; in all
<emph>vehement</emph> debatings.  She says I am
<q rend=italic>too witty</q>; <foreign lang=la
rend=roman>Anglicè</foreign>, <gloss rend=italic>too
pert</gloss>; I, that she is <q rend=italic>too wise</q>;
that is to say, being likewise put into English, <gloss
rend=italic>not so young as she has been</gloss>:  in short,
she is grown so much into a <hi rend=italic>mother</hi>,
that she had forgotten she ever was a
<hi rend=italic>daughter</hi>.

6.4 Names, Numbers, Dates, Abbreviations, and Addresses

This section describes a number of textual features which it is often convenient to distinguish from their surrounding text. Names, dates and numbers are likely to be of particular importance to the scholar treating a text as source for a database; distinguishing such items from the surrounding text is however equally important to the scholar primarily interested in lexis.

The treatment of these textual features proposed here is not intended to be exhaustive: fuller treatments for names, numbers, measures and dates are provided in the additional tag set for names and dates (see chapter 20 ).

6.4.1 Referring Strings

A referring string is a phrase which refers to some person, place, object etc. Two elements are provided to mark such strings:

Where it is thought useful to do so, the kind of object referred to may be specified using the type attribute.

Examples include:

<q>My dear <rs type=person>Mr. Bennet</rs>, </q>
said his lady to him one day, <q>have you heard
that <rs type=place>Netherfield Park</rs> is let
at last?</q>
Collectors of water-rents were appointed by the
<rs type=organization>Watering Committee</rs>.
They were paid a commission not exceeding four per
cent, and gave bond.
 
It being one of the principles of the
<rs type=org>Circumlocution Office</rs> never, on any
account whatsoever, to give a straightforward answer,
<rs type=person>Mr Barnacle</rs> said, <q>Possibly.</q>

As the following example shows, the <rs> element may be used for any reference to a person, place, etc., not only to references in the form of a proper noun or noun phrase.

<q>My dear <rs type=person>Mr. Bennet</rs>,</q>
said <rs type=person>his lady</rs> to him
one day...

The <name> element by contrast is provided for the special case of referencing strings which consist only of proper nouns; it may be used synonymously with the <rs> element, or nested within it if a referring string contains a mixture of common and proper nouns. The following example shows an alternative way of encoding the short sentence from Pride and Prejudice quoted above:

<q>My dear <name type=person>Mr. Bennet</name>,</q>
said <rs type=person>his lady</rs> to him one day,
<q>have you heard that <name type=place>Netherfield
Park</name> is let at last?</q>
The following example shows how a proper name may be nested within a referring string:
<rs>His Excellency the Life President,
<name>Ngwazi Dr H. Kamuzu Banda</name></rs>
<!-- ... -->

Simply tagging something as a name is generally not enough to enable automatic processing of personal names into the canonical forms usually required for reference purposes. The name as it appears in the text may be inconsistently spelled, partial, or vague. Moreover, name prefixes such as `van' or `de la', may or may not be included as part of the reference form of a name, depending on the language and country of origin of the bearer.

The following attributes, common to all members of the names element class, are provided to help overcome these difficulties:

Either or both of these attributes may be specified, as appropriate. The key attribute may be useful as a means of gathering together all references to the same individual or location scattered throughout a document:

<q>My dear <rs type=person key=BENM1>Mr. Bennet</rs>,
</q> said <rs type=person key=BENM2>his lady</rs>
to him one day, <q>have you heard that
<rs type=place key=NETP1>Netherfield Park</rs>
is let at last?</q>

This use should be distinguished from the case of the reg (regularization) attribute, which provides a means of marking the standard form of a referencing string as demonstrated below:

My personal life during the administration of
<rs type=person key=POJA1 reg='Polk, James K.'>Col. Polk</rs>
has but poorly compensated me for the suspended
enjoyments and pursuits of private and professional spheres
<name type=person key=VOM1 reg='Volanges, Mme de'>
Mme. de Volanges</name>
marie sa fille: c'est encore un secret; mais elle m'en
a fait part hier.
<name type=person key=WADLM1 reg='de la Mare, Walter'>
Walter de la Mare
</name>
was born at
<name key=Ch1 type=place>Charlton</name>, in
<name key=KT1 type=county>Kent</name>, in 1873.
<name type=place>Montaillou</name> is not a large parish.
At the time of the events which led to
<name type=person reg='Benedict XII, Pope of Avignon
(Jacques Fournier)'>Fournier's</name> investigations,
the local population consisted of between 200 and
250 inhabitants.

This method is adequate for many simple applications. For more complex applications, such as onomastics, or wherever a detailed analysis of the component parts of a name is needed, the specialized elements described in chapter 20 or the analytical tools described in chapter 16 should be used.

These elements are formally declared as follows:

<!-- 6.4.1:  Proper Nouns                                     -->
<!ELEMENT name          - -  (%phrase.seq;)                     >
<!ATTLIST name               %a.global;
                             %a.names;
          type               CDATA               #IMPLIED       >
<!ELEMENT rs            - -  (%phrase.seq)                      >
<!ATTLIST rs                 %a.global;
                             %a.names;
          type               CDATA               #IMPLIED       >
<!-- This fragment is used in sec. 6.12                       -->

6.4.2 Addresses

The simplest way of encoding an address is to regard it as a series of distinct lines, just as they might be printed on an envelope. The following elements support this view:

Alternatively, an address may be encoded as a structure composed of the following elements, which constitute the addrPart element class:

Any number of addrPart elements may appear within an address and in any order. None of them is required. Where code letters are commonly used in addresses (for example, to identify regions or countries) a useful practice is to supply the full name of the region or country as the content of the element, but to supply the abbreviatory code as the value of the global n attribute, so that (for example) an application preparing formatted labels can readily find the required information. Other components of addresses should be represented using the general-purpose <name> element.

Some examples follow:

<address>
<addrLine>110 Southmoor Road, </>
<addrLine>Oxford, OX2 6RB,</>
<addrLine>UK</>
</address>

The above address could also be represented as follows :

<address>
<street>110 Southmoor Road</street>
<name type=city n=OX>Oxford</name>
<postCode>OX2 6RB</postCode>
<name type=country n=UK>United Kingdom</name>
</address>

The order of elements within an address is highly culture-specific, and is therefore unconstrained:

<address>
<name type=org>Università di Bologna</name>
<name type=country n=I>Italy</name>
<postCode>40126</postcode>
<name type=city>Bologna</name>
<street>via Marsala 24</street>
</address>

For further discussion of ways of regularizing the names of places, see section 6.4 . A full postal address may also include the name of the addressee, tagged as above using the general purpose <name> element. When the additional tag set for names and dates is enabled, more specific elements such as <publisher> or <org> may be used, as further discussed in chapter 20 .

The <address> element and its components are formally described as follows:

<!-- 6.4.2:  Addresses and their components                   -->
<!ELEMENT address       - O  (addrLine+ | (%m.addrPart)*)       >
<!ATTLIST address            %a.global;                         >
<!ELEMENT addrLine      - o  (%phrase.seq)                      >
<!ATTLIST addrLine           %a.global;                         >
<!ELEMENT street        - o  (%phrase.seq)                      >
<!ATTLIST street             %a.global;                         >
<!ELEMENT postCode      - o  (#PCDATA)                          >
<!ATTLIST postCode           %a.global;                         >
<!ELEMENT postBox       - o  (#PCDATA)                          >
<!ATTLIST postBox            %a.global;                         >
<!-- Other components of addresses should be represented      -->
<!-- using the general purpose NAME element                   -->

<!-- This fragment is used in sec. 6.12                       -->

6.4.3 Numbers and Measures

This section describes two elements provided for the simple encoding of numbers and measures and gives some indication of circumstances in which this may usefully be done. The following phrase level elements are provided for this purpose:

Like names or abbreviations, numbers can occur virtually anywhere in a text. Numbers are special in that they can be written with either letters or digits (`twenty-one', `xxi', and `21') and their presentation is language-dependent (e.g. English `5th' becomes Greek `5.'; English `123,456.78' equals French `123.456,78').

For many kinds of application, e.g. natural-language processing or machine translation, numbers are not regarded as `lexical' in the same way as other parts of a text. For these and other applications, the <num> element provides a convenient method of distinguishing numbers from the surrounding text. For other kinds of application, numbers are only useful if normalized: here the <num> element is useful precisely because it provides a standardized way of representing a numerical value.

For example:

<num value='33'>xxxiii</num>
<num type=cardinal value='21'>twenty-one</num>
<num type=percentage value='10'>ten percent</num>
<num type=percentage value='10'>10%</num>
<num type=ordinal value='5'>5th</num>
<num type=fraction value='0,5'>one half</num>
<num type=fraction value='0,5'>1/2</num>

The word `measure' is used here to refer to a special kind of referring string, the referent of which is a `virtual object'. In its fullest form, a measure consists of a number, a phrase expressing units of measure and a phrase expressing the commodity being measured. Not all of these components need be present in every case. For some applications, particularly quantitative ones, the internal components of measure need to be marked so that their values can be calculated. Thus, in order to evaluate a monetary measure according to some standard, it is necessary to mark its currency unit (e.g. US dollars, pounds sterling). Similarly, the expression `2 ounces' will have a different meaning when it is associated with `flour' from that which it has when associated with `water'.

Such applications will require the elements discussed in chapter 20 , or the more powerful analytical tools discussed in chapter 16 . Elsewhere, it may be sufficient simply to encode measures as such, perhaps also indicating their numeric content with the <num> element, as in the following examples:

<l>I've measured it from side to side
<l>'Tis
<measure type=length reg='0.924m'>
    <num value=3>three</num>
    feet
</measure>
long, and
<measure type=length reg='0.616m'>
   <num value=2>two</num>
   feet
</measure>
wide.
</l>
As the above example also demonstrates, the <measure> element is a member of the class names like other referencing strings, and may thus bear a reg attribute to indicate a normalized value. It may also carry a key attribute to indicate a database key value as in the following example:
<list>
<item><measure key=BH2 type=volume>
         <num value=2>ii</num>
          bags hops
       </measure>
<item> <measure key=TW6 type=volume>
         <num value=6>six</num>
        trusses Woolen and linen goods
       </measure>
<item><measure  key=WC5 type=weight>
        5 tonnes coale
      </measure>
<!-- ... -->
</list>

These elements are formally defined as follows:

<!-- 6.4.3:  Numbers and measures                             -->
<!ELEMENT num           - -  (%phrase.seq;)                     >
<!ATTLIST num                %a.global;
          value              CDATA               #IMPLIED
          type               CDATA               #IMPLIED       >
<!ELEMENT measure       - -  (%phrase.seq;)                     >
<!ATTLIST measure            %a.global;
                             %a.names;
          type               CDATA               #IMPLIED       >
<!-- This fragment is used in sec. 6.12                       -->

6.4.4 Dates and times

Dates and times, like numbers, can appear in widely varying culture- and language-dependent forms, and can pose similar problems in automatic language processing. The following elements are provided to identify them:

Dates can occur virtually anywhere in a text, but in some contexts (e.g. bibliographic citations) their encoding is recommended or required rather than optional. Times can also appear anywhere but are generally optional.

Partial dates or times (e.g. `1990', `September 1990', `twelvish') can be expressed in the value attribute by simply omitting a part of the value supplied. Imprecise dates or times (for example `early August', `some time after ten and before twelve') may be expressed as date or time ranges. If either end of the date or time range is known to be accurate (for example, `at some time before 1230', `a few days after Hallowe'en'), the exact attribute may be used to specify this.

Where the certainty (i.e. reliability) of the date or time itself is in question, rather than its precision, the encoder should record this fact using the mechanisms discussed in chapter 17 .

These mechanisms are useful primarily for fully specified dates or times known with certainty. If component parts of dates or times are to be marked up, or if a more complex analysis of the meaning of a temporal expression is required, the techniques described in chapter 20 should be used in preference to the simple method outlined here.

Examples:

<date value='1980-02-21'>21 Feb 1980</date>
Given on the <date value='1977-06-12'>Twelfth Day of June
in the Year of Our Lord One Thousand Nine Hundred and
Seventy-seven of the Republic the Two Hundredth and first
and of the University the Eighty-Sixth.</date>
<date value='1990'>1990</date>
<date value='1990-09'>September 1990</date>
Those five years — <dateRange from=1918 to=1923>
1918 to 1923</dateRange> &mdash had been, he suspected,
somehow very important.
The Eddic poems are preserved in a unique
manuscript (Codex Regius 2365) from
<dateRange from=1250 to=1300>
the second half of the thirteenth
century</dateRange>, and <title>Hervarar
saga</title> dates from <date value=1300>
around 1300</date>.

These elements are formally defined as follows:

<!-- 6.4.4:  Dates and times                                  -->
<!ELEMENT date          - -  (%phrase.seq;)                     >
<!ATTLIST date               %a.global;
          value              CDATA               #IMPLIED
          calendar           CDATA               #IMPLIED
          certainty          CDATA               #IMPLIED       >
<!ELEMENT dateRange     - O  (%phrase.seq;)                     >
<!ATTLIST dateRange          %a.global;
          to                 CDATA               #IMPLIED
          exact              (to | from | both | none) 
                                                 #IMPLIED
          from               CDATA               #IMPLIED
          calendar           CDATA               #IMPLIED       >
<!ELEMENT time          - -  (%phrase.seq;)                     >
<!ATTLIST time               %a.global;
          value              CDATA               #IMPLIED
          type               (am | pm | 24hour | descriptive) 
                                                 #IMPLIED
          zone               CDATA               #IMPLIED       >
<!ELEMENT timeRange     - -  (%phrase.seq;)                     >
<!ATTLIST timeRange          %a.global;
          to                 CDATA               #IMPLIED
          exact              (to | from | both | none) 
                                                 #IMPLIED
          from               CDATA               #IMPLIED       >
<!-- This fragment is used in sec. 6.12                       -->

6.4.5 Abbreviations and Their Expansions

It is sometimes desirable to mark abbreviations in the copy text, whether to trigger special processing for them, to provide the full form of the word or phrase abbreviated, or to allow for different possible expansions of the abbreviation. Abbreviations may be transcribed as they stand, or expanded; they may be left unmarked, or marked using these tags:

The <abbr> element is useful as a means of distinguishing semi-lexical items such as acronyms or jargon:

We can sum up the above discussion as follows:  the identity of a
<abbr>CC</abbr> is defined by that calibration of values which
motivates the elements of its <abbr>GSP</abbr>; ...
Every manufacturer of <abbr>3GL</abbr> or <abbr>4GL</abbr>
languages is currently nailing on <abbr>OOP</abbr> extensions.

The type attribute may be used to distinguish types of abbreviation by their function, and the expan attribute may be used to supply an expansion:

 <abbr type=title>Dr.</abbr> <abbr type=initial>M.</abbr> Deegan
 is the Director of the <abbr expan='Computers in Teaching Initiative'
 type=acronym>CTI</abbr> Centre for Textual Studies.

Abbreviations such as `Dr M' above may be treated as two abbreviations, as above, or as one:

 <abbr>Dr. M.</abbr> Deegan is the Director of the
 <abbr>CTI</abbr> Centre for Textual Studies.

This element is particularly useful where manuscript materials in which abbreviation is very frequent are being transcribed. For example:

<l>Ex<abbr type=brevigraph expan='per'
           Resp=PG>&per;</abbr>ience, thogh noon auctoritee
<l>Were in this world, is right ynogh for me
<l>To speke of wo that is in mariage;

Here an entity reference per has been used to represent the common manuscript symbol `crossed-p', and its expansion supplied in the associated <abbr> tag. The same lines might be transcribed, expanded, as follows:

<l>Ex<expan type=brevigraph abbr='&per;'
          resp=PG>per</expan>ience, thogh noon auctoritee
<l>Were in this world, is right ynogh for me
<l>To speke of wo that is in mariage;

In practice, it may be most convenient to transcribe the abbreviation as an entity reference; this allows the entity reference itself to be expanded either as an <abbr> or as an <expan> element, depending on the processing to be done at the moment. (For further discussion of such documentation, see section 25.4.3 .) The text shown here:

<l>Ex&per;ience, thogh noon auctoritee
<l>Were in this world, is right ynogh for me
<l>To speke of wo that is in mariage;
may be expanded as desired by providing the appropriate choice between the two entity declarations:
<!ENTITY per "<abbr type=brevigraph expan='per'
              Resp=PG>&p.crossed;</abbr>" >
<!ENTITY per "<expan type=brevigraph abbrev='&p.crossed;'
              Resp=PG>per</expan>" >
For further discussion of manuscript abbreviations, see chapter 18 .

These elements are formally defined as follows:

<!-- 6.4.5:  Abbreviations                                    -->
<!ELEMENT abbr          - -  (%phrase.seq;)                     >
<!ATTLIST abbr               %a.global;
          cert               CDATA               #IMPLIED
          expan              CDATA               #IMPLIED
          resp               IDREF               %INHERITED
          type               CDATA               #IMPLIED       >
<!ELEMENT expan         - -  (%phrase.seq;)                     >
<!ATTLIST expan              %a.global;
          cert               CDATA               #IMPLIED
          abbr               CDATA               #IMPLIED
          resp               IDREF               %INHERITED
          type               CDATA               #IMPLIED       >
<!-- This fragment is used in sec. 6.12                       -->

6.5 Simple Editorial Changes

As in editing a printed text, so in encoding a text in electronic form, it may be necessary to accommodate editorial comment on the text and to render account of any changes made to the text in preparing it. The tags described in this section may be used to record such editorial interventions, whether made by the encoder, by the editor of a printed edition used as a copy text, by earlier editors, or by the copyists of manuscripts.

The tags described here handle most common types of editorial intervention and stereotyped comment; where less structured commentary of other types is to be included, it should be marked using the <note> element described in section 6.8 . Systematic interpretive annotation is also possible using the various methods described in chapter 14 . The examples given here illustrate only simple cases of editorial intervention; in particular, they permit economical encoding of two alternative readings of a text only. To encode more than two views of any one segment of text, the mechanisms described in chapters 14 and 19 must be used.

The first two pairs of elements here discussed (<sic> and <corr> , <reg> and <orig> ) may both be used to record simultaneously a text in its `original', uncorrected and unaltered form and also in an `edited' form. In this way they resemble the pair <abbr> and <expan> , described in section 6.4.5 . Such paired elements enable software to move automatically from one `view' of the text to the other.

Three categories of editorial intervention are discussed in this section:

A more extended treatment of the use of these tags in transcriptional and editorial work is given in chapter 18 .

6.5.1 Correction of Apparent Errors

When the copy text is manifestly faulty, an encoder or transcriber may elect simply to correct it without comment. For scholarly purposes, it will often be more generally useful to record both the correction and the original state of the text. The elements described here enable this to be done is such a way as not to distract the reader.

The following examples show alternative treatment of the same material. The copy text reads: ``Another property of computer-assisted historical research is that data modelling must permit any one textual feature or part of a textual feature to be a part of more than one information model and to allow the researcher to draw on several such models simultaneously, for example, to select from a machine-readable text those marginal comments which indicate that the date's mentioned in the main body of the text are incorrect.''

An encoder may choose to correct the typographic error, either silently or with an indication that a correction has been made, as follows:

 ... marginal comments which indicate that the
 <corr>dates</corr> mentioned in
 the main body of the text are incorrect.

Alternatively, the encoder may simply record the typographic error without correcting it, either without comment or with a <sic> element to indicate the error is not a transcription error in the encoding:

 ... marginal comments which indicate that the
 <sic>date's</sic> mentioned in
 the main body of the text are incorrect.

If the encoder elects both to record the original source text and to provide a correction for the sake of word-search and other programs, either <sic> or <corr> may be used with the appropriate attribute:

 ... marginal comments which indicate that the
 <sic corr='dates' resp=MSM>date's</sic> mentioned in
 the main body of the text are incorrect.
 ... marginal comments which indicate that the
 <corr sic="date's" resp='MSM'>dates</corr> mentioned in
 the main body of the text are incorrect.
If both readings are given, the choice between <sic> and <corr> is largely a question of individual preference; since both record the same information, either may be mechanically transformed into the other. If the original reading contains SGML tags, it will prove more convenient to use <sic> than <corr> (and vice versa if there are tags within the corrected reading), since SGML tags are not recognized in attribute values. If both readings contain subordinate tags, then recourse must be had to the methods described in chapter 19 .

The cert attribute on the <sic> and <corr> elements permits a statement of the degree of editorial confidence in a particular correction. For example, using a confidence scale of one to ten, an editor may indicate the conjectural status of a correction by assigning a value to this attribute of less than ten. In the following instance, some uncertainty is expressed concerning a commonly-accepted emendation:

An <corr sic='Antony' cert=8>Autumn</corr> it was,
That grew the more by reaping
See further the discussion in section 18.1.3 .

Where the correction takes the form of adding text, the encoder must choose whether to use the <corr> (or <sic> ) tag, the <add> tag (see section 6.5.3 below), or the more detailed facilities provided by the additional tag set for primary source description. The following discussion may be helpful when making this decision:

The formal definition of these elements is as follows:

<!-- 6.5.1:  Editorial tags for correction                    -->
<!ELEMENT sic           - -  (%specialPara;)                    >
<!ATTLIST sic                %a.global;
          cert               CDATA               #IMPLIED
          corr               CDATA               #IMPLIED
          resp               IDREF               %INHERITED     >
<!ELEMENT corr          - -  (%specialPara;)                    >
<!ATTLIST corr               %a.global;
          cert               CDATA               #IMPLIED
          sic                CDATA               #IMPLIED
          resp               CDATA               %INHERITED     >
<!-- This fragment is used in sec. 6.12                       -->

6.5.2 Regularization and Normalization

When the source text makes extensive use of variant forms or non-standard spellings, it may be desirable for a number of reasons to regularize it: that is, to provide `standard' or `regularized' forms equivalent to the non-standard forms. [ see note 44 ]

As with other such changes to the copy text, the changes may be made silently (in which case the TEI header should specify the types of silent changes made) or may be explicitly marked using the following elements:

Typical applications for these elements include the production of editions intended for student or lay readers, linguistic research in which spelling or usage variation is not the main question at issue, production of spelling dictionaries, etc.

Consider this 16th-century text: ``how godly a dede it is to overthrowe so wicked a race the world may judge: for my part I thinke there canot be a greater sacryfice to God.''

An encoder may choose to preserve the original spelling of this text, but simply flag it as nonstandard by using the <orig> element with no attributes specified, as follows:

how godly a <orig>dede</orig> it is to
<orig>overthrowe</orig> so wicked a race the
world may judge:  for my part I <orig>thinke</orig>
there <orig>canot</orig> be a greater
<orig>sacryfice</orig> to God.

Alternatively, the encoder may simply indicate that certain words have been modernized by using the <reg> element with no attributes specified, as follows:

how godly a <reg>deed</reg> it is to
<reg>overthrow</reg> so wicked a race the
world may judge:  for my part I <reg>think</reg>
there <reg>cannot</reg> be a greater
<reg>sacrifice</reg> to God.

More usefully, the encoder may elect to record both old and new spellings, so that (for example) the same electronic text may serve as the basis of an old- or new-spelling edition:

how godly a <reg orig='dede'>deed</reg> it is to
<reg orig='overthrowe'>overthrow</reg> so wicked a race the
world may judge:  for my part I <reg orig='thinke'>think</reg>
there <reg orig='canot'>cannot</reg> be a greater
<reg orig='sacryfice'>sacrifice</reg> to God.

Or the <orig> tag might be preferred

how godly a <orig reg='deed'>dede</orig> it is to
<orig reg='overthrow'>overthrowe</orig> so wicked a race the
world may judge:  for my part I <orig reg='think'>thinke</orig>
there <orig reg='cannot'>canot</orig> be a greater
<orig reg='sacrifice'>sacryfice</orig> to God.

The resp attribute should be used to specify the agency responsible for the regularization. This may be an identifiable individual, for example an editor, or a descriptive phrase such as `copyist'. For example, in the first stanza of the Old Norse poem Gr[oacute]galdr , the manuscript form `dura' is usually regularized in modern editions to `dyra' doors. The manuscript's ``vek ek [thorn ]ik dau[eth ]ra dura'' might thus be recorded together with its regularization in two ways, as follows:

 vek ek þik dauðra <reg orig='dura' resp=ed>dyra</reg>
or:
 vek ek þik dauðra <orig reg='dyra' resp=ed>dura</orig>

These elements are formally defined as follows:

<!-- 6.5.2:  Editorial tags for regularization                -->
<!ELEMENT reg           - -  (%phrase.seq;)                     >
<!ATTLIST reg                %a.global;
          orig               CDATA               #IMPLIED
          resp               CDATA               #IMPLIED       >
<!ELEMENT orig          - -  (%phrase.seq;)                     >
<!ATTLIST orig               %a.global;
          reg                CDATA               #IMPLIED
          resp               CDATA               #IMPLIED       >
<!-- This fragment is used in sec. 6.12                       -->

6.5.3 Additions, Deletions and Omissions

The following elements are used to indicate when words or phrases have been omitted from, added to, or marked for deletion from, a text. Like the other editorial elements, they allow for a wide range of editorial practices:

Encoders may choose to omit parts of the copy text for reasons ranging from illegibility of the source or impossibility of transcribing it, to editorial policy, e.g. a systematic exclusion of poetry or prose from an encoding. The full details of the policy decisions concerned should be documented in the TEI Header (see section 5.3 ). Each place in the text at which omission has taken place should be marked with a <gap> element, with optionally further information about the reason for the omission, its extent and the person or agency responsible for it, as in the following examples:

 <gap desc="Prose commentary" reason="sampling"
       extent="120 lines" resp=PR>
 ... Their arrangement with respect to Jupiter and to each
 other was as follows:
 <gap desc="diagram" reason="sampling" extent="2 cm x 1 col">
 That is, there were two starts on the easterly side and one
 to the west; ...
 <gap reason="illegible" desc="ink blot" extent="two words">
 <gap reason='overwriting, illegible'
       resp='H1'
       extent='8 chars'>

The <add> and <del> elements may be used to record where words or phrases have been added or deleted in the copy text. They are not appropriate where longer passages have been added or deleted, which span several SGML elements; for these, the elements <addSpan> and <delSpan> , or other mechanisms described in section 18 must be used.

Additions to a text may be recorded for a number of reasons. Sometimes they are marked in a distinctive way in the source text, for example by brackets or insertion above the line (supralinear insertion), as in the following example, taken from a 19th century manuscript:

The story I am going to relate is true as to its main
facts, and as to the consequences <add
place='supralinear' resp='auth'>of these facts</add>
from which this tale takes its title.

The <add> element should not be used to mark editorial changes, such as supplying a word omitted by mistake from the source text or a passage present in another version. In these cases, either the <corr> or <supplied> tags should be used, as discussed above in section 6.5.1 , and in section 18.1.3 , respectively.

The <unclear> element is used to mark passages in the original which cannot be read with confidence, or about which the transcriber is uncertain for other reasons, as for example when transcribing a partially inaudible or illegible source. Its reason and resp attributes are used, as with the <gap> element, to indicate the cause of uncertainty and the person responsible for the conjectured reading.

For example:

<l>And where the sandy mountain Fenwick scald</>
<l><unclear reason="ink blot" resp=LB>The</unclear> sea between
   yet hence his pray'r prevail'd</>
or from a spoken text:
and then <unclear reason='passing truck'>marbled queen</unclear>
Where the material affected is entirely illegible or inaudible, the <gap> element discussed above should be used in preference.

The <del> element is used to mark material which is deleted in the source but which can still be read with some degree of confidence, as opposed to material which has been omitted by the encoder or transcriber either because it is entirely illegible or for some other reason. This is of particular importance in transcribing manuscript material, though deletion is also found in printed texts, sometimes for humorous purposes:

<l>One day I will sojourn to your shores</>
<l>I live in the middle of England</>
<l>But!</>
<l>Norway! My soul resides in your watery
   <del type='overstrike'>fiords fyords fiiords</del></>
<l>Inlets.</>

The type attribute may be used to distinguish different methods of deletion in manuscript or typescript material, as in this line from the typescript of Eliot's Waste Land :

<l><del type=overtyped>Mein</del> Frisch
   <del type=overstrike>schwebt</del> weht der Wind</>

Deletion in manuscript or typescript is often associated with addition:

<l><del type=overstrike>Inviolable</del>
   <add place=infralinear>Inexplicable</add>
   splendour of Corinthian white and gold</>

The <del> element should not be used where the deletion is such that material cannot be read with confidence, or read at all, or where the material has been omitted by the transcriber or editor for some other reason. Where the material cannot be read with confidence following deletion, the <unclear> tag should be used with the reason attribute indicated that the difficulty of transcription is due to deletion. Where material has been omitted by the transcriber or editor, this may be indicated by use of the <corr> (or <sic> ) and <gap> elements. Observe that the distinction between recommended uses of the <del> , <corr> and <gap> tags parallels the distinction drawn between the <add> , <corr> and <supplied> tags in section 6.5.1 and section 18.1.3 :

For any detailed transcription of a manuscript or typescript with more than trivial amounts of alteration, the reader should consult chapter 19 , and chapter 18 .

These elements are formally defined as follows:

<!-- 6.5.3:  Other editorial tags                             -->
<!ELEMENT gap           - O  EMPTY                              >
<!ATTLIST gap                %a.global;
          desc               CDATA               #IMPLIED
          extent             CDATA               #IMPLIED
          resp               IDREF               %INHERITED
          hand               IDREF               %INHERITED
          reason             CDATA               #IMPLIED
          agent              CDATA               #IMPLIED       >
<!ELEMENT add           - -  (%specialPara;)                    >
<!ATTLIST add                %a.global;
          cert               CDATA               #IMPLIED
          place              CDATA               #IMPLIED
          resp               IDREF               %INHERITED
          hand               IDREF               %INHERITED     >
<!ELEMENT del           - -  (%phrase.seq;)                     >
<!ATTLIST del                %a.analysis;
                             %a.linking;
                             %a.terminology;
          id                 ID                  #IMPLIED
          lang               IDREF               %INHERITED
          n                  CDATA               #IMPLIED
          cert               CDATA               #IMPLIED
          resp               IDREF               %INHERITED
          hand               IDREF               %INHERITED
          status             CDATA               'unremarkable'
          rend               CDATA               #IMPLIED
          type               CDATA               #IMPLIED       >
<!ELEMENT unclear       - O  (%paraContent;)                    >
<!ATTLIST unclear            %a.global;
          cert               CDATA               #IMPLIED
          resp               CDATA               %INHERITED
          hand               IDREF               %INHERITED
          reason             CDATA               #IMPLIED
          agent              CDATA               #IMPLIED       >
<!-- This fragment is used in sec. 6.12                       -->

6.6 Simple Links and Cross References

Cross-references or links between one location in a document and another, or between one location and several others, may be encoded using the elements <ptr> and <ref> , as discussed in this section. These elements both `point' from one location in a document, the place that the element itself appears, to another (or to several), specified by the target attribute. Linkages of several other kinds are also provided for in these guidelines; see further chapter 14 .

The pointing facility of these elements depends on the ability to supply a unique identifier for any element in the TEI scheme, using the global id attribute. Where the object or objects of a cross-reference are not identifiable in this way, either because they are located in a distinct SGML document or because no id attribute is available, the elements <xptr> or <xref> may be used instead. [ see note 45 ] Alternatively, if no explicit link is to be encoded, but it is simply required to mark the phrase as a cross-reference, the <ref> element may be used without a target attribute.

The elements <ptr> and <ref> share, as members of the element class pointer , the following attributes:

The shared attributes of the two elements may be used in the same way; the difference between the elements is that while the <ptr> element is empty, the <ref> element may contain phrases specifying, or defining more exactly, the target of a cross reference, which form the content of the element. Since its content thus serves as a human-readable pointer, in the simplest case a <ref> element need not identify its target in any other way. For example:

    See <ref>section 12 on page 34</ref>.

More usually, it will be desirable to identify the target of the cross-reference using the target attribute, so that processing software can access it directly, for example to implement a linkage or to generate an appropriate reference. Assuming that section 12 in the previous example has been tagged <div1 id=SEC12> , the same cross reference might more exactly be encoded as

    See especially <ref target=SEC12>section 12 on page 34</ref>.

If the text for the cross reference is to be generated according to a fixed pattern, or if no text is to appear in the body of the cross reference, the <ptr> element would be used as follows:

    See in particular <ptr target=sec12>.

A cross-reference may point to any number of locations simultaneously, simply by giving more than one identifier as the value of its target attribute. This may be particularly useful where an analytic index is to be encoded, as in the following example:

    <list>
    <item>Saints aid rejected in mel. <ptr target=p299></item>
    <item>Sallets censured <ptr target="p143 p144"></item>
    <item>Sanguine mel. signs <ptr target="p263 p312 p332"></item>
    <item>Scilla or sea onyon, a purger of mel. <ptr target=p442></item>
   <!-- ... -->
    </list>
Here the targets of the cross references are simply page numbers; it is assumed that corresponding elements with identifiers p299 p143 , etc. have been provided in the body of the text. If it is desired to check that the target elements are of a particular type, the targType (target type) attribute may be specified:
    <list>
    <item>Saints aid rejected in mel <ptr target=p299 targType=pb></item>
    <item>Sallets censured <ptr target="p143 p144" targType=pb></item>
    <!-- ... -->
    </list>
Here, a processing application can check that the elements with identifiers p299 , p143 , and p144 are all <pb> (page-break) elements. It is a semantic error in a text if the targets given do not match the values specified on a targType attribute.

The type and resp attributes may be used, as elsewhere, to categorize the cross reference according to any system of importance to the encoder and to supply a code identifying the person or agency responsible for the cross reference. If bibliographic references require special processing (e.g. in order to provide a consistent short-form reference), they might be tagged thus:

 Similar forms, often called <term rend='ldquo
 rdquo'>rewriting systems</term>, have a long
 history among mathematicians, but the specific form
 of <ptr targType=fig target=Fig22> was first
 studied extensively by Chomsky <ptr type=bibliog
 targType='bibl bibl.struct bibl.full' target=Chom59>.
Here type=bibliog signals for the processing appropriate to a bibliographic reference, while ``targType='bibl bibl.struct bibl.full''' restricts the legal targets to bibliographic elements, and target=Chom59 indicates which bibliographic element actually is being referred to. For further discussion of bibliographic references, see section 6.10.3 .

If the order in which the objects of a multi-headed cross reference are specified is of importance, the targOrder (target order) attribute should be specified.

<p>The following discussions of this topic should
be consulted for further information:
<ptr target='ch3 sec332 sec45 sec722' targOrder=Y>

The <ptr> and <ref> tags have many applications in addition to the simple cross-referencing facilities illustrated in this section. In conjunction with the analytic tools discussed in chapters 14 , 15 , and 16 , they may be used to link analyses of a text to their object, to combine corresponding segments of a text, or to align segments of a text with a temporal or other axis or with each other.

These elements are formally defined as follows:

<!-- 6.6:  Simple cross references                            -->
<!ENTITY % a.pointer '
          crdate             CDATA               #IMPLIED
          evaluate           (all | one | none)  #IMPLIED
          resp               CDATA               #IMPLIED
          targOrder          (Y | N | U)         U
          type               CDATA               #IMPLIED
          targType           NAMES               #IMPLIED'      >
<!ELEMENT ptr           - O  EMPTY                              >
<!ATTLIST ptr                %a.global;
                             %a.pointer;
          target             IDREFS              #REQUIRED      >
<!ELEMENT ref           - -  (%paraContent)                     >
<!ATTLIST ref                %a.global;
                             %a.pointer;
          target             IDREFS              #IMPLIED       >
<!-- This fragment is used in sec. 6.12                       -->

6.7 Lists

The following elements are provided for the encoding of lists, their constituent items, and the labels or headings associated with them:

The <list> element should be used to mark any kind of list: numbered, lettered, bulleted, or unmarked. Lists formatted as such in the copy text should in general be encoded using this element, with an appropriate value for the type attribute. Lists given as run-on text may also be encoded using this element, where this is felt to be appropriate.

Each distinct item in the list should be encoded as a distinct <item> element. If the numbering or other identification for the items in a list is unremarkable and may be reconstructed by any processing program, no enumerator need be specified. If however an enumerator is retained in the encoded text, it may be supplied either by using the n attribute on the <item> element, or by using a <label> element. The following examples are thus equivalent:

I will add two facts, which have seldom occurred in
the composition of six, or at least of five quartos.
<list type=ordered rend=runon>
<label>(1)</><item>My first rough manuscript, without any
intermediate copy, has been sent to the press.</item>
<label>(2)</><item>Not a sheet has been seen by any human
eyes, excepting those of the author and the printer:
the faults and the merits are exclusively my own.</item>
</list>
I will add two facts, which have seldom occurred in
the composition of six, or at least of five quartos.
<list type=ordered rend=runon>
<item n='1'>My first rough manuscript, without any
intermediate copy, has been sent to the press.</>
<item n='2'>Not a sheet has been seen by any human
eyes, excepting those of the author and the printer:
the faults and the merits are exclusively my own.</>
</list>
The two styles may not be mixed in the same list: if one item is preceded by a label, all must be.

A list need not necessarily be displayed in list format. For example,

On those remote pages it is written that animals are
divided into <list><item n='a'>those that belong to the
Emperor,</><item n='b'> embalmed ones, </><item n='c'> those
that are trained, </><item n='d'> suckling pigs, </><item n='e'>
mermaids,  </><item n='f'> fabulous ones, </><item n='g'> stray
dogs, </><item n='h'> those that are included in this
classification, </><item n='i'> those that tremble as if they
were mad, </><item n='j'> innumerable ones, </><item n='k'> those
drawn with a very fine camel's-hair brush, </><item n='l'>
others, </><item n='m'> those that have just broken a flower
vase, </><item n='n'> those that resemble flies from a
distance.</></list>

A list may be given a heading or title, for which the <head> element should be used, as in the next example, which also demonstrates simple use of the <label> element to mark a tabular or glossary list in which each item is associated with a word or phrase rather than a numeric or alphabetic enumerator:

<list type=gloss>
<head>Report of the conduct and progress of Ernest Pontifex.
Upper Vth form &mdash half term ending Midsummer 1851</head>
<label>Classics</><item>Idle listless and unimproving</>
<label>Mathematics</><item>ditto</>
<label>Divinity</><item>ditto</>
<label>Conduct in house</><item>Orderly</>
<label>General conduct</><item>Not satisfactory, on account
of his great unpunctuality and inattention to duties</>
</list>

In such a list, the individual items have internal structure. In complex cases, where list items contain many components, the list is better treated as a table, on which see chapter 22 . A particularly important instance of the simple two-column table is the `glossary list', which should be marked by the tag <list type=gloss> . In such lists, each <label> element contains a term and each <item> its gloss; it is a semantic error for a list tagged with type=gloss not to have labels. For example:

<list type=gloss>
<head>Unit Three -- Vocabulary</head>
<label lang=la>acerbus, -a, -m       </><item>bitter, harsh</>
<label lang=la>ager, agrī, M.  </><item>field</>
<label lang=la>audiō, īre,
       īvī, ītus   </><item>hear, listen (to)</>
<label lang=la>bellum, -ī, N.  </><item>war</>
<label lang=la>bonus, -a, -um        </><item>good</>
<!-- etc. -->
</list>
Additionally, the <term> and <gloss> elements discussed in section 6.3.4 might be used to make explicit the role that each column in the glossary list has, as follows:
<list type=gloss>
<head>Unit Three -- Vocabulary</head>
<label><term lang=la>acerbus, -a, -m</term></>
<item><gloss>bitter, harsh</gloss></item>
<label><term lang=la>ager, agrī, M. </term></>
<item><gloss>field</gloss></item>
<label><term lang=la>audiō, īre,
       īvī, ītus  </term></>
<item><gloss>hear, listen (to)</gloss></item>
<label><term lang=la>bellum, -ī, N. </term></>
<item><gloss>war</gloss></item>
<label><term lang=la>bonus, -a, -um</term></>
<item><gloss>good</gloss></item>
<!-- etc. -->
</list>
Note in the above examples the use of the global lang attribute to specify on the <label> (or <term> ) element what language the term is from. For further discussion of the lang attribute see section 3.5 , and section 4.2 . A more elaborate markup for this glossary would distinguish the headword forms from the grammatical information (principal parts and gender), using tags described more fully in chapters 13 or 12 .

In addition to the <head> element used to supply a title or heading for the whole list, headings for the two columns of a glossary-style list may be specified using the two special elements <headLabel> and <headItem> :

The simple, straightforward statement of an idea is
preferable to the use of a worn-out expression.
<list type=gloss>
<headLabel>TRITE</>
<headItem>SIMPLE, STRAIGHTFORWARD</>
<label>bury the hatchet  </><item>stop fighting, make peace</>
<label>at loose ends     </><item>disorganized</>
<label>on speaking terms </><item>friendly</>
<label>fair and square   </><item>completely honest</>
<label>at death's door   </><item>near death</>
</list>

The elements <label> , <head> , <headLabel> , and <headItem> may contain only phrase-level elements. The <item> element however may contain paragraphs or other `chunks', including other lists. In this example, a glossary list contains two items, each of which is itself a simple list:

<list type=gloss><label>EVIL</label>
<item><list type=simple>
   <item>I am cast upon a horrible desolate island, void
          of all hope of recovery.</item>
   <item>I am singled out and separated as it were from
         all the world to be miserable.</item>
   <item>I am divided from mankind &mdash a solitaire; one
           banished from human society.</item>
     </list> <!-- end of first nested list --></item>
<label>GOOD</label>
<item><list type=simple>
     <item>But I am alive; and not drowned, as all my
              ship's company were.</item>
     <item>But I am singled out, too, from all the ship's
             crew, to be spared from death...</item>
     <item>But I am not starved, and perishing on a barren place,
            affording no sustenances....</item>
     </list><!-- end of second nested list --></item>
</list><!-- end of glossary list -->

Lists of different types may be nested to arbitrary depths in this way.

The formal declarations for lists and list items are as follows.

<!-- 6.7:  Lists and List Items                               -->
<!ELEMENT list          - -  (head?, ( (item*) | (headLabel?, 
                             headItem?, (label, item)*)))       >
<!ATTLIST list               %a.global;
          type               CDATA               simple         >
<!ELEMENT item          - O  (%specialPara;)                    >
<!ATTLIST item               %a.analysis;
                             %a.linking;
                             %a.terminology;
          id                 ID                  #IMPLIED
          lang               IDREF               %INHERITED
          rend               CDATA               #IMPLIED
          n                  CDATA               #IMPLIED       >
<!ELEMENT label         - o  (%phrase.seq;)                     >
<!ATTLIST label              %a.global;                         >
<!ELEMENT head          - O  (%paraContent;)                    >
<!ATTLIST head               %a.global;
          type               CDATA               #IMPLIED       >
<!ELEMENT headLabel     - O  (%phrase.seq;)                     >
<!ATTLIST headLabel          %a.global;                         >
<!ELEMENT headItem      - O  (%phrase.seq;)                     >
<!ATTLIST headItem           %a.global;                         >
<!-- This fragment is used in sec. 6.12                       -->

6.8 Notes, Annotation, and Indexing

6.8.1 Notes and Simple Annotation

The following elements are provided for the encoding of discursive notes, either already present in the copy text or supplied by the encoder:

A note is any additional comment found in a text, marked in some way as being out of the main textual stream. All notes should be marked using the same tag, <note> , whether they appear as block notes in the main text area, at the foot of the page, at the end of the chapter or volume, in the margin, or in some other place.

Notes may be in a different hand or typeface, may be authorial or editorial, and may have been added later. Attributes may be used to specify these and other characteristics of notes, as detailed below.

Where possible, the body of a note should be inserted in the text at the point at which its identifier or mark first appears. This may not be possible for example with marginal notes, which may not be anchored to an exact location. For simplicity, it may be adequate to position marginal notes before the relevant paragraph or other element. In some cases, however, it may be desirable to transcribe notes not at their point of attachment to the text but at their point of appearance (at the end of the volume, or the end of the chapter --- not, in general, when the notes appear at the foot of the page); in this case the target and targetEnd attributes should be used to specify the point of attachment. In some cases, the note is explicitly attached not to a point but to a span of text; for a full discussion of pointing to points and spans in the text, see section 6.6 .

Examples:

<l>The self-same moment I could pray
<l>And from my neck so free
<l>The albatross fell off, and sank
<l>Like lead into the sea.
<note type=auth place=margin>The spell begins to break</note>
Collections are ensembles of distinct entities or objects
of any sort.<note place=foot n=1>We explain below why we use
the uncommon term <mentioned>collection</mentioned>
instead of the expected <mentioned>set</mentioned>.
Our usage corresponds to the <mentioned>aggregate</mentioned> of many
mathematical writings and to the sense of <mentioned>class</mentioned>
found in older logical writings.
</note>
The elements ...

In addition to transcribing notes from the copy text, researchers may wish to annotate the electronic text itself, by attaching analytic notes in some structured vocabulary to particular passages of text, e.g. to specify the topics or themes of a text. The empty <span> element is provided for such applications; it is available only when the additional tag set for simple analysis is selected (see section 15.3 ).

The formal declarations for the <note> element is this:

<!-- 6.8.1:  Annotation                                       -->
<!ELEMENT note          - O  (%specialPara;)                    >
<!ATTLIST note               %a.analysis;
                             %a.linking;
                             %a.terminology;
          id                 ID                  #IMPLIED
          lang               IDREF               %INHERITED
          rend               CDATA               #IMPLIED
          target             IDREFS              #IMPLIED
          place              CDATA               'unspecified'
          targetEnd          IDREFS              #IMPLIED
          resp               CDATA               #IMPLIED
          anchored           (yes | no)          yes
          n                  CDATA               #IMPLIED
          type               CDATA               #IMPLIED       >
<!-- ... declarations from section 6.8.2                      -->
<!--     (Index Entries)                                      -->
<!--     go here ...                                          -->
<!-- This fragment is used in sec. 6.12                       -->

6.8.2 Index Entries

Machine-readable versions of existing texts rarely reproduce any index published with the copy text. Should a printed index be transcribed, the <div1> tag or a <div> tag at an appropriate level should be used to demarcate the index, and the index itself may be transcribed as a structured list or table.

It is convenient, however, to be able to generate a new index from a machine-readable text, whether the text is being written for the first time with the tags here defined or was transcribed from some other source. The <index> tag is provided for this purpose; it may be useful for marking points of particular interest for whatever reason, and not merely for generating printed indexes for a printed version of the text. The <divGen> element indicates the point at which an index, or any other generated text (e.g. a table of contents), is to appear in the output of a text production process.

The tag <index> associates up to four levels of index terms with a specific point in the text. The index terms are supplied in attributes named level1 , level2 , level3 , and level4 . An index attribute associates the entry with a particular index, so multiple indices are possible.

All index terms must be supplied as attribute values; no part of the text itself is taken as a term. This may require words or phrases to be repeated, as illustrated below; it also allows spelling to be normalized, as the example shows:

 
The students understand procedures for Arabic lemmatisation
<index level1='Arabic lemmatization'>and are beginning
to build parsers.

The <divGen> element marks the place at which an index generated from the <index> elements should be inserted into the output of a processing program; typically, this will be at some point within the back matter of the document; its type attribute should be used to specify which index is to be generated, and its n attribute to specify a name for the index:

<back>
<div><head>Examples</head>
<p> ...
</div>
<div><head>Bibliography</head>
<listBibl>
  <bibl> ... </bibl>
</listBibl>
</div>
<divGen type='index 1' n='Index Nominum'>
<divGen type='index 2' n='Index Rerum'  >
</back>

The formal declaration for these elements is as follows. The <index> element is a member of the element class metadata and may thus be used anywhere within the <text> element.

<!-- 6.8.2:  Index Entries                                    -->
<!ELEMENT index         - O  EMPTY                              >
<!ATTLIST index              %a.global;
          level1             CDATA               #REQUIRED
          index              CDATA               #IMPLIED
          level3             CDATA               #IMPLIED
          level4             CDATA               #IMPLIED
          level2             CDATA               #IMPLIED       >
<!ELEMENT divGen        - O  EMPTY                              >
<!ATTLIST divGen             %a.global;
          type               CDATA               #IMPLIED       >
<!-- This fragment is used in sec. 6.8.1                      -->

6.9 Reference Systems

By `reference system' we mean the system by which names or references are associated with particular passages of a text (e.g. `Ps. 23:3' for the third verse of Psalm 23 or `Amores 2.10.7' for Ovid's Amores , book 2, poem 10, line 7). Such names make it possible to mark a place within a text and enable other readers to find it again. A reference system may be based on structural units (chapters, paragraphs, sentences; stanza and verse), typographic units (page and line numbers), or divisions created specifically for reference purposes (chapter and verse in Biblical texts). Where one exists, the traditional reference system for a text should be preserved in an electronic transcript of it, if only to make it easier to compare electronic and non-electronic versions of the text.

Reference systems may be recorded in TEI-encoded texts in any of the following ways:

The specific method used to record traditional or new reference systems for a text should be declared in the TEI header, as further described in section 6.9.4 and in chapter 32 .

When a text has no pre-existing associated reference system of any kind, these Guidelines recommend as a minimum that at least the page boundaries of the source text be marked using one of the methods outlined in this section. Retaining page breaks in the markup is also recommended for texts which have a detailed reference system of their own. Line breaks in prose texts may be, but need not be, tagged. [ see note 46 ]

6.9.1 Using the ID and N Attributes

When traditional reference schemes represent a hierarchical structuring of the text which mirrors that of the SGML document, the n attribute defined for all elements may be used to indicate the traditional identifier of the relevant structural units. The n attribute may also be used to record the numbering of sections or list items in the copy text if the copy-text numbering is important for some reason, for example because the numbers are out of sequence.

For example, a traditional reference to Ovid's Amores might be `Amores 2.10.7'---book 2, poem 10, line 7. Book, poem, and line are structural units of the work and will therefore be tagged in any case. (See chapter 9 for a discussion of structural units in verse collections.) In such cases, it is convenient to record traditional reference numbers of the structural units using the n attribute. The relevant tags for our example would be:

<div0 n=Amores type=volume>
    <div1 n='2' type=book>
         <div2 n='10' type=poem>
              <l n='7'> ...

One may also place the entire standard reference for each portion of the text into the appropriate value for the n attribute, though for obvious reasons this takes more space in the file:

<div0 n=Amores type=volume>
    <div1 n='Amores 2' type=book>
         <div2 n='Amores 2.10' type=poem>
              <l n='Amores 2.10.7'> ...

If the names used by the traditional reference system can be formulated as SGML identifiers, then the references can be given as values for the id attribute; this requires that the reference be given without internal spaces, begin with a letter, and contain no characters other than letters, digits, hyphens, and full stops. [ see note 47 ] Unlike values for the n attribute, values for the id attribute must be unique throughout the document. Our example then looks like this:

<div0 id=Amores type=volume>
    <div1 id='Am.2' type=book>
         <div2 id='Am.2.10' type=poem>
              <l id='Am.2.10.7'> ...

To document the usage and to allow automatic processing of these standard references, it is recommended that the TEI header be used to declare whether standard references are recorded in the n or id attributes and which elements may carry standard references or portions of them. For examples of declarations for the reference systems just shown, see section 6.9.4 .

Using the n attribute one can specify only a single standard referencing system, a limitation not without problems, since some editions may define structural units differently and thus create alternative reference systems. For example, another edition of the Amores considers poem 10 a continuation of poem 9, and therefore would specify the same line as `Amores 2.9.31'. In order to record both of these reference systems, one must either use the id or n attribute for one, with competing systems in concurrent markup hierarchies, as discussed below in section 31.6 , or else use the <milestone> tags described in section 6.9.3 .

6.9.2 Creating New Reference Systems

If a text has no canonical reference system of its own, a reference system, if needed, may be derived from the structure of the electronic text, specifically from the SGML markup of the text. As with any reference system intended for long-term use, it is important to see the reference as an established, unchanging point in the text. Should the text be revised or rearranged, the reference-system identifiers associated with any bit of text must stay with that bit of text, even if it means the reference numbers fall out of sequence. (A new reference system may always be created beside the old one if out-of-sequence numbers must be avoided.)

The global attributes n and id may be used to assign reference identifiers to segments of the text. Identifiers specified by either attribute apply to the entire element for which they are given. SGML enforces uniqueness on ID attributes within a single document, and ID values must begin with a letter. No such restrictions are made on the values of n attributes.

A convenient method of mechanically generating unique values for id or n attributes based on the SGML structure of the document is to construct, for each element, a domain-style address comprising a series of components separated by full stops, with one component for each level of the SGML document hierarchy. Two methods may be used. In the typed path form of identifier, each component in the identifier takes the form element-type `-' number . The element name specifies what type of element to be sought, and the number specifies which occurrence of that element type is to be selected. (The hyphen and number may be omitted if there is only one element of the given type.) In the untyped path form of identifier, each component consists of a number, indicating which element in the sequence of nodes at each level is to be selected. A fixed prefix beginning with a letter may be used to make the untyped path legal as an SGML ID value.

Identifiers generated with these methods should use the <text> element as their starting point, rather than the <tei.2> or <body> elements. The <tei.2> element may be taken as a starting point only if identifiers need to be generated for the <teiHeader> , which is not usually the case; using the <body> element as a root would prevent assignment of identifiers for the front and back matter. The component corresponding to the root element can be omitted from identifiers, if no confusion will result. In collections and corpora, the component corresponding to the root may be replaced by the unique identifier assigned to the text or sample.

In the following example, each element within the <text> element has been given a typed-path identifier as its id value, and an untyped-path identifier as its n value; the latter are prefixed with the string `AB', which may be imagined to be the general identifier for this text.

   <text id='text-1' n='AB'>
     <front id='front' n='AB.1'>
       <div       id='front.div-1'     n='AB.1.1'><p> ... </div>
       <titlePage id='front.titlePage' n='AB.1.2'>
                 <titlePart> ... </titlePart>       </titlePage>
       <div       id='front.div-2'     n='AB.1.3'><p> ... </div>
     </front>
     <body id='body' n='AB.2'>
       <p id='body.p-1' n='AB.2.1'> ... </p>
       <p id='body.p-1' n='AB.2.2'> ... </p>
       <div id='body.div-1' n='AB.2.3'>
         <head id='body.div-1.head' n='AB.2.3.1'> ... </head>
         <p    id='body.div-1.p-1'  n='AB.2.3.2'> ... </p>
         <p    id='body.div-1.p-2'  n='AB.2.3.3'> ... </p>
         </div>
       <div id='body.div-2' n='AB.2.4'>
         <head id='body.div-2.head' n='AB.2.4.1'> ... </head>
         <p    id='body.div-2.p-1'  n='AB.2.4.2'> ... </p>
         <p    id='body.div-2.p-2'  n='AB.2.4.3'> ... </p>
         </div>
     </body>
  </text>
The typed and untyped path methods are convenient, but are in no way required for anyone creating a reference system.

If the id attribute is used to record the reference identifiers generated, each value should record the entire path. If the n attribute is used, each value may record either the entire path or only the subpath from the SGML parent element. The attribute used, the elements which can bear standard reference identifiers, and the method for constructing standard reference identifiers, should all be declared in the header as described in section 5.3.5 .

When the hierarchy of the SGML-encoded document and that of the reference system differ (e.g. for reference systems based on page and line numbers) or when more than one reference system is to be encoded, the encoder may choose to represent the alternative reference system(s) as elements in one or more concurrent document hierarchies. For an introduction to the concept of concurrent hierarchies, see the discussion of the CONCUR feature in section 2.5.2 . For further discussion of this and other mechanisms, see chapter 31 .

6.9.3 Milestone Tags

If concurrent markup is not desired (e.g. because the available SGML parser does not support the CONCUR feature), or if the desired reference system does not correspond to any particular structural hierarchy, it may be more convenient to mark up changes in the reference system by using one or more of the following milestone elements:

These elements simply mark the points in a text at which some category in a reference system changes. They have no content but subdivide the text into regions, rather in the same way as milestones divide a road into segments. The elements <pb> , <cb> , and <lb> are provided to mark specific types of milestone, namely page, column, and line boundaries, as further described in chapter 18 . No SGML validation of a reference system based on <milestone> tags is possible, so it will be the responsibility of the encoder or the application software to ensure that milestone tags occur in a correct order.

Milestone tags may be useful where a text has two competing structures. For example, many English novels were first published as serial works, individual parts of which do not always contain a whole number of chapters. An encoder may decide to represent the chapter-based structure using <div1> elements, with <milestone> elements to mark the points at which individual parts end; or the reverse. Thus, an encoding in which chapters are regarded as more important than parts might encode some work in which chapter three begins in part one and is concluded in part two as follows:

<text><body>
  <milestone unit=part>
  <div1 n=1 type=chapter>
  <!-- text of chapter 1 here -->
  </div1>
  <div1 n=2 type=chapter>
  <!-- text of chapter 2 here -->
  </div1>
  <div1 n=3>
  <!-- part of text of chapter 3 here -->
  <milestone unit=part>
  <!-- remainder of text of chapter 3 here -->.
  </div1>
</body></text>
An encoding of the same work in which parts are regarded as more important than chapters might begin as follows:
 
<text><body>
  <div1 n=1 type=part>
  <milestone unit=chapter>
  <p><!-- text of chapter 1 here -->
  <milestone unit=chapter>
  <p><!-- text of chapter 2 here -->
  <milestone unit=chapter>
  <p><!-- part of text of chapter 3 here -->
  </div1><div1 n=2 type=part>
  <p><!-- remainder of text of chapter 3 here -->
  <milestone unit=chapter>
  <!-- ... -->
</body></text>

Milestone tags also make it possible to record the reference systems used in a number of different editions of the same work. The reference system of any one edition can be recreated from a text in which all are marked by simply ignoring all elements that do not specify that edition on their ed attribute.

As a simple example, assuming that edition E1 of some collection of poems regards the first two poems as constituting the first book, while edition E2 regards the first poem as prefatory, a markup scheme like the following might be adopted:

 <milestone ed=E1 unit=work>
 <milestone ed=E2 unit=work>
 <milestone ed=E1 unit=book>
 <milestone ed=E1 unit=poem>
 <milestone ed=E2 unit=poem>
  <!-- text of first poem here -->
 <milestone ed=E2 unit=book>
 <milestone ed=E1 unit=poem>
 <milestone ed=E2 unit=poem>
  <!-- text of second poem here -->

In this case no n value is specified, since the numbers rise predictably and the application can keep a count from the start of the document, if desired.

The value of the n attribute may but need not include the identifiers used for any larger sections. That is, either of the following styles is legitimate:

 <milestone ed=E1 unit=work n='Amores'>
 <milestone ed=E1 unit=book n=1>
 <milestone ed=E1 unit=poem n=1>
  <!-- text of Amores 1.1 -->
 <milestone ed=E1 unit=poem n=2>
  <!-- text of Amores 1.2 -->
 <milestone ed=E1 unit=book n=3>
or
 <milestone ed=E1 unit=work n='Amores'>
 <milestone ed=E1 unit=book n=1>
 <milestone ed=E1 unit=poem n='1.1'>
 <!-- text of Amores 1.1 -->
 <milestone ed=E1 unit=poem n='1.2'>
  <!-- text of Amores 1.2 -->
 <milestone ed=E1 unit=book n='1.3'>

When using <milestone> tags, line numbers may be supplied for every line or only periodically (every fifth, every tenth line). The latter may be simpler; the former is more reliable.

The style of numbering used in the values of n is unrestricted: for the example above, I.i , I.ii , and I.iii could have been used equally well if preferred. The special value unnumbered should be reserved for marking sections of text which fall outside the normal numbering system (e.g. chapter heads, poem numbers, titles, or speaker attributions in a verse drama).

Because the ed attribute is unrestricted, no change need be made to the document type declaration of a file before adding tags to describe a new reference system. (The value of ed may be restricted to a defined set of edition symbols by using the techniques described in chapter 29 .)

See below, section 6.9.4 , for examples of declarations for the reference systems just shown.

The milestone elements are formally defined as follows:

<!-- 6.9.3:  Milestone tags                                   -->
<!ELEMENT milestone     - O  EMPTY                              >
<!ATTLIST milestone          %a.analysis;
                             %a.linking;
                             %a.terminology;
          id                 ID                  #IMPLIED
          lang               IDREF               %INHERITED
          rend               CDATA               #IMPLIED
          ed                 CDATA               #IMPLIED
          n                  CDATA               #IMPLIED
          unit               CDATA               #REQUIRED      >
<!ELEMENT pb            - O  EMPTY                              >
<!ATTLIST pb                 %a.analysis;
                             %a.linking;
                             %a.terminology;
          id                 ID                  #IMPLIED
          lang               IDREF               %INHERITED
          rend               CDATA               #IMPLIED
          ed                 CDATA               #IMPLIED
          n                  CDATA               #IMPLIED       >
<!ELEMENT lb            - O  EMPTY                              >
<!ATTLIST lb                 %a.analysis;
                             %a.linking;
                             %a.terminology;
          id                 ID                  #IMPLIED
          lang               IDREF               %INHERITED
          rend               CDATA               #IMPLIED
          ed                 CDATA               #IMPLIED
          n                  CDATA               #IMPLIED       >
<!ELEMENT cb            - O  EMPTY                              >
<!ATTLIST cb                 %a.analysis;
                             %a.linking;
                             %a.terminology;
          id                 ID                  #IMPLIED
          lang               IDREF               %INHERITED
          rend               CDATA               #IMPLIED
          ed                 CDATA               #IMPLIED
          n                  CDATA               #IMPLIED       >
<!-- This fragment is used in sec. 6.12                       -->

6.9.4 Declaring Reference Systems

Whatever kind of reference system is used in an electronic text, it is recommended that the TEI header contain a description of its construction in the <refsDecl> element described in section 5.3.5 . As described there, the declaration may consist either of a formal declaration using the <step> tag or an informal description in prose. The former is recommended because unlike prose it can be processed by software.

The three examples given in section 6.9.1 would be declared as follows. The first example encodes the standard references for Ovid's Amores one level at a time, using the n attribute on the <div0> , <div1> , <div2> , and <l> tags. The header for such an encoding should look something like this:

<teiHeader>
    <fileDesc> ...
    </fileDesc>
    <encodingDesc>
         ...
         <refsDecl>
              <step refunit='work' delim=' '
                    from='DESCENDANT (1 DIV0 N %1)' to='DITTO' >
              <step refunit='book' delim='.'
                    from='CHILD (1 DIV1 N %1)' to='DITTO' >
              <step refunit='poem' delim='.'
                    from='CHILD (1 DIV2 N %1)' to='DITTO' >
              <step refunit='line'
                    from='CHILD (1 L N %1)' to='DITTO' >
         </refsDecl>
         ...
    </encodingDesc>
</teiHeader>

The second example encodes the same reference system, again using the n attribute on the <div0> , <div1> , <div2> , and <l> tags, but giving the reference string in full on each tag. If canonical references are made only to lines, the reference system could be declared as follows:

 <refsDecl>
   <step refunit='line'
         from='DESCENDANT (1 L N %1)' to='DITTO' >
 </refsDecl>
Since no delimiter is specified, the entire canonical reference string is sought as the value of the n attribute on an <l> element.

In order to handle references to works, books, and poems as well as to individual lines, the declaration for the reference system must be more complicated:

 <refsDecl>
   <step from='DESCENDANT (1 (DIV[012]|L) N %1)' to='DITTO' >
 </refsDecl>
This declaration indicates that the entire reference string must be sought as the value of the n attribute on a <div0> , <div1> , <div2> , or <l> element.

The third example encodes the same reference system, this time giving the entire reference string as the value of the id attribute on the relevant tags. The reference system declaration for such an encoding would be:

<refsDecl>
    <step from="ID (%1)" to="DITTO">
</refsDecl>

As in the previous example, no single value can be given for the refunit attribute in this declaration, as the single step handles references to works, books, and poems, as well as to lines. The type attribute on the <div0> , <div1> , and <div2> elements may be used, however, to indicate the type of the result returned from a match.

Reference systems recorded by means of milestone tags can also be declared; the following prose description could be used to declare the example given in section 6.9.3 .

<refsDecl>
    <p>Standard references to work, book, poem, and line may be
    constructed from the Milestone tags in the text.
</refsDecl>
Or in this way, using a formal declaration for this reference scheme derived from edition E1 .
<refsDecl>
    <state delim=' ' unit=work ed='E1'>
    <state delim='.' unit=book ed='E1'>
    <state delim=':' unit=poem ed='E1'>
    <state unit=line ed='E1'>
</refsDecl>

This is synonymous with the following declaration using the <step> element:

<refsDecl>
    <step refunit='work' delim=' '
          from='DESCENDANT (1 MILESTONE EDITION E1 UNIT work N %1)'
          to=  'FOLLLOWING (1 MILESTONE EDITION E1 UNIT work)' >
    <step refunit='book' delim='.'
          from='DESCENDANT (1 MILESTONE EDITION E1 UNIT book N %2)'
          to=  'FOLLLOWING (1 MILESTONE EDITION E1 UNIT book)' >
    <step refunit='poem' delim='.'
          from='DESCENDANT (1 MILESTONE EDITION E1 UNIT poem N %3)'
          to=  'FOLLLOWING (1 MILESTONE EDITION E1 UNIT poem)' >
    <step refunit='line'
          from='DESCENDANT (1 MILESTONE EDITION E1 UNIT line N %4)'
          to=  'FOLLLOWING (1 MILESTONE EDITION E1 UNIT line)' >
</refsDecl>

6.10 Bibliographic Citations and References

Bibliographic references (that is, full descriptions of bibliographic items such as books, articles, films, broadcasts, songs, etc.) or pointers to them may appear at various places in a TEI text. They are required at several points within the TEI Header's source description, as discussed in section 5.2.7 ; they may also appear within the body of a text, either singly, (for example within a footnote), or collected together in a list as a distinct part of a text.

In printed texts, the individual constituents of a bibliographic reference are conventionally marked off from each other and from the flow of text by such features as bracketing, italics, special punctuation conventions, underlining, etc. In electronic texts, such distinctions are also important, whether in order to produce acceptably formatted output or to facilitate intelligent retrieval processing, [ see note 48 ] quite apart from the need to distinguish the reference itself as a textual object with particular linguistic properties.

It should be emphasized that for references as for other textual features, the primary or sole consideration is not how the text should be formatted when it is printed. The distinctions permitted by the scheme outlined here may not necessarily be all that particular formatters or bibliographic styles require, although they should prove adequate to the needs of many such commonly used software systems. [ see note 49 ] The features distinguished and described below (in section 6.10.2 ) constitute a set which has been useful for a wide range of bibliographic purposes and in many applications, and which moreover corresponds to a great extent with existing bibliographic and library cataloguing practice. For a fuller account of that practice as applied to electronic texts see section 5.2.7 ; for a brief mention of related library standards see section 5.7 .

6.10.1 Elements of Bibliographic References

The following elements are used to mark individual bibliographic references as wholes, or in groups:

These elements all share a number of possible component sub-elements. For the <bibl> and <biblStruct> elements, exactly the same sub-elements are concerned, and they are described together in section 6.10.2 ; for the <biblFull> element, the sub-elements concerned are fully described in section 5.2 .

Different levels of specific tagging may be appropriate in different situations. In some cases, it may be felt necessary to mark just the extent of the reference itself, with perhaps a few distinctions being made within it (for example, between the part of the reference which identifies a title or author and the rest). Such references, containing a mixture of text with specialized bibliographic elements, are regarded as <bibl> elements, and tagged accordingly. For example:

<p>A book which had a great influence on him
was <bibl>Tufte's <title>Envisioning
Information</title></bibl>, although he may
never have actually read it.
Indeed, some encoders may find it unnecessary to mark the bibliographic reference at all:
<p>A book which had a great influence on him
was Tufte's <title>Envisioning Information</title>,
although he may never have actually read it.

Some bibliographic references are extremely elliptical, often only a string of the form `Baxter, 1983'. If no further details of Baxter's book are given in the source text and none are supplied by the encoder, then the reference thus given should be tagged as a <bibl> :

All of this is of course much more fully treated
in <bibl>Baxter, 1983</bibl>.
In general, however, normal modern bibliographic practice, and these Guidelines, distinguish between a bibliographic reference, which is a self-sufficient description of a bibliographic item, and a bibliographic pointer, which is a short-form citation (such as `Baxter, 1983') which serves usually as a place-holder or pointer to a full long-form reference found elsewhere in the text. The usual encoding of short-form references such as `Baxter, 1983' is not as <bibl> elements but as cross-references to such elements; see section 6.10.3 below.

In cases where the encoder wishes to impose more structure on the bibliographic information, for example to make sure it conforms to a particular style-sheet or retrieval processor, the <biblStruct> element should be used. Note that several of the features in this and later examples are explained later in the current section.

<biblStruct>
    <monogr>
    <author>Edward R. Tufte</>
    <title>Envisioning Information</>
    <imprint>
         <pubPlace>Cheshire, Conn.</>
         <publisher>Graphics Press</>
         <date>1990</>
    </imprint></monogr>
</biblStruct>

The highest level of detail and the most complex structure supported by the current proposals is provided by the <biblFull> element, which closely resembles the <fileDesc> element of the TEI Header (section 5.2 ).

<biblFull>
    <titleStmt>
         <title>Envisioning Information</title>
         <author>Tufte, Edward R[olf]</author>
      </titleStmt>
    <extent>126 pp.</extent>
    <publicationStmt>
         <publisher>Graphics Press</publisher>
         <pubPlace>Cheshire, Conn. USA</pubPlace>
         <date>1990</date>
    </publicationStmt>
</biblFull>

A list of bibliographic items, of whatever kind, may be treated in the same way as any other list (see section 6.7 ). Alternatively, the specialized <listBibl> element may be used. The difference between the two is that a <list> contains <item> elements, within which bibliographic elements (<bibl> , <biblStruct> or <biblFull> ) may appear, as well as other phrase- and paragraph-level elements, whereas the <listBibl> may contain only bibliographic elements, optionally preceded by a heading and a series of introductory paragraphs. The former would be appropriate for a list of bibliographic elements in which descriptive prose predominated, and the latter for a more formal bibliography. The following are thus both legal encodings of a list of bibliographic entries: a <listBibl> :

<listBibl>
<head>Bibliography</head>
<biblStruct id=NEL80>
    <analytic>
         <author>Nelson, T. H.</>
         <title>Replacing the printed word:
                a complete literary system.</>
    </analytic>
    <monogr>
         <title>Information Processing '80:  Proceedings of the IFIPS
                Congress, October 1980</>
         <editor>Simon H. Lavington</>
         <imprint>
              <publisher>North-Holland</>
              <pubPlace>Amsterdam</>
              <date>1980</>
         </imprint>
         <biblScope>pp 1013-23
    </monogr>
    <note>Apparently a draft of section 4 of <title>Literary
          Machines</title>.</note>
</biblStruct>
<bibl id=NEL88>Ted Nelson:  <title>Literary Machines</title>
    (privately published, 1987)</bibl>
<bibl id=BAX88>
    <author>Baxter, Glen</author>
    <title>Glen Baxter His Life: the years of struggle</title>
    London: Thames and Hudson, 1988.
    </bibl>
</listBibl>
or a simple <list> :
<list><head>Bibliography</>
<item><bibl id=NEL80>
        <author>Nelson, T. H.</>
        <title level=a>Replacing the printed word:
             a complete literary system.</>
        <title level=m>Information Processing '80:
             Proceedings of the IFIPS Congress, October 1980</>
        <editor>Simon H. Lavington</>
        <publisher>North-Holland</>
        <pubPlace>Amsterdam</>
        <date>1980</>
        <biblScope>pp 1013-23
        <note>Apparently a draft of section 4 of <title>Literary
         Machines</title>.</note>
    </bibl></item>
<item><bibl id=NEL88>Ted Nelson:  <title>Literary Machines</title>
    (privately published, 1987)</bibl></item>
<item><bibl id=BAX88>
    <author>Baxter, Glen</author>
    <title>Glen Baxter His Life: the years of struggle</title>
    London: Thames and Hudson, 1988.
    </bibl></item>
</list>

The formal declarations for these elements are as follows:

<!-- 6.10.1:  Tags for Bibliographic References               -->
<!ELEMENT bibl          - o  (#PCDATA | %m.phrase; | 
                             %m.biblPart;)*                     >
<!ATTLIST bibl               %a.global;
                             %a.declarable;                     >
<!ELEMENT biblStruct    - o  (analytic?, (monogr, series*)+, 
                             (note | idno)*)                    >
<!ATTLIST biblStruct         %a.global;
                             %a.declarable;                     >
<!ELEMENT biblFull      - o  (titleStmt, editionStmt?, extent?, 
                             publicationStmt, seriesStmt?, 
                             notesStmt?, sourceDesc*)           >
<!ATTLIST biblFull           %a.global;
                             %a.declarable;                     >
<!ELEMENT listBibl      - -  (head?, (bibl | biblStruct | 
                             biblFull)+, trailer?)              >
<!ATTLIST listBibl           %a.global;
                             %a.declarable;                     >
<!-- (continued in sec. 6.10.2.1, 6.10.2.2, 6.10.2.3)         -->
<!-- This fragment is used in sec. 6.12                       -->

6.10.2 Components of Bibliographic References

This section discusses a number of very commonly occurring component elements of bibliographic references. They fall into four groups:

The following sections describe the elements which may be used to represent such information within a <bibl> or <biblStruct> element. Within the former, any or all of these may be used and in any order. Within the latter, such of these elements as exist for a given reference must be distinguished, and must also be presented in a specific order, discussed further below (section 6.10.2.6 ).

6.10.2.1 Analytic, Monographic, and Series Levels

In common library practice a clear distinction is made between an individual item within a larger collection and a free-standing book, journal, or collection. Similarly a book in a series is distinguished sharply from the series within which it appears. An article forming part of a collection which itself appears in a series thus has a bibliographic description with three quite distinct levels of information:

  1. the analytic level, giving the title, author, etc., of the article;
  2. the monographic level, giving the title, editor, etc., of the collection;
  3. the series level, giving the title of the series, possibly the names of its editors, etc., and the number of the volume within that series.
In the same way, an article in a journal requires at least two levels of information: the analytic level describing the article itself, and the monographic level describing the journal.

These three levels may be distinguished within a <bibl> element, and must be distinguished within a <biblStruct> element if present, by means of the following tags:

For purposes of TEI encoding, journals and anthologies are both treated as monographs; a journal title will thus be tagged <title level=j> or <monogr> <title> ... </title> ... </monogr> . Individual articles in the journal or collected texts should be treated at the `analytic' level. When an article has been printed in more than one journal or collection, the bibliographic reference may have more than one <monogr> element, each possibly followed by one or more <series> elements. A <series> element always relates to the most recently preceding <monogr> element. (Whether reprints of an article are treated in the same bibliographic reference or a separate one varies among different styles. Library lists typically use a different entry for each publication, while academic footnoting practice typically treats all publications of the same article in a single entry.)

For example, the article cited in this example has been published twice, once in a journal and once in a collection which appeared in a series:

    <biblStruct>
         <analytic>
              <author>Thaller, Manfred</author>
              <title level=a>A Draft Proposal for a Standard for the
                     Coding of Machine Readable Sources</>
         </analytic>
         <monogr>
              <!-- In -->
              <title level=j>Historical Social Research</title>
              <imprint>
                    <biblScope type=vol>40</>
                    <date>October 1986</>
                    <biblScope type=pages>3-46</>
              </imprint>
         </monogr>
         <monogr>
              <!-- Rpt. in -->
              <title level=m>Modelling Historical Data:
                     Towards a Standard for Encoding and Exchanging
                     Machine-Readable Texts</title>
              <editor>Daniel I. Greenstein</editor>
              <imprint>
                   <pubPlace>St. Katharinen</pubPlace>
                   <publisher>Max-Planck-Institut für Geschichte
                         In Kommission bei
                         Scripta Mercaturae Verlag</publisher>
                   <date>1991</date>
              </imprint>
         </monogr>
         <series>
              <title level=s>Halbgraue Reihe
                   zur Historischen Fachinformatik</title>
              <respStmt><resp>Herausgegeben von</resp>
                    <name type=person>Manfred Thaller</>
                    <name type=org>Max-Planck-Institut für
                        Geschichte</>
              </respStmt>
              <title level=s>Serie A:  Historische Quellenkunden</>
              <biblScope>Band 11</biblScope>
         </series>
    </biblStruct>

Punctuation may not appear between the elements within a structured bibliographic entry; if punctuation is to be given explicitly in the encoding, it must be contained within the elements it delimits. As the example shows, it is possible to encode the entry without any inter-element punctuation: this facilitates use of the <biblStruct> element in systems which can render bibliographic references in any of several styles.

The formal declarations for the elements defined in this section are as follows:

<!-- 6.10.2.1:  Tags for Bibliographic References (cont'd)    -->
<!-- (continuation of sec. 6.10.1)                            -->
<!ELEMENT analytic      - O  (author | editor | respStmt | 
                             title)*                            >
<!ATTLIST analytic           %a.global;                         >
<!ELEMENT monogr        - O  ( ( ((author | editor | 
                             respStmt)+, title+, (editor | 
                             respStmt)*) | (title+, (author | 
                             editor | respStmt)*))?, (note | 
                             meeting)*, (edition, (editor | 
                             respStmt)*)*, imprint, (imprint | 
                             extent | biblScope)* )             >
<!ATTLIST monogr             %a.global;                         >
<!ELEMENT series        - O  (title | editor | respStmt | 
                             biblScope | #PCDATA)*              >
<!ATTLIST series             %a.global;                         >

6.10.2.2 Authors, Titles, and Editors

Bibliographic references typically begin with a statement of the title being cited and the names of those intellectually responsible for it. For articles in journals or collections, such statements should appear both for the analytic and for the monographic level. The following elements are provided for tagging such elements:

In bibliographic references, all titles should be tagged as such, whether analytic, monographic, or series titles. The single element <title> is used for all these cases. When it appears directly within an <analytic> , <monogr> , or <series> element, <title> is interpreted as belonging to the appropriate level. When it appears elsewhere, its level attribute should be used to signal its bibliographic level. It is a semantic error to give a value for the level attribute which is inconsistent with the context; such values may be ignored. The level value a implies the analytic level; the values m , j , and u imply the monographic level; the value s implies the series level. Note, however, that the semantic error occurs only if the nested title is directly enclosed by the <analytic> , <monogr> , or <series> element; if it is enclosed only indirectly, no semantic error need be present. For example, the analytic title may contain a monographic title:

<biblStruct>
    <analytic><author>Lucy Allen Paton</>
              <title>Notes on Manuscripts
                   of the <title level=m>Prophécies
                   de Merlin</title></title>
    </analytic>
    <monogr><title level=j>PMLA</>
          <imprint><biblscope type=vol>8</>
            <date>1913</date>
            <biblscope type=pages>122</>
      </imprint>
    </monogr>
</biblStruct>
In this case, the analytic title ``Notes on Manuscripts of the Proph[eacute]cies de Merlin '' needs no level attribute because it is directly contained by the <analytic> level; the monographic title contained within it, ``Proph[eacute]cies de Merlin,'' does not create a semantic error because it is not directly contained by the <analytic> element.

In some bibliographic applications, it may prove useful to distinguish main titles from subordinate titles, parallel titles, etc. The type attribute is provided to allow this distinction to be recorded.

The following reference, from a national standard for bibliographic references, [ see note 50 ] illustrates this type of analysis with its distinction between main and subordinate titles.

<bibl>Saarikoski, Pirkko-Liisa, and Paavo Suomalainen,
<title level=a type=main>Studies on the physiology of
the hibernating hedgehog, 15</title>
<title level=a type=subordinate>Effects of seasonal
and temperature changes on the in vitro
glycerol release from brown adipose tissue</>
<title level=j>Ann. Acad. Sci. Fenn., Ser. A4</>
<date>1972</> <biblScope type='vol: pp'>187: 1-4</>
</bibl>

Slightly more complex is the distinction made below among main, subordinate, and parallel titles, in an example from the same source (p. 63). The punctuation and the bibliographic analysis are those given in ANSI Z39.29-1977; the punctuation is in the style prescribed by the International Standard Bibliographic Description (ISBD). [ see note 51 ]

<bibl>Tchaikovsky, Peter Ilich.
<title level=m type=main>The swan lake ballet</>
= <title level=m type=parallel>Le lac des cygnes</>
: <title level=m type=subordinate>grand ballet en 4 actes</>
: <title level=m type=subordinate>op. 20</>
[Score].
New York:  Broude Brothers; [1951] (B.B. 59). vi, 685 p.
</bibl>

The elements <author> and <editor> have, for printed books and articles, a fairly obvious significance; for other kinds of bibliographic items their proper usage may be less obvious. The <author> element should be used for the person or agency with primary responsibility for a work's intellectual content, and the element <editor> for an editor of the work. Thus an organization such as a radio or television station is usually accounted `author' of a broadcast, for example, while the author of a Government report will usually be the agency which produced it.

For anyone else with responsibility for the work, the <respStmt> element should be used. The nature of the responsibility is indicated by means of a <resp> element, and the person, organization etc. responsible by a <name> element. At least one of each of these must be given within the <respStmt> element, followed optionally by any number of either. Examples of secondary responsibility of this kind include the roles of illustrator, translator, editor, annotator. The <respStmt> element may also be used for editors, if it is desired to record the specific terms in which their role is described.

Examples of <author> and <editor> may be found in sections 6.10.1 , and 6.10.2.1 ; wherever <author> and <editor> may occur, the <respStmt> element may also occur. When one of these elements precedes or immediately follows a title, it applies to that title; when it follows an <edition> element or occurs within an edition statement, it applies to the edition in question.

In this example, the <respStmt> elements apply to the work as a whole, not merely to the first edition:

 <bibl>
    <author>Lominadze, D. G.</>
    <title level=m>Cyclotron waves in plasma.</title>
    <respStmt><resp>translated by</> <name>A. N. Dellis;</name>
              <resp>edited by</> <name>S. M. Hamberger.</name>
    </respStmt>
    <edition>1st ed.</>
    <imprint><pubPlace>Oxford:</> <publisher>Pergamon Press,</>
       <date>1981.</></imprint>
    <extent>206 p.</>
    <title level=s>International series in natural philosophy.</>
    <note place=inline>Translation of:
      <title level=m lang=RU>Ciklotronnye volny v plazme.</></>
</bibl>

In this example, by contrast, the <respStmt> element applies to the edition, and not to the collection per se (Moser and Tervooren were not responsible for the first thirty-five printings); the elements of the reference have been reordered from their appearance on the title page of the volume in order to ensure the correct relationship of the collection title, the edition statement, and the statement of responsibility.

<biblStruct>
<monogr>
    <title>Des Minnesangs Frühling</title>
    <note place=inline>Mit 1 Faksimile</note>
    <edition>36., neugestaltete und erweiterte Auflage</edition>
    <respStmt>
         <resp>Unter Benutzung der Ausgaben
         von <name>Karl Lachmann</> und <name>Moriz Haupt</>,
         <name>Friedrich Vogt</> und <name>Carl von Kraus</>
         bearbeitet von
         </resp>
         <name>Hugo Moser</>
         <!-- und -->
         <name>Helmut Tervooren</>
    </respStmt>
    <imprint>
         <biblScope type=volume>I</>
         <biblScope type='volume title'>Texte</>
         <pubPlace>Stuttgart</>
         <publisher>S. Hirzel Verlag</>
         <date>1977</>
    </imprint>
</monogr>
</biblStruct>
With the exception of the <name> element (for which see section 6.4 ), the elements described in this section are defined as follows:
<!-- 6.10.2.2:  Tags for Bibliographic References (cont'd)    -->
<!-- (continuation of sec. 6.10.1)                            -->
<!ELEMENT author        - o  (%phrase.seq)                      >
<!ATTLIST author             %a.global;                         >
<!ELEMENT editor        - o  (%phrase.seq)                      >
<!ATTLIST editor             %a.global;
          role               CDATA               editor         >
<!ELEMENT respStmt      - o  (resp | name)+                     >
<!ATTLIST respStmt           %a.global;                         >
<!ELEMENT resp          - o  (%phrase.seq;)                     >
<!ATTLIST resp               %a.global;                         >
<!ELEMENT title         - o  (%paraContent)                     >
<!ATTLIST title              %a.global;
          level              (a | m | j | s | u) #IMPLIED
          type               CDATA               #IMPLIED       >
<!ELEMENT meeting       - -  (%paraContent)                     >
<!ATTLIST meeting            %a.global;                         >

6.10.2.3 Imprint, Pagination, and Other Details

By `imprint' is meant all the information relating to the publication of a work: the person or organization by whose authority and in whose name a bibliographic entity such as a book is made public or distributed (whether a commercial publisher or some other organization), the place of publication, and a date. It may also include a full address for the publisher or organization. Full bibliographic references usually specify either the number of pages in a print publication (or equivalent information for non-print materials), or the specific location of the material being cited within its containing publication. The following elements are provided to hold this information:

For bibliographic purposes, usually only the place (or places) of publication are required, possibly including the name of the country, rather than a full address; the element <pubPlace> is provided for this purpose. Where however the full postal address is likely to be of importance in identifying or locating the bibliographic item concerned, it may be supplied and tagged using the <address> element described in section 6.4.2 . Alternatively, if desired, the <rs> or <name> elements described in section 6.4.1 may be used; this involves no claim that the information given is either a full address or the name of a city.

The name of the publisher of an item should be marked using the <publisher> tag even if the item is made public (`published') by an organization other than a conventional publisher, as is frequently the case with technical reports:

<biblStruct>
<monogr>
    <author>Nicholas, Charles K.</author>
    <author>Welsch, Lawrence A.</author>
    <title>On the interchangeability of SGML and ODA</title>
    <imprint>
         <pubPlace>Gaithersburg, MD</pubPlace>
         <publisher>National Institute of Standards and Technology
         </publisher>
         <date value='1992-01'>January 1992</date>
    </imprint>
    <extent>19 pp.</extent>
</monogr>
<idno type='NIST'>NISTIR 4681</idno>
</biblStruct>
and with dissertations:
<biblStruct>
<monogr>
    <author>Hansen, W.</>
    <title level=u>Creation of hierarchic text
         with a computer display</>
    <note place=inline>Ph.D. dissertation</>
    <imprint>
         <publisher>Dept. of Computer Science, Stanford Univ.</>
         <pubPlace>Stanford, CA</>
         <date value='1971-06'>June 1971</>
    </imprint>
</monogr>
</biblStruct>

When an item has been reprinted, especially reprinted without change from a specific earlier edition, the reprint may appear in a <monogr> element with only the <imprint> and other details of the reprint. In the following example, a microform reprint has been issued without any change in the title or authorship. The series statement here applies only to the second <monogr> element.

<biblStruct>
<monogr>
    <author>Shirley, James</>
    <title type=main>The gentlemen of Venice</>
    <title type=subordinate>a tragi-comedie presented at
         the private house in Salisbury Court by
         Her Majesties servants</>
    <note place=inline>[Microform]</>
    <imprint>
         <pubPlace>London</>
         <publisher>H. Moseley</>
         <date>1655</>
    </imprint>
    <extent>78 p.</>
</monogr>
<monogr>
    <imprint>
         <pubPlace>New York</>
         <publisher>Readex Microprint</>
         <date>1953</>
    </imprint>
   <extent>1 microprint card, 23 x 15 cm.</>
</monogr>
<series>
    <title>Three centuries of drama:  English, 1642-1700</>
</series>
 </biblStruct>

A bibliographic description, particularly for an analytic title, will often include some additional information specifying its location, for example as a volume number, page number, range of page numbers, or name or number of a subdivision of the host work. The element <biblScope> may be used to identify such information if it is present. Where it is desired to distinguish different classes of such information (volume number, page number, chapter number, etc.), the type attribute may be used with any convenient typology.

When the item being cited is a journal article, the <imprint> element describing the issue in which it appeared will typically contain <biblScope> elements for volume and page numbers, together with a <date> element.

For example:

<biblStruct>
<analytic>
    <author>Wrigley, E. A.</>
    <title>Parish registers and the historian</>
</analytic>
<!-- in -->
<monogr>
    <editor>Steel, D. J.</>
    <title>National index of parish registers</>
    <imprint>
         <pubPlace>London</pubPlace>
         <publisher>Society of Genealogists</>
         <date value='1968'>1968</>
    </imprint>
    <biblScope type=volume>vol. 1</>
    <biblScope type=pages>pp. 155-167.</>
</monogr>
</biblStruct>

The type attribute on <biblScope> is optional: both the following are legal examples:

<biblStruct>
<analytic>
    <author>Boguraev, Branimir</>
    <author>Neff, Mary</>
    <title>Text Representation, Dictionary Structure,
         and Lexical Knowledge</>
</analytic>
<!-- in -->
<monogr>
    <title level=j>Literary & Linguistic Computing</>
    <imprint>
         <biblScope type=volume>7</>
         <biblScope type=issue>2</>
         <date>1992</>
         <biblScope type=pages>110-112</>
    </imprint>
</monogr>
</biblStruct>
<biblStruct>
<analytic>
    <author>Chesnutt, David</>
    <title>Historical Editions in the States</>
</analytic>
<!-- in -->
<monogr>
    <title level=j>Computers and the Humanities</>
    <imprint>
         <biblScope>25.6</>
         <date value='1991-12'>(December, 1991):</>
         <biblScope>377-380</>
    </imprint>
</monogr>
</biblStruct>

Formal definitions for the elements described in this section are as follows:

<!-- 6.10.2.3:  Tags for Bibliographic References (cont'd)    -->
<!-- (continuation of sec. 6.10.1)                            -->
<!ELEMENT imprint       - O  (pubPlace | publisher | date | 
                             biblScope)*                        >
<!ATTLIST imprint            %a.global;                         >
<!ELEMENT publisher     - o  (%phrase.seq)                      >
<!ATTLIST publisher          %a.global;                         >
<!ELEMENT biblScope     - o  (%phrase.seq)                      >
<!ATTLIST biblScope          %a.global;
          type               CDATA               #IMPLIED       >
<!ELEMENT pubPlace      - -  (%phrase.seq;)                     >
<!ATTLIST pubPlace           %a.global;
                             %a.names;                          >

<!-- Note and date are defined elsewhere, as are extent,      -->
<!-- address, and idno.                                       -->

6.10.2.4 Series Information

Series information may (in <bibl> elements) or must (in <biblStruct> elements) be enclosed in a <series> element or (in a <biblFull> element) a <seriesStmt> element. The title of the series may be tagged <title level=s> , the volume number <biblScope type=volume> , and responsibility statements for the series (e.g. the name and affiliation of the editor, as in the example in section 6.10.2.1 ) may be tagged <editor> or <respStmt> .

6.10.2.5 Notes and Other Additional Information

Explanatory notes about the publication of unusual items, the form of an item (e.g. `[Score]' or `[Microform]'), or its provenance (e.g. `translation of ...') may be tagged using the <note> element. The same element may be used for any descriptive annotation of a bibliographic entry in a database.

For example:

<bibl><author>Coombs, James H., Allen H. Renear,
       and Steven J. DeRose.</author>
       <title level=a>Markup Systems and the Future of Scholarly
       Text Processing.</title>
       <title level=j>Communications of the ACM</title>
       <biblScope>30.11 (November 1987):  933-947.</biblScope>
       <note>Classic polemic supporting descriptive over procedural
            markup in scholarly work.</note>

6.10.2.6 Order of Components within References

The order of elements in <bibl> elements is not constrained.

In <biblStruct> elements, the <analytic> element, if it occurs, must come first, followed by one or more <monogr> and <series> elements, which may appear intermingled (as long as a <monogr> element comes first). Within <analytic> , the title(s), author(s), editor(s), and other statements of responsibility may appear in any order; it is recommended that all forms of the title be given together. Within <monogr> , the author, editor, and statements of responsibility may either come first or else follow the monographic title(s). Following these, the elements must appear in the following order:

Within <imprint> , the elements allowed may appear in any order.

Finally, within the <series> information in a <biblStruct> , the sequence of elements is not constrained.

If more detailed structuring of a bibliographic description is required, the <biblFull> element should be used. This is not further described here, as its contents are essentially equivalent to those of the <fileDesc> element in the <teiHeader> , which is fully described in section 5.2 .

6.10.3 Bibliographic Pointers

References which are pointers to bibliographic items, of whatever kind, should be treated in the same way as other cross-references (see section 6.6 ). As discussed in that section, cross referencing within TEI texts is in general represented by means of <ptr> or <ref> elements. A target attribute on these elements is used to supply an identifying value for the target of the cross reference, which should be, in the case of bibliographic elements, a bibliographic reference of some kind. Where the form of the reference itself is unimportant, or may be reconstructed mechanically, or is not to be encoded, the <ptr> element is used, as in the following example:

As shown above (<ptr target=NEL80>)...

Where the form of the reference is important, or contains additional qualifying information which is to be kept but distinguished from the surrounding text, the <ref> element should be used, as in the following example:

Nelson claims <ref target=NEL80>(ibid, passim)</ref> ...
It may be important to distinguish between the short form of a bibliographic reference and some qualifying or additional information. The latter should not appear within the scope of the <ref> element when this is the case, as for example in an application concerned to normalize bibliographic references:
Nelson claims (<ref target=NEL80>Nelson [1980]</ref>, pages 13-37) ...

6.10.4 Relationship to Other Bibliographic Schemes

The bibliographic tagging defined here can capture the distinctions required by most bibliographic encoding systems; for the benefit of users of some commonly used systems, the following lists of equivalences are offered, showing the relationship of the markup defined here to the fields defined for bibliographic records in the Scribe, BibTeX, and ProCite systems.

The various bibliographic fields defined for use in the Scribe and BibTeX systems of bibliographic databases have the following equivalents in the scheme presented here: [ see note 52 ] Elements and structures available in the tag set defined here which have no analogues in Scribe and BibTeX are not noted.

6.11 Passages of Verse or Drama

The following elements are included in the core tag set for the convenience of those encoding texts which include mixtures of prose, verse and drama.

Full details of other, more specialized, elements for the encoding of texts which are predominantly verse or drama are described in the appropriate chapter of part three (for verse, see the verse base described in chapter 9 ; for performance texts, see the drama base described in chapter 10 ). In this section, we describe only the elements listed above, all of which can appear in any text, whichever of the three modes prose, verse, or drama may predominate in it.

6.11.1 Core Tags for Verse

Like other written texts, verse texts or poems may be hierarchically subdivided, for example into books or cantos. These structural subdivisions should be encoded using the general purpose <div> or <div1> (etc.) elements described below in chapters 8 and 9 . The fundamental unit of a verse text is the verse line rather than the paragraph, however.

The <l> element is used to mark up verse lines, that is metrical rather than typographic lines. Where a metrical line is interrupted by a typographic line break, the encoder may choose to ignore the fact entirely or to use the empty <lb> (line break) element discussed in 6.9 . In the copy text, the following example is printed on four typographic lines, beginning with the words `There', `From', `The', and `the'.

<l>There they lie, in the largest, in an
   open space in the woods,
<l>From 500 to 600 poor fellows — the groans
   and screams —
<l>The odor of blood, mixed with the fresh scent
   of the night, <lb>the grass, the trees —
   that Slaughter-house!

Where verse lines are not properly nested within the enclosing hierarchy (for example where verse lines cross larger boundaries such as verse paragraphs or speeches) the encoder may choose to use one of the techniques discussed in chapter 31 , or to use the part attribute to indicate that the verse line is incomplete, as in the following example:

 <l>Thou fumblest <name>Eros</>, and my Queenes a Squire
 <l>More tight at this, then thou:  Dispatch. O Loue,
 <l>That thou couldst see my Warres to day, and knew'st
 <l>The Royall Occupation, thou should'st see
 <l part=i>A Workeman in't.
 <stage>Enter an Armed Soldier.</stage>
 <l part=f>Good morrow to thee, welcome. ...

In some verse forms, regular groupings of lines are regarded as units of some kind, often identified by a regular verse scheme. In stichic verse and couplets, groups of lines analogous to paragraphs are often indicated by indentation. In other verse forms, lines are grouped into irregular sequences indicated simply by white space. The neutral <lg> or line group element may be used to mark any such grouping of lines; the type is available to further categorize the line group where this is felt desirable, as in the following example. This example also demonstrates the rend attribute to indicate whether or not a line is indented.

<lg type=stanza>
<l>Come fill up the Glass,
<l rend=indent>Round, round let it pass,
<l>'Till our Reason be lost in our Wine:
<l rend=indent>Leave Conscience's Rules
<l rend=indent>To Women and Fools,
<l>This only can make us divine.
</lg>
<lg type=refrain n='Chorus'>
<l>Then a Mohock, a Mohock I'll be,
<l>No Laws shall restrain
<l>Our Libertine Reign,
<l>We'll riot, drink on, and be free.
</lg>

For some kinds of analysis, it may be useful to identify different kinds of line group within the same piece of verse. Such line groups may self-nest, in much the same way as the un-numbered <div> element described in chapter 8 . For example:

The part attribute may also be attached to a <lg> element to indicate that it is incomplete, for example because it forms part of a group that is divided between two speakers, as in the following example:

<sp><speaker>First Voice</>
<lg type=stanza part=I>
<l>But why drives on that ship so fast
<l>Withouten wave or wind?
</lg>
<sp><speaker>Second Voice</speaker>
<lg part=F>
<l>The air is cut away before,
<l>And closes from behind.
</lg>

For alternative methods of aligning groups of lines which do not form simple hierarchic groups, or which are discontinuous, see the more detailed discussion in chapter 14 . For discussion of other elements and attributes specific to the encoding of verse, see chapter 9 .

These elements are defined as follows:

<!-- 6.11.1:  Verse                                           -->
<!ELEMENT l             - O  (%paraContent)                     >
<!ATTLIST l                  %a.global;
                             %a.enjamb;
                             %a.metrical;
          part               (Y | N | I | M | F) N              >
<!ELEMENT lg            - O  ((%m.divtop)*, (l | lg)+, 
                             (%m.divbot)*)                      >
<!ATTLIST lg                 %a.global;
                             %a.divn;
                             %a.metrical;                       >
<!-- This fragment is used in sec. 6.12                       -->

6.11.2 Core Tags for Drama

Like other written texts, dramatic and other performance texts such as cinema or TV scripts are often hierarchically organized, for example into acts and scenes. These structural subdivisions should be encoded using the general purpose <div> or <div1> (etc.) elements described below in chapters 8 and 10 . Within these divisions, the body of a performance text typically consists of speeches, often prefixed by a phrase indicating who is speaking, and occasionally interspersed with stage directions of various kinds.

In the following simple example, each speech consists of a single paragraph:

<div2 type=scene n='I.2'>
<head>Scene 2.</head>
<stage type=setting>Peachum, Filch.</stage>
<sp><speaker>FILCH.</speaker><p>Sir, Black Moll hath sent word her
Trial comes on in the Afternoon, and she hopes you will order Matters
so as to bring her off.
<sp><speaker>PEACHUM.</speaker><p>Why, she may plead her Belly
at worst; to my Knowledge she hath taken care of that Security.
But, as the Wench is very active and industrious, you may satisfy
her that I'll soften the Evidence.
<sp><speaker>FILCH.</speaker><p>Tom Gagg, sir, is found guilty.

In the following example, each speech consists of a sequence of verse lines, some of them being marked as metrically incomplete:

<div1 type='Act' n='I'><head>ACT I</>
<div2 type='Scene' n='1'><head>SCENE I</head>
<stage rend=italic>
Enter Barnardo and Francisco, two Sentinels, at several doors</stage>
<sp><speaker>Barn</><l part=Y>Who's there?
<sp><speaker>Fran</><l>Nay, answer me.  Stand and unfold yourself.
<sp><speaker>Barn</><l part=i>Long live the King!
<sp><speaker>Fran</><l part=m>Barnardo?
<sp><speaker>Barn</><l part=f>He.
<sp><speaker>Fran</><l>You come most carefully upon your hour.
<sp><speaker>Barn</><l>'Tis now struck twelve.  Get thee to bed,
 Francisco.
<sp><speaker>Fran</><l>For this relief much thanks.'Tis bitter cold,
<l part=i>And I am sick at heart.

In some cases, as here in the First Quarto of Hamlet , the printed speaker attributions need to be supplemented by use of the who attribute; again, the lines are marked as complete or incomplete:

<stage>Enter two Centinels.
<add place=right resp=ms>
Now call'd Bernardo & Francesco.
</add></stage>
<sp who='Francisco'><speaker>1.</>
    <l part=y>STand:  who is that?</sp>
<sp who='Barnardo'><speaker>2.</>
    <l part=y>Tis I.</sp>
<sp who='Francisco'><speaker>1.</>
    <l>O you come most carefully vpon your
       watch,</sp>
<sp who='Barnardo'><speaker>2.</>
    <l>And if you meete Marcellus and Horatio,
    <l>The partners of my watch, bid them make haste.</sp>
<sp who='Francisco'><speaker>1.</>
    <l part=y>I will:  See who goes there.</sp>
<stage>Enter Horatio and Marcellus.</>
<sp who='Horatio'><speaker>Hor.</>
    <l part=i>Friends to this ground.</sp>
<sp who='Marcellus'><speaker>Mar.</>
    <l part=f>And leegemen to the Dane,
    <l>O farewell honest souldier, who hath
    releeued you?</sp>
<sp who='Francisco'><speaker>1.</>
    <l>Barnardo hath my place, giue you good night.</sp>

By contrast with the preceding examples, the following encodes an early printed edition without making any assumption about which parts are prose or verse:

<div1 type=act n='I'>
<div2 type=scene n='1'>
<head rend=italic>Actus primus, Scena prima.</head>
<stage type=setting rend=italic>
A tempestuous noise of Thunder and Lightning heard:  Enter
a Ship-master, and a Boteswaine.</stage>
<sp><speaker>Master.</speaker><p> Bote-swaine.</sp>
<sp><speaker>Botes.</speaker><p> Heere Master: What cheere?</sp>
<sp><speaker>Mast.</speaker><p> Good: Speake to th' Mariners: fall
too't, yarely, or we run our selues a ground,
bestirre, bestirre.  <stage type=move>Exit.</stage></sp>
<stage type=move>Enter Mariners.</stage>
<sp><speaker>Botes.</speaker>
<p>Heigh my hearts, cheerely, cheerely my harts:
yare, yare: Take in the toppe-sale: Tend to th' Masters
whistle: Blow till thou burst thy winde, if roome e-nough.</sp>

The <sp> and <stage> elements should also be used to mark parts of a text otherwise in prose which are presented as if they were dialogue in a play. For example:

<sp><speaker>The reverend Doctor Opimiam</speaker>
<p>I do not think I have named a single unpresentable
fish. </sp>
<sp><speaker>Mr Gryll</speaker><p>Bream, Doctor: there
is not much to be said for bream.</sp>
<sp><speaker>The Reverend Doctor
Opimiam</speaker><p>On the contrary, sir, I think
there is much to be said for him.  In the first
place....
<p>Fish, Miss Gryll -- I could discourse to you on
fish by the hour:  but for the present I will
forbear...
</sp>
<sp><speaker>Lord Curryfin</speaker>
<stage>(after a pause).</stage>
<p><q>Mass</q> as the second grave-digger
says in <title>Hamlet</title>,
<q>I cannot tell.</q>
</sp>
<p>A chorus of laughter dissolved the sitting.

These elements are defined as follows:

<!-- 6.11.2:  Drama                                           -->
<!ELEMENT sp            - O  (speaker?, (p | l | lg | ab | seg 
                             | stage)+)                         >
<!ATTLIST sp                 %a.global;
          who                IDREFS              #IMPLIED       >
<!ELEMENT speaker       - O  (%phrase.seq)        -(speaker)    >
<!ATTLIST speaker            %a.global;                         >
<!ELEMENT stage         - -  (%specialPara)       -(stage)      >
<!ATTLIST stage              %a.global;
          type               CDATA               mix            >
<!-- This fragment is used in sec. 6.12                       -->

6.12 Overview of the Core Tag Set

Except for those tags designed to be used in concurrent markup streams, all the elements described in this chapter occur in the core of TEI tags, defined by the following DTD fragment.

<!-- 6.12: Elements available in all forms of the TEI main    -->
<!-- DTD                                                      -->
<!-- Definition of elements, sub-group by sub-group.          -->

<!-- ... declarations from section 6.1                        -->
<!--     (Paragraph)                                          -->
<!--     go here ...                                          -->
<!-- ... declarations from section 6.3.2.1                    -->
<!--     (Highlighted phrases)                                -->
<!--     go here ...                                          -->
<!-- ... declarations from section 6.4.1                      -->
<!--     (Proper Nouns)                                       -->
<!--     go here ...                                          -->
<!-- ... declarations from section 6.4.3                      -->
<!--     (Numbers and measures)                               -->
<!--     go here ...                                          -->
<!-- ... declarations from section 6.4.4                      -->
<!--     (Dates and times)                                    -->
<!--     go here ...                                          -->
<!-- ... declarations from section 6.4.5                      -->
<!--     (Abbreviations)                                      -->
<!--     go here ...                                          -->
<!-- ... declarations from section 6.5.1                      -->
<!--     (Editorial tags for correction)                      -->
<!--     go here ...                                          -->
<!-- ... declarations from section 6.5.2                      -->
<!--     (Editorial tags for regularization)                  -->
<!--     go here ...                                          -->
<!-- ... declarations from section 6.5.3                      -->
<!--     (Other editorial tags)                               -->
<!--     go here ...                                          -->
<!-- ... declarations from section 6.4.2                      -->
<!--     (Addresses and their components)                     -->
<!--     go here ...                                          -->
<!-- ... declarations from section 6.6                        -->
<!--     (Simple cross references)                            -->
<!--     go here ...                                          -->
<!-- ... declarations from section 6.7                        -->
<!--     (Lists and List Items)                               -->
<!--     go here ...                                          -->
<!-- ... declarations from section 6.8.1                      -->
<!--     (Annotation)                                         -->
<!--     go here ...                                          -->
<!-- ... declarations from section 6.9.3                      -->
<!--     (Milestone tags)                                     -->
<!--     go here ...                                          -->
<!-- ... declarations from section 6.10.1                     -->
<!--     (Tags for Bibliographic References)                  -->
<!--     go here ...                                          -->
<!-- ... declarations from section 6.11.1                     -->
<!--     (Verse)                                              -->
<!--     go here ...                                          -->
<!-- ... declarations from section 6.11.2                     -->
<!--     (Drama)                                              -->
<!--     go here ...                                          -->


PreviousUpNext