<!--
Copyright TEI Consortium. 
Dual-licensed under CC-by and BSD2 licences 
See the file COPYING.txt for details.
$Date$
$Id$
-->


<?xml-model href="http://tei.oucs.ox.ac.uk/jenkins/job/TEIP5/lastSuccessfulBuild/artifact/P5/release/xml/tei/odd/p5.nvdl" type="application/xml" schematypens="http://purl.oclc.org/dsdl/nvdl/ns/structure/1.0"?>

<div xmlns="http://www.tei-c.org/ns/1.0" type="div1" xml:id="SA" n="14">
  <head>Linking, Segmentation, and Alignment</head>
  
<p>This chapter discusses a number of ways in which encoders may
represent analyses of the structure of a text which are not
necessarily linear or hierarchic. The module defined by this chapter
provides for the following common requirements:
  <list rend="bulleted">
    <item>to link disparate elements 
    using the <att>xml:id</att> attribute (section <ptr target="#SAPT"/>);</item>
    <item>to link disparate elements without using the
    <att>xml:id</att> attribute (sections <ptr target="#SAUR"/> and <ptr target="#SATS"/>);</item>
    <item>to segment text into elements convenient for
    the encoder and to mark arbitrary points within documents (section
    <ptr target="#SASE"/>);</item>
    <item>to represent correspondence or
    alignment among groups of text elements, both those
    with content and those which are empty (section <ptr target="#SACS"/>);<note place="bottom">We use the term <term>alignment</term> as a
    special case for the more general notion of correspondence. Using A
    as a short form for <q>an element with its attribute <att>xml:id</att>
    set to the value <val>A</val></q>, and suppose elements A1, A2,
    and A3 occur in that order and form one group, while elements B1,
    B2, and B3 occur in that order and form another group. Then a
    relation in which A1 corresponds to B1, A2 corresponds to B2, and
    A3 corresponds to B3 is an alignment. On the other hand, a
    relation in which A1 corresponds to B2, B1 to C2, and C1 to A2 is
    not an alignment.</note></item>
    <item>to synchronize elements of a
    text, that is to represent temporal correspondences and alignments
    among text elements (section <ptr target="#SASY"/>) and also to
    align them with specific points in time (section <ptr target="#SASYMP"/>);</item>
    <item>to specify that one text element is identical
    to or a copy of another (section <ptr target="#SAIE"/>);</item>
    <item>to aggregate possibly noncontiguous elements
    (section <ptr target="#SAAG"/>);</item>
    <item>to specify that different elements are
    alternatives to one another and to express
    preferences among the alternatives (section <ptr target="#SAAT"/>);</item>
    <item>to store markup separately from the data it describes (section <ptr target="#SASO"/>);</item>
    <item>to associate segments of a text
    with interpretations or analyses of their significance (section
    <ptr target="#SAAN"/>).</item>
  </list></p>
<p>These facilities all use the same set of techniques based on the
W3C XPointer framework (<ptr target="#XPTRFMWK"/>) This provides a
variety of <term rend="noindex">schemes</term>; the most convenient of
which, and that recommended by these Guidelines, makes use of the
global <att>xml:id</att> attribute, as defined in section <ptr target="#STGA"/>, and introduced in the section of <ptr target="#SG"/>
titled <ptr target="#SG-id"/> . When the <ident type="module">linking</ident> module is included in a schema, the
attribute class <ident type="class">att.global</ident> is extended to
include eight additional attributes to support the various kinds of
linking listed above. Each of these attributes is introduced in the
appropriate section below. In addition, for many of the topics
discussed, a choice of methods of encoding is offered, ranging from
simple but less general ones, which use attribute values only, to more
elaborate and more general ones, which use specialized elements.</p>
  <div type="div2" xml:id="SAPT">
    <head>Links</head>
    <p>We say that one element <term rend="noindex">points</term> to
    others if the first has an attribute whose value is a reference to
    the others: such an element is called a <term>pointer
    element</term>, or simply a <term>pointer</term>. Among the
    pointers that have been introduced up to this point in these
    Guidelines are <gi>note</gi>, <gi>ref</gi>, and <gi>ptr</gi>.
    These elements all indicate an association between one place in
    the document (the location of the pointer itself) and one or more
    others (the elements whose identifiers are specified by the
    pointer's <att>target</att> attribute). The module described in
    this chapter introduces  a
    variation on this basic kind of pointer, known as a
    <term>link</term>,  which specifies both <soCalled>ends</soCalled>
    of an association. In addition, we define a syntax for
    representing locations in a document by a variety of means not
    dependent on the use of <att>xml:id</att> attributes.</p>
    <div type="div3" xml:id="SAPTL">
<head>Pointers and Links</head>
<p>In section <ptr target="#COXR"/> we introduced the simplest
pointer elements, <gi>ptr</gi> and <gi>ref</gi>. Here we
introduce additionally the <gi>link</gi> element, which
represents an association between two (or more) locations by
specifying each location explicitly. Its own location is
irrelevant to the intended linkage. All three elements use the
attribute <att>target</att>, provided by the <ident
type="class">att.pointing</ident> class as a means of indicating the
location or locations referenced or pointed to. 
<specList>
  <specDesc key="att.pointing" atts="target"/>
  <specDesc key="link"/>
</specList>
The <gi>ptr</gi> element may be called a <soCalled>pure
pointer</soCalled>, because its primary function is simply to point. A
pointer sets up a <term rend="noindex">connection</term> between an
element (which, in the case of a pure pointer, is
simply a location in a document), and one or more others, known
collectively as its <term>target</term>. The <gi>ptr</gi> and
<gi>ref</gi> elements <!-- bear a <att>target</att> attribute (in the
singular), because they --> point, conceptually, at a single target, even
if that target may be discontinuous in the document. The <gi>link</gi>
element <!-- bears a <att>targets</att> attribute (in the plural), because
it --> specifies at least two targets and represents an association
between them, independent of its own location. </p>
<p>These three elements also share a common set of attributes, derived
from the <ident type="class">att.pointing</ident> and <ident
type="class">att.typed</ident> classes:
<specList>
  <specDesc key="att.pointing" atts="evaluate"/>
  <specDesc key="att.typed" atts="type subtype"/>
</specList></p>
<p>Double connection among elements could also be expressed by a
combination of pointer elements, for example, two <gi>ptr</gi>
elements, or one <gi>ptr</gi> element and one <gi>note</gi>
element. All that is required is that the value of the
<att>target</att> (or other pointing) attribute of the one be
the value of the <att>xml:id</att> attribute of the other. What
the <gi>link</gi> element accomplishes is the handling of double
connection by means of a single element. Thus, in the following
encoding:
<egXML xml:lang="und" xmlns="http://www.tei-c.org/ns/Examples"><ptr xml:id="sa-p1" target="#sa-p2"/><ptr xml:id="sa-p2" target="#sa-p1"/></egXML>
<val>sa-p1</val> points to <val>sa-p2</val>, and <val>sa-p2</val>
points to <val>sa-p1</val>. This is logically
equivalent to the more compact encoding:
  <egXML xml:lang="und" xmlns="http://www.tei-c.org/ns/Examples"><link target="#sa-p1 #sa-p2"/></egXML></p>
<p>As noted elsewhere, the <att>target</att> <!-- and
<att>targets</att> attributes--> attribute may take as value one or
more URI reference. In the simplest case, each such reference will
indicate an element in the current document (or in some other
document), for example by supplying the value used for its global
<att>xml:id</att> attribute. It may however carry as value any form of
URI, such as a URL pointing to some other document or location on the
Internet. Pointing or linking to external documents and pointing and
linking where identifiers are not available is described below in
section <ptr target="#SAXP"/>.</p>
    </div>
    <div type="div3" xml:id="SAPTEG">
<head>Using Pointers and Links</head>
<p>As an example of the use of mechanisms which establish
connections among elements, consider the practice (common in
18th century English verse and elsewhere) of providing footnotes
citing parallel passages from classical authors. <figure xml:id="POPE" rend="float fullpage">
<graphic url="Images/dunpic.png"/><figDesc>The
figure shows the original page of Pope's Dunciad
which is discussed in the text.</figDesc></figure> Such
footnotes can of course simply be encoded using the
<gi>note</gi> element (see section <ptr target="#CONO"/>) without
a <att>target</att> attribute, placed adjacent to the passage to
which the note refers:<note place="bottom">The <att>type</att>
attribute on the note is used to classify the notes using the
typology established in the Advertisement to the work: <q>The
<term rend="noindex">Imitations</term> of the Ancients are
added, to gratify those who either never read, or may have
forgotten them; together with some of the Parodies, and
Allusions to the most excellent of the Moderns.</q> In the
source text, the text of the poem shares the page with two sets
of notes, one headed <q>Remarks</q> and the other
<q>Imitations</q>.</note>
<egXML xml:lang="mul" xmlns="http://www.tei-c.org/ns/Examples" source="#SAPTEG-eg-3"><l>(Diff'rent our parties, but with equal grace</l>
<l>The Goddess smiles on Whig and Tory race,</l>
<l><note type="imitation" place="bottom" anchored="false">    
    <bibl>Virg. Æn. 10.</bibl>
    <quote>
<l>Tros Rutulusve fuat; nullo discrimine habebo.</l>
<l>—— Rex Jupiter omnibus idem.</l>
    </quote>
  </note>'Tis the same rope at sev'ral ends they twist,</l>
<l>To Dulness, Ridpath is as dear as Mist)</l></egXML>
<!-- Pope, Dunciad (1729) III.284 --></p>
<p>This use of the <gi>note</gi> element can be called
<term>implicit pointing</term> (or <term>implicit
linking</term>). It relies on the juxtaposition of the note to
the text being commented on for the connection to be understood.
If it is felt that the mere juxtaposition of the note to the
text does not make it sufficiently clear exactly what text
segment is being commented on (for example, is it the
immediately preceding line, or the immediately preceding two
lines, or what?), or if it is decided to place the note at some
distance from the text, then the pointing or the linking must be
made explicit. We now consider various methods for doing
that.</p>
<p>Firstly, a <gi>ptr</gi> element might be placed at an
appropriate point within the text to link it with the
annotation:
<egXML xml:lang="mul" xmlns="http://www.tei-c.org/ns/Examples" source="#SAPTEG-eg-3"><l>(Diff'rent our parties, but with equal grace</l>
<l>The Goddess smiles on Whig and Tory race,
   <ptr rend="unmarked" target="#note3.284"/></l>
<l>'Tis the same rope at sev'ral ends they twist,</l>
<l>To Dulness, Ridpath is as dear as Mist)</l>
<note xml:id="note3.284" type="imitation" place="bottom" anchored="false">    
   <bibl>Virg. Æn. 10.</bibl>
   <quote>
<l>Tros Rutulusve fuat; nullo discrimine habebo.</l>
<l>—— Rex Jupiter omnibus idem.</l>
   </quote>
</note></egXML>
<!-- Pope, Dunciad (1729) III.284 -->
The <gi>note</gi> element has been given an arbitrary identifier
(<val>note3.284</val>) to enable it to be specified
as the target of the pointer element. Because there is nothing
in the text to signal the existence of the annotation, the
<att>rend</att> attribute has been given the value <val>unmarked</val>.</p>
<p>Secondly, the <att>target</att> attribute of the
<gi>note</gi> element can be used to point at its associated
text, provided that an <att>xml:id</att> attribute has been
supplied for the associated text:
<egXML xmlns="http://www.tei-c.org/ns/Examples" source="#SAPTEG-eg-3"><l xml:id="L3.283">(Diff'rent our parties, but with equal grace</l>
<l xml:id="L3.284">The Goddess smiles on Whig and Tory race,</l>
<l xml:id="L3.285">'Tis the same rope at sev'ral ends they twist,</l>
<l xml:id="L3.286">To Dulness, Ridpath is as dear as Mist)</l>
<!-- ... -->
</egXML>
Given this encoding of the text itself, we can now link the various
notes to it. In this case, the note
itself contains a pointer to the place in the text which it is
annotating; this could be encoded using a <gi>ref</gi>
element, which bears a <att>target</att> attribute of its own
and contains a (slightly misquoted) extract from the text marked
as a <gi>quote</gi> element:
<egXML xml:lang="mul" xmlns="http://www.tei-c.org/ns/Examples" source="#SAPTEG-eg-3"><note type="imitation" place="bottom" anchored="false" target="#L3.284">    
   <ref rend="sc" target="#L3.284">Verse 283–84.
 <quote>
   <l>——. With equal grace</l>
   <l>Our Goddess smiles on Whig and Tory race.</l>
</quote>
   </ref>
   <bibl>Virg. Æn. 10.</bibl>
   <quote>    
<l>Tros Rutulusve fuat; nullo discrimine habebo.</l>
<l>—— Rex Jupiter omnibus idem. </l>
   </quote>
</note></egXML>
<!-- Pope, Dunciad (1729) III.284 --></p>
<p>Combining these two approaches gives us the following
associations:
<list rend="bulleted">
  <item>a pointer within one line indicates the note</item>
  <item>the note indicates the line</item>
  <item>a pointer within the note indicates the line</item>
</list>
Note that we do not have any way of pointing from the line itself to
the note: the association is implied by containment of the pointer. We
do not as yet have a true double link between text and note. To
achieve that we will need to supply identifiers for the annotations as
well as for the verse lines, and use a <gi>link</gi> element to
associate the two. Note that the <gi>ptr</gi> element and the
<att>target</att> attribute on the <gi>note</gi> may now be dispensed
with:
<egXML xml:lang="mul" xmlns="http://www.tei-c.org/ns/Examples" source="#SAPTEG-eg-3">
<note xml:id="n3.284" type="imitation" place="bottom" anchored="false">    
   <ref rend="sc" target="#L3.284">Verse 283–84.
<quote>
   <l>——. With equal grace</l>
   <l>Our Goddess smiles on Whig and Tory race.</l>
</quote></ref>
   <bibl>Virg. Æn. 10.</bibl>
   <quote>
<l>Tros Rutulusve fuat; nullo discrimine habebo.</l>
<l>—— Rex Jupiter omnibus idem. </l>
   </quote>
</note>
<link target="#n3.284 #L3.284"/></egXML>
<!-- Pope, Dunciad (1729) III.284 --></p>
<p>The <att>target</att> attribute of the <gi>link</gi> element
here bears the identifier of the note followed by that of the
verse line.  We could also allocate an
identifier to the reference within the note and encode the
association between it and the verse line in the same way:
<egXML xmlns="http://www.tei-c.org/ns/Examples" source="#SAPTEG-eg-3"><note type="imitation" place="bottom" anchored="false">    
   <ref rend="sc" xml:id="r3.284" target="#L3.284">Verse 283–84.
<quote>
   <l>——. With equal grace</l>
   <l>Our Goddess smiles on Whig and Tory race.</l>
</quote></ref>
   <!-- ... -->
</note>
<!-- ... -->
<link target="#r3.284 #L3.284"/></egXML>
<!-- Pope, Dunciad (1729) III.284 -->
Indeed, the two <gi>link</gi>s could be combined into one, as
follows:
  <egXML xml:lang="und" xmlns="http://www.tei-c.org/ns/Examples"><link target="#n3.284 #r3.284 #L3.284"/></egXML></p>
    </div>
    <div type="div3" xml:id="SAPTLG">
<head>Groups of Links</head>
<p>Clearly, there are many reasons for which an encoder might
wish to represent a link or association between different
elements. For some of them, specific elements are provided in
these Guidelines; some of these are discussed elsewhere in the
present chapter. The <gi>link</gi> element is a general purpose
element which may be used for any kind of association. The
element <gi>linkGrp</gi> may be used to group links of a
particular type together in a single part of the document; such
a collection may be used to represent what is sometimes referred
to in the literature of Hypertext as a <term>web</term>, a term
introduced by the Brown University FRESS project in 1969, and not to
be confused with the World Wide Web.
<specList><specDesc key="linkGrp"/></specList>
As a member of the class <ident type="class">att.pointing.group</ident>, this element shares the
following attributes with other members of that class:
<specList><specDesc key="att.pointing.group" atts="domains targFunc "/></specList>
It is also a member of the <ident type="class">att.pointing</ident>
and <ident type="class">att.typed</ident> classes, and therefore also carries the attributes specified in
section <ptr target="#SAPTL"/> above, in particular the
<att>type</att> attribute. </p>
<p>The <gi>linkGrp</gi> element provides a convenient way of
establishing a default for the <att>type</att> attribute on a
group of links of the same type: by default, the <att>type</att>
attribute on a <gi>link</gi> element has the same value as that
given for <att>type</att> on the enclosing <gi>linkGrp</gi>.</p>
<p>Typical software might hide a web entirely from the user, but
use it as a source of information about links, which are
displayed independently at their referenced locations.
Alternatively, software might provide a direct view of the link
collection, along with added functions for manipulating the
collection, as by filtering, sorting, and so on.
To continue our previous example, this text contains many other
notes of a kind similar to the one shown above. Here are a few
more of the lines to which annotations have to be attached,
followed by the annotations themselves:
<egXML xml:lang="mul" xmlns="http://www.tei-c.org/ns/Examples"><l xml:id="L2.79">A place there is, betwixt earth, air and seas</l>
<l xml:id="L2.80">Where from Ambrosia, Jove retires for ease.</l>
<!-- ... -->
<l xml:id="L2.88">Sign'd with that Ichor which from Gods distills.</l>
<!-- ... -->
<note xml:id="n2.79" place="bottom" anchored="false">
   <bibl>Ovid Met. 12.</bibl>
   <quote xml:lang="la">  
<l>Orbe locus media est, inter terrasq; fretumq;</l>
<l>Cœlestesq; plagas —</l>
   </quote>
</note>
<note xml:id="n2.88" place="bottom" anchored="false">
    Alludes to <bibl>Homer, Iliad 5</bibl> ...
</note></egXML>
To avoid having
to repeat the specification of <att>type</att> as <val>imitation</val> on each <gi>note</gi>,
we may specify it once for all on a <gi>linkGrp</gi> element
containing all links of this type.
  <egXML xml:lang="und" xmlns="http://www.tei-c.org/ns/Examples">
<linkGrp type="imitation">
   <link target="#n2.79 #L2.79"/>
   <link target="#n2.88 #L2.88"/>
   <link target="#n3.284 #L3.284"/>
</linkGrp></egXML></p>
<p>Additional information for applications that use
<gi>linkGrp</gi> elements can be provided by means of special
attributes. First, the <att>domains</att> attribute can be used
to identify the text elements within which the individual
targets of the links are to be found. Suppose that the text
under discussion is organized into a <gi>body</gi> element,
containing the text of the poem, and a <gi>back</gi> element
containing the notes. Then the <att>domains</att> attribute can
have as its value the identifiers of the <gi>body</gi> and the
<gi>back</gi>, to enable an application to verify that the link
targets are in fact contained by appropriate elements, or to
limit its search space:
  <egXML  xml:lang="und" xmlns="http://www.tei-c.org/ns/Examples">
  <!-- ... -->
  <linkGrp type="imitation" domains="#dunciad #dunnotes">
    <link target="#n2.79 #L2.79"/>
    <link target="#n2.88 #L2.88"/>
    <!-- ... -->
    <link target="#n3.284 #L3.284"/>
    <!-- ... -->
  </linkGrp>
</egXML></p>
<p>Note that there must be a single parent element for each
<soCalled>domain</soCalled>; if some notes are contained by a
section with identifier <val>dunnotes</val>, and
others by a section with identifier <val>dunimits</val>,
an intermediate pointer must be
provided (as described in section <ptr target="#SAPTIP"/>) within
the <gi>linkGrp</gi> and its identifier used instead.</p>
<p>Next, the <att>targFunc</att> attribute can be used to
provide further information about the role or function of the
various targets specified for each link in the group. The value
of the <att>targFunc</att> attribute is a list of names
(formally, name tokens), one for each of the targets in the
link; these names can be chosen freely by the encoder, but their
significance should be documented in the encoding description in
the header.<note place="bottom">Since no special element is
provided for this purpose in the present version of these
Guidelines, the information should be supplied as a series of
paragraphs at the end of the <gi>encodingDesc</gi> element
described in section <ptr target="#HD5"/>.</note> In the current
example, we might think of the note as containing the <term rend="noindex">source</term> of the imitation and the verse line
as containing the <term rend="noindex">goal</term> of the
imitation. Accordingly, we can specify the <gi>linkGrp</gi> in
the preceding example thus:
<egXML xmlns="http://www.tei-c.org/ns/Examples"><linkGrp type="imitation" domains="#dunciad #dunnotes" targFunc="source goal">
   <link target="#n2.79 #L2.79"/>
   <link target="#n2.88 #L2.88"/>
   <!-- ... -->
   <link target="#n3.284 #L3.284"/>
   <!-- ... -->
</linkGrp></egXML></p>
<specGrp xml:id="DSAPT" n="Links">
  











<include xmlns="http://www.w3.org/2001/XInclude" href="../../Specs/link.xml"/>






  











<include xmlns="http://www.w3.org/2001/XInclude" href="../../Specs/linkGrp.xml"/>






</specGrp>

    </div>
    <div type="div3" xml:id="SAPTIP">
<head>Intermediate Pointers</head>
<p>In the preceding examples, we have shown various ways of
linking an annotation and a single verse line. However, the
example cited in fact requires us to encode an association
between the note and a <emph>pair</emph> of verse lines (lines
284 and 285); we call these two lines a <term>span</term>.</p>
<p>There are a number of possible ways of correcting this error: one
could use the <att>target</att> attribute to indicate one end of the
span and the special purpose <att>targetEnd</att> attribute on the
<gi>note</gi> element to point to the other. Another possibility might
be to create an element which represents the whole span itself and
assign that an <att>xml:id</att> attribute, which can then be linked
to the <gi>note</gi> and <gi>ref</gi> elements. This could be done
using for example the <gi>lg</gi> element defined in section <ptr
target="#COVE"/> or the <soCalled>virtual</soCalled> <gi>join</gi>
element discussed in section <ptr target="#SAAG"/>.</p>
<p>A third possibility would be to use an
<soCalled>intermediate pointer</soCalled> as follows:
  <egXML xml:lang="und" xmlns="http://www.tei-c.org/ns/Examples">
<ptr xml:id="L3.283-284" target="#L3.283 #L3.284"/></egXML>
When the <att>target</att> attribute of a <gi>ptr</gi> or
<gi>ref</gi> element specifies more than one element, the
indicated elements are intended to be combined or aggregated in
some way to produce the object of the pointer. (Such aggregation
is however the task of a processing application, and cannot be
defined simply by the markup). The <att>xml:id</att> attribute
of the <gi>ptr</gi> then provides an identifier which can be linked to the
<gi>note</gi> and <gi>ref</gi> elements:
<egXML xmlns="http://www.tei-c.org/ns/Examples"><link evaluate="all" target="#n3.284 #r3.284 #L3.283-284"/></egXML>
<!-- Pope, Dunciad (1729) III.284 --></p>
<p>The <val>all</val> value of <att>evaluate</att> is used on the
<gi>link</gi> element to specify that any pointer encountered as
a target of that element is itself evaluated. If
<att>evaluate</att> had the value <val>none</val>, the link target would be the pointer
itself, rather than the objects it points to.</p>
<p>Where a <gi>linkGrp</gi> element is used to group a
collection of <gi>link</gi> elements, any intermediate pointer
elements used by those <gi>link</gi> elements should be included
within the <gi>linkGrp</gi>.</p>
    </div> 
  </div>
  <div type="div2" xml:id="SAXP">
    <head>Pointing Mechanisms</head>
    <p>This section introduces more formally the pointing mechanisms
    available in the TEI. In addition to those
    discussed so far, the TEI provides methods of pointing:
    <list rend="bulleted">
<item>into documents other than the current document;</item>
<item>to a particular element in a document other than the
current document using its <att>xml:id</att>;</item>
<item>to a particular element whether in the current document or
not, using its position in the XML element tree;</item>
<item>at arbitrary content in any XML document using TEI-defined
XPointer schemes.</item>
    </list>
    </p>
<p>All TEI attributes used to point at something else are declared as
having the datatype <ident type="datatype">data.pointer</ident>, which
is defined as a URI reference<note place="bottom">The URI (Universal
Resource Indicator) is defined in <ref
target="http://www.ietf.org/rfc/rfc3986.txt">RFC 3986</ref></note>; the
cases so far discussed are all simple examples of a URI
reference. Another familiar example is the mechanism used in XHTML to
create represent hypertext links by means of the XHTML <att
scheme="XHTML">href</att> attribute. A URI reference can reference the
whole of an
XML resource such as a document or an XML element, or a
sub-portion of such a resource, identified by means of an appropriate <term>fragment
identifier</term>. Technically speaking, the <soCalled>fragment
identifier</soCalled> is that portion of a URI reference following the
first unescaped <q>#</q> character; in practice, it provides a means
of accessing some part of the resource described by the URI which is
less than the whole.  <!-- The details of the interpretation of the
Fragment-ID depend on the Internet MIME-type of the resource
identified by the URI. --></p>
    <p>The first three of the following subsections provide only a
    brief overview and some examples of the W3C mechanisms
    recommended. More detailed information on the use of these
    mechanisms is readily available elsewhere.<!-- where? --></p>
    <div type="div3" xml:id="SAUR">
<head>Pointing Elsewhere</head>
<p>Like the ubiquitous if misnamed XHTML pointing attribute <att
scheme="XHTML">href</att>, the TEI pointing attributes can point to a
document that is not the current document (the one that contains the
pointing element) whether it is in the same local filesystem as the
current document, or on a different system entirely. In either case,
the pointing can be accomplished absolutely (using the entire address
of the target document) or relatively (using an address relative to
the current base URI in force). The <soCalled>current base
URI</soCalled> is defined according to <ref target="#XMLBASE">Marsh
2001</ref>. If there is none, the base URI is that of the current
document.  In common practice the current base URI in force is likely
to be the value of the <att>xml:base</att> attribute of the closest
ancestor that has one. However this may not be the case, since
<att>xml:base</att> attributes are accumulated through the hierarchy
by concatenation of path segments, beginning at the top of the
hierarchy and proceeding down to the context node.</p>
<p>The following example demonstrates an absolute URI reference
that points to a remote document: 
<egXML xmlns="http://www.tei-c.org/ns/Examples">The current base URI in force is as defined in the
 W3C <ref target="http://www.w3.org/TR/xmlbase/">XML
 Base</ref> recommendation.</egXML></p>
<p>This example points explicitly to a location on the Web,
accessible via HTTP<!--, the web Protocol-->. Suppose however that we wish
to access a document stored locally in a file. Again we will
supply an absolute URI reference, but this time using a
different protocol:
<egXML xmlns="http://www.tei-c.org/ns/Examples">This Debian package is distributed under the terms
 of the <ref target="file:///usr/share/common-licenses/GPL-2">GNU General Public License</ref>.</egXML></p>
<p>In the  following example, we use a relative URI reference
to point to a local document: 
<egXML xmlns="http://www.tei-c.org/ns/Examples"><figure rend="float fullpage">
  <graphic url="Images/compic.png"/>
  <figDesc>The figure shows the page from the <title>Orbis
  pictus</title> of Comenius which is discussed in the text.</figDesc>
  </figure></egXML> 

Since no <att>xml:base</att> is specified here, the location of the resource
<ident type="file">Figures/compic.png</ident> is determined relative to the
resource indicated by the current base URI, which is the current
document.
</p>


<p>In the following example, however, we first change the current base
URI by setting a new value for <att>xml:base</att>. The resource
required is then identified by means of a relative URI:

<egXML xmlns="http://www.tei-c.org/ns/Examples"><div type="chap" xml:base="http://classics.mit.edu/">
  <head>On Ancient Persian Manners</head>
  <p>In the very first story of <ref target="Sadi/gulistan.2.i.html"><title>The Gulistan of
  Sa'di</title></ref>,
  Sa'di relates moral advice worthy of Miss Minners ...</p>
  <!-- ... -->
</div></egXML></p>
<p>As noted above, the current base URI is found on the nearest
ancestor. This provides a useful way of abbreviating URIs within a
given scope:
<egXML xmlns="http://www.tei-c.org/ns/Examples"><body>
  <div n="A">
    <p>The base URI here is the current document. A URI such as
    <code>a.xml</code> is equivalent to
    <code>./a.xml</code>.</p>
  </div>
  <div n="B" xml:base="http://www.example.org/">
    <p>The base URI here is
    <code>http://www.example.org/</code>. A
    URI such as <code>a.xml</code> is equivalent to
    <code>http://www.example.org/a.xml</code>.</p>
  </div>
  <div n="C" xml:base="ftp://ftp.example.net/mirror/">
    <p>The base URI here is
    <code>ftp://ftp.example.net/mirror/</code>. A URI such
    as
    <code>a.xml</code> is equivalent to
    <code>ftp://ftp.example.net/mirror/a.xml</code>.</p>
  </div>
  <div n="D">
    <p>The base URI here is the current document. A URI such as
    <code>a.xml</code> is equivalent to
    <code>./a.xml</code>.</p>
  </div>
</body></egXML>
</p>


    </div>
    <div type="div3" xml:id="SABN">
<head>Pointing Locally</head>
<p>Because the  default base  URI is the  current document,  a pointer
that is specified as a <term>bare name</term> fragment identifier<note
place="bottom">In  more recent  W3C documents,  the  term <soCalled>bare
name</soCalled>  is   deprecated  in  favour  of   the  more  explicit
<term>shorthand  pointer</term>.</note>alone acts as  a pointer  to an
element in the current document,  as in the following example.  <egXML
xmlns="http://www.tei-c.org/ns/Examples" source="#SA-eg-01"><div
type="section" xml:id="sect106"><!-- ... --></div>
<div type="section" n="107" xml:id="sect107">
  <head>Limitations on exclusive rights: Fair use</head>
  <p>Notwithstanding the provisions of
  <ref target="#sect106">section 106</ref>, the fair use of a
  copyrighted work, including such use by reproduction in copies 
  or phonorecords or by any other means specified by that section,
  for purposes such as criticism, comment, news reporting,
  teaching (including multiple copies for classroom use),
  scholarship, or research, is not an infringement of copyright.
  In determining whether the use made of a work in any particular
  case is a fair use the factors to be considered shall
  include — 
  <list rend="bulleted">
    <item n="(1)">the purpose and character of the use, including
    whether such use is of a commercial nature or is for nonprofit
    educational purposes;</item>
    <item n="(2)">the nature of the copyrighted work;</item>
    <item n="(3)">the amount and substantiality of the portion
    used in relation to the copyrighted work as a whole;
    and</item>
    <item n="(4)">the effect of the use upon the potential market
    for or value of the copyrighted work.</item>
  </list>
  The fact that a work is unpublished shall not itself bar a
  finding of fair use if such finding is made upon consideration
  of all the above factors.</p>
</div></egXML>
This method of pointing, by referring to the <att>xml:id</att> of the
target element as a bare name only (e.g., <val>#sect106</val>) is 
the simplest and often the best approach where it can be applied, i.e. where
both the source element and target element are in the same XML
document, and where the target element carries an identifier. It
is the method used extensively in previous sections of this
chapter and elsewhere in these Guidelines. </p>
    </div>
    
    <div type="div3" xml:id="SAPU">
      <head>Using Abbreviated Pointers</head>
      
      <p><att>xml:base</att> is a useful way of handling the repeated use of long external URIs. However, it is less convenient when your text contain many references to a variety of different sources in different locations. Even in the case of relative links on the local file system, <att>ref</att> or <att>target</att> attributes may become quite lengthy and make XML code difficult to read. To deal with this problem, the TEI provides a useful method of using abbreviated pointers and documenting a way to dereference them automatically.</p>
      
      <p>Imagine a project which has a large collection of XML documents organized like this:</p>
      
      <list>
        <item>anthology
          <list>
            <item>poetry
              <list>
                <item><ident type="file">poem.xml</ident></item>
              </list>
            </item>
            <item>prose
              <list>
                <item><ident type="file">novel.xml</ident></item>
              </list>
            </item>
          </list>
        </item>
        <item>references
          <list>
            <item>people
              <list>
                <item><ident type="file">personography.xml</ident></item>
              </list>
            </item>
          </list>
        </item>
      </list>
      
      
      <p>If you want to link a <gi>name</gi> in the <ident type="file">novel.xml</ident> file to a <gi>person</gi> in the <ident type="file">personography.xml</ident> file, the link will look like this:
        
        <egXML xmlns="http://www.tei-c.org/ns/Examples">
          <name ref="../../references/people/personography.xml#fred">Fred</name>
        </egXML>
        
        If there are many names to tag in a single paragraph, the XML encoding will be congested, and such lengthy links are prone to typographical error. In addition, if the project organization is changed, every relative link will have to be found and altered.</p>
      
      <p>One way to deal with this is to use what is often referred to as a "magic token". You could make such links using the <att>key</att> attribute:
        <egXML xmlns="http://www.tei-c.org/ns/Examples">
          <name key="fred">Fred</name>
        </egXML>
        
        and document the meaning of the key using (for instance) a <gi>taxonomy</gi> element in the TEI header, as described in <ptr target="#CONARS"/>. However, such a link cannot be mechanically processed by an external system that does not know how to interpret it; a human will have to read the header explanation and write code explicitly to reconstruct the intended link.</p>
      
      <p>A more robust alternative is to use a <term>private URI scheme</term>. This is a method of constructing a simple, key-like token which functions as a <ident type="datatype">data.pointer</ident>, and can therefore be used as the value of any attribute which has that datatype, such as <att>ref</att> and <att>target</att>. Such a scheme consists of a prefix with a colon, and then a value. You might, for example, use the prefix <val>psn</val> (for "person"), and structure your name tags like this:
        
        <egXML xmlns="http://www.tei-c.org/ns/Examples">
          <name ref="psn:fred">Fred</name>
        </egXML>
        
        How is this different from a <soCalled>magic token</soCalled>? Essentially, it isn't, except that TEI provides a structured method of dereferencing it (turning it into a computable path, such as <val>../../references/people/personography.xml#fred</val>) by means of a declaration inside <gi>encodingDesc</gi> in the TEI header, using the elements and attributes for prefix declaration: 
      
        <specList>
          <specDesc key="listPrefixDef"/>
          <specDesc key="prefixDef" atts="ident"/>
          <specDesc key="att.patternReplacement" atts="matchPattern replacementPattern"/>
        </specList>
        
      </p>
        
        
        <p>This is how you might document a private URI scheme using the <val>psn:</val> prefix:
        
        <egXML xmlns="http://www.tei-c.org/ns/Examples">
          <listPrefixDef>
            <prefixDef ident="psn" matchPattern="([a-z]+)"
              replacementPattern="../../references/people/personography.xml#$1">
              <p>
                In the context of this project, private URIs with the prefix 
                "psn" point to <gi>person</gi> elements in the project's 
                personography.xml file.
              </p>
            </prefixDef>
          </listPrefixDef>
        </egXML>
        
        This specifies that where a <ident type="datatype">data.pointer</ident> value is constructed with a <val>psn:</val> prefix, a regular-expression replace operation can be performed on it to construct the full or relative URI to the target document or fragment. <gi>listPrefixDef</gi> is a child of <gi>encodingDesc</gi>, and it contains any number of <gi>prefixDef</gi> elements. Each <gi>prefixDef</gi> element provides a method of dereferencing or expanding an abbreviated pointer, based on a regular expression. The <att>ident</att> attribute specifies the prefix to which the expansion applies (without the colon). The <att>matchPattern</att> attribute contains a regular expression which is matched against the component of the pointer following the first colon, and the <att>replacementPattern</att> provides the string which will be used as a replacement. In this example, using <val>psn:fred</val>, the value <val>fred</val> would be matched by the <att>matchPattern</att>, and also captured (through the parentheses in the regular expression); it would then be replaced by the value <val>../../references/people/personography.xml#fred</val> (with the the <val>$1</val> in the <att>replacementPattern</att> being replaced by the captured value). The <gi>p</gi> element inside the <gi>prefixDef</gi> can be used to provide a human-readable explanation of the usage of this prefix.</p>
      
      <p>Through this mechanism, any processor which encounters a <ident type="datatype">data.pointer</ident> with a protocol unknown to it can check the <gi>listPrefixDef</gi> in the header to see if there is an available expansion for it, and if there is, it can automatically provide the expansion and generate a full or relative URI.</p>
      
      <p>For any given prefix, it may be useful to supply more than one expansion. For instance, in addition to pointing at the <gi>person</gi> element in the personography file, it might also be useful to point to an external source which is available on the network, representing the same information in a different way. So there might be a second <gi>prefixDef</gi> like this:
        
        <egXML xmlns="http://www.tei-c.org/ns/Examples">
          <prefixDef ident="psn" matchPattern="([a-z]+)"
            replacementPattern="http://www.example.com/personography.html#$1">
            <p>
              Private URIs with the prefix "psn" can be converted to point 
              to a fragment on the Personography page of the project Website.
            </p>
          </prefixDef>
        </egXML>
        
        Any number of <gi>prefixDef</gi> elements may be provided for the same prefix. A processor may decide to process one or all of them; if it processes only one, it should choose the first one with the correct <att>ident</att> value, so the primary or most important <gi>prefixDef</gi> for any given prefix should appear first in its parent <gi>listPrefixDef</gi>.</p>
      
      <p>When creating private URI schemes, it is recommended that you avoid using any existing registered prefix. A list of registered prefixes is maintained by IANA at <ref target="http://www.iana.org/assignments/uri-schemes.html">http://www.iana.org/assignments/uri-schemes.html</ref>.</p>
      
      <p>Note that this mechanism can also be used to dereference other abbreviated pointing systems which are based on prefixes, such as Tag URIs.</p>
      
      <p>The <att>matchPattern</att> and <att>replacementPattern</att> attributes are also used in dereferencing canonical reference patterns, and further examples of the use of regular expressions are shown in <ptr target="#SACR"/>.</p>
      
    </div>
    
<div type="div3" xml:id="SATS">
<head>TEI XPointer Schemes</head>

<p>The pointing schemes described in this chapter are part of a number of
such schemes envisaged by the W3C, which together constitute a
framework for addressing data within XML documents, known as the
XPointer Framework (<ref target="#XPTRFMWK">Grosso et al
2003</ref>). This framework permits the definition of many other named
addressing methods, each of which is known as an <term>XPointer
Scheme</term>. The W3C has predefined a set of such schemes, and
maintains a register for their expansion. </p>

<p>One important scheme, also defined by the W3C, and recommended
by these Guidelines is the <name type="xpscheme">xpath()</name> pointer
scheme, which allows for any part of an XML structure to be selected
using the syntax defined by the XPath specification. This is further
discussed below, <ptr target="#SATSXP"/>. These Guidelines also define
six other pointer schemes, which provide access to parts of an XML
document such as points within  data content or stretches of data
content. These additional TEI pointer schemes are defined in sections
<ptr target="#SATSL"/> to <ptr target="#SATSMA"/> below. </p>

<div type="div4" xml:id="SATSin"><head>Introduction to TEI Pointers</head>

<p>Before discussing the TEI pointer schemes, we introduce slightly
more formally the terminology used to define them. So far, we have
discussed only ways of pointing at components of the XML information
set node such as elements and attributes. However, there is often a
need in text analysis to address additional types of location such as
the <soCalled>point</soCalled> locations <emph>between</emph> 
<soCalled>nodes</soCalled>, and <soCalled>sequences</soCalled> that 
may arbitrarily cross the boundaries of nodes in a document.  The 
content of an XML document is organized sequentially as well as 
hierarchically, and it makes sense to consider ranges of characters 
within a document independently of the nodes to which they belong. 
From the perspective of most of the pointer schemes discussed below, 
a TEI document is a tree structure superimposed upon a character stream. 
Nodes are entities available only in the tree, while points are available 
only in the stream. For this reason, the schemes below that rely upon 
character positions (<code>string-index()</code>, 
<code>string-range()</code>, and <code>match()</code>) cannot take nodes 
into account. Similarly, XPath, being a method for locating nodes in the 
tree, treats those nodes as atomic, and is unable to address parts of nodes 
in their document context.</p>

<p>The TEI pointer scheme thus distinguishes the following
kinds of object:

<list type="gloss">
<label>Node</label>
<item>A node is an instance of one of the node kinds defined in
the <ref target="http://www.w3.org/TR/xpath-datamodel/">XQuery 
1.0 and XPath 2.0 Data Model (Second Edition)</ref>. It represents 
a single item in the XML information 	set for a document. For pointing
purposes, the only nodes that are of interest are Text Nodes,
Element Nodes, and Attribute nodes.</item> 
<label>Sequence</label>
<item>A Sequence follows the definition in the XPath 2.0 Data 
Model, with one alteration. A Sequence is an ordered collection
of zero or more items, where an item is either a node or a partial
text node. 
</item>
<label>Text Stream</label>
<item>A Text Stream is the concatenation of the text nodes in a document
and behaves as though all tags had been removed. A text stream begins
at a reference node and encompasses all of the text inside that node (if any)
and all the text following it in document order. In XPath terms, this would
encompass all of the text nodes beginning at a particular node, and following 
it on the <ref target="http://www.w3.org/TR/xpath20/#axes">following axis</ref>.
</item>
<label>Point</label>
<item>A Point represents a dimensionless point between nodes or characters in 
a document. Every point is adjacent to either characters or elements, and
never to another point. Points can only be referenced in relation to an 
element or text node in the document (i.e. something addressable by either 
an XPath or a fragment identifier). Points occur either immediately before 
or after an element, or at a numbered position inside a text stream. 
Position zero in the stream would be immediately before the first character. 
Note that points within attribute values cannot mark the beginning or end of 
a range extending beyond the attribute value, because points indicate a 
position within a document. Since attribute nodes are by definition un-ordered, 
they cannot be said to have a fixed position. 
</item>
</list>
</p>

<p>The TEI recommends the following seven pointer schemes:
<list type="gloss">
<label><name type="xpscheme">xpath()</name></label>
<item>Addresses a node or nodeset using the XPath syntax. (<ptr target="#SATSXP"/>)</item>
<label><name type="xpscheme">left()</name> and <name
type="xpscheme">right()</name></label>
<item>addresses the point before (left) or after (right) a node or node 
set (<ptr target="#SATSL"/> and <ptr target="#SATSR"/>)</item>
<label><name type="xpscheme">string-index()</name></label>
<item>addresses a point inside a text node (<ptr target="#SATSSI"/></item>
<label><name type="xpscheme">range()</name></label>
<item>addresses the range between two points (<ptr target="#SATSRN"/>)</item>
<label><name type="xpscheme">string-range()</name></label>
<item>addresses a range of a specified length starting from a
specified point (<ptr target="#SATSSR"/>)</item>
<label><name type="xpscheme">match()</name></label>
<item>addresses a range which matches a specified string within a node
(<ptr target="#SATSMA"/>)</item>
</list>
</p>
<p>The <name type="xpscheme">xpath()</name> scheme refers to the
existing XPath specification which is adopted with one modification:
the default namespace for any XPath used as a parameter to this 
scheme is assumed to be the TEI namespace <code>http://www.tei-c.org/ns/1.0</code>.
</p>
<p>The other six schemes overlap in functionality with a W3C draft
specification known as the <name type="xpscheme">XPointer
scheme</name> draft, but are individually much simpler. At the time of
this writing, there is no current or scheduled activity at the W3C
towards revising this draft or issuing it as a recommendation. </p>

<p><hi rend="bold">A note on namespaces</hi>: The W3C defines an 
<name type="xpscheme">xmlns()</name> scheme (see 
<ref target="http://www.w3.org/TR/xptr-xmlns/">XPointer xmlns() Scheme</ref>) 
which when prepended to a resolvable pointer allows for the definition of 
namespace prefixes to be used in XPaths in subsequent pointers. TEI Pointer 
schemes assume that un-prefixed element names in TEI Pointer XPaths are in the 
TEI namespace, <code>http://www.tei-c.org/ns/1.0</code>. The use of 
<name type="xpscheme">xmlns()</name> is thus optional, 
provided no new prefixes need to be defined. If the schemes described here
are used to address non-TEI elements, then any new prefixes to be used in 
pointer XPaths may be defined using the  <name type="xpscheme">xmlns()</name> 
scheme.</p>
</div>

<div type="div4" xml:id="SATSXP">
<head>xpath()</head>
<p><code>Sequence xpath(XPATH)</code></p>
<p>The <name type="xpscheme">xpath()</name> scheme locates a node
within an XML Information Set. The single argument
<rs>XPATH</rs> is an XPath path expression, following the latest 
scheme adopted by the W3C (currently 
<ref target="http://www.w3.org/TR/xpath20/">XPath 2.0</ref>), that 
returns a sequence. XPaths returning atomic values (e.g. 
<name>substring()</name>) are illegal in the 
<name type="xpscheme">xpath()</name> scheme because they 
represent extracted values rather than locations in the source 
document. XPath expressions that address attribute nodes are only 
advisable in the <name type="xpscheme">xpath()</name> scheme.
</p>
<p>The example below, and all subsequent examples in this section refer 
to the following TEI fragment<anchor xml:id="SATSXP-ex"/>:
<!-- Ostrakon from Trimithis (O.Trim 1, 1)  http://papyri.info/ddbdp/o.trim;1;1 -->
<egXML xmlns="http://www.tei-c.org/ns/Examples">
<div xml:lang="la" type="edition" xml:space="preserve"><ab>
<lb n="1"/><supplied reason="lost">si</supplied> non <choice><reg>habui</reg><orig>abui</orig></choice> quidquam vaco 
<lb n="2"/>si<gap reason="illegible" quantity="3" unit="character"/>b<gap reason="illegible" quantity="3" unit="character"/> 
  cohort<unclear>e</unclear> mi rescribas 
<lb n="3"/><unclear>s</unclear>emp<unclear>er</unclear> in <choice><reg>mente</reg><orig>mentem</orig></choice> 
  <choice><reg>habe</reg><orig>abe</orig></choice> supra res 
<lb n="4"/>scriptas<gap reason="lost" extent="unknown" unit="character"/> 
<lb n="5"/>auge et opto u<unclear>t</unclear> bene valeas</ab></div>
</egXML>
</p>
<p>A TEI Pointer that referenced the "normalized" form in the 
<code>choice</code> in line 1 of the example might look like:
<lb/><code>#xpath(//lb[@n='1']/following-sibling::choice/reg)</code>. 
</p>
<p>When an XPath is interpreted by a TEI processor, the
information set of the referenced document is interpreted
without any additional information supplied by any schema
processing that may or may not be present. In particular this
means that no whitespace normalization is applied to a
document before the XPath is interpreted.
</p>
<p>This pointer scheme allows easy, direct use of the most
widely-implemented XML query method. It is probably the most
robust pointing mechanism for the common situation of
selecting an XML element or its contents where an
<att>xml:id</att> is not present. The ability to use element
names and attribute names and
values makes <name type="xpscheme">xpath()</name> pointers more
robust than the other mechanisms discussed in this section 
even if the designated document changes. For durability in the
presence of editing, use of <att>xml:id</att> is always
recommended when possible.</p>
</div>

<div type="div4" xml:id="SATSL">
<head>left()</head>
<p><rs>Point</rs> <code>left( IDREF | XPATH )</code></p>
<p>The <name type="xpscheme">left()</name> scheme locates the
point immediately preceding the node addressed by its argument,
which is either an <rs>XPATH</rs> as defined above or an
<rs>IDREF</rs>, the value of an <att>xml:id</att>
occurring in the document addressed by the base URI in effect
for the pointer.</p>
<p>Example: the pointer <code>#left(//gap[1])</code> 
indicates the point between the first <code>lb</code> and the first
  <code>gap</code> in the <ref target="#SATSXP-ex">example</ref> above.</p>
<p>Example: <code>#left(l1)</code> indicates the point immediately before
the <code><![CDATA[<lb n="1"/>]]></code> element.</p></div>

<div type="div4" xml:id="SATSR">
<head>right()</head>
<p><rs>Point</rs> <code>right( IDREF | XPATH )</code></p>
<p>The <name type="xpscheme">right()</name> scheme locates the
point immediately following the node addressed by its argument.</p>
<p>Example: the pointer <code>#right(//lb[@n='3'])</code> 
indicates the point between the third <code>lb</code> and the 
<code><![CDATA[<unclear>s</unclear>]]></code> element
in the <ref target="#SATSXP-ex">example</ref>.</p>
</div>

<div type="div4" xml:id="SATSSI">
<head>string-index()</head>
<p><rs>Point</rs> <code>string-index( IDREF | XPATH, OFFSET )</code></p>
<p>The <name type="xpscheme">string-index()</name> scheme locates a
  point based on character positions in a text stream relative 
  to the node identified by the IDREF or XPATH parameter. The <rs>OFFSET</rs>
parameter is a positive, negative, or zero integer which determines 
the position of the <rs>point</rs>. An offset of 0 represents the 
position immediately before the first character in either the first 
text node descendant of the node addressed in the first parameter or the 
first following text node, if the addressed element contains 
no text node descendants.</p>
<p>Example: <code>#string-index(//lb[@n='2'],1)</code> indicates the point
between the <q>s</q> and the <q>i</q> in the word <q>si</q> in line 2.</p>
</div>

<div type="div4" xml:id="SATSRN">
<head>range()</head>
<p><rs>Sequence</rs> <code>range( POINTER, POINTER[, POINTER, POINTER ...])</code></p>
<p>The <name type="xpscheme">range()</name> scheme takes as parameters one 
or more pairs of <rs>POINTER</rs>s, which are each members of the set <rs>IDREF</rs>,
<rs>XPATH</rs>, <name type="xpscheme">left()</name>, 
<name type="xpscheme">right()</name>, or 
<name type="xpscheme">string-index()</name>. A 
<name type="xpscheme">range()</name> locates a (possibly non-contiguous) 
sequence beginning at the first POINTER parameter and ending at the 
last. If the POINTER locates a node (i.e. is an XPATH or IDREF), then 
that node is a member of the addressed sequence. If a sequence addressed
by a range pointer overlaps, but does not wholly contain, an element
(i.e. it contains only the start but not the end tag or vice-versa),
then that element is not part of the sequence.</p>
<p><name type="xpscheme">Range()</name>s may address sequences of 
non-contiguous nodes. For example, a range() might select text beginning 
before an <gi>app</gi>, encompassing the content of a single <gi>rdg</gi> 
and continuing after the <gi>app</gi>.</p>
<p>Example: <code>#range(left(//lb[@n='3']),left(//lb[@n='4']))</code> indicates 
the whole of <ref target="#SATSXP-ex">line 4</ref> from the 
<code><![CDATA[<lb n="3"/>]]></code> to the point right before the 
following <code><![CDATA[<lb n="4"/>]]></code>.</p>
<p>Example: <code>#range(right(//lb[@n='3']),string-index(//lb[@n='3'],15))</code>
  indicates the sequence <code><![CDATA[<unclear>s</unclear>emp<unclear>er</unclear> in mente]]></code>.</p>
<p>Example: <code>#range(string-index(//lb[@n='3'],7),string-index(//lb[@n='3'],10),string-index(//lb[@n='3'],15),string-index(//lb[@n='3'],21))</code> indicates
the non-contiguous sequence <q>in mentem</q>.</p>
</div>

<div type="div4" xml:id="SATSSR">
<head>string-range()</head>
<p><rs>Sequence</rs> <code>string-range(IDREF | XPATH, OFFSET, LENGTH[, OFFSET, LENGTH ...])</code></p>
<p>The string-range() scheme
locates a sequence based on character positions in a text stream relative 
to the node identified by the first parameter. The location of the 
beginning of the addressed sequence is determined precisely
as for <name type="xpscheme">string-index()</name>. The <rs>OFFSET</rs>
parameter is defined as above in <name type="xpscheme">string-index()</name>.
The <rs>LENGTH</rs> parameter is a positive integer that denotes
the length of the text stream captured by the sequence. As with 
<name type="xpscheme">range()</name>, the addressed sequence may 
contain text nodes and/or elements. The 
<name type="xpscheme">string-range()</name> scheme, can accept multiple 
OFFSET, LENGTH pairs to address a non-contiguous sequence in mauch the
same way that range() can accept multiple pairs of pointers.</p> 
<p>Because string-range() addresses points in the text stream, tags are
invisible to it. For example, if an empty tag like <gi>lb</gi> is
encountered while processing a string-range(), it will be included in
the resulting sequence, but the LENGTH count will not increment when
it is captured.</p>
<p>Example: <code>#string-range(//lb[@n='5'],0,27)</code> indicates 
the whole of <ref target="#SATSXP-ex">line 5</ref> from the text immediately 
following the<code>lb</code> to the point right before the closing 
<code>ab</code> tag.</p>
<p>Example: <code>#string-range(//lb[@n='4'],7,8)</code>
indicates the sequence <q>in mente</q>.</p>
<p>Example: <code>#string-range(//lb[@n='4'],7,3,15,6)</code> indicates
the non-contiguous sequence <q>in mentem</q>.</p>
</div>
  
<div type="div4" xml:id="SATSMA">
<head>match()</head>
<p><rs>Sequence</rs> <code>match(IDREF | XPATH, 'REGEX' [, INDEX])</code></p>
<p>The match scheme locates a sequence based on matching the REGEX parameter
against a text stream relative to the reference node identified by the first 
parameter. REGEX is a regular expression as defined by 
<ref target="http://www.w3.org/TR/xpath-functions/#regex-syntax">XQuery 
1.0 and XPath 2.0 Functions and Operators (Second Edition)</ref>, with some
modifications: 
<list>
  <item>Because the regular expression is delimited by apostrophe 
    characters, any such characters (<code>'</code> or <code>\u0027</code>) 
    occurring inside the expression must be escaped using the URI 
    percent-encoding scheme <code>%27</code>. </item>
  <item>Regular expressions in <code>match()</code> are assumed to
    operate in multi-line mode. The end of the string to be matched
    against is either the end of the text contained by the element in the
    first parameter or the end of the document, if that parameter
    indicates an empty element. The meta-character <code>^</code>
    therefore matches the beginning of the text stream inside or following
    the reference node, and the meta-character <code>$</code> matches the
    end of that stream.
  </item>
</list>
The optional INDEX parameter is an integer greater than 0 which specifies which 
match should be chosen when there is more than one possibility. If omitted, the 
first match in the text stream will be used.</p>
<p>Like <code>string-range()</code>, <code>match()</code> may capture elements 
in the returned sequence, even though they are ignored for purposes of evaluating 
the match.</p>
<p>Example: <code>#match(//lb[@n='5'],'opto.*valeas')</code> indicates the sequence
<code><![CDATA[opto u<unclear>t</unclear> bene valeas]]></code> in 
<ref target="#SATSXP-ex">line 5</ref>.</p>
<p>Example: <code>#match(//lb[@n='3'],'semper')</code> would indicate the 
word <q>semper</q>, but would not capture the <code>unclear</code> elements
in <code><![CDATA[<unclear>s</unclear>emp<unclear>er</unclear>]]></code>, just 
their text children.</p>
</div>
</div>
<div type="div3" xml:id="SACR">
<head>Canonical References</head>

<p>By <soCalled>canonical</soCalled> reference we mean any means
of pointing into documents, specific to a community or
corpus. For example, biblical scholars might understand <q>Matt
5:7</q> to mean <q>the book called <title>Matthew</title>, chapter
5, verse 7.</q> They might then wish to translate the string
<q>Matt 5:7</q> into a pointer into a TEI-encoded document,
selecting the element which corresponds to the seventh
<gi>div</gi> element within the fifth <gi>div</gi> element
within the <gi>div</gi> element with the <att>n</att> attribute
valued <q>Matt.</q></p>
<p>Several elements in the TEI scheme (<gi>gloss</gi>,
<gi>ptr</gi>, <gi>ref</gi>, and <gi>term</gi>) bear a special
attribute, <att>cRef</att>, just for this purpose. Using the
system described in this section, an encoder may specify
references to canonical works in a discipline-familiar format,
and expect software to derive a complete URI from it. The value
of the <att>cRef</att> attribute is processed as described in
this section, and the resulting URI reference is treated as if
it were the value of the <att>target</att> attribute. The
<att>cRef</att> and <att>target</att> attributes are mutually
exclusive: only one or the other may be specified on any given
occurrence of an element.</p>
<p>For the <att>cRef</att> attribute to function as required, a mechanism is needed to define the
mapping between (for example) <q>the book called
<title>Matt</title></q> and the part of the XML structure which
corresponds with it. This is provided by the  <gi>refsDecl</gi> element <!--(which is a member of class
<ident type="class">att.declaring</ident>)--> in the TEI header,
which contains an algorithm for translating a canonical reference string
(like <val>Matt 5:7</val>) into a URI such as <code>#xpath(//div[@n='Matt']/div[5]/div[7])</code>. The
<gi>refsDecl</gi> element is described in section <ptr target="#HD54"/>; the following example is discussed in more
detail below in section <ptr target="#SACRWE"/>.
<egXML xmlns="http://www.tei-c.org/ns/Examples"><refsDecl xml:id="biblical">
   <cRefPattern matchPattern="(.+) (.+):(.+)" replacementPattern="#xpath(//div[@n='$1']/div[@n='$2']/div[@n='$3]')">
    <p>This pointer pattern extracts and references the <q>book,</q>
<q>chapter,</q> and <q>verse</q> parts of a biblical reference.</p>
   </cRefPattern>
   <cRefPattern matchPattern="(.+) (.+)" replacementPattern="#xpath(//div[@n='$1']/div[$2])">
    <p>This pointer pattern extracts and references the <q>book</q> and
<q>chapter</q> parts of a biblical reference.</p>
   </cRefPattern>
   <cRefPattern matchPattern="(.+)" replacementPattern="#xpath(//div[@n='$1'])">
    <p>This pointer pattern extracts and references just the <q>book</q>
part of a biblical reference.</p>
   </cRefPattern>
</refsDecl></egXML>
</p>
<p>When an application encounters a canonical reference as the
value of <att>cRef</att> attribute, it might follow this sequence of
specific steps to transform it into a URI reference:
<list rend="numbered">
  <item>Ascertain the correct <gi>refsDecl</gi>
  following the rules summarized in section <ptr target="#CCAS3"/>.</item>
  <item>For each <gi>cRefPattern</gi> element encountered in
  the appropriate <gi>refsDecl</gi>, in the order encountered:
  <list rend="numbered">
    <item>match the value of the <att>cRef</att> attribute to the regular
    expression found as the value of the <att>matchPattern</att>
    attribute</item>
    <item>if the value of the <att>cRef</att> attribute matches:
      <list rend="numbered">
        <item>take the value of the <att>replacementPattern</att> 
          attribute and substitute the back references ($1, $2, 
          etc.) with the corresponding matched substrings</item>
        <item>the result is taken as if it were a relative or 
          absolute URI reference specified on the <att>target</att> 
          attribute; i.e., it should be used as is or combined with
          the current <att>xml:base</att> attribute value as usual</item>
        <item>no further processing of this value of the <att>cRef</att> 
          attribute against the <gi>refsDecl</gi> should take place</item>
      </list>
    </item>
    <item>if, however, the value of the <att>cRef</att> attribute does not match
    the regular expression specified in the value of the <att>matchPattern</att> attribute,
    proceed to the next <gi>cRefPattern</gi></item>
  </list>
  </item>
  <item>If all the <gi>cRefPattern</gi> elements are
  examined in turn and none matches, the pointer fails.</item>
</list></p>
<p>The regular expression language used as the value of the
<att>matchPattern</att> attribute is that used for the
<term>pattern</term> facet of the World Wide Web Consortium's
XML Schema Language in an <ref target="http://www.w3.org/TR/xmlschema-2/#regexs">Appendix to
XML Schema Part 2</ref>.<note place="bottom">As always
seems to be the case, no two regular expression languages are
precisely the same. For those used to Perl regular expressions,
be warned that while in Perl the pattern <code>tei</code>
matches any string that contains <mentioned>tei</mentioned>, in
the W3C language it only matches the string <q>tei</q>.</note>
The value of the <att>replacementPattern</att> attribute is simply a string,
except that occurrences of <q>$1</q> through <q>$9</q> are
replaced by the corresponding substring match. Note that since a
maximum of nine substring matches are permitted, the string
<q>$18</q> means <q>the value of the first matched substring
followed by the character <q>8</q></q> as opposed to <q>the
eighteenth matched substring</q>. If there is a need for an
actual string including a dollar sign followed by a digit that is
not supposed to be replaced, the dollar sign should be written
as <code>$$</code>. Implementations must convert <code>$$</code> 
to <code>$</code> during processing.</p>
<!--<p>A TEI application
encounters a canonical reference, for example <egXML
xmlns="http://www.tei-c.org/ns/Examples">This story is continued
in <ptr cRef="Matt 5:7" decls="#biblical"/>.</egXML> and wants
to be able to convert it to a standard URI Reference that
corresponds to <q>Matt 5:7</q>.</p>
  <p>The application first follows the URI in the
  <ident>decls</ident> attribute, which points to a
  <gi>refsDecl</gi> element in the local document or a remote
  document<note place="bottom" resp="#sdb">As with other elements
  in the <ident type="class">tei.delcarable</ident> class, the
  default is the <gi>refsDecl</gi> in the <gi>teiHeader</gi> of
  the current document; or, if there are more than one
  <gi>refsDecl</gi>s, the one that has a <val>yes</val> value
  specified for the <att>default</att> attriubte.</note> Within
  that declaration (see example above), it refers to the list of
  <gi>cRefPattern</gi>s, and for each pattern in the order
  specified, applies the regular expression found on the
  <att>matchPattern</att> attribute to the reference <q>Matt 5:7</q>.</p>
  <p>If
  the first regular expression matches, it applies the matched
  substrings (in this case, <q>Matt</q>, <q>5</q>, and <q>7</q>)
  to the string in the <att>replacementPattern</att> attribute of that
  <gi>cRefPattern</gi> element, substituting the first
  matched substring for $1, the second for $2, and so on, to
  produce an absolute or relative URI. In the case that a
  relative URI is produced, it is relative to whatever
  <att>xml:base</att> is in force for the pointer itself, i.e.
  the <gi>ptr</gi> or <gi>ref</gi> element that bore the
  <att>cRef</att> attripte.</p>
  <p>If the regular expression in the first
  <gi>cRefPattern</gi> element does not match, the regular
  expression in the second <gi>cRefPattern</gi> element is
  tried, and so on.</p> -->
<div type="div4" xml:id="SACRWE">
  <head>Worked Example</head>
  <p>Let us presume that with the example <gi>refsDecl</gi>
  above, an application comes across a <att>cRef</att> value of
  <val>Matt 5:7</val> inside a <gi>div</gi> which has an
  <att>xml:base</att> of
  <val>http://www.example.org/resources/books/Bible.xml</val>. The
  application would first apply the regular expression
  <code>(.+) (.+):(.+)</code> to <q>Matt 5:7</q>. This regular
  expression would successfully match. The first matched
  substring would be <q>Matt</q>, the second <q>5</q>, and the
  third <q>7</q>. The application would then apply these
  substrings to the pattern
  <code>#xpath(//div[@n='$1']/div[$2]/div[$3])</code>, producing
  <code>#xpath(//div[@n='Matt']/div[5]/div[7])</code>. It would
  append this to the <att>xml:base</att> in force, thus
  generating the complete URI Reference
  <code>http://www.example.org/resources/books/Bible.xml#xpath(//div[@n='Matt']/div[5]/div[7])</code>.
  </p>
  <p>If, however, the input string had been <q>Matt 5</q>, the
  first regular expression would not have matched. The
  application would have then tried the second, <code>(.+)
  (.+)</code>, producing a successful match, and the matched
  substrings <q>Matt</q> and <q>5</q>. It would then have
  substituted those matched substrings into the pattern
  <code>#xpath(//div[@n='$1']/div[$2])</code> to produce a
  fragment identifier, which when appended to the
  <att>xml:base</att> in force produces the absolute URI
  reference 
  <code>http://www.example.org/resources/books/Bible.xml#xpath(//div[@n='Matt']/div[5])</code>.</p>
  <p>If the input string had been <q>Matt</q>, neither the first
    nor the second regular expressions would have successfully
    matched. The application would have then tried the third,
    <code>(.+)</code>, producing the matched substring <q>Matt</q>,
    and the URI Reference
    <code>http://www.example.org/resources/books/Bible.xml#xpath(//div[@n='Matt'])</code>.</p>
  <p>a <gi>cRefPattern</gi> should not reference more matched substrings. For example:
    <egXML xml:lang="und" xmlns="http://www.tei-c.org/ns/Examples"><cRefPattern matchPattern="(.+) (.+):(.+)" replacementPattern="//div[@n='$1']/div[$2]/div[$3]/p[$4]"/></egXML>
    is faulty, since only three matched
    substrings would have been produced, but a fourth (<code>$4</code>) was
    referenced.</p>
</div>
<div type="div4" xml:id="SACRex">
  <head>Complete and Partial URI Examples</head>
  <p>In the above example, the value of <att>cRef</att> was used
  to generate a Fragment Identifier, which in turn was used to
  generate a complete URI. The complete URI could be generated
  directly, as in the following example.
    <egXML xml:lang="und" xmlns="http://www.tei-c.org/ns/Examples">
    <refsDecl xml:id="USC">
<cRefPattern matchPattern="([0-9][0-9])\s*U\.?S\.?C\.?\s*[Cc](h(\.|ap(ter|\.)?)?)?\s*([1-9][0-9]*)" replacementPattern="http://uscode.house.gov/download/pls/$1C$5.txt">
  <p>Matches most standard references to particular
  chapters of the United States Code, e.g.
  <val>11USCC7</val>, <val>17 U.S.C. Chapter 3</val>, or
  <val>14 USC Ch. 5</val>. Note that a leading zero is
  required for the title (must be two digits), but is not
  permitted for the chapter number.</p>
</cRefPattern>
<cRefPattern matchPattern="([0-9][0-9])\s*U\.?S\.?C\.?\s*[Pp](re(lim(inary)?)?)?\s*[Mm](at(erial)?)?" replacementPattern="http://uscode.house.gov/download/pls/$1T.txt">
  <p>Matches references to the preliminary material for a
  given title, e.g. <val>11USCP</val>, <val>17 U.S.C.
  Prelim Mat</val>, or <val>14 USC pm</val>.</p>
</cRefPattern>
<cRefPattern matchPattern="([0-9][0-9])\s*U\.?S\.?C\.?\s*[Aa](ppend(ix)?)?" replacementPattern="http://uscode.house.gov/download/pls/$1A.txt">
  <p>Matches references to the appendix of a given tile,
  e.g. <val>05USCA</val>, <val>11 U.S.C. Appendix</val>,
  or <val>18 USC Append</val>.</p>
</cRefPattern>
    </refsDecl>
    <!-- ... -->
    <p>The example in section 10 is taken
    from <ref cRef="17 USC Ch 1">Subject Matter and Scope of
    Copyright</ref>.</p>
  </egXML>
  </p>
  <p>See <ptr target="#SAPU"/> for another related use of the <att>matchPattern</att> and <att>replacementPattern</att> attributes.</p>
</div>
<div type="div4" xml:id="SACRmu"><head>Miscellaneous Usages</head>
<p>Canonical reference pointers are intended for use by TEI
    encoders. However, this specification might be useful to the
    development of a process for recognizing canonical
    references in non-TEI documents (such as plain text
    documents), possibly as part of their conversion to TEI.</p>
</div>
      
    </div>
  </div>
  <div type="div2" xml:id="SASE">
    <head>Blocks, Segments, and Anchors</head>

<p>In this section, we discuss three general purposes elements which
may be used to mark and categorize both a span of text and a point
within one. These elements have several uses, most notably to provide
elements which can be given identifiers for use when aligning or
linking to parts of a document, as discussed elsewhere in this
chapter. They also provide a convenient way of extending the semantics
of the TEI markup scheme in a theory-neutral manner, by providing for
two neutral or <soCalled>anonymous</soCalled> elements to which the
encoder can add any meaning not supplied by other TEI defined
elements.
    <specList>
<specDesc key="anchor"/>
<specDesc key="ab" />
<specDesc key="seg"/>
    </specList>
    The elements <gi>anchor</gi>, <gi>ab</gi>, and <gi>seg</gi> are members of
    the class <ident type="class">att.typed</ident>, from which they
    inherit the following attributes:
    <specList>
<specDesc key="att.typed" atts="type subtype"/>
    </specList>
The elements <gi>ab</gi>, and <gi>seg</gi> are members of
    the class <ident type="class">att.fragmentable</ident>, from which they
    inherit the following attribute:
    <specList>
<specDesc key="att.fragmentable" atts="part"/>
</specList>
    The <gi>seg</gi> element is also a member of the class <ident type="class">att.segLike</ident> from which it inherits the
    following attribute:
    <specList>
      <specDesc key="att.segLike" atts="function"/>
    </specList>
</p>
    <p>The <gi>anchor</gi> element may be thought of as an empty
    <gi>seg</gi>, or as an artifice enabling an identifier to be
    attached to any position in a text. Like the <gi>milestone</gi>
    element discussed in section <ptr target="#CORS"/>, it is useful
    where multiple views of a document are to be combined, for
    example, when a logical view based on paragraphs or verse lines is
    to be mapped on to a physical view based on manuscript lines. Like
    those elements, it is a member of the class <ident type="class">model.global</ident> and can therefore appear
    anywhere within a document when the module defined by this chapter
    is included in a schema. Unlike the other elements in its class,
    the <gi>anchor</gi> element is primarily intended  to mark
    an arbitrary point used for alignment, or as the target of a
    spanning element such as those discussed in section <ptr target="#PHAD"/>, rather than as a means of marking segment
    boundaries for some arbitrary segmentation of a text.</p>
    <p>For example, suppose that we wish to mark the end of the fifth
    word following each occurrence of some term in a particular text,
    perhaps to assist with some collocational analysis. This can most
    easily be done with the help of the <gi>anchor</gi> element, as
    follows:
    <!-- Vladimir Nabokov, Pnin, 1953, p14 of 1967 Avon pb reprinting -->
    <egXML xmlns="http://www.tei-c.org/ns/Examples">English language. Except for not very<anchor xml:id="eng1"/>
English at all at the time<anchor xml:id="eng2"/>
English was still full of flaws<anchor xml:id="eng3"/>
English. This was revised by young<anchor xml:id="eng4"/></egXML>
    In section <ptr target="#SACS1"/> we discuss ways in which these
    <gi>anchor</gi> points might be used to represent an alignment
    such as one might get in a keyword-in-context concordance.</p>
    <p>The <gi>seg</gi> element may be used at the encoder's
    discretion to mark almost any segment of the text of interest for
    processing. One use of the element is to mark text features for
    which no appropriate markup is otherwise defined, i.e. as a simple
    extension mechanism. Another use is to provide an identifier for
    some segment which is to be pointed at by some other element, i.e.
    to provide a target, or a part of a target, for a <gi>ptr</gi> or
    other similar element.</p>
    <p>Several examples of uses for the <gi>seg</gi> element are
    provided elsewhere in these Guidelines. For example:
    <list rend="bulleted">
<item>as a means of marking segments significant in a metrical
or rhyming analysis (see section <ptr target="#VEME"/>)</item>
<item>as a means of marking typographic lines in drama (see
section <ptr target="#DRBOD"/>) or title pages (see section <ptr target="#DSTITL"/>)</item>
<item>as a means of marking prosody- or pause-defined units in
transcribed speech (see section <ptr target="#TSSASE"/>)</item>
<item>as a means of marking linguistic or other analyses in a
theory-neutral manner (see chapter <ptr target="#AI"/>
passim)</item></list></p>

<p>In the following simple example, the <gi>seg</gi> element simply
delimits the extent of a stutter, a textual feature for which no
element is provided in these Guidelines.  <egXML xmlns="http://www.tei-c.org/ns/Examples" source="#SASE-eg-32"><q>Don't say <q><seg type="stutter">I-I-I</seg>'m afraid,</q> Melvin, just say <q>I'm
afraid.</q></q></egXML>
<!-- David Shields, Dead Languages, 1990, p10 -->
The <gi>seg</gi> element is particularly useful for the markup
of linguistically significant constituents such as the phrases
that may be the output of an automatic parsing system. This
example also demonstrates the use of the <att>xml:id</att>
attribute to carry an identifier which other parts of a document
may use to point to, or align with:
<egXML xmlns="http://www.tei-c.org/ns/Examples" source="#SASE-eg-33">  <seg xml:id="bl0034" type="sentence">  
    <seg xml:id="bl0034.1" type="phrase">Literate and illiterate speech</seg>
    <seg xml:id="bl0034.2" type="phrase">in a language like English</seg>
    <seg xml:id="bl0034.3" type="phrase">are plainly different.</seg>
</seg></egXML>
<!-- Bloomfield, "Literate and Illiterate Speech", 1927   --></p>
<p>As the above example shows, <gi>seg</gi> elements may be
nested directly within one another, to any degree of analysis
considered appropriate. This is taken a little further in the
following example, where the <att>type</att> and
<att>subtype</att> attributes have been used to further
categorize each word of the sentence (the <att>xml:id</att>
attributes have been removed to reduce the complexity of the
example):
<egXML xmlns="http://www.tei-c.org/ns/Examples">  <seg type="sentence" subtype="declarative">  
    <seg type="phrase" subtype="noun">    
<seg type="word" subtype="adjective">Literate</seg>
<seg type="word" subtype="conjunction">and</seg>
<seg type="word" subtype="adjective">illiterate</seg>
<seg type="word" subtype="noun">speech</seg>
    </seg>
    <seg type="phrase" subtype="preposition">    
<seg type="word" subtype="preposition">in</seg>
<seg type="word" subtype="article">a</seg>
<seg type="word" subtype="noun">language</seg>
<seg type="word" subtype="preposition">like</seg>
<seg type="word" subtype="noun">English</seg>
    </seg>
    <seg type="phrase" subtype="verb">    
<seg type="word" subtype="verb">are</seg>
<seg type="word" subtype="adverb">plainly</seg>
<seg type="word" subtype="adjective">different</seg>
    </seg>
    <seg type="punct">.</seg>
  </seg></egXML></p>
<p>(The example values shown are chosen for simplicity of
comprehension, rather than verisimilitude). It should also be
noted that specialized segment elements are defined in section
<ptr target="#AILC"/> to facilitate this particular kind of
analysis. These allow for the explicit markup of units called
<term>s-units</term>, <term>clauses</term>,
<term>phrases</term>, <term>words</term>, <term>morphemes</term>,
and <term>characters</term>, which may be felt preferable to the
more generic approach typified by use of the <gi>seg</gi>
element. Using these, the first phrase above might be encoded
simply as
<egXML xmlns="http://www.tei-c.org/ns/Examples">  <phr type="noun">    
    <w type="adjective">Literate</w>
    <w type="conjunction">and</w>
    <w type="adjective">illiterate</w>
    <w type="noun">speech</w>
</phr></egXML>
Note the way in which the <att>type</att> attribute of these
specialized elements now carries the value carried by the
<att>subtype</att> attribute of the more general <gi>seg</gi>
element. For an analysis not using these traditional linguistic
categories however, the <gi>seg</gi> element provides a simple
but powerful mechanism.</p>
<p>In language corpora and similar material, the <gi>seg</gi>
element may be used to provide an end-to-end segmentation as an
alternative to the more specific <gi>s</gi> element proposed in
chapter <ptr target="#AILC"/> for the markup of orthographic
sentences, or <term>s-units</term>. However, it may be more
useful to use the <gi>s</gi> element for this purpose, since
this means that the <gi>seg</gi> element can then be used to
mark both features within s-units and segments composed of
s-units, as in the following example:<note place="bottom">See
section <ptr target="#AISP"/>, where the text from which this
fragment is taken is analyzed.</note>
<egXML xmlns="http://www.tei-c.org/ns/Examples" source="#AI-eg-01"><seg xml:id="s1s3" type="narrative_unit">
  <s xml:id="s1">Sigmund, the <seg type="patronymic">son of Volsung</seg>,
 was a king in Frankish country.</s>
  <s xml:id="s2">Sinfiotli was the eldest of his sons.</s>
  <s xml:id="s3"> ... </s>
</seg></egXML></p>
<p>Like other elements, the <gi>seg</gi> tag must be properly
enclosed within other elements. Thus, a single <gi>seg</gi>
element can be used to group together words in different
sentences only if the sentences are not themselves tagged. The
first of the following two encodings is legal, but the second is
not.
<!-- Made up example -->
<egXML xmlns="http://www.tei-c.org/ns/Examples">Give me <seg type="phrase">a dozen. Or two or three.</seg></egXML>
<eg xml:space="preserve"><![CDATA[<!-- Illegal! -->
<s>Give me <seg type="phrase">a dozen.</s>
<s>Or two or three.</s></seg>]]></eg></p>
<p>The <att>part</att> attribute may be used as one simple
method of overcoming this restriction:
<egXML xmlns="http://www.tei-c.org/ns/Examples"><s>Give me <seg type="phrase" part="I">a dozen.</seg></s>
<s><seg part="F">Or two or three.</seg></s></egXML>
Another solution is to use the <gi>join</gi> element discussed
in section <ptr target="#SAAG"/>; this requires that each of the
<gi>seg</gi> elements be given an identifier. For further
discussion of this generic encoding problem, see also chapter
<ptr target="#NH"/>.</p>
<p>The <gi>seg</gi> element has the same content as a paragraph
in prose: it can therefore be used to group together consecutive
sequences of <ident type="class">model.inter</ident> class elements,
such as lists, quotations, notes, stage directions, etc. as well
as to contain sequences of phrase-level elements. It cannot
however be used to group together sequences of paragraphs or
similar text units such as verse lines; for this purpose, the
encoder should use intermediate pointers, as described in
section <ptr target="#SAPTIP"/> or the methods described in
section <ptr target="#SAAG"/>. It is particularly important that
the encoder provide a clear description of the principles by
which a text has been segmented, and the way in which that
segmentation is represented. This should include a description
of the method used and the significance of any categorization
codes. The description should be provided as a series of
paragraphs within the <gi>segmentation</gi> element of the
encoding description in the TEI header, as described in section
<ptr target="#HD53"/>.</p>
<p>The <gi>seg</gi> element may also be used to encode
simultaneous or mutually exclusive variants of a text when the
more special purpose elements for simple editorial changes,
abbreviation and expansion, addition and deletion, or for a
critical apparatus are not appropriate. In these circumstances,
one <gi>seg</gi> is encoded for each possible variant, and the
set of them is enclosed in a <gi>choice</gi> element. </p>
<p>For example, if one were writing dual-platform instructions for
installation of software, it might be useful to use <gi>seg</gi>
to record platform-specific pieces of mutually exclusive text.
<egXML xmlns="http://www.tei-c.org/ns/Examples">…pressing <choice><seg type="platform" subtype="Mac">option</seg>
<seg type="platform" subtype="PC">alt</seg></choice>-f will …</egXML></p>
<p>Elsewhere in this chapter we provide a number of examples
where  the <gi>seg</gi> element is used simply to provide an
element to which an identifier may be attached, for example so
that another segment may be linked or related to it in some
way.</p>
<p>The <gi>ab</gi> (anonymous block) element performs a similar
function to that of the <gi>seg</gi> element, but is used for portions
of the text which occur not within paragraphs or other component-level
elements, but at the component level themselves. It is therefore a
member of the <ident type="class">model.pLike</ident> class.</p>

<p>The <gi>ab</gi> element may be used, for
example, to tag the canonical verse divisions of Biblical texts:
<egXML xmlns="http://www.tei-c.org/ns/Examples" source="#SASE-eg-40"><div1 n="Gen" type="book">
  <head>The First Book of Moses, Called</head>
  <head type="main">Genesis</head>
  <div2 n="1" type="chapter">
    <ab n="1">In the beginning God created the heaven and the
earth.</ab>
    <ab n="2">And the earth was without form, and void; and darkness
<hi>was</hi> upon the face of the deep. And the Spirit of God
moved upon the face of the waters.</ab>
    <ab n="3">And God said, Let there be light: and there was
light.</ab>
  </div2>
</div1></egXML>
<!--* Authorized Version *--></p>
<p>In other cases, where the text clearly indicates paragraph
divisions containing one or more verses, the <gi>p</gi> element
may be used to tag the paragraphs, and the <gi>seg</gi> element
used to subdivide them. The <gi>ab</gi> element is provided as
an alternative to the <gi>p</gi> element; it may
<emph>not</emph> be used within paragraphs. The <gi>seg</gi>
element, by contrast, may appear only within and not between
paragraphs (or anonymous block elements).
<egXML xml:lang="de" xmlns="http://www.tei-c.org/ns/Examples" source="#SASE-eg-41"><div1 n="Gen" type="book">
  <head>Das Erste Buch Mose.</head>
  <div2 n="1" type="chapter">
    <p>
<seg n="1">Am Anfang schuff Gott Himel vnd Erden.</seg>
<seg n="2">Vnd die Erde war wüst vnd leer / vnd es war
  finster auff der Tieffe / Vnd der Geist Gottes schwebet auff
  dem Wasser.</seg></p>
    <p>
<seg n="3">Vnd Gott sprach / Es werde Liecht / Vnd es ward
   Liecht.</seg></p>
  </div2>
</div1></egXML>
<!--* Martin Luther [tr]. Die gantze Heilige Schrifft Deudsch.
Wittenberg 1545. Letzte zu Luthers Lebzeiten erchienene Ausgabe, hsg.
Hans Volz unter Mitarbeit von Heinz Blanke. Textredaktion Friedrich Kur.
M&uuml;nchen: Rogner & Bernhard, 1972. *--></p>
<p>The <gi>ab</gi> element is also useful for marking dramatic
speeches when it is not clear whether the speech is to be
regarded as prose or verse. If, for example, an encoder does not
wish to express an opinion as to whether the opening lines of
Shakespeare's <title>The Tempest</title> are to be regarded as
prose or as verse, they might be tagged as follows: <egXML xmlns="http://www.tei-c.org/ns/Examples" source="#CODR-eg-295"><div1 n="I" type="act">
  <div2 n="1" type="scene">
    <head rend="italic">Actus primus, Scena prima.</head>
    <stage rend="italic" type="setting"> A tempestuous noise of 
Thunder and Lightning heard: 
Enter a Ship-master, and a Boteswaine.</stage>
    <sp><speaker>Master.</speaker>
  <ab>Bote-swaine.</ab></sp>
    <sp><speaker>Botes.</speaker>
  <ab>Heere Master: What cheere?</ab></sp>
    <sp><speaker>Mast.</speaker>
  <ab>Good: Speake to th' Mariners: fall too't, yarely,
     or we run our selues a ground, bestirre, bestirre. 
     <stage type="move">Exit.</stage>
  </ab></sp>
    <stage type="move">Enter Mariners.</stage>
    <sp><speaker>Botes.</speaker>
  <ab>Heigh my hearts, cheerely, cheerely my harts: yare, yare:
    Take in the toppe-sale: Tend to th' Masters whistle: Blow
    till thou burst thy winde, if roome e-nough.</ab></sp>
  </div2>
</div1></egXML>
See further <ptr target="#CODR"/> and <ptr target="#DRPAL"/>.</p>
<specGrp xml:id="DSASA" n="Blocks Segments and Anchors">










<include xmlns="http://www.w3.org/2001/XInclude" href="../../Specs/ab.xml"/>















<include xmlns="http://www.w3.org/2001/XInclude" href="../../Specs/anchor.xml"/>















<include xmlns="http://www.w3.org/2001/XInclude" href="../../Specs/seg.xml"/>





</specGrp>
    </div>
    <div type="div2" xml:id="SACS">
<head>Correspondence and Alignment</head>
<p>In this section we introduce the notions of
<term>correspondence</term>, expressed by the <att>corresp</att>
attribute, and of <term>alignment</term>, which is a special
kind of correspondence involving an ordered set of
correspondences. Both cases may be represented using the
<gi>link</gi> and <gi>linkGrp</gi> elements introduced in
section <ptr target="#SAPT"/>. We also discuss the special case
of alignment in time or <term>synchronization</term>, for which
special purpose elements are proposed in section <ptr target="#SASY"/>.</p>
<div type="div3" xml:id="SACS1">
  <head>Correspondence</head>
  <p>A common requirement in text analysis is to represent
  correspondences between two or more parts of a single
  document, or between places in different documents. Provided
  that explicit elements are available to represent the parts or
  places to be linked, then the global linking attribute
  <att>corresp</att> may be used to encode such correspondence,
  once it has been identified.
  <specList>
    <specDesc key="att.global.linking" atts="corresp"/>
  </specList>
  This is one of the attributes made available by the mechanism
  described in the introduction to this chapter (<ptr target="#SA"/>). Correspondence can also be expressed by means
  of the <gi>link</gi> element introduced in section <ptr target="#SAPT"/>.</p>
  <p>Where the correspondence is between <emph>spans</emph>, the
  <gi>seg</gi> element should be used, if no other element is
  available. Where the correspondence is between
  <emph>points</emph>, the <gi>anchor</gi> element should be
  used, if no other element is available.</p>
  <p>The use of the <att>corresp</att> attribute with spans of
  content is illustrated by the following example:
<egXML xmlns="http://www.tei-c.org/ns/Examples" source="#SACS1-eg-48"><title xml:id="SHIRLEY">Shirley</title>, which made
its Friday night debut only a month ago, was
not listed on <name xml:id="NBC">NBC</name>'s new schedule,
although <seg corresp="#NBC" xml:id="NETWORK">the network</seg>
says <seg corresp="#SHIRLEY" xml:id="SHOW">the show</seg>
still is being considered.</egXML>
<!-- from G Leech. -->
  <!-- commas after 'Shirley' and 'ago' added conjecturally by  -->
  <!-- msm after consultation with lb.  -->
  Here the anaphoric phrases <mentioned>the network</mentioned>
  and <mentioned>the show</mentioned> have been associated
  directly with the elements to which they refer by means of
  <att>corresp</att> attributes. This mechanism is simple to
  apply, but has the drawback that it is not possible to specify
  more exactly what kind of correspondence is intended. Where
  this attribute is used, therefore, encoders are encouraged to
  specify their intent in the associated encoding description
  in the TEI header.</p>
  <p>Essentially, what the <att>corresp</att> attribute does is
  to specify that elements  bearing this attribute and those
to which the attribute points are doubly linked. In the example above,
the use of the <att>corresp</att> attribute indicates that the <gi>seg</gi> element containing <q>the show</q> and the 
<gi>title</gi> element containing <q>Shirley</q> correspond to each
other: the correspondence relationship is not <soCalled>from</soCalled>
one to the other, but <soCalled>between</soCalled> the two
objects. It is thus different from the <att>target</att>  attribute,
and provides functionality more similar to that of the  <gi>link</gi> and <gi>linkGrp</gi> elements defined in
  section <ptr target="#SAPT"/>, although it lacks the ability to
  indicate more precisely what kind of correspondence is
  intended as in the following retagging of the preceding
  example.
<egXML xmlns="http://www.tei-c.org/ns/Examples" xml:lang="en"><title xml:id="shirley">Shirley</title>, which made
its Friday night debut only a month ago, was not
listed on <name xml:id="nbc">NBC</name>'s new schedule,
although <seg xml:id="network">the network</seg> says
<seg xml:id="show">the show</seg> still is being considered.
<linkGrp type="anaphoric_link" targFunc="antecedent anaphor">
   <link target="#shirley #show"/>
   <link target="#nbc #network"/>
</linkGrp></egXML></p>
<p>In the following example, we use the same mechanism to
express a correspondence amongst the anchors introduced following the
fifth word after <mentioned>English</mentioned> in a text:
<egXML xml:lang="en" xmlns="http://www.tei-c.org/ns/Examples">
English language. Except for not very<anchor xml:id="en1"/>
<!-- ... -->
English at all at the time<anchor xml:id="en2"/>
<!-- ... -->
English was still full of flaws<anchor xml:id="en3"/>
<!-- ... -->
English. This was revised by young<anchor xml:id="en4"/>
<!-- ... -->
<linkGrp type="five-word_collocates">
  <link type="collocates_of_ENGLISH" target="#en1 #en2 #en3 #en4"/>
  <!-- ... -->
</linkGrp>
</egXML></p></div>
  <div type="div3" xml:id="SACSAL">
    <head>Alignment of Parallel Texts</head>
    <p>One very important application area for the alignment of
    parallel texts is multilingual corpora. Consider, for
    example, the need to align <soCalled>translation
    pairs</soCalled> of sentences drawn from a corpus such as
    the Canadian Hansard, in which each sentence is given in
    both English and French. Concerning this problem, Gale and
    Church write: <q rend="display">Most English sentences match
    exactly one French sentence, but it is possible for an
    English sentence to match two or more French sentences. The
    first two English sentences [in the example below]
    illustrate a particularly hard case where two English
    sentences align to two French sentences. No smaller
    alignments are possible because the clause
    <q>...sales...were higher...</q> in the first English
    sentence corresponds to (part of) the second French
    sentence. The next two alignments ... illustrate the more
    typical case where one English sentence aligns with exactly
    one French sentence. The final alignment matches two English
    sentences to a single French sentence. These alignments
    [which were produced by a computer program] agreed with the
    results produced by a human judge.<note place="bottom">See
    <ptr type="cit" target="#SA-BIBL-1"/>, from which the example in the
    text is taken.</note></q></p>
    <p>The alignment produced by Gale and Church's program can
    be expressed in four different ways. The encoder must first
    decide whether to represent the alignment in terms of points
    within each text (using the <gi>anchor</gi> element) or in
    terms of whole stretches of text, using the <gi>seg</gi>
    element. To some extent the choice will depend on the
    process by which the software works out where alignment
    occurs, and the intention of the encoder. Secondly, the
    encoder may elect to represent the actual encoding using
    either <att>corresp</att> attributes attached to the
    individual <gi>anchor</gi> or <gi>seg</gi> elements, or
    using a free-standing <gi>linkGrp</gi> element.</p>
    <p>We present first a solution using <gi>anchor</gi>
    elements bearing only <att>corresp</att> attributes:
<egXML xml:lang="mul" xmlns="http://www.tei-c.org/ns/Examples" source="#SA-BIBL-1"><div xml:lang="en" type="subsection">
<p><anchor corresp="#fa1" xml:id="ea1"/>According to our survey, 1988
sales of mineral water and soft drinks were much higher than in 1987,
reflecting the growing popularity of these products. Cola drink
manufacturers in particular achieved above-average growth rates.
<anchor corresp="#fa2" xml:id="ea2"/>The higher turnover was largely
due to an increase in the sales volume.
<anchor corresp="#fa3" xml:id="ea3"/>Employment and investment levels also climbed.
<anchor corresp="#fa4" xml:id="ea4"/>Following a two-year transitional period,
the new Foodstuffs Ordinance for Mineral Water came into effect on
April 1, 1988. Specifically, it contains more stringent requirements
regarding quality consistency and purity guarantees.</p>
</div>
<div xml:lang="fr" type="subsection">
<p><anchor corresp="#ea1" xml:id="fa1"/>Quant aux eaux minérales
et aux limonades, elles rencontrent toujours plus d'adeptes. En effet,
notre sondage fait ressortir des ventes nettement supérieures
à celles de 1987, pour les boissons à base de cola
notamment. <anchor corresp="#ea2" xml:id="fa2"/>La progression des
chiffres d'affaires résulte en grande partie de l'accroissement
du volume des ventes. <anchor corresp="#ea3" xml:id="fa3"/>L'emploi et
les investissements ont également augmenté.
<anchor corresp="#ea4" xml:id="fa4"/>La nouvelle ordonnance fédérale
sur les denrées alimentaires concernant entre autres les eaux
minérales, entrée en vigueur le 1er avril 1988 après
une période transitoire de deux ans, exige surtout une plus
grande constance dans la qualité et une garantie de la
pureté.</p>
</div></egXML></p>
  <p>There is no requirement that the <att>corresp</att>
  attribute be specified in both English and French texts, since
  (as noted above) this attribute is defined as representing a
  mutual association. However, it may simplify processing to do
  so, and also avoids giving the impression that the English is
  translating the French, or vice versa. More seriously, this
  encoding does not make explicit that it is in fact
  the entire stretch of text between the anchors which is being
  aligned, not simply the points themselves. If for example one
  text contained material omitted from the other, this approach
  would not be appropriate.</p>
  <p>We now present the same passage using the alternative
  <gi>linkGrp</gi> mechanism and marking explicitly the segments
  which have been aligned:
<egXML xml:lang="mul" xmlns="http://www.tei-c.org/ns/Examples"><div xml:id="div-e" xml:lang="en" type="subsection">
  <p>
    <seg xml:id="e_1">According to our survey, 1988 sales of mineral
water and soft drinks were much higher than in 1987,
reflecting the growing popularity of these products. Cola
drink manufacturers in particular achieved above-average
growth rates.</seg>
    <seg xml:id="e_2">The higher turnover was largely due to an
increase in the sales volume.</seg>
    <seg xml:id="e_3">Employment and investment levels also climbed.</seg>
    <seg xml:id="e_4">Following a two-year transitional period, the new
Foodstuffs Ordinance for Mineral Water came into effect on
April 1, 1988. Specifically, it contains more stringent
requirements regarding quality consistency and purity
guarantees.</seg></p>
</div>
<div xml:id="div-f" xml:lang="fr" type="subsection">
  <p>
    <seg xml:id="f_1">Quant aux eaux minérales et aux limonades,
elles rencontrent toujours plus d'adeptes. En effet, notre
sondage fait ressortir des ventes nettement
supérieures à celles de 1987, pour les
boissons à base de cola notamment.</seg>
    <seg xml:id="f_2">La progression des chiffres d'affaires
résulte en grande partie de l'accroissement du volume
des ventes.</seg>
    <seg xml:id="f_3">L'emploi et les investissements ont
également augmenté.</seg>
    <seg xml:id="f_4">La nouvelle ordonnance fédérale sur
les denrées alimentaires concernant entre autres les
eaux minérales, entrée en vigueur le 1er avril
1988 après une période transitoire de deux
ans, exige surtout une plus grande constance dans la
qualité et une garantie de la pureté.</seg></p>
</div>
<linkGrp type="alignment" domains="#div-e #div-f">
  <link target="#e_1 #f_1"/>
  <link target="#e_2 #f_2"/>
  <link target="#e_3 #f_3"/>
  <link target="#e_4 #f_4"/>
</linkGrp></egXML></p>
<p>Note that use of the <gi>ab</gi> element allows us to mark up the
orthographic sentences in both languages independently of the alignment:
the first translation pair in this example might be marked up as
follows:
<egXML xml:lang="mul" xmlns="http://www.tei-c.org/ns/Examples"><div xml:id="english" xml:lang="en" type="subsection">
  <ab xml:id="english1">  
    <s>According to our survey, 1988 sales of mineral water and soft
drinks were much higher than in 1987, reflecting the growing popularity
of these products.</s>
    <s>Cola drink manufacturers in particular achieved above-average
growth rates.</s>
  </ab>
</div>
<div xml:id="french" xml:lang="fr" type="subsection">
  <ab xml:id="french1">  
    <s xml:id="fs1">Quant aux eaux minérales et aux limonades, elles
rencontrent toujours plus d'adeptes.</s>
    <s xml:id="fs2">En effet, notre sondage fait ressortir des ventes nettement
supérieures à celles de 1987, pour les boissons à
base de cola notamment.</s>
  </ab>
</div></egXML>
</p></div>
<div type="div3" xml:id="SACSXA">
<head>A Three-way Alignment</head>
<p>The preceding encoding of the alignment of parallel passages from
two texts requires that those texts and the alignment all be part of
the same document. If the texts are in separate documents, then
complete URIs, whether absolute or relative (section <ptr target="#SA"/>), will be required. These external pointers may appear
anywhere within the document, but if they are created solely for use
in encoding links, they may for convenience be grouped within the
<gi>linkGrp</gi> (or other grouping element that uses them for
linking).</p>
<p>To demonstrate this facility, we consider how we might encode the
alignments in an extract from Comenius' <title>Orbis Sensualium
Pictus</title>, in the
English translation of Charles Hoole (1659).
<!-- Whenever this <figure> is changed, copy-and-paste it into the
     example in SAUR above. -->
<figure xml:id="COMENIUS" rend="float fullpage">
<graphic url="Images/compic.png"/>
</figure>
Each topic covered in this work has three parts: a
picture, a prose text in Latin describing the topic, and a
carefully-aligned translation of the Latin into English, German, or some
other vernacular. Key terms in the two texts are typographically
distinct, and are linked to the picture by numbers, which appear in the
two texts and within the picture as well.</p>
<p>First, we consider the text portions. The English and Latin portions
have been encoded as distinct <gi>div</gi> elements. Identifiers have
been attached to each typographic line, but no other encoding added, to
simplify the example.
<!-- S tag changed to SEG -DTL  -->
<egXML xml:lang="mul" xmlns="http://www.tei-c.org/ns/Examples" source="#SA-BIBL-2"><div xml:id="e98" xml:lang="en" type="lesson">
   <head>The Study</head>
   <p>
<seg xml:id="e9801">The Study</seg>
<seg xml:id="e9802">is a place</seg>
<seg xml:id="e9803">where a Student,</seg>
<seg xml:id="e9804">a part from men,</seg>
<seg xml:id="e9805">sitteth alone,</seg>
<seg xml:id="e9806">addicted to his Studies,</seg>
<seg xml:id="e9807">whilst he readeth</seg>
<seg xml:id="e9808">Books,</seg></p>
</div>
<div xml:id="l98" xml:lang="la" type="lesson">
   <head>Muséum</head>
   <p>
<seg xml:id="l9801">Museum</seg>
<seg xml:id="l9802">est locus</seg>
<seg xml:id="l9803">ubi Studiosus,</seg>
<seg xml:id="l9804">secretus ab hominibus,</seg>
<seg xml:id="l9805">solus sedet,</seg>
<seg xml:id="l9806">Studiis deditus,</seg>
<seg xml:id="l9807">dum lectitat</seg>
<seg xml:id="l9808">Libros,</seg></p>
</div></egXML></p>
<!--<?tei winita?>
<note resp="#sdb" place="inline">Need to re-work this entire graphical example; probably DGD or CC.</note>-->
<p>Next we consider the non-textual parts of the page. Encoding this
requires providing two distinct components: firstly a digitized rendering of the
page itself, and secondly a representation of the areas within that image
which are to be aligned. In section <ptr target="#PHFAX"/> we present a
simple way of doing this using the TEI-defined markup for alignment of
facsimiles. In the present chapter we demonstrate a more powerful
means of aligning arbitrary polygons and points, which uses the XML notation SVG (see <ref target="#SVG-11">SVG</ref>). This provides appropriate facilities for both these
requirements:
<egXML xml:lang="und" xmlns="http://www.tei-c.org/ns/Examples" valid="false">
  <svg  xmlns="http://www.w3.org/2000/svg"
   xmlns:xlink="http://www.w3.org/1999/xlink">
    <image
     xlink:href="p1764.png"
     width="597"  height="897"
     id="p981" />
  <rect id="p982" x="75" y="75"  width="25" height="10"/>
  <rect id="p983" x="55" y="42"  width="25" height="10"/>
  </svg>
</egXML>
This example of SVG defines two rectangles 
at the locations with the specified x and y coordinates. A view is
defined on these, enabling them to be 
mapped by an SVG processor to the image found at the URL specified
(<ident type="file">p1764.png</ident>). It also defines unique identifiers for
the whole image, and the two views of it, which we will use within our
alignment, as shown next (for further discussion of the handling of
 images and graphics, see section <ptr target="#FTGRA"/>; for further
 discussion of using non-TEI XML vocabularies such as SVG within a TEI
 document, see section <ptr target="#ST-aliens"/>).
</p>
<p>As printed, the Comenius text exhibits three kinds of alignment.
 <list rend="numbered">
   <item n="1">The English and Latin portions are printed in two
   parallel columns, with corresponding phrases, (represented above by
   <gi>seg</gi> elements), more or less next to each other.</item>
   <item n="2">Particular words or phrases are marked as terms in the
   two languages by a change of rendition: the English text, which
   otherwise uses black letter type throughout, has the words
   <mentioned>The Study</mentioned>, <mentioned>a Student</mentioned>,
   <mentioned>Studies</mentioned>, and <mentioned>Books</mentioned> in
   a roman font; in the Latin text, which is printed in roman, the
   corresponding words (<mentioned>Museum</mentioned>,
   <mentioned>Studiosus</mentioned>, <mentioned>Studiis</mentioned>,
   and <mentioned>Libros</mentioned>) are all in italic.</item>
   <item n="3">Numbered labels appear within the text portions,
   linking keywords to each other and to sections of the picture.
   These labels, which have been left out of the above encoding, are
   attached to the first, third, and last segments in each language
   quoted below, and also appear (rather indistinctly) within the
   picture itself. Thus, the images of the study, the student, and his
   books are each aligned with the correct term for them in the two
   languages. 
   </item></list></p>
<p>The first kind of alignment might be represented by using the
<att>corresp</att> attribute on the <gi>seg</gi> element. The second
kind might be represented by using the <gi>gloss</gi> and <gi>term</gi>
mechanism described in section <ptr target="#COHQU"/>. The third kind of
alignment might be represented using pointers embedded within the
texts, for example:
<egXML xml:lang="en" xmlns="http://www.tei-c.org/ns/Examples">
<!--... -->
<seg xml:id="xe9803">where a <ref n="2" target="#xp982">Student</ref>,</seg>
<seg xml:id="xl9803">ubi <ref n="2" target="#xp982">Studiosus</ref>,</seg>
<!--... -->
</egXML>

We choose however to use
the <gi>link</gi> element, since this provides a more efficient way of
representing the three-way alignment between English, Latin, and picture
without redundancy.
  <egXML xml:lang="und" xmlns="http://www.tei-c.org/ns/Examples"><linkGrp type="alignment">
   <link target="#xe9801 #xl9801 #xp981"/>
   <link target="#xe9802 #xl9802"/>
   <link target="#xe9803 #xl9803 #xp982"/>
   <link target="#xe9804 #xl9804"/>
   <link target="#xe9805 #xl9805"/>
   <link target="#xe9806 #xl9806"/>
   <link target="#xe9807 #xl9807"/>
   <link target="#xe9808 #xl9808 #xp983"/>
</linkGrp></egXML></p>
<p>This map, of course, only aligns whole segments and image portions,
since these are the only parts of our encoding which bear identifiers
and can therefore be pointed to. To add to it the alignment between
the typographically distinct words mentioned above, new elements must
be defined, either within the text itself or externally by using stand
off techniques. Encoding these word pairs as <gi>term</gi>
and <gi>gloss</gi>, although intuitively obvious, requires a
non-trivial decision as to whether the Latin text is glossing the
English, or vice versa. Tagging all the marked words as <gi>term</gi>
avoids the difficult decision, but might be thought by some encoders
to convey the wrong information about the words in question. Simply
tagging them as additional embedded <gi>seg</gi> elements with
identifiers that can be aligned like the others is also a possibility.
</p>
<p>These solutions all require the addition of further markup to the text. This
may pose no problems, or it may be infeasible, for example because the text is
held on a read-only medium. If it is not feasible to add more markup
to the original text, some form of stand-off markup will be
needed. Any item within the text that can be pointed to using the
various pointer schemes discussed in this chapter may be used, not
simply those which rely on the existence of an <att>xml:id</att>
attribute. Suppose our example had been
more lightly tagged, as follows:
  <egXML xml:lang="mul" xmlns="http://www.tei-c.org/ns/Examples" source="#SA-BIBL-2">
    <div xml:id="E98" xml:lang="en" type="lesson">
      <head>The Study</head>
      <ab>The Study</ab>
      <ab>is a place</ab>
      <ab>where a Student,</ab>
    </div>
    <div xml:id="L98" xml:lang="la" type="lesson">
      <head>Muséum</head>
      <ab>Museum</ab>
      <ab>est locus</ab>
      <ab>ubi Studiosus,</ab>
    </div>
  </egXML></p>
<p>To express the same alignment mentioned above, we could use an
XPath expression to identify the required <gi>seg</gi> elements:
  <egXML xml:lang="und" xmlns="http://www.tei-c.org/ns/Examples">
    <linkGrp type="alignment">
      <link target="#xpath(//div[@xml:id='L98']/ab[1]) #xpath(//div[@xml:id='E98']/ab[1])"/>
      <link target="#xpath(//div[@xml:id='L98']/ab[2]) #xpath(//div[@xml:id='E98']/ab[2])"/>
    </linkGrp></egXML>
In the absence of any markup around individual substrings of
the element content, the string-range pointer scheme discussed in <ptr target="#SATSSR"/> may also be helpful: for example, to indicate that the words
<mentioned>Studies</mentioned> and <mentioned>Studiis</mentioned>
correspond, we might express the link between them as follows:
  <egXML xml:lang="und" xmlns="http://www.tei-c.org/ns/Examples">
<link target="#string-range(e9806,16,7) #string-range(l9806,0,7)"/></egXML></p>

</div></div>
<div type="div2" xml:id="SASY">
<head>Synchronization</head>
<p>In the previous section we discussed two particular kinds of
alignment: alignment of parallel texts in different languages; and
alignment of texts and portions of an image. In this section we address
another specialized form of alignment: synchronization. The need to
mark the relative positions of text components with respect to time
arises most naturally and frequently in transcribed spoken texts, but it
may arise in any text in which quoted speech occurs, or events are
described within a time frame. The methods described here are also
generalizable for other kinds of alignment (for example, alignment of
text elements with respect to space).</p>
<div type="div3" xml:id="SASYNC">
<head>Aligning Synchronous Events</head>

<p>Provided that explicit elements are available to represent the
parts or places to be synchronized, then the global linking attribute
<att>synch</att> may be used to encode such synchronization, once it
has been identified.
  <specList>
    <specDesc key="att.global.linking" atts="synch"/>
  </specList>
  This is another of the attributes made globally available by
  the mechanism described in the introduction to this chapter.
  Alternatively, the <gi>link</gi> and <gi>linkGrp</gi> elements
  may be used to make explicit the fact that the synchronous
  elements are aligned.</p>
<p>To illustrate the use of these mechanisms for marking synchrony,
consider the following representation of a spoken text:
<!-- From BNC, via LB, retranscribed without SGML markup -->
<eg xml:space="preserve"><![CDATA[B: The first time in twenty five years, we've cooked Christmas
   (unclear) for a blooming great load of people.
A: So you're [1] (unclear) [2]
B: [1] It will be [2] nice in a way, but, [3] be strange. [4]
A: [3] Yeah [4], yeah, cos it, it's [5] the [6]
B: [5] not [6]]]></eg></p>
<p>This representation uses numbers in brackets to mark the points at
which speakers overlap each other. For example, the <mentioned>[1]</mentioned>
in A's first speech is to be understood as coinciding with the
<mentioned>[1]</mentioned> in B's second speech.<note place="bottom">This sample is taken from
a conversation collected and transcribed for the British National
Corpus.</note></p>
<p>To encode this we use the  spoken texts module, described
in chapter <ptr target="#TS"/>, together with the module
described in the present chapter. First, we transcribe this text,
marking the synchronous points with <gi>anchor</gi> elements, and
providing a <att>synch</att> attribute on one of each of the pairs of
synchronous anchors. As noted in the example given above (section <ptr target="#SACSAL"/>), correspondence, and hence synchrony, is a
symmetric relation; therefore the attribute need only be specified on
one of the pairs of synchronous anchors.
<egXML xmlns="http://www.tei-c.org/ns/Examples" source="#SA-eg-02">    <div xml:id="BNC-d1" type="convers">
<u xml:id="u2b" who="#b">
  The first time in twenty five years,
  we've cooked Christmas <unclear> for a blooming great
  load of people.</unclear> </u>
<u xml:id="u3a" who="#a">So you're
  <anchor synch="#t1b" xml:id="t1a"/>
  <unclear> <anchor synch="#t2b" xml:id="t2a"/> </unclear> </u>
<u xml:id="u3b" who="#b"><anchor xml:id="t1b"/>It will be <anchor xml:id="t2b"/>
  nice in a way, but, <anchor xml:id="t3b"/>
  be strange.<anchor xml:id="t4b"/> </u>
<u xml:id="u4a" who="#a"><anchor synch="#t3b" xml:id="t3a"/>Yeah
  <anchor synch="#t4b" xml:id="t4a"/>, yeah, cos it, its
  <anchor synch="#t5b" xml:id="t5a"/>the
  <anchor synch="#t6b" xml:id="t6a"/> </u>
<u xml:id="u4b" who="#b"><anchor xml:id="t5b"/>not<anchor xml:id="t6b"/> </u>
<!-- ... --></div></egXML></p>
<p>We can encode this same example using <gi>link</gi> and
<gi>linkGrp</gi> elements to make the temporal alignment explicit. A <gi>back</gi>
element has been used to enclose the <gi>linkGrp</gi> element, but the links
may be located anywhere the encoder finds convenient:
<egXML xmlns="http://www.tei-c.org/ns/Examples"><back>
    <linkGrp xml:id="lg1" domains="#BNC-d1 #BNC-d1" targFunc="speaker.a speaker.b" type="synchronous_alignment">
<link xml:id="L1" target="#t1a #t1b"/>
<link xml:id="L2" target="#t2a #t2b"/>
<link xml:id="L3" target="#t3a #t3b"/>
<link xml:id="l4" target="#t4a #t4b"/>
<link xml:id="l5" target="#t5a #t5b"/>
<link xml:id="l6" target="#t6a #t6b"/>
    </linkGrp>
  </back></egXML>
The
<att>xml:id</att> attributes are provided for the <gi>link</gi> and
<gi>linkGrp</gi> elements here for reasons discussed  in the next
section, <ptr target="#SASYMP"/>. 
</p>
<p>As with other forms of alignment, synchronization may be expressed
between stretches of speech as well as between points. When complete
utterances are synchronous, for example, if one person says
<mentioned>What?</mentioned> and another <mentioned>No!</mentioned> at the same time,
that can be represented without <gi>anchor</gi> elements as follows.
<egXML xmlns="http://www.tei-c.org/ns/Examples"><u synch="#u02" xml:id="u01" who="#a">What?</u>
<u xml:id="u02" who="#b">No!</u></egXML></p>
<p>A simple way of expressing <term>overlap</term> (where one speaker
starts speaking before another has finished) is thus to use the
<gi>seg</gi> element to encode the overlapping portions of speech. For
example,
<egXML xmlns="http://www.tei-c.org/ns/Examples"><u who="#a"> So you're <unclear synch="#u-b1"/> </u>
<u who="#b"><seg xml:id="u-b1"> It will be </seg> nice in a way, but,
     <seg synch="#u-a3"> be strange. </seg> </u>
<u who="#a"><seg xml:id="u-a3"> Yeah </seg>, yeah, cos it,
     its <seg synch="#u-b2"> the </seg> </u>
<u xml:id="u-b2" who="#b"> not </u></egXML>
Note in this encoding how synchronization has been effected between an
empty <gi>unclear</gi> element and the content of a <gi>seg</gi> element, and between the
content of an
<gi>u</gi> element and that of another <gi>seg</gi>, using the <att>synch</att>
attribute. Alternatively, a <gi>linkGrp</gi> could be used in the same
way as above.</p></div>
<div type="div3" xml:id="SASYMP">
<head>Placing Synchronous Events in Time</head>
<p>A synchronous alignment specifies which points in a spoken text occur
at the same time, and the order in which they occur, but does not say at
what time those points actually occur. If that information is available
to the encoder it can be represented by means of the <gi>when</gi> and
<gi>timeline</gi> elements, whose description and attributes are the
following:
<specList><specDesc key="when" atts="absolute interval unit since"/><specDesc key="timeline" atts="origin interval unit"/></specList></p>
<p>Each <gi>when</gi> element indicates a point in time, either directly
by means of the <att>absolute</att> attribute, whose value is a string
which specifies a particular time, or indirectly by means of the
<att>since</att> attribute, which points to another <gi>when</gi>. If
the <att>since</att> is used, then the <att>interval</att> and
<att>unit</att> attributes should also be used to indicate the amount of
time that has elapsed since the time specified by the element pointed to
by the <att>since</att> attribute; the value <val>-1</val>
can be given to indicate that the interval is unknown.</p>
<p>If the <gi>when</gi> elements are uniformly spaced in time, then the
<att>interval</att> and <att>unit</att> values need be given once in the
<gi>timeline</gi>, and not repeated in any of the <gi>when</gi>
elements. If the intervals vary, but the units are all the same, then
the <att>unit</att> attribute alone can be given in the
<gi>timeline</gi> element, and the <att>interval</att> attribute given
in the <gi>when</gi> element.</p>
<p>The <att>origin</att> attribute in the <gi>timeline</gi> element
points to a <gi>when</gi> element which specifies the reference or
origin for the timings within the <gi>timeline</gi>; this must, of
course, specify its position in time absolutely. If the origin of a
timeline is unknown, then this attribute may be omitted.</p>
<p>The following <gi>timeline</gi> might be used to accompany the marked
up conversation shown in the preceding section:
<egXML xmlns="http://www.tei-c.org/ns/Examples"><timeline xml:id="tL1" origin="#w0" unit="ms">
   <when xml:id="w0" absolute="11:30:00"/>
   <when xml:id="w1" interval="unknown" since="#w0"/>
   <when xml:id="w2" interval="100" since="#w1"/>
   <when xml:id="w3" interval="200" since="#w2"/>
   <when xml:id="w4" interval="150" since="#w3"/>
   <when xml:id="w5" interval="250" since="#w4"/>
   <when xml:id="w6" interval="100" since="#w5"/>
</timeline></egXML>
The information in this <gi>timeline</gi> could now be linked to the
information in the <gi>linkGrp</gi> which provides the temporal
alignment (synchronization) for the text, as follows:
<egXML xmlns="http://www.tei-c.org/ns/Examples"><linkGrp type="temporal_specification" domains="#lg1 #tL1" targFunc="synch.points when">
   <link target="#L1 #w1"/>
   <link target="#L2 #w2"/>
   <link target="#L3 #w3"/>
   <link target="#l4 #w4"/>
   <link target="#l5 #w5"/>
   <link target="#l6 #w6"/>
</linkGrp></egXML></p>
<p>To avoid the need for two distinct link groups (one marking the
synchronization of anchors with each other, and the other marking their
alignment with points on the time line) it would be better to link the
<gi>when</gi> elements with the synchronous points directly:
<egXML xmlns="http://www.tei-c.org/ns/Examples"><linkGrp type="temporal_specification" domains="#BNC-d1 #BNC-d1 #tL1" targFunc="speaker.a speaker.b when">
   <link target="#t1a #t1b #w1"/>
   <link target="#t2a #t2b #w2"/>
   <link target="#t3a #t3b #w3"/>
   <link target="#t4a #t4b #w4"/>
   <link target="#t5a #t5b #w5"/>
   <link target="#t6a #t6b #w6"/>
</linkGrp></egXML></p>
<?tei winita ?>
<!-- We need to completely rewrite this para; I've started, but will
need help to get the SMIL stuff right -sdb -->
<p>Finally, suppose that a digitized audio recording is also
available, and an XML file that assigns identifiers to the various
temporal spans of sound is available. For example, the following
Synchronized Multimedia Integration Language (SMIL, pronounced
"smile") fragment:
<egXML xml:lang="und" xmlns="http://www.tei-c.org/ns/Examples" valid="false">
<audio xmlns="http://www.w3.org/2001/SMIL20/Language" src="rtsp://soundstage.pi.cnr.it:554/home/az/bncSound/xmas4lots.mp3" 
 xml:id="au1" begin="05.2s" />
<audio xmlns="http://www.w3.org/2001/SMIL20/Language" src="rtsp://soundstage.pi.cnr.it:554/home/az/bncSound/xmas4lots.mp3" 
 xml:id="au2" begin="05.7s" />
<audio xmlns="http://www.w3.org/2001/SMIL20/Language" src="rtsp://soundstage.pi.cnr.it:554/home/az/bncSound/xmas4lots.mp3" 
 xml:id="au3" begin="05.9s" />
<audio xmlns="http://www.w3.org/2001/SMIL20/Language" src="rtsp://soundstage.pi.cnr.it:554/home/az/bncSound/xmas4lots.mp3" 
 xml:id="au4" begin="06.3s" />
<audio xmlns="http://www.w3.org/2001/SMIL20/Language" src="rtsp://soundstage.pi.cnr.it:554/home/az/bncSound/xmas4lots.mp3" 
 xml:id="au5" begin="06.9s" />
<audio xmlns="http://www.w3.org/2001/SMIL20/Language" src="rtsp://soundstage.pi.cnr.it:554/home/az/bncSound/xmas4lots.mp3" 
 xml:id="au6" begin="07.4s" />
</egXML>
URIs pointing to the <gi scheme="SMIL">audio</gi> elements could also
be included as a fourth component in each of the above <gi>link</gi>
elements, thus providing a synchronized audio track to complement the
transcribed text.</p>
<p>For further discussion of this and related aspects of encoding
transcribed speech, refer to chapter <ptr target="#TS"/>.</p>
<specGrp xml:id="DSASYMP" n="Temporal specification">









<include xmlns="http://www.w3.org/2001/XInclude" href="../../Specs/when.xml"/>















<include xmlns="http://www.w3.org/2001/XInclude" href="../../Specs/timeline.xml"/>






</specGrp></div></div>
<div type="div2" xml:id="SAIE">
<head>Identical Elements and Virtual Copies</head>
<p>This section introduces the notion of a <term>virtual element</term>,
that is, an element which is not explicitly present in a text, but the
presence of which an application can infer from the encoding supplied.
In this section, we are concerned with virtual elements made by simply
cloning existing elements. In the next section (<ptr target="#SAAG"/>), we
discuss virtual elements made by aggregating existing elements. </p>
<p>Provided
  that explicit elements are available to represent the parts or
  places to be linked, then the global linking attributes
  <att>sameAs</att> and <att>copyOf</att> may be used to encode 
this kind of equivalence:
  <specList>
    <specDesc key="att.global.linking" atts="sameAs copyOf"/>
  </specList>
</p>
<p>It is useful to be able to represent the fact that one element of
text is identical to others, for analytical purposes, or (especially if
the elements have lengthy content) to obviate the need to repeat the
content. For example, consider the repetition of the <gi>date</gi>
element in the following material:
<egXML xmlns="http://www.tei-c.org/ns/Examples" source="#SA-eg-03"><p>In small clumsy letters he wrote:
<q rend="centered italic"><date xml:id="d840404">April 4th,
1984</date>.</q></p>
<p>He sat back. A sense of complete helplessness had
descended upon him. ...</p>
<p>His small but childish handwriting straggled up
and down the page, shedding first its capital letters
and finally even its full stops:
<q rend="italic"><date>April 4th, 1984</date>.
Last night to the flicks. ... </q></p></egXML>
Suppose now that we wish to encode the fact that the second
<gi>date</gi> element above has identical content to the first. The
<att>sameAs</att> attribute is provided for this purpose. Using it, we
could recode the last line of the above example as follows:
<egXML xmlns="http://www.tei-c.org/ns/Examples"><date sameAs="#d840404">April 4th,
1984</date>
Last night to the flicks ... </egXML></p>
<p>The <att>sameAs</att> attribute may be used to document the fact
that two elements have identical content. It may be regarded as a
special kind of link. It should only be attached to an element with
identical content to that which it targets, or to one the content of which
clearly designates it as a repetition, such as the word
<mentioned>repeat</mentioned> or <mentioned>bis</mentioned> in the
representation of the chorus of a song, the second time it is to be
sung. The relation specified by the <att>sameAs</att> attribute is
symmetric: if a chorus is repeated three times and each repetition
bears a <att>sameAs</att> attribute indicating the first occurrence of
the element concerned, it is implied that each chorus is identical,
and there is no need for the first occurrence to specify any of its
copies.</p>
<p>The <att>copyOf</att> attribute is used in a similar way to
indicate that the content of the element bearing it is identical to
that of another. The difference is that the content is not itself
repeated. The effect of this attribute is thus to create a
<term>virtual copy</term> of the element indicated. Using this
attribute, the repeated date in the first example above could be
recoded as follows:
  <egXML xml:lang="und" xmlns="http://www.tei-c.org/ns/Examples"><date rend="italic" copyOf="#d840404"/></egXML></p>
<p>An application program should replace whatever is the actual content
of an element bearing a <att>copyOf</att> attribute with the content of
the element specified by it. If the content of the element specified
includes other elements, these will become embedded within the element
bearing the attribute. Care must be taken to ensure that the document
is valid both before and after this embedding takes
place. If, for example, the element bearing a <att>copyOf</att>
attribute requires a mandatory sub-component, then this component must
be present (though possibly empty), even though it will be replaced by
the content of the targetted element.</p>
<p>The following example demonstrates how the <att>copyOf</att>
attribute may be used in conjunction with the <gi>seg</gi> element to
highlight the differences between almost identical repetitions:

<egXML xmlns="http://www.tei-c.org/ns/Examples" source="#COVE-eg-284"><sp><speaker>Mikado</speaker>
   <l>My <seg xml:id="Mik-L1s">object all sublime</seg></l>
   <l>I shall <seg xml:id="Mik-L2s">achieve in time</seg>—</l>
   <l xml:id="Mik-L3">To let <seg xml:id="L3s">the punishment fit the crime</seg>,</l>
   <l xml:id="Mik-l4"><seg copyOf="#Mik-L3s"/>;</l>
   <l xml:id="Mik-l5">And make each pris'ner pent</l>
   <l xml:id="Mik-l6">Unwillingly represent</l>
   <l xml:id="Mik-l7">A source <seg xml:id="Mik-l7s">of innocent merriment</seg>,</l>
   <l xml:id="Mik-l8"><seg copyOf="#Mik-l7s"/>!</l>
</sp>
<sp><speaker>Chorus</speaker>
   <l>His <seg copyOf="#Mik-L1s"/></l>
   <l>He will <seg copyOf="#Mik-L2s"/></l>
   <l copyOf="#Mik-L3"/>
   <l copyOf="#Mik-l4"/>
   <l copyOf="#Mik-l5"/>
   <l copyOf="#Mik-l6"/>
   <l copyOf="#Mik-l7"/>
   <l copyOf="#Mik-l8"/>
</sp></egXML>
</p>
<p>For further examples of the use of this attribute, see <ptr target="#SAAT"/> and <ptr target="#GDAT"/>.</p></div>
<div type="div2" xml:id="SAAG">
<head>Aggregation</head>
<p>Because of the strict hierarchical organization of elements, or for
other reasons, it may not always be possible or desirable to include
all the parts of a possibly fragmented text segment within a single
element. In section <ptr target="#SAPTIP"/> we introduced the notion of
an intermediate pointer as a way of pointing to discontinuous segments
of this kind. In this section we first describe another way of linking
the parts of a discontinuous whole, using a set of linking attributes,
which are made available for any tag by following the procedure
described at the beginning of this chapter. We then describe how the
<gi>link</gi> element may be used to aggregate such segments, and
finally introduce the <gi>join</gi> element, which is a
special-purpose linking element specifically for representing the
aggregation of parts, and the <gi>joinGrp</gi> for grouping
<gi>join</gi> elements.</p>
<p>The linking attributes for aggregation are <att>next</att> and
<att>prev</att>; each of these attributes has a single identifier as its
value:
<specList><specDesc key="att.global.linking" atts="next prev"/></specList></p>
<p>The <gi>join</gi> element is also a member of the class of <ident type="class">att.pointing</ident> elements, and so may carry any of the
attributes of that class; for the list, see section <ptr target="#SAPT"/>.</p>
<p>Here is the material on which we base our first illustration of the
use of these mechanisms. Our problem is to represent the s-units
identified below as <val>qs3</val> and <val>qs4</val> as a single (but discontinuous) whole:
<egXML xmlns="http://www.tei-c.org/ns/Examples" source="#SAAG-eg-72"><q><s xml:id="qs2">Monsieur Paul, after he has taken equal
parts of goose breast and the finest pork, and
broken a certain number of egg yolks into them,
and ground them <emph>very</emph>, very fine,
cooks all with seasoning for some three hours.</s>
   <s xml:id="qs3">
<emph>But</emph>,</s> </q>
<s xml:id="ps2">she pushed her face nearer, and looked with
   ferocious gloating at the pâté
   inside me, her eyes like X rays,</s>
<q>
   <s xml:id="qs4">he never stops stirring it!</s>
   <s xml:id="qs5">Figure to yourself the work of it —</s>
   <s xml:id="qs6">stir, stir, never stopping!</s>
</q></egXML>
<!-- M.F.K. Fisher, I was really very hungry, As They Were, p43  -->
</p>
<p>Using the <att>prev</att> and <att>next</att> attributes, we can
link the s-units with identifiers <val>qs3</val> and <val>qs4</val>, either singly or doubly as follows:
<eg xml:space="preserve"><![CDATA[  <s xml:id="qs3" next="#qs4"><emph>But</emph>,</s>
  <s xml:id="qs4">he never stops stirring it!</s>]]></eg>
<eg xml:space="preserve"><![CDATA[  <s xml:id="qs3"><emph>But</emph>,</s>
  <s xml:id="qs4" prev="#qs3">he never stops stirring it!</s>]]></eg>
<eg xml:space="preserve"><![CDATA[  <s xml:id="qs3" next="#qs4"><emph>But</emph>,</s>
  <s xml:id="qs4" prev="#qs3">he never stops stirring it!</s>]]></eg>
Double linking of the two s-units, as illustrated by the last of these
encodings, is equivalent to specifying a <gi>link</gi> element:
  <egXML xml:lang="und" xmlns="http://www.tei-c.org/ns/Examples"><link type="join" target="#qs3 #qs4"/></egXML></p>
<p>Such a <gi>link</gi> element must carry a <att>type</att>
attribute with a value of <val>join</val> to specify that the link is
to be understood as joining its targets into a single aggregate.</p>
<p>The <gi>join</gi> element is equivalent to a <gi>link</gi> element of
type <val>join</val>.
<!-- LB added following, possibly contentious, clarification -->
Unlike the <gi>link</gi> element, the <gi>join</gi> element can
additionally specify information about the virtual element which it
represents, by means of its <att>result</att> attribute. And finally,
unlike the <gi>link</gi> element, the position of a <gi>join</gi>
element within a text is significant: it must be supplied at a position
where the element indicated by its <att>result</att> attribute would be
contextually legal.
<specList><specDesc key="join" atts="result "/><specDesc key="joinGrp" atts="result"/></specList>
To conclude the above example, we now use a <gi>join</gi> element to
represent the virtual sentence formed by the aggregation of <val>s1</val> and <val>s2</val>:
  <egXML xml:lang="und" xmlns="http://www.tei-c.org/ns/Examples"><join target="#qs3 #qs4" result="s"/></egXML>
As a further example, consider the following list of authors' names.
The object of the <gi>join</gi> element here is to provide another
list, composed of those authors from the larger list who happen to
come from Heidelberg:
<egXML xmlns="http://www.tei-c.org/ns/Examples"><list>
<head>Authors</head>
   <item xml:id="a_uf">Figge, Udo </item>
   <item xml:id="a_ch">Heibach, Christiane </item>
   <item xml:id="a_gh">Heyer, Gerhard </item>
   <item xml:id="a_bp">Philipp, Bettina </item>
   <item xml:id="a_ms">Samiec, Monika </item>
   <item xml:id="a_ss">Schierholz, Stefan </item>
</list>
<join target="#a_ch #a_bp #a_ss" result="list">
<desc>Authors from Heidelberg</desc></join></egXML></p>
<p>The following example shows how <gi>join</gi> can be used to
reconstruct a text cited in fragments presented out of order. The poem
being remembered (an unusual translation of a well-known poem by Basho)
runs <q>When the old pond / gets a new frog, / it's a new pond.</q></p>
<egXML xmlns="http://www.tei-c.org/ns/Examples">
<sp><speaker>Hughie</speaker>
  <p>How does it go?
  <q>
    <l xml:id="frog-x1">da-da-da</l>
    <l xml:id="frog-L2">gets a new frog</l>
    <l>...</l>
  </q>
  </p>
</sp>
<sp><speaker>Louie</speaker>
  <p>
    <q>
<l xml:id="frog-L1">When the old pond</l>
<l>...</l>
    </q>
  </p>
</sp>
<sp><speaker>Dewey</speaker>
  <p>
    <q>... 
    <l xml:id="frog-L3">It's a new pond.</l>
    </q>
  </p>
  <join target="#frog-L1 #frog-L2 #frog-L3" result="lg" scope="root"/>
</sp>
</egXML>
<p>As with other forms of link, a grouping element <gi>joinGrp</gi>
is available for use when a number of <gi>join</gi> elements of the
same kind co-occur. This avoids the need to specify the
<att>result</att> attribute for each <gi>join</gi> if they are all of
the same type, and also allows us to restrict the domain within which
their target elements are to be found, in the same way as for
<gi>linkGrp</gi> elements (see <ptr target="#SAPTLG"/>). Like a
<gi>join</gi>, a <gi>joinGrp</gi> may appear only where the elements
represented by its contents are legal. Thus if we had created many
<gi>join</gi> tags of the sort just described, we could group them
together, and require that their components are all contained by an
element with the identifier <val>MFKFhungry</val> as
follows:
  <egXML xml:lang="und" xmlns="http://www.tei-c.org/ns/Examples"><joinGrp domains="#mfkfhungry #mfkfhungry" result="s">
   <join target="#qs3 #qs4"/>
   <join target="#qs5 #qs6"/>
</joinGrp>
</egXML>
</p>
<p>The <gi>join</gi> element is useful as a means of representing
non-hierarchic structures (as further discussed in chapter <ptr target="#NH"/>). It may also be used as a convenient way of representing a
variety of analytic units, like the <gi>span</gi> and <gi>interp</gi>
elements discussed in chapter <ptr target="#AI"/>. As an example, consider
the following famous Zen koan:
<q rend="display">
<p>Zui-Gan called out to himself every day, <q>Master.</q></p>
<p>Then he answered himself, <q>Yes, sir.</q></p>
<p>And then he added, <q>Become sober.</q></p>
<p>Again he answered, <q>Yes, sir.</q></p>
<p><q>And after that,</q> he continued, <q>do not be deceived by
others.</q></p>
<p><q>Yes, sir; yes, sir,</q> he replied.</p></q></p>
<p>Suppose now that we wish to represent an interpretation of the above
passage in which we distinguish between the various
<soCalled>voices</soCalled> adopted by Zui-Gan. In the
following encoding, the <att>who</att> attribute has been used for this
purpose; its value on each occasion supplies a pointer to the
<q>voice</q> to which each speech is attributed. (For convenience in
this example, we use simply the first occurrence of the names used for
each voice as the target for these pointers.) Note also that we add
<att>xml:id</att> attributes to each distinct speech fragment, which
we can then use to link the material
spoken by each voice:
<egXML xmlns="http://www.tei-c.org/ns/Examples" source="#SA-eg-04"><text xml:id="zuitxt">
   <body>
<p><name xml:id="zuigan">Zui-Gan</name> called out to himself every day,
   <q next="#zuiq2" xml:id="zuiq1" who="#zuigan"><name xml:id="master">Master</name>.</q></p>
<p>Then he answered himself,
   <q next="#zuiq4" xml:id="zuiq2" who="#zuigan">Yes, sir.</q></p>
<p>And then he added,
   <q next="#zuiq5" xml:id="zuiq3" who="#master">Become sober.</q></p>
<p>Again he answered,
   <q next="#zuiq7" xml:id="zuiq4" who="#zuigan">Yes, sir.</q></p>
<p><q next="#zuiq6" xml:id="zuiq5" who="#master">And after that,</q>
   he continued,
   <q xml:id="zuiq6" who="#master">do not be deceived by others.</q></p>
<p><q xml:id="zuiq7" who="#zuigan">Yes, sir; yes, sir,</q>
   he replied.</p>
   </body>
</text></egXML></p>
<p>However, by using the <gi>join</gi> element, we can directly
represent the complete speech attributed to each voice:
<egXML xmlns="http://www.tei-c.org/ns/Examples"><joinGrp result="q">
   <join target="#zuiq1 #zuiq2 #zuiq4 #zuiq7">
    <desc>what Zui-Gan said</desc></join>
   <join target="#zuiq3 #zuiq5 #zuiq6"> 
     <desc>what Master said</desc></join>
</joinGrp></egXML></p>
<p>Note the use of the <gi>desc</gi> child element within the two
<gi>join</gi>s making up the <gi>q</gi> element here. These enable us
to  document the speakers of the two virtual <gi>q</gi> elements
represented by the <gi>join</gi> elements; this is necessary because the
there is no  way of specifying the attributes
to be associated with a virtual element, in particular there is no way
to specify a
<att>who</att> value for them.</p>
<p>Suppose now that <att>xml:id</att> attributes, for whatever
reasons, are not available. Then <gi>ptr</gi> elements may be created
using any of the methods described in section <ptr target="#SATS"/>. 
The <att>xml:id</att> attributes of
<emph>these</emph> elements may now be specified by the
<att>target</att> attribute on the <gi>join</gi> elements.
<egXML xmlns="http://www.tei-c.org/ns/Examples" source="#SA-eg-04"><text>
 <body>
<!-- five div1 elements -->
<div1>
    <p>Zui-Gan called out to himself every day, <q>Master.</q></p>
    <p>Then he answered himself, <q>Yes, sir.</q></p>
    <p>And then he added, <q>Become sober.</q></p>
    <p>Again he answered, <q>Yes, sir.</q></p>
    <p><q>And after that,</q> he continued, <q>do not be deceived by others.</q></p>
    <p><q>Yes, sir; yes, sir,</q> he replied.</p>
    <ab type="aggregation">
<ptr xml:id="rzuiq1" target="./#xpath(//div1[6]/p[1]/q[1])"/>
<ptr xml:id="rzuiq2" target="./#xpath(//div1[6]/p[2]/q[1])"/>
<ptr xml:id="rzuiq3" target="./#xpath(//div1[6]/p[3]/q[1])"/>
<ptr xml:id="rzuiq4" target="./#xpath(//div1[6]/p[4]/q[1])"/>
<ptr xml:id="rzuiq5" target="./#xpath(//div1[6]/p[5]/q[1])"/>
<ptr xml:id="rzuiq6" target="./#xpath(//div1[6]/p[5]/q[2])"/>
<ptr xml:id="rzuiq7" target="./#xpath(//div1[6]/p[6]/q[1])"/>
<joinGrp evaluate="one" result="q">
  <join target="#rzuiq1 #rzuiq2 #rzuiq4 #rzuiq7">
   <desc>what Zui-Gan said</desc></join>
  <join target="#rzuiq3 #rzuiq5 #rzuiq6">
    <desc>what Master said</desc></join>
</joinGrp>
    </ab>
</div1>  </body>
</text></egXML></p>
<p> The extended pointer with identifier
<val>rzuiq2</val>, for example, may be read as <q>the first
<gi>q</gi> in the first <gi>p</gi>, within the sixth <gi>div1</gi>
element of the current document.</q></p>

<specGrp xml:id="DSAAG" n="Aggregation">
<include xmlns="http://www.w3.org/2001/XInclude" href="../../Specs/join.xml"/>
<include xmlns="http://www.w3.org/2001/XInclude" href="../../Specs/joinGrp.xml"/>
</specGrp></div>
<div type="div2" xml:id="SAAT">
<head>Alternation</head>
<p>This section proposes elements for the representation of alternation.
We say that two or more elements are in <term>exclusive
alternation</term> if any of those elements could be present in a text,
but one and only one of them is; in addition, we say that those elements
are <term>mutually exclusive</term>. We say that the elements are in
<term>inclusive alternation</term> if at least one (and possibly more)
of them is present. The elements that are in alternation may also be
called <term>alternants</term>.</p>
<p>The need to mark exclusive alternation arises frequently in text
encoding. A common situation is one in which it can be determined that
exactly one of several different words appears in a given location, but
it cannot be determined which one. One way to mark such an exclusive
alternation is to use the linking attribute <att>exclude</att>. Having
marked an exclusive alternation, it can sometimes later be determined
which of the alternants actually appears in the given location. To
preserve the fact that an alternation was posited, one can add the
linking attribute <att>select</att> to a tag which hierarchically
encompasses the alternants, which points to the one which actually
appears. To assign responsibility and degree of certainty to the
choice, one can use the <gi>certainty</gi> tag described in
chapter <ptr target="#CE"/>. Also see that chapter for further discussion of
certainty in general.</p>
<p>The <att>exclude</att> and <att>select</att> attributes may be used
with any element assuming that they have been declared following the
procedure discussed in the introduction to this chapter.
 <specList>
   <specDesc key="att.global.linking" atts="exclude select"/>
</specList></p>
<p>A more general way to mark alternation, encompassing both exclusive
and inclusive alternation, is to use the linking element <gi>alt</gi>.
The description and attributes of this tag and of the associated
grouping tag <gi>altGrp</gi> are as follows. These elements are also
members of the <ident type="class">att.pointing</ident> class and therefore
have all the attributes associated with that class.
<specList><specDesc key="alt" atts="weights"/><specDesc key="altGrp"/></specList></p>
<p>To take a simple hypothetical example, suppose in transcribing a
spoken text, we encounter an utterance that we can understand either as
<mentioned>We had fun at the beach today.</mentioned> or as <mentioned>We had sun at
the beach today.</mentioned> We can represent the exclusive alternation of
these two possibilities by means of the <att>exclude</att> attribute as
follows.
<egXML xmlns="http://www.tei-c.org/ns/Examples"><div type="interview">
  <u exclude="#we.sun1" xml:id="we.fun1">We had fun at the beach today.</u>
  <u exclude="#we.fun1" xml:id="we.sun1">We had sun at the beach today.</u>
</div></egXML></p>
<p>If it is then determined that the speaker said <mentioned>fun</mentioned>,
not <mentioned>sun</mentioned>, the encoder could amend the text by deleting the
alternant containing <mentioned>sun</mentioned> and the <att>exclude</att>
attribute on the remaining alternant. Alternatively, the encoder could
preserve the fact that there was uncertainty in the original
transcription by retaining the alternants, and assigning the
<val>we.fun</val> value to the <att>select</att> attribute value on the <gi>div</gi> element that
encompasses the alternants, as in:
<egXML xmlns="http://www.tei-c.org/ns/Examples"><div select="#we.fun2" type="interview">
   <u exclude="#we.sun2" xml:id="we.fun2">We had fun at the beach
   today.</u>
   <u exclude="#we.fun2" xml:id="we.sun2">We had sun at the beach today.</u>
</div></egXML></p>
<p>The above alternation (including the <att>select</att> attribute)
could be recoded by assigning the <att>exclude</att> attributes to tags
that enclose just the words or even the characters that are mutually
exclusive, as in:<note place="bottom">See section <ptr target="#AILC"/> for discussion of the
<gi>w</gi> and <gi>c</gi> tags that can be used in the following
examples instead of the <tag>seg type="word"</tag> and <tag>seg
type="character"</tag> tags.</note>
<egXML xmlns="http://www.tei-c.org/ns/Examples"><div type="interview">
  <u select="#fun3">We had
    <seg exclude="#sun3" xml:id="fun3" type="word">fun</seg>
    <seg exclude="#fun3" xml:id="sun3" type="word">sun</seg>
    at the beach today.</u>
</div></egXML>
<egXML xmlns="http://www.tei-c.org/ns/Examples"><div type="interview">
  <u>We had
    <seg select="#id-f" type="word">
<seg exclude="#id-s" xml:id="id-f" type="character">f</seg>
<seg exclude="#id-f" xml:id="id-s" type="character">s</seg>
un</seg>
    at the beach today.</u>
</div></egXML></p>
<p>Now suppose that the transcriber is uncertain whether the first word
in the utterance is <mentioned>We</mentioned> or <mentioned>Lee</mentioned>, but is
certain that if it is <mentioned>Lee</mentioned>, then the other uncertain word
is definitely <mentioned>fun</mentioned> and not <mentioned>sun</mentioned>. The three
utterances that are in mutual exclusion can be encoded as follows.
<egXML xmlns="http://www.tei-c.org/ns/Examples"><div type="interview">
  <!-- ... -->
  <u exclude="#we.sun4 #lee.fun4" xml:id="we.fun4">We had fun at the beach today.</u>
  <u exclude="#we.fun4 #lee.fun4" xml:id="we.sun4">We had sun at the beach today.</u>
  <u exclude="#we.fun4 #we.sun4" xml:id="lee.fun4">Lee had fun at the beach today.</u>
  <!-- ... -->
</div></egXML></p>
<p>The preceding example can also be encoded with <att>exclude</att>
attributes on the word segments <mentioned>We</mentioned>, <mentioned>Lee</mentioned>,
<mentioned>fun</mentioned>, and <mentioned>sun</mentioned>:
<egXML xmlns="http://www.tei-c.org/ns/Examples"><u>
  <seg exclude="#lee" xml:id="we" type="word">We</seg>
  <seg exclude="#we #sun" xml:id="lee" type="word">Lee</seg>
  had
  <seg exclude="#sun" xml:id="fun" type="word">fun</seg>
  <seg exclude="#fun #lee" xml:id="sun" type="word">sun</seg>
  at the beach today.</u></egXML></p>
<p>The value of the <att>select</att> attribute is defined as a list of
identifiers; hence it can also be used to
narrow down the range of alternants, as in:
<egXML xmlns="http://www.tei-c.org/ns/Examples"><div select="#we.fun5 #lee.fun5" type="interview">
  <u exclude="#we.sun5 #lee.fun5" xml:id="we.fun5">We had fun at the beach today.</u>
  <u exclude="#we.fun5 #lee.fun5" xml:id="we.sun5">We had sun at the beach today.</u>
  <u exclude="#we.fun5 #we.sun5" xml:id="lee.fun5">Lee had fun at the beach today.</u>
</div></egXML>
This is interpreted to mean that either the first or the third
<gi>u</gi> element tag appears, and is thus equivalent to just the alternation
of those two tags:
<egXML xmlns="http://www.tei-c.org/ns/Examples"><div type="interview">
  <u exclude="#lee.fun6" xml:id="we.fun6">We had fun at the beach
  today.</u>
  <u exclude="#we.fun6" xml:id="lee.fun6">Lee had fun at the beach today.</u>
</div></egXML></p>
<p>The <att>exclude</att> attribute can also be used in case there is
uncertainty about the tag that appears in a certain position. For
example, the occurrence of the word <mentioned>May</mentioned> in the s-unit
<mentioned>Let's go to May</mentioned> can be interpreted, in the absence of
other information, either as a person's name or as a date. The
uncertainty can be rendered as follows, using the <att>exclude</att>
attribute.
<egXML xmlns="http://www.tei-c.org/ns/Examples"><s>Let's go to
<name exclude="#mayn" xml:id="mayd">May</name>
   <date copyOf="#mayd" exclude="#mayd" xml:id="mayn"/>.</s></egXML></p>
<p>Note the use of the <att>copyOf</att> attribute discussed in
section <ptr target="#SAIE"/>; this avoids having to repeat the content of the
element whose correct tagging is in doubt.</p>
<p>The <att>copyOf</att> and the <att>exclude</att> attributes also
provide for a simple way of indicating uncertainty about exactly where a
particular element occurs in a document.<note place="bottom">An alternative way of
representing this problem is discussed in chapter <ptr target="#CE"/>.</note>

For example suppose that a particular <gi>div2</gi>
element appears either as the third and last of the <gi>div2</gi>
elements within the first <gi>div1</gi> element in the body of a
document, or as the first <gi>div2</gi> of the second <gi>div1</gi>.
One solution would be to record the <gi>div2</gi> in its entirety in the
first of these positions, and a virtual copy of it in the second, and
mark them as excluding each other as follows:
  <egXML xml:lang="und" xmlns="http://www.tei-c.org/ns/Examples"><body>
  <div1 xml:id="C1">
    <div2 xml:id="C1S3" exclude="#C2S1">
    </div2>
  </div1>
  <div1 xml:id="C2">
    <div2 xml:id="C2S1" copyOf="#C1S3" exclude="#C1S3"/>
  </div1>
</body></egXML>
In this case, the <att>select</att> attribute, if used, would appear on
the <gi>body</gi> element.</p>
<p>Mutual exclusion can also be expressed using a <gi>link</gi>; the
first example in this section can be recoded by removing the
<att>exclude</att> attributes from the <gi>u</gi> elements, and adding a
<gi>link</gi> element as follows:<note place="bottom">In this example, we have
placed the <gi>link</gi> next to the elements that represent the
alternants. It could also have been placed elsewhere in the document,
perhaps within a <gi>linkGrp</gi>. </note>
<egXML xmlns="http://www.tei-c.org/ns/Examples"><div type="interview">
  <u xml:id="we.had.fun">We had fun at the beach today.</u>
  <u xml:id="we.had.sun">We had sun at the beach today.</u>
  <link type="exclusiveAlternation" target="#we.had.fun #we.had.sun"/>
</div></egXML></p>
<p>Now we define the specialized linking element <gi>alt</gi>, making
it a member of the class <ident type="class">att.pointing</ident>, and
assigning it a <att>mode</att> attribute, which can have either of the
values <val>excl</val> (for exclusive) or <val>incl</val> (for
inclusive). Then the following equivalence holds:
<egXML xmlns="http://www.tei-c.org/ns/Examples"><alt  target="#a #b" mode="excl"/></egXML> = 
<egXML xmlns="http://www.tei-c.org/ns/Examples"><link target="#a #b" type="exclusive_alternation"/></egXML></p>
<p>The preceding <gi>link</gi> element may therefore be recoded as the
  following <gi>alt</gi> element. <egXML xml:lang="und" xmlns="http://www.tei-c.org/ns/Examples"><alt target="#we.had.fun #we.had.sun" mode="excl"/></egXML></p>
<p>Another attribute that is defined specifically for the <gi>alt</gi>
element is <att>weights</att>, which is to be used if one wishes to assign
<term>probabilistic weights</term> to the targets (alternants). Its
value is a list of numbers, corresponding to the targets, expressing the
probability that each target appears. <!--The <att>percent</att> attribute
is used to indicate whether the weights are stated as percentages
(<code>percent="Y"</code>, the default) or as the actual probabilities
(<code>percent="N"</code>)--> If the alternants are mutually exclusive, then
the weights must sum to 1.0.</p>
<p>Suppose in the preceding example that it is equiprobable whether
<mentioned>fun</mentioned> or <mentioned>sun</mentioned> appears. Then
the <gi>alt</gi> element that represents the alternation may be stated
as follows: <egXML xml:lang="und" xmlns="http://www.tei-c.org/ns/Examples"><alt target="#we.fun #we.had.sun" mode="excl" weights="0.5 0.5"/></egXML></p>
<p>The assignment of a weight of 1.0 to one target (and weights of 0
to all the others) is equivalent to selecting that target. Thus the
following encoding is equivalent to the second example at the beginning
of this section.
<egXML xmlns="http://www.tei-c.org/ns/Examples"><u xml:id="we.fun">We had fun at the beach today.</u>
<u xml:id="we.sun">We had sun at the beach today.</u>
<alt target="#we.fun #we.sun" mode="excl" weights="1 0"/>
</egXML>
<!-- 
<p>Inclusive alternation can only be expressed by the    -->
  <!-- use of a linking element, either a <tag>link -->
  <!-- type='inclusive alternation'</tag> tag, or an <tag>alt   -->
  <!-- excl=N</tag> tag.  For example, suppose in the example   -->
  <!-- concerning the <tag>div2</tag> tag that can appear in    -->
  <!-- two different places, that it is also possible that it   -->
  <!-- appears in both plases.  This situation can be encoded   -->
  <!-- as follows.    -->
  <!-- 
<eg><![CDATA[-->
  <!-- <body> -->
  <!-- 
<div1 xml:id='C1'> -->
  <!-- ... -->
  <!-- 
<div2 xml:id='C1S3'> -->
  <!-- Text of the "copyable" div2 appears here. -->
  <!-- </div2> -->
  <!-- 
<div1 xml:id='C2'> -->
  <!-- 
<div2 xml:id='C2S1' copyOf='c1s3'></div2> -->
  <!-- ... -->
  <!-- </div1> -->
  <!-- ... -->
  <!-- </body> -->
  <!-- ... -->
  <!-- <alt excl='N' targType='div2 div2' targets='c1s3 c2s1'> -->
  <!-- ]]> -->
  <!-- </eg> -->
  <!-- 
<p>For inclusive alternation, the <att>weights</att>     -->
  <!-- attribute works as follows:  each weight states the-->
  <!-- probability that the target appears given that one or    -->
  <!-- more of the other targets appear.  For the current -->
  <!-- example, suppose that the probability that the     -->
  <!-- <tag>div2</tag> appears in the first position given-->
  <!-- that it appears in the second position is 20%, but that  -->
  <!-- the probability that it appears in the second position   -->
  <!-- given that it appears in the first position is 0%. -->
  <!-- Then the preceding <tag>alt</tag> tag can be revised to  -->
  <!-- read:    -->
  <!-- 
<eg><![CDATA[-->
  <!-- <alt excl='N' targType='div2 div2' targets='c1s3 c2s1' weights='20 0'> -->
  <!-- ]]> -->
  <!-- </eg> -->
The sum of the weights for <tag>alt mode="incl"</tag> tags ranges from 0% to
(100 × <code>k</code>)%, where <code>k</code> is the number of targets. If the sum is 0%, then
the alternation is equivalent to exclusive alternation; if the sum is
(100 x k)%, then all of the alternants must appear, and the situation is
better encoded without an <gi>alt</gi> tag.</p>

<p>If it is desired, <gi>alt</gi> elements may be grouped together in
an <gi>altGrp</gi> element, and attribute values shared by the
individual <gi>alt</gi> elements may be identified on the
<gi>altGrp</gi> element. The <att>targFunc</att> attribute defaults to
the value <val>first.alternant next.alternant</val>. <!-- Thus, specifying
the value <val>2</val> for the <att>extendTarg</att> attribute permits
the alternants to be extended indefinitely.--></p>

<p>To illustrate, consider again the example of a transcribed
utterance, in which it is uncertain whether the first word is
<mentioned>We</mentioned> or <mentioned>Lee</mentioned>, whether the
third word is <mentioned>fun</mentioned> or
<mentioned>sun</mentioned>, but that if the first word is
<mentioned>Lee</mentioned>, then the third word is
<mentioned>fun</mentioned>. Now suppose we have the following
additional information: if <mentioned>we</mentioned> occurs, then the
probability that <mentioned>fun</mentioned> occurs is 50% and that
<mentioned>sun</mentioned> occurs is 50%; if
<mentioned>fun</mentioned> occurs, then the probability that
<mentioned>we</mentioned> occurs is 40% and that
<mentioned>Lee</mentioned> occurs is 60%. This situation can be
encoded as follows.

<egXML xmlns="http://www.tei-c.org/ns/Examples"><u>
  <seg exclude="#lee2" xml:id="we2" type="word">We</seg>
  <seg exclude="#we2" xml:id="lee2" type="word">Lee</seg>
  had
  <seg exclude="#sun2" xml:id="fun2" type="word">fun</seg>
  <seg exclude="#fun2" xml:id="sun2" type="word">sun</seg>
  at the beach today.</u>
<altGrp>
   <alt target="#we2 #lee2"/>
   <alt target="#fun2 #sun2"/>
   <alt target="#we2 #fun2" mode="incl" weights="0.5 0.5"/>
   <alt target="#lee2 #fun2" mode="incl" weights="1.0 0.6"/>
</altGrp></egXML>
As noted above, when the <att>mode</att> attribute on an
<gi>alt</gi> has the value <val>incl</val>, then each weight states
the probability that the corresponding alternative occurs,  given that at least one of the other alternatives occurs.
</p>
<p>From the information in this encoding, we can determine that the
probability is about 28.5% that the utterance is <q>We had fun at the
beach today</q>, 28.5% that it is <mentioned>We had sun at the beach
today</mentioned>, and 43% that it is <mentioned>Lee had fun at the beach
today</mentioned>.</p>
<p>Another very similar example is the following regarding the text of a
Broadway song. In three different versions of the song, the same line
reads <q>Her skin is tender as a leather glove</q>,  <q>Her skin is
tender as a baseball glove</q>, and <q>Her skin is tender as Dimaggio's
glove.</q><note place="bottom">The variant readings are found in the commercial sheet
music, the performance score, and the Broadway cast recording.</note></p>
<p>If we wish to express this textual variation using the <gi>alt</gi>
element, we can record our relative confidence in the readings
<mentioned>Dimaggio's</mentioned> (with probability 50%), <mentioned>a
leather</mentioned> (25%), and <mentioned>a baseball</mentioned> (25%).</p>
<p>Let us extend the example with a further (imaginary) variation,
supposing for the sake of the argument that the next line is variously
given as <mentioned>and she bats from right to left</mentioned> (with
probability 50%) or <mentioned>now ain't that too damn bad</mentioned> (with
probability 50%). Using the <gi>alt</gi> element, we can express the
conviction that if the first choice for the second line is correct, then
the probability that the first line contains <mentioned>Dimaggio's</mentioned>
is 90%, and each of the others 5%; whereas if the second choice for the
second line is correct, then the probability that the first line
contains <mentioned>Dimaggio's</mentioned> is 10%, and each of the others is
45%. This can be encoded, with an <gi>altGrp</gi> element containing a
combination of exclusive and inclusive <gi>alt</gi> elements, as follows.
<!-- Bloody Mary, from Rogers & Hammerstein, South Pacific -->
<egXML xmlns="http://www.tei-c.org/ns/Examples"><div xml:id="bm" type="song">
  <l>Her skin is tender as
    <seg xml:id="dm">Dimaggio's</seg>
    <seg xml:id="lt">a leather</seg>
    <seg xml:id="bb">a baseball</seg>
    glove,</l>
  <l xml:id="rl">and she bats from right to left.</l>
  <l xml:id="db">now ain't that too damn bad.</l>
</div>
  <altGrp>
    <alt target="#dm #lt #bb" mode="excl" weights="0.5 0.25 0.25"/>
    <alt target="#rl #db" mode="excl" weights="0.50 0.50"/>
  </altGrp>
  <altGrp mode="incl">
    <alt target="#dm #rl" weights="0.90 0.90"/>
    <alt target="#lt #rl" weights="0.5 0.5"/>
    <alt target="#bb #rl" weights="0.5 0.5"/>
    <alt target="#dm #db" weights="0.10 0.10"/>
    <alt target="#lt #db" weights="0.45 0.90"/>
    <alt target="#bb #db" weights="0.45 0.90"/>
  </altGrp></egXML></p>
<specGrp xml:id="DSAAT" n="Alternation">
<include xmlns="http://www.w3.org/2001/XInclude" href="../../Specs/alt.xml"/>
<include xmlns="http://www.w3.org/2001/XInclude" href="../../Specs/altGrp.xml"/>
</specGrp></div>

<div type="div2" xml:id="SASO">
<head>Stand-off Markup</head>
<div type="div3" xml:id="SASOin">
	<head>Introduction</head>

    <p>Most of the mechanisms defined in this chapter rely to a
    greater or lesser extent on the fact that tags in a marked-up
    document can both assert a property for a span of text which they
    enclose, and assert the existence of an association between
    themselves and some other span of text elsewhere. In stand-off
    markup, there is a clear separation of these two behaviours: the
    markup does not directly contain any part of the text, but
    instead includes it by reference. One specific mechanism
    recommended by these Guidelines for this purpose is the standard
    XInclude mechanism defined by the W3C; another is to use pointers
    as demonstrated elsewhere in this chapter. </p>

	<p>There are many reasons for using stand-off markup: the source
	  text might be read-only so that additional markup cannot be added,
    or a single text may need to be marked up
    according to several hierarchically incompatible schemes, or a single
    scheme may need to accommodate multiple hierarchical ambiguities, so that
    a single markup tree is not the most faithful representation of the
    source material.</p>
	<p>This section describes a generic mechanism for expressing
	  <emph>all</emph> kinds of markup externally as stand-off tags, for use
    whenever it is appropriate.</p>
<!-- this list uses technical terms defined after it appears: it
should be moved -->
	<p>Throughout this section the following terms will be systematically used in
	specific senses.
<list type="gloss">
	<label>
<term>source document</term>
	</label>
	<item>a document to which the stand-off markup refers (a source document can be either XML or
plain text); there may be more than one source document.</item>
	<label>
<term>internal markup</term>
	</label>
	<item>markup that is already present in an XML source document</item>
	<label>
<term>stand-off markup</term>
	</label>
	<item>markup that is either outside of the source document and points in to it to the data it
describes, or alternatively is in another part of the source document and points elsewhere
within the document to the data it describes</item>
	<label>
<term>external document</term>
	</label>
	<item>a document that contains stand-off markup that points to a different, source document</item>
	<label>
<term>internalize</term>
	</label>
	<item>the action of creating a new XML document with external markup and data integrated with the
source document data, and possibly some source document markup as well</item>
	<label>
<term>externalize</term>
	</label>
	<item>a process applied to markup from a pre-existing XML document, which splits it into two
documents, an XML (external) document containing some of the markup of the original document,
and another (source) XML document containing whatever text content and markup has not been
extracted into the stand-off document; if all markup has been externalized from a document, the
new source may be a plain text document</item>
</list>
	</p>
	<p>The three major requirements satisfied by this scheme for stand-off markup are:
	  <list rend="numbered">
	    <item n="a">any valid TEI markup can be either internal or
	external,</item>
	    <item n="b">external markup can be internalized by
	applying it to the document content by either
	substituting the existing markup or adding to it,
	to form a valid TEI document, and</item>
	    <item n="c">the external markup itself specifies whether
	an internalized document is to be created by substituting
	the existing internal markup or by adding to
	it</item></list>.</p>
  
<!-- need simple example here -->
   
</div>
<div type="div3" xml:id="SASOov">
<head>Overview of XInclude </head>


   <p>Stand-off markup which relies on the inclusion of virtual
   content is adequately supported by the W3C XInclude recommendation,
   which is also recommended for use by these Guidelines.<note
   place="bottom">The version on which this text is based is the <ref
   target="http://www.w3.org/TR/2004/REC-xinclude-20041220/">W3C
   Recommendation dated <date when="2004-12-20">20 December
   2004</date>.</ref>.</note> XInclude defines a namespace
   (<mentioned>http://www.w3.org/2001/XInclude</mentioned>), which in
   these Guidelines will be associated with the prefix
   <mentioned>xi:</mentioned>, and exactly two elements,
   <gi>xi:include</gi> and <gi>xi:fallback</gi>. XInclude relies on
   the <ref target="http://www.w3.org/TR/xptr-framework/">XPointer
   framework</ref> discussed elsewhere in this chapter to point to the
   actual fragments of text to be internalized. Although XInclude only
   requires support for the <ref
   target="http://www.w3.org/TR/xptr-element/"><code>element()</code></ref>
   scheme of XPointer, these Guidelines permit the use of any of the
   pointing schemes discussed in section <ptr target="#SAXP"/>.</p>
 
	<p>XInclude is a W3C recommendation which specifies a syntax for the
	  inclusion within an XML document of data fragments placed in
	  different resources. Included resources can be either plain
	  text or XML. XInclude instructions within an XML document
	  are meant to be replaced by a resource targetted by a
	  URI, possibly augmented by an XPointer that identifies the
	  exact subresource to be included. </p>

	  <p>The <gi>xi:include</gi> element uses the <att
	  scheme="XI">href</att> attribute to specify the location of
	  the resource to be included; its value is an URI containing,
	  if necessary, an XPointer. Additionally, it uses the <att
	  scheme="XI">parse</att> attribute (whose only valid values
	  are <val>text</val> and <val>xml</val>) to specify whether
	  the included content is plain text or an XML fragment, and
	  the <att>encoding</att> attribute to provide a hint, when
	  the included fragment is text, of the character encoding of
	  the fragment. An optional <gi>xi:fallback</gi> element is
	  also permitted within an <gi>xi:include</gi>; it specifies
	  alternative content to be used when the external resource
	  cannot be fetched for some reason. Its use is not however
	  recommended for stand-off markup.</p>

  </div>
<div type="div3" xml:id="SASOso">

	<head>Stand-off Markup in TEI</head>

     <p>The operations of internalizing and externalizing markup are
     very useful and practically important. XInclude processing as
     defined by the W3C <emph>is</emph> internalization of one or more
     source documents' content into a stand-off document. TEI use of
     XInclude for stand-off markup enables use of XInclude-conformant
     software to perform this useful operation. However,
     internalization is not clearly defined for all stand-off files,
     because the structure of the internal and external markup trees
     may overlap. In particular, when an external markup document
     selects a range that overlaps partial elements in the source
     document, it is not clear how the semantics of internalization
     (inclusion) should work, since partial elements are not XML
     objects.<note place="bottom">This corresponds to the observation
     that overlapping XML tags reflecting a textual version of such an
     inclusion would not even be well-formed XML. This kind of overlap
     in textual phenomena of interest is in fact the major reason that
     stand-off markup is needed.</note> XInclude defines a semantics
     for this case that involves only complete elements.</p>

<p>When a range selection partially overlaps a number of elements in a
source document, XInclude specifies that the partially overlapping
elements should be included as well as all completely overlapping
elements and characters (partially overlapping characters are not
possible). The effect of this is that elements that straddle the start
or end of a selected range will be included as wrappers for those of
their children that are completely or partially selected by the
range. For example, given the following source document:
 
<egXML xml:lang="en" xmlns="http://www.tei-c.org/ns/Examples">
  <body>
  <p xml:id="par1">home, <emph>home</emph> on Brokeback Mountain.</p>
  <p xml:id="par2">That was the <emph>song</emph> that I sang</p>
  </body>
</egXML>
  and the following external document:
<eg xml:space="preserve"><![CDATA[
  <body>
     <div><include href="example1.xml" xmlns="http://www.w3.org/2001/XInclude"
xpointer="range(xpath(id('par1')//emph),xpath(id('par2')//emph))"/>
     </div>
 </body>   
]]></eg>
  the resulting document after XInclude processing of this external document
  would be:
<eg xml:space="preserve"><![CDATA[
   <body>
   <div>
     <p xml:id="par1">home, <emph>home</emph> on Brokeback Mountain.</p>
     <p xml:id="par2">That was the <emph>song</emph> that I sang</p>
   </div>
   </body>
]]></eg>
  The result of the inclusion is two paragraph elements, while
  the original range designated in the source document
  overlapped two paragraph fragments. 
<!-- what if it were a TEI join -->

The semantics of XInclude require the creation of well-formed XML results even though
  the  pointing mechanisms it uses do not necessarily
  respect the hierarchical structure of XML documents, as in
	this case. While
  this is a good way to ensure that internalization is always
  possible, it has implications for the use of XInclude as a
  notation for the <emph>description</emph> of overlapping
  markup structures.
  </p>
  
  <p><!--Overlapping markup cannot be represented by a source
  document combined with an external document using XInclude to
  impose elements that overlap those in the original
  document. -->When overlapping hierarchies need to be represented
  for a single document, each hierarchy must be represented by a
  separate set of XInclude tags pointing to a common source
  document. This sort of structure corresponds to common
  practice in work with linguistic text corpora. In such corpora, each
  potentially overlapping hierarchy of elements for the text is
  represented as a separate stream of stand-off
  markup. Generally the source text contains markup for
 the smallest significant units of analysis in the corpus,
  such as words or morphemes, this information and its markup
  representing a layer of common information that is shared by
  all the various hierarchies. As a way of organizing the
  representation of complex data, this technique generally
  allows a large number of <att>xml:id</att> attributes to be
  attached to the shared elements, providing robust anchors for
  links and facilitating adjustments to the source document
  without breaking external documents that reference it.
  </p>
	<!-- example please -->

  <p>Any tag can be externalized by
<!-- this example is too simple: move it earlier -->
	  removing its content and replacing it with an 
	  <gi>xi:include</gi> element that contains an XPointer
	  pointing to the desired content.</p>
	<p>For instance the following portion of a TEI document:
<egXML xmlns="http://www.tei-c.org/ns/Examples" source="#VEST-eg-1"><text>
   <body>
<head>1755</head>
<l>To make a prairie it takes a clover and one bee,</l>
<l>One clover, and a bee,</l>
<l>And revery.</l>
<l>The revery alone will do,</l>
<l>If bees are few.</l>
   </body>
</text></egXML>
	  can be externalized by placing the actual text in a separate
	  document, and providing exactly the same markup with the
	  <gi>xi:include</gi> elements:
	  <lb/>	<!-- inappropriate kludge -->
	 <label rend="it">Source.xml</label>
	<eg xml:space="preserve"><![CDATA[<content>To make a prairie it takes a clover and one bee,\n
One clover, and a bee,\n
And revery.\n
The revery alone will do,\n
If bees are few.\n
</content>]]></eg>
	  <lb/>	<!-- inappropriate kludge -->
	  <label rend="it">External.xml</label>
	<eg xml:space="preserve"><![CDATA[<text xmlns:xi="http://www.w3.org/2001/XInclude">
 <body>
  <head>1755</head>
   <l>
    <xi:include href="Source.xml" parse="xml"
 xpointer="string-range(element(/1),  0, 48)"/>
   </l>
   <l>
    <xi:include href="Source.xml" parse="xml"
 xpointer="string-range(element(/1), 49, 71)"/>
   </l>
   <l>
    <xi:include href="Source.xml" parse="xml"
 xpointer="string-range(element(/1), 72, 83)"/>
   </l>
   <l>
    <xi:include href="Source.xml" parse="xml"
 xpointer="string-range(element(/1), 84,109)"/>
   </l>
   <l>
    <xi:include href="Source.xml" parse="xml"
 xpointer="string-range(element(/1),110,126)"/>
   </l>
 </body>
</text>]]></eg></p>

<!-- better eg might be to use <lb/>? -->

	<p>Please note that this specification requires that the
	XInclude namespace declaration is present in all cases. The
	<gi>xi:fallback</gi> element contains text or XML fragments to
	be placed in the document if the inclusion fails for any
	reason (for instance due to inaccessibility of an external
	resource). The <gi>xi:fallback</gi> element is optional; if it
	is not present an XInclude processor must signal a fatal error
	when a resource is not found. This is the preferred behaviour
	for use with stand-off markup. These Guidelines recommend
	against the use of <gi>xi:fallback</gi> for stand-off
	markup.</p>
</div>
<div type="div3" xml:id="SASOva">
	<head>Well-formedness and Validity of Stand-off Markup</head>
	<p>The whole source fragment
	  identified by an XInclude element, as well as any markup
	  therein contained is inserted in the position specified, and
	  an XInclude processor is required to ensure that the resulting
	  internalized document is
	  well-formed. This has obvious implications when the external
	  document contains XML markup. A plain text source document
	  will always create a well-formed
	  internalized document. </p>
	<p>While a TEI customization may permit
 <gi>xi:include</gi> elements in various places in a TEI document
 instance, in general these Guidelines suggest that validity be
 verified after the resolution of all the <gi>xi:include</gi>
 elements.</p>
</div>
<div type="div3" xml:id="SASOfr">
	<head>Including Text or XML Fragments</head>
	<p>When the source text is plain text the overall form of the
	  XPointer pointing to it is of minimal importance. The form
	  of the XPointer matters considerably, on the other hand,
	  when the source document is XML.</p>
	<p>In this case, it is rather important to distinguish whether
	  we intend to substitute the source XML with the new one, or
	  just to add new markup to it. The XPointers used in the
	  references can express both cases.</p>
	<p>A simple way is to make sure to select only textual data in
	  the XPointer. For instance, given the following
	  document:
	  <lb/>	<!-- inappropriate kludge -->
	  <label rend="it">Source.xhtml</label>
<egXML xmlns="http://www.tei-c.org/ns/Examples">
<html xmlns="http://www.w3.org/1999/xhtml">
    <body>
  <div>To make a prairie it takes a <a href="clover.gif">clover</a>
    and one <a href="bee.gif">bee</a>,</div>
  <div>One <a href="clover.gif">clover</a>, and
    a <a href="bee.gif">bee</a>,</div>
  <div>And revery.</div>
  <div>The revery alone will do,</div>
  <div>If bees are few.</div>
    </body>
</html>
</egXML>	  
the expression
<code>range(element(/1/2/1.0),element(/1/2/11.1))</code> will select
the whole poem, text content <emph>and</emph> <gi>div</gi> elements
<emph>and</emph> hypertext links (NB: in XPointer whitespace-only text
nodes count).</p>
<p>On the contrary, the expressions
<code>xpointer(//text()/range-to(.))</code> and
<code>xpointer(string-range(//text(),"To")/range-to(//text(),"few.")</code>
will only select the text of the poem, with no markup inside.</p>
	<p>Thus, the following could be a valid stand-off document for
	  the <title>Source.xhtml</title> document:
	  <lb/>	<!-- inappropriate kludge -->
	  <label rend="it">External2.xml</label>
	  <eg xml:space="preserve"><![CDATA[<text xmlns:xi="http://www.w3.org/2001/XInclude">
 <body>
  <head>1755</head>
  <l>
   <xi:include href="Source.xhtml"
 xpointer='xpointer(string-range(//div[1]/text(),"To")/range-to(//div[1]/text(),"bee,")'/>
  </l>
  <l>
   <xi:include href="Source.xhtml"
 xpointer='xpointer(string-range(//div[2]/text(),"One")/range-to(//div[2]/text(),"bee,")'/>
  </l>
  <l>
   <xi:include href="Source.xhtml"
 xpointer='xpointer(string-range(//div[3]/text(),"And")/range-to(//div[3]/text(),".")'/>
  </l>
  <l>
   <xi:include href="Source.xhtml"
 xpointer='xpointer(string-range(//div[4]/text(),"The")/range-to(//div[4]/text(),",")'/>
  </l>
  <l>
   <xi:include href="Source.xhtml"
 xpointer='xpointer(string-range(//div[5]/text(),"If")/range-to(//div[5]/text(),".")'/>
  </l>
 </body>
</text>]]></eg></p>
</div>
</div>

<div type="div2" xml:id="SAAN">
<head>Connecting Analytic and Textual Markup</head>
<p>In chapters <ptr target="#AI"/> and <ptr target="#FS"/> and elsewhere,
provision is made for analytic and interpretive markup to be represented
outside of textual markup, either in the same document or in a different
document. The elements in these separate domains can be connected,
either with the pointing attributes <att>ana</att> (for
<mentioned>analysis</mentioned>) and <att>inst</att> (for
<mentioned>instance</mentioned>), or by means of <gi>link</gi> and
<gi>linkGrp</gi> elements. Numerous examples are given in these
chapters<!-- , particularly in sections <ptr target="#AILA"/>, <ptr target="#FSFL"/> and <ptr target="#FSIL"/>-->.</p>
</div>
<div type="div2" xml:id="SAref">
<head>Module for Linking, Segmentation, and Alignment</head>
<p>The module described in this chapter makes available the following
components:
  <moduleSpec xml:id="DSA" ident="linking">
    <altIdent type="FPI">Linking, Segmentation, and Alignment</altIdent>
    <desc>Linking, segmentation and alignment</desc>
    <desc xml:lang="fr">Liens, segmentation et alignement</desc>
    <desc xml:lang="zh-TW">Linking, segmentation and alignment連結、分割與隊列</desc>
  <desc xml:lang="it">Collegamento, segmentazione e allineamento</desc><desc xml:lang="pt">Ligação, segmentação e alinhamento</desc><desc xml:lang="ja">リンクモジュール</desc></moduleSpec>
The selection and combination of modules to form a TEI schema is described in
<ptr target="#STIN"/>.
  <specGrp>
<include xmlns="http://www.w3.org/2001/XInclude" href="../../Specs/att.global.linking.xml"/>
<!-- &att.pointing;  -->
<!-- <include xmlns="http://www.w3.org/2001/XInclude" href="../../Specs/att.pointing.group.xml"/> -->
<!-- att.pointing and att.pointing.group now defined in ST -->






<specGrpRef target="#DSAPT"/>
<specGrpRef target="#DSASA"/>
<specGrpRef target="#DSASYMP"/>
<specGrpRef target="#DSAAG"/>
<specGrpRef target="#DSAAT"/>
  </specGrp>
  </p>
</div>
</div>
