<?xml version="1.0" encoding="utf-8"?>
<!--
Copyright TEI Consortium. 
Dual-licensed under CC-by and BSD2 licences 
See the file COPYING.txt for details.
$Date$
$Id$
-->


<?xml-model href="http://tei.oucs.ox.ac.uk/jenkins/job/TEIP5/lastSuccessfulBuild/artifact/P5/release/xml/tei/odd/p5.nvdl" type="application/xml" schematypens="http://purl.oclc.org/dsdl/nvdl/ns/structure/1.0"?>

<div xmlns="http://www.tei-c.org/ns/1.0" n="31" type="div1" xml:id="NH">
    <head>Non-hierarchical Structures</head>
    <p>XML employs a strongly hierarchical document model. At various
    points, these Guidelines discuss problems that arise when using
    XML to encode textual features that either do not naturally lend
    themselves to representation in a strictly hierarchical form or
    conflict with other hierarchies represented in the
    markup. Examples of such situations include: <list rend="bulleted">
            <item>
                <p>Conflict between the hierarchy established by the
                physical structure of a document (e.g., volume, page,
                column, line) and its rhetorical or linguistic
                structure (e.g., chapters, paragraphs, sentences,
                acts, scenes, etc.)</p>
            </item>
            <item>
                <p>Conflict between a verse text's metrical structure
                (e.g., its arrangement in stanzas and metrical lines)
                and its rhetorical or linguistic structure (e.g.,
                phrases, sentences, and, for plays, acts, scenes, and
                speeches).</p>
            </item>
            <item>
                <p>Conflict between metrical, rhetorical, or
                linguistic structure and the representation of direct
                speech, especially if the quoted speech is interrupted
                by other elements (e.g., <said>What</said>, she asked,
                <said>was that all about</said>) or crosses metrical,
                rhetorical, or linguistic boundaries.</p>
            </item>
            <item>
                <p>Conflict between different analytical views or
                descriptions of a text or document, e.g., markup
                intended to encode diplomatic information about a
                word's appearance in a manuscript with markup intended
                to describe its morphology or pronunciation.</p>
            </item>
        </list>
    </p>
    <p>Non-nesting information poses fundamental problems for any
    XML-based encoding scheme, and it must be stated at the outset
    that no current solution combines all the desirable attributes of
    formal simplicity, capacity to represent all occurring or
    imaginable kinds of structures, suitability for formal or
    mechanical validation. The representation of non-hierarchical
    information is thus necessarily a matter of trade-offs among
    various sets of advantages and disadvantages.</p>
    <p>These Guidelines support several methods for handling
    non-hierarchical information: <list rend="bulleted">
            <item>
                <p>redundant encoding of information in multiple forms
                (discussed in <ptr target="#NHME"/>)</p>
            </item>
            <item>
                <p>the use of empty elements to delimit the boundaries
                of a non-nesting structure (discussed in <ptr
                target="#NHBM"/>)</p>
            </item>
            <item>
                <p>the division of a logically single non-nesting
                element into segments that nest properly in their
                immediate hierarchical context but can also be
                reconstituted virtually across these hierarchical
                boundaries (discussed <ptr target="#NHVE"/>)</p>
            </item>
            <item>
                <p>stand-off markup: the annotation of information by
                pointing at it, rather than by placing XML tags within
                it (discussed in <ptr target="#NHSO"/>)</p>
            </item>
        </list> Some of these methods can be used in TEI-conformant or -conformable documents. Others
        require extension. </p>
    <p>In the sections which follow these techniques are described and their advantages and
        disadvantages are briefly discussed. The various solutions to the problem will be
        exemplified using extracts from two poems. The first is the opening quatrain from William
        Wordsworth's <title level="a">Scorn not the sonnet</title>: <quote>
            <l>Scorn not the sonnet; critic, you have frowned,</l>
            <l>Mindless of its just honours; with this key</l>
            <l>Shakespeare unlocked his heart; the melody</l>
            <l>Of this small lute gave ease to Petrarch's wound.</l>
        </quote> The second example is the third stanza from the fourth section of Robert Pinsky's
            <title level="a">Essay on Psychiatrists</title>: <quote>
            <lg>
                <l>Catholic woman of twenty-seven with five children</l>
                <l>And a first-rate body—pointed her finger</l>
                <l>at the back of one certain man and asked me,</l>
                <l>"Is that guy a psychiatrist?" and by god he was! "Yes,"</l>
                <l>She said, "He <emph>looks</emph> like a psychiatrist."</l>
                <l>Grown quiet, I looked at his pink back, and thought.</l>
            </lg>
        </quote> These two texts can be analysed in various ways. The first, which we might describe
        as the <soCalled>Metrical View</soCalled>, encodes the text according to its metrical
        features: line divisions (as here), stanzas or cantos in larger poems, and perhaps prosodic
        features like stress or syllable patterns, alliteration, or rhyme. A second view, which we
        might describe as the <soCalled>Grammatical</soCalled>, encodes linguistic and rhetorical
        features: phonemes, morphemes, words, phrases, clauses, and sentences. A third view, the
            <soCalled>Dialogic</soCalled>, might concentrate on narrative voice: distinguishing
        between the narrator and their interlocutors and identifying individual segments as direct
        quotations. In our examples, we will restrict ourselves to relatively simple conflicts: for
        the <term>Metrical View</term> we will encode only metrical lines and line groups; for the
            <term>Grammatical View</term> we will restrict ourselves to encoding sentences; and for
        the <term>Dialogic View</term>, we only will distinguish direct quotation from other
        narration.</p>
    <div type="div2" xml:id="NHME">
        <head>Multiple Encodings of the Same Information</head>
        <p>Conceptually, the simplest method of disentangling two (or
        more) conflicting hierarchical views of the same information
        is to encode it twice (or more), each time capturing a single
        view.</p>
        <p>Thus, for example, the <term>Metrical View</term> of <title level="a">Scorn not the
                sonnet</title> might be encoded as follows, using the <gi>l</gi> element to encode
            each metrical line: <egXML xmlns="http://www.tei-c.org/ns/Examples" source="#NH-eg-01">
                <l>Scorn not the sonnet; critic, you have frowned,</l>
                <l>Mindless of its just honours; with this key</l>
                <l>Shakespeare unlocked his heart; the melody</l>
                <l>Of this small lute gave ease to Petrarch's wound.</l>
            </egXML></p>
        <p>The <term>Grammatical View</term> would be encoded by
        taking the same text and replacing the metrical markup with
        information about its sentence structure: <egXML
        xmlns="http://www.tei-c.org/ns/Examples" source="#NH-eg-01">
                <p>
                    <seg>Scorn not the sonnet;</seg>
                    <seg>critic, you have frowned, Mindless of its just honours;</seg>
                    <seg>with this key Shakespeare unlocked his heart;</seg>
                    <seg>the melody Of this small lute gave ease to Petrarch's wound.</seg>
                </p>
            </egXML></p>
        <p>Likewise, the more complex passage from Pinsky could be
        encoded in three different ways to reflect the different
        metrical, grammatical, and dialogic views of its text: <egXML
        xmlns="http://www.tei-c.org/ns/Examples" source="#NH-eg-02">
                <lg>
                    <l>Catholic woman of twenty-seven with five children</l>
                    <l>And a first-rate body—pointed her finger</l>
                    <l>at the back of one certain man and asked me,</l>
                    <l>"Is that guy a psychiatrist?" and by god he was! "Yes,"</l>
                    <l>She said, "He <emph>looks</emph> like a psychiatrist."</l>
                    <l>Grown quiet, I looked at his pink back, and thought.</l>
                </lg>
            </egXML>

<egXML xmlns="http://www.tei-c.org/ns/Examples" source="#NH-eg-02">
<p>
<seg>Catholic woman of twenty-seven with five children And a
first-rate body—pointed her finger at the back of one certain man and
asked me, "Is that guy a psychiatrist?" and by god he was!</seg>
</p>
<p>
<seg>"Yes," She said, "He <emph>looks</emph> like a
psychiatrist."</seg>
</p>
<p>
 <seg>Grown quiet, I looked at his pink back, and thought.</seg>
</p>
</egXML>

<egXML xmlns="http://www.tei-c.org/ns/Examples" source="#NH-eg-02">
<ab>Catholic woman of twenty-seven with five children And a first-rate
body—pointed her finger at the back of one certain man and asked me,
<said>Is that guy a psychiatrist?</said> and by god he was!
<said>Yes,</said> She said, <said>He <emph>looks</emph> like a
psychiatrist.</said> Grown quiet, I looked at his pink back, and
thought.</ab>
</egXML>

        </p>
        <p>This method is TEI-conformant. Its advantages are that each
        way of looking at the information is explicitly represented in
        the data and that the individual views are simple to
        process. The disadvantages are that the method requires the
        maintenance of multiple copies of identical textual content
        (an invitation to inconsistency) and that there is no explicit
        indication that the various views, which might be in separate
        files, are related to each other: it might prove difficult to
        combine the views or access information from one view while
        processing the file that contains the encoding of
        another.<note place="bottom">It has been shown, however, that it
        is possible to relate the different annotations in an indirect
        way: if the textual content of the annotations is identical,
        the very text can serve as a means for linking the different
        annotations, as described in <ptr
        target="#NH-BIBL-01"/>. </note></p>
    </div>
    <div type="div2" xml:id="NHBM">
        <head>Boundary Marking with Empty Elements</head>
        <p>A second method for accommodating non-hierarchical objects
        in an XML document involves marking the start and end points
        of the non-nesting material. This prevents textual features
        that fall outside the privileged hierarchy from invalidating
        the document while identifying their beginnings and ends for
        further processing. The disadvantage of this method is that no
        single XML element represents the non-nesting material and, as
        a result, processing with XML technologies is significantly
        more difficult.</p>
        <p>The empty elements used at each end are called
        <term>segment-boundary elements</term> or
        <term>segment-boundary delimiters</term>. There are several
        variations on this method of encoding.</p>

        <p>For some common structural features, the TEI provides
        milestone elements that can be used to mark the beginning of a
        textual feature. These include <gi>lb</gi>, <gi>pb</gi>,
        <gi>cb</gi>, <gi>handShift</gi>, and the generic
        <gi>milestone</gi>. Using <gi>lb</gi>, for example, it is
        possible to indicate both the physical lineation of a poem on
        the page and its grammatical division into sentences:
<egXML xmlns="http://www.tei-c.org/ns/Examples" source="#NH-eg-01">
  <p>
<seg><lb n="1"/>Scorn not the sonnet;</seg>; <seg>critic, you have
frowned, <lb n="2"/>Mindless of its just honours;</seg> <seg>with this
key <lb n="3"/>Shakespeare unlocked his heart;</seg> <seg>the melody
<lb n="4"/>Of this small lute gave ease to Petrarch's
wound.</seg></p></egXML>
	</p>
	<p>The use of these elements is by definition TEI-conformant. 
	  Care should be taken, however, that the meaning of
	the milestone elements is preserved: semantically, for
	example, <gi>lb</gi> is used to mark the start of a new
	(typographical) line. While in much modern poetry,
	typographical and metrical line divisions correspond,
	<gi>lb</gi> does not itself make a metrical claim: in encoding
	verse from sources, such as Old English manuscripts, where
	physical line breaks are not used to indicate metrical
	lineation, the correspondence would break down entirely.</p>


        <p>The segment boundaries also may be delimited by the generic
        <gi>anchor</gi> element.  Attributes can then be used to
        indicate the type of feature being delimited and whether a
        given instance opens or closes the feature. <egXML
        xmlns="http://www.tei-c.org/ns/Examples" source="#NH-eg-01">
                <l>
                    <anchor subtype="sentenceStart" type="delimiter"/>
		    Scorn not the sonnet; 
<anchor subtype="sentenceEnd" type="delimiter"/>
                    <anchor subtype="sentenceStart" type="delimiter"/> critic, you have frowned,</l>
                <l>Mindless of its just honours; <anchor subtype="sentenceEnd" type="delimiter"/>
                    <anchor subtype="sentenceStart" type="delimiter"/> with this key</l>
                <l>Shakespeare unlocked his heart; <anchor subtype="sentenceEnd" type="delimiter"/>
                    <anchor subtype="sentenceStart" type="delimiter"/> the melody</l>
                <l>Of this small lute gave ease to Petrarch's wound. <anchor subtype="sentenceEnd" type="delimiter"/></l>
            </egXML>
        </p>
        <p>This method is TEI-conformant.</p>


        <p>Another approach is to design custom elements that provide
        richer information about the feature being delimited or its
        boundaries. This information can be included as attribute
        values or as part of the element name itself: e.g.,
        <tag>boundaryStart
        element="sentence"/</tag>... <tag>boundaryEnd
        element="sentence"/</tag>, <tag>sentenceBoundary
        position="start"/</tag>... <tag>sentenceBoundary
        position="end"/</tag>, or <tag>sentenceBoundaryStart/</tag>...
        <tag>sentenceBoundaryEnd/</tag>:

<egXML xmlns="http://www.tei-c.org/ns/Examples" xmlns:n="http://www.example.org/ns/nonTEI" source="#NH-eg-01">
<l>
<n:sentenceBoundaryStart/>Scorn not the sonnet; 
<n:sentenceBoundaryEnd/>
<n:sentenceBoundaryStart/>critic, you have frowned,</l>
<l>Mindless of its just honours; <n:sentenceBoundaryEnd/>
<n:sentenceBoundaryStart/>with this key</l>
<l>Shakespeare unlocked his heart; <n:sentenceBoundaryEnd/>
<n:sentenceBoundaryStart/>the melody</l>
<l>Of this small lute gave ease to Petrarch's wound. <n:sentenceBoundaryEnd/></l>
            </egXML>
        </p>
        <p>If the custom elements can be replaced by TEI elements and
        attributes without loss of information, this method is TEI-conformable 
        (see <ptr target="#CF"/>); if the custom elements
        introduce information or distinctions that cannot be captured
        using standard TEI elements, the method is an extension.</p>


        <p>Finally, elements that are normally used to encode nesting
        textual features (e.g., <gi>said</gi>, <gi>seg</gi>,
        <gi>l</gi>, etc.) can be adapted so that they serve as empty
        segment boundary delimiters when the features they encode
        cross-hierarchical boundaries.  Additional attributes (<att
        scheme="HORSE">sID</att> and <att scheme="HORSE">eID</att> in
        the example below) are added to these elements in order to
        allow the unambiguous correlation of start and end
        points. This method has been introduced in the markup
        literature under various names, including Trojan milestones,
        HORSE markup, CLIX, and COLT. It is described in detail by
        <ptr target="#NH-BIBL-1" type="cit"/>): <egXML
        xmlns="http://www.tei-c.org/ns/Examples" source="#NH-eg-01">
                <lg xmlns:hr="http://www.example.org/ns/nonTEI">
                    <l>
                        <seg>Scorn not the sonnet;</seg><hr:s sID="s02"/>critic, you have frowned, </l>
                    <l>Mindless of its just honours; <hr:s eID="s02"/>
                        <hr:s sID="s03"/>with this key </l>
                    <l>Shakespeare unlocked his heart; <hr:s eID="s03"/>
                        <hr:s sID="s04"/>the melody </l>
                    <l>Of this small lute gave ease to Petrarch's wound. <hr:s eID="s04"/>
                    </l>
                </lg>
            </egXML> Depending on how the modifications are carried out, 
          this method may be TEI-conformable, represent an extension of the 
          TEI, or produce a non-conformant document. <list rend="bulleted">
                <item>The method is TEI-conformable if the modified
                elements are placed in a distinct, non-TEI namespace
                (see <ptr target="#CFNS"/>), and if the modified
                elements and attributes can be mapped without loss of
                information to existing TEI markup structures such as
                milestone or anchor elements automatically (see <ptr
                target="#CF"/>).</item>
                <item>The method represents an Extension if the
                modified elements are placed in a distinct, non-TEI
                namespace, but contain information or distinctions
                that cannot be algorithmically translated to existing
                TEI elements without loss of information (see <ptr
                target="#CF"/>).</item>
                <item>The method is non-conformant—and indeed strongly
                deprecated—if the modified elements and attributes are
                not placed in a distinct, non-TEI namespace (see <ptr
                target="#CFAM"/>).</item>
            </list></p>



        <p>In each of the above examples (except the last), the relationship between the start and
            end delimiters (where these exist) of a given feature is implicit: it is assumed that
            "end" delimiters close the nearest preceding "start" delimiter, or, in the case of
            milestones, that the milestone marks both the end of the preceding example and the
            beginning of the next. Complications arise, however, when the non-nesting text overlaps
            with other non-nesting text of the same type, as, for example, in a grammatical analysis
            of the various possible interpretations of the <!--<choice>
                <expan>Noun Phrase</expan>
                <abbr>NP</abbr>
            </choice>--> noun phrase
            <mentioned>fast trains and planes</mentioned>. In this case, the adjective <mentioned>fast</mentioned>
            can be understood as either modifying <mentioned>trains and planes</mentioned> or just
                <mentioned>trains</mentioned>: <figure>
                <head>Two interpretations of the phrase
                <mentioned>Fast trains and planes</mentioned></head>
                <graphic url="Images/tree1-2.jpg" width="80%"/>
                <figDesc>Graphic representation of two interpretations of the phrase <mentioned>Fast
                        trains and planes.</mentioned></figDesc>
            </figure></p>
        <p>In order to encode the possible analyses of this phrase, an
        unambiguous method of associating opening and closing segment
        boundary delimiters is required: <egXML
        xmlns="http://www.tei-c.org/ns/Examples" source="#NH-eg-03">
                <phr function="NP">
                    <anchor type="delimiter" subtype="NPstart" xml:id="NPInterpretationB"/>
                    <w function="A">Fast</w>
                    <anchor type="delimiter" subtype="NPstart" xml:id="NPInterpretationA"/>
                    <w function="N">trains</w>
                    <anchor type="delimiter" subtype="NPend" corresp="#NPInterpretationB"/>
                    <w function="C">and</w>
                    <w function="N">planes</w>
                    <anchor type="delimiter" subtype="NPend" corresp="#NPInterpretationA"/>
                </phr>
            </egXML>
        </p>
        <p>In this encoding, the first interpretation, in which
        <mentioned>fast</mentioned> modifies the NP <mentioned>trains
        and planes</mentioned>, the NP <mentioned>trains and
        planes</mentioned> is opened using an <gi>anchor</gi> tag with
        the <att>xml:id</att> value
        <mentioned>NPInterpretationA</mentioned> and closed with an
        <gi>anchor</gi> with the same value on <att>corresp</att>; in
        the second interpretation, in which
        <mentioned>fast</mentioned> forms a NP with
        <mentioned>trains</mentioned>, the NP <mentioned>fast
        cars</mentioned> is opened using an <gi>anchor</gi> tag with
        the <att>xml:id</att> value
        <mentioned>NPInterpretationB</mentioned> and closed with an
        <gi>anchor</gi> tag that has the same value on
        <att>corresp</att>.</p>
        <p>Despite their advantages, segment boundary delimiters incur
        the disadvantage of cumbersome processing: since the elements
        of the analysis (e.g., the sentences in the poems, or phrases
        in the above example) are not uniformly represented by nodes
        in the document tree, they must be reconstituted by software
        in an ad hoc fashion, which is likely to be difficult and may
        be error prone.</p>
        <p>Most important for some encoders, the method also disguises
        the relationship between the beginning and the ending of each
        logical element. This makes it impossible for standard
        validation software to provide the same kind of validation
        possible elsewhere in the encoding. When using grammar-based
        schema languages it is not possible to define a content model
        for the range limited by empty elements.<note
        place="bottom">Grammar based schema languages (e.g., DTD, W3C
        Schema, and RELAX NG) are used to define markup languages
        (e.g., XHTML or TEI). Rule-based schema languages (e.g.,
        Schematron) can be used to define further constraints. Such a
        rule-based schema language permits a sequence of certain
        elements between empty elements to be legitimized or
        prohibited.</note></p>
    </div>
    <div type="div2" xml:id="NHVE">
        <head>Fragmentation and Reconstitution of Virtual Elements</head>
        <p>A third method involves breaking what might be considered a
        single logical (but non-nesting) element into multiple smaller
        structural elements that fit within the dominant hierarchy but
        can be reconstituted virtually. For example, if a passage of
        direct discourse begins in the middle of one paragraph and
        continues for several more paragraphs, one could encode the
        passage as a series of <gi>said</gi> elements, each fitting
        within a <gi>p</gi> element. The resulting encoding is valid
        XML, but the text in each <gi>said</gi> element represents
        only a portion of the complete passage of direct
        discourse. For this reason these elements are sometimes called
        <soCalled>partial elements</soCalled>.</p>
        <p>In the case of our selection from Pinsky's poem, for
        example, the second passage of direct quotation, which crosses
        a line boundary and is broken up by a <mentioned>She
        said</mentioned> in the narrator's voice, can be made to fit
        within the hierarchy established by the metrical lineation by
        using two <gi>said</gi> elements:

<egXML xmlns="http://www.tei-c.org/ns/Examples" source="#NH-eg-02">
  <lg>
    <l>Catholic woman of twenty-seven with five children</l>
    <l>And a first-rate body—pointed her finger</l>
    <l>at the back of one certain man and asked me,</l>
    <l><said n="quotation1">Is that guy a psychiatrist?</said> and by god he was!
    <said n="quotation2">Yes,</said></l>
    <l>She said, <said n="quotation2">He <emph>looks</emph> like a
psychiatrist.</said></l>
    <l>Grown quiet, I looked at his pink back, and thought.</l>
  </lg>
</egXML>
</p>
	<p>Similarly, the sentences in our example from Wordsworth
	could be encoded:
<egXML xmlns="http://www.tei-c.org/ns/Examples" source="#NH-eg-01">
  <l>
    <seg n="sentence1">Scorn not the sonnet;</seg>
    <seg n="sentence2">critic, you have frowned,</seg>
  </l>
  <l>
    <seg n="sentence2">Mindless of its just honours;</seg>
    <seg n="sentence3">with this key</seg>
  </l>
  <l>
    <seg n="sentence3">Shakespeare unlocked his heart;</seg>
    <seg n="sentence4">the melody</seg>
  </l>
  <l>
    <seg n="sentence4">Of this small lute gave ease to Petrarch's wound.</seg>
  </l>
            </egXML></p>
        <p>There are two main problems with this type of encoding. The
        first is that it invariably means that the encoding will have
        more elements claiming to represent a feature than there are
        actual instances of that feature in the text. Thus, for
        example, the passage from <title level="a">Scorn not the
        sonnet</title> marks seven spans of text using <gi>seg</gi>,
        even though there are only four linguistic sentences in the
        passage.</p>
        <p>The second problem is that it can be semantically
        misleading. Although they are tagged using the element for
        <term>sentence</term>, for example, very few of the textual
        features encoded using <gi>seg</gi> in this example represent
        actual linguistic sentences: <mentioned>with this
        key</mentioned>, for example, is a prepositional phrase, not a
        sentence; <mentioned>Of this small lute gave ease to
        Petrarch's wound</mentioned> is a string corresponding to no
        single grammatical category.</p>
        <p>Taken together, these problems can make automatic analysis
        of the fragmented features difficult. An analysis that
        intended to count the number of sentences in Wordsworth's
        poem, for example, would arrive at an inflated figure if it
        understood the <gi>seg</gi> elements to represent complete
        rhetorical sentences; if it wanted to do an analysis of his
        syntax, it would not be able to assume that <gi>seg</gi>
        delimited linguistic sentences.</p>
        <p>The technique of fragmentation is often complemented by the
        technique of virtual joins.  Virtual joins may be used to
        combine objects in the text to a new hierarchy. Here is <title
        level="a">Scorn not the sonnet</title> again; this time the
        relationship between the parts of the fragmented sentences is
        indicated explicitly using the <att>next</att> and
        <att>prev</att> attributes described in <ptr target="#SAAG"/>.
	
<egXML xmlns="http://www.tei-c.org/ns/Examples" source="#NH-eg-01">
  <l>
    <seg>Scorn not the sonnet;</seg>
    <seg next="#s2b" xml:id="s2a">critic, you have frowned,</seg>
  </l>
  <l>
    <seg prev="#s2a" xml:id="s2b">Mindless of its just honours;</seg>
    <seg next="#s3b" xml:id="s3a">with this key</seg>
  </l>
  <l>
    <seg prev="#s3a" xml:id="s3b">Shakespeare unlocked his heart;</seg>
    <seg next="#s4b" xml:id="s4a">the melody</seg>
  </l>
  <l>
    <seg prev="#s4a" xml:id="s4b">Of this small lute gave ease to Petrarch's wound.</seg>
  </l>
            </egXML> This method of virtually joining partial elements is sometimes called
                <soCalled>chaining</soCalled>. </p>
        <p>For fragments encoded using <gi>ab</gi>, <gi>l</gi>,
        <gi>lg</gi>, <gi>div</gi>, or elements that belong to the
        <ident type="class">att.segLike</ident> class, an even simpler
        mechanism for virtually joining fragments exists: the use of
        the <att>part</att> attribute with the value
        <mentioned>I</mentioned> (Initial), <mentioned>M</mentioned>
        (Medial), or <mentioned>F</mentioned> (Final) as described in
        <ptr target="#SASE"/>.  Here is the above example recoded to
        reflect this method: <egXML
        xmlns="http://www.tei-c.org/ns/Examples" source="#NH-eg-01">
                <l>
                    <seg>Scorn not the sonnet;</seg>
                    <seg part="I">critic, you have frowned,</seg>
                </l>
                <l>
                    <seg part="F">Mindless of its just honours;</seg>
                    <seg part="I">with this key</seg>
                </l>
                <l>
                    <seg part="F">Shakespeare unlocked his heart;</seg>
                    <seg part="I">the melody</seg>
                </l>
                <l>
                    <seg part="F">Of this small lute gave ease to Petrarch's wound.</seg>
                </l>
            </egXML></p>
        <p>This method is TEI-conformant and simple to use. Its
        disadvantage is that it does not work well for cases of
        self-overlap, or if there are nested occurrences of the same
        element type, as it can become difficult to ascertain which
        initial, medial, or final partial element should be combined
        with which others or in which order. This problem becomes
        evident if we attempt to combine a detailed Grammatical view
        of the Pinsky example with its metrical encoding:
<egXML xmlns="http://www.tei-c.org/ns/Examples" source="#NH-eg-02">
  <lg>
    <l>
      <seg part="I">Catholic woman of twenty-seven with five children</seg>
    </l>
    <l>
      <seg part="M">And a first-rate body—pointed her finger</seg>
    </l>
    <l>
      <seg part="M">at the back of one certain man and asked me,</seg>
    </l>
    <l>
      <seg part="F">"<seg>Is that guy a psychiatrist?</seg>" and by god he was!</seg>
      <seg part="I">"<seg part="I">Yes,</seg>"</seg>
    </l>
    <l>
      <seg part="F">She said, "<seg part="F">He <emph>looks</emph> like a psychiatrist.</seg>"</seg>
    </l>
    <l>
      <seg>Grown quiet, I looked at his pink back, and thought.</seg>
    </l>
  </lg>
	</egXML></p>

	<p>A third method for aggregating fragmented partial elements
	involves using markup that is not directly part of the
	encoding, e.g., the <gi>join</gi> element. In this method, a
	<gi>join</gi> element is used elsewhere in the document to
	indicate explicitly the members of the virtual element:

<egXML xmlns="http://www.tei-c.org/ns/Examples" source="#NH-eg-01">
  <l>
    <w xml:id="w01">Scorn</w>
    <w xml:id="w02">not</w>
    <w xml:id="w03">the</w>
    <w xml:id="w04">sonnet</w>; <w xml:id="w05">critic</w>, <w xml:id="w06">you</w>
    <w xml:id="w07">have</w>
  <w xml:id="w08">frowned</w>, </l>
  <l>
    <w xml:id="w09">Mindless</w>
    <w xml:id="w10">of</w>
    <w xml:id="w11">its</w>
    <w xml:id="w12">just</w>
    <w xml:id="w13">honours</w>; <w xml:id="w14">with</w>
    <w xml:id="w15">this</w>
    <w xml:id="w16">key</w>
  </l>
  <l>
    <w xml:id="w17">Shakespeare</w>
    <w xml:id="w18">unlocked</w>
    <w xml:id="w19">his</w>
    <w xml:id="w20">heart</w>; <w xml:id="w21">the</w>
    <w xml:id="w22">melody</w>
  </l>
  <l>
    <w xml:id="w23">Of</w>
    <w xml:id="w24">this</w>
    <w xml:id="w25">small</w>
    <w xml:id="w26">lute</w>
    <w xml:id="w27">gave</w>
    <w xml:id="w28">ease</w>
    <w xml:id="w29">to</w>
    <w xml:id="w30">Petrarch's</w>
  <w xml:id="w31">wound</w>. </l>
  
  <!-- Elsewhere in the document -->
  
  <p>
    <join result="s" scope="root" target="#w01 #w02 #w03 #w04"/>
    <join result="s" scope="root" target="#w05 #w06 #w07 #w08 #w09 #w10 #w11 #w12 #w13"/>
    <join result="s" scope="root" target="#w14 #w15 #w16 #w17 #w18 #w19 #w20"/>
    <join result="s" scope="root" target="#w21 #w22 #w23 #w24 #w25 #w26 #w27 #w28 #w29 #w30 #w31"/>
  </p>
</egXML>
	</p>
        <p>This use of <gi>join</gi> is TEI-conformant.</p>
        <p>The major advantage of fragmentation and virtual joins is
        that it allows all the hierarchies in the text to be handled
        explicitly: both the privileged one directly represented and
        the alternate hierarchy that has been split up and
        rejoined. The major disadvantages are that (like most of the
        other methods described here) it privileges one hierarchy over
        the others, requires special processing to reconstitute the
        elements of the other hierarchies, and, except in the case of
        <gi>join</gi>, can be semantically misleading.</p>
    </div>
    <div type="div2" xml:id="NHSO">
        <head>Stand-off Markup</head>
        <p>Most markup is characterized by the embedding of elements
        in the text. An alternative approach separates the text and
        the elements used to describe it. This approach is known as
        stand-off markup (see section <ptr target="#SASO"/>). It
        establishes a new hierarchy by building a new tree whose nodes
        are XML elements that do not contain textual content, but
        rather links to another <term>layer</term>: <gloss>a node in
        another XML document or a span of text</gloss>. This approach
        can be subdivided according to different criteria. A first
        distinction concerns the link base, i.e. the content to which
        annotations are to be applied. Sometimes the link target
        contains markup that can be referred to explicitly, as in the
        following example where the offset markup uses the
        <att>xml:id</att> values on <gi>w</gi> to provide targets for
        <gi>xi:include</gi><note place="bottom">A fake namespace is
        given for XInclude here, to avoid the markup being interpreted
        literally during processing.</note>: <egXML
        xmlns="http://www.tei-c.org/ns/Examples" source="#NH-eg-01">
                <l>
                    <w xml:id="w001">Scorn</w>
                    <w xml:id="w002">not</w>
                    <w xml:id="w003">the</w>
                    <w xml:id="w004">sonnet</w>; <w xml:id="w005">critic</w>, <w xml:id="w006">you</w>
                    <w xml:id="w007">have</w>
                    <w xml:id="w008">frowned</w>, </l>
                <l>
                    <w xml:id="w009">Mindless</w>
                    <w xml:id="w010">of</w>
                    <w xml:id="w011">its</w>
                    <w xml:id="w012">just</w>
                    <w xml:id="w013">honours</w>; <w xml:id="w014">with</w>
                    <w xml:id="w015">this</w>
                    <w xml:id="w016">key</w>
                </l>
                <l>
                    <w xml:id="w017">Shakespeare</w>
                    <w xml:id="w018">unlocked</w>
                    <w xml:id="w019">his</w>
                    <w xml:id="w020">heart</w>; <w xml:id="w021">the</w>
                    <w xml:id="w022">melody</w>
                </l>
                <l>
                    <w xml:id="w023">Of</w>
                    <w xml:id="w024">this</w>
                    <w xml:id="w025">small</w>
                    <w xml:id="w026">lute</w>
                    <w xml:id="w027">gave</w>
                    <w xml:id="w028">ease</w>
                    <w xml:id="w029">to</w>
                    <w xml:id="w030">Petrarch's</w>
                    <w xml:id="w031">wound</w>. </l>

<!-- elsewhere in the current document -->

<![CDATA[
<p xmlns:xi="http://www.w3.org/2001/XInclude"> 
  <seg>
    <xi:include xpointer="range(element(w001),element(w004))"/>
  </seg>
  <seg>
    <xi:include xpointer="range(element(w005),element(w013))"/>
  </seg>
  <seg>
    <xi:include xpointer="range(element(w014),element(w020))"/>
  </seg>
  <seg>
    <xi:include xpointer="range(element(w021),element(w031))"/>
  </seg>
</p>
]]>
            </egXML>

  Note that the layer that uses XInclude to build another hierarchy
  might well be in another document, in which case the value of <att
  scheme="HTML">href</att> of <gi>xi:xinclude</gi> would need to be
  the URL of the document that contains the base layer, in this case
  the <gi>w</gi> elements.
        </p>

        <p>This is very similar to the use of <gi>join</gi> discussed
        above. The main advantages of the stand-off method are that it
        is possible to specify attributes on the aggregate
        <gi>seg</gi> elements, and that there exists off-the-shelf
        software that will perform appropriate processing. Stand-off
        markup may be used even when the base text being annotated is
        plain text, i.e. does not have any XML encoding. In this case,
        the range of text to be marked up is indicated by character
        offsets (see <ptr target="#SATS"/>, in particular <ptr
        target="#SATSSR"/>). Another distinction concerns the number
        of files which can serve as link targets. Often, one
        (dedicated) annotation is used as the link target of all the
        other annotations. It is also possible to freely interlink
        several layers.</p>
        <p>It has been noted that stand-off markup has several
        advantages over embedded annotations.  In particular, it is
        possible to produce annotations of a text even when the source
        document is read-only. Furthermore, annotation files can be
        distributed without distributing the source text. Further
        advantages mentioned in the literature are that discontinuous
        segments of text can be combined in a single annotation, that
        independent parallel coders can produce independent
        annotations, and that different annotation files can contain
        different layers of information. Lastly, it has also been
        noted that this approach is elegant.</p>
        <p>But there are also several drawbacks. First, new stand-off
        annotated layers require a separate interpretation, and the
        layers—although separate—depend on each other. Moreover,
        although all of the information of the multiple hierarchies is
        included, the information may be difficult to access using
        generic methods.</p>
        <p>Inasmuch as it uses elements not included in the TEI
        namespace, stand-off markup involves an extension of the
        TEI.</p>
    </div>
    <div type="div2" xml:id="NHNX">
        <head>Non-XML-based Approaches</head>
        <p>There exist many non-XML methods of encoding a text that
        either solve or do not suffer the problem of the inability to
        encode overlapping hierarchies. These include, but are not
        limited to, the following proposals.</p>
        <list rend="bulleted">
            <item>Applying the notion of concurrent markup to XML
            (<ptr target="#NH-BIBL-2" type="cit"/>). This reintroduces
            the CONCUR feature of SGML, which was omitted from the XML
            specification.</item>
            <item>Designing a form of document representation in which
            several trees share all or part of the same frontier, and
            in which each individual view of the document has the form
            of a tree (see <ptr target="#NH-BIBL-3"
            type="cit"/>).</item>
            <item>The <soCalled>colored XML</soCalled> proposal (<ptr
            target="#NH-BIBL-4" type="cit"/>), which stores a body of
            information as a set of intertwined XML trees. This
            approach eliminates unnecessary redundancy and makes the
            database readily updatable, while allowing the user to
            exploit different hierarchical access paths.</item>
            <item>The MultiX proposal (<ptr target="#NH-BIBL-5"
            type="cit"/>) , which represents documents as directed
            graphs. Because XML is used to represent the graph, the
            document is, at least in principle, manipulable with
            standard XML tools.</item>
            <item>The Just-In-Time-Trees proposal (<ptr
            target="#NH-BIBL-6" type="cit"/>), which stores documents
            using XML, but processes the XML representation in
            non-standard ways and allows it to be mapped onto data
            structures that are different from those known from
            XML.</item>
            <item>The <choice>
                    <expan>Layered Markup and Annotation
                    Language</expan> <abbr>LMNL</abbr> </choice>
                    proposal. This offers alternatives to the basic
                    XML linear form as well as its data and processing
                    models. It uses an alternative notation to XML and
                    a data structure based on Core Range Algebra (<ptr
                    target="#NH-BIBL-7" type="cit"/>).</item>
            <item><choice>
                    <expan>Markup Languages for Complex Documents</expan>
                    <abbr>MLCD</abbr> </choice>. This provides a
                    notation (TexMECS) and a data structure (Goddag)
                    as well as a draft constraint language for the
                    representation of non-hierarchical structures; see
                    <ptr target="#NH-BIBL-8" type="cit"/>.</item>
        </list>
        <p>These approaches are based either on non-standard XML
        processing or data models, or not based on XML at all. Since
        TEI is currently based on XML they are not described any
        further in these Guidelines. Use of these methods with the TEI
        will certainly involve extensions; in most cases the documents
        will also be non-conformant.</p>
    </div>
    <!-- I wonder if we really need this last section -->
    <!-- <div type="div2" xml:id="NHDI"> <head>Discussion</head>
         <p>All proposed solutions of the multiple hierarchy problem
         have advantages as well as drawbacks. For different
         applications and usage scenarios, different solutions may be
         appropriate. The best solution in a given circumstance
         depends on a number of factors:<list rend="bulleted"> <item>the
         number of different views and potential hierarchies</item>
         <item>the amount of overlap</item> <item>how much, if any,
         customization is required</item> <item>how much encoding
         guidance annotators need from the XML editing software</item>
         <item>whether the markup is intended for internal use of
         exchange.</item> </list></p> <p>Moreover what is </item>
         <item>Boundary marking with empty elements </item> </list>
         The choice of a solution depends on a number of specifics:
         . If good (special purpose) software is available it might
         also be appropriate to use stand-off markup or non XML-based
         solutions.</p> </item> <item>For processing, the
         stand-off-approach often is quite easy to handle. Several
         readily-available XML tools will perform XInclude
         processing.</item> <item>For sustainable storage and easy
         information transfer redundant encoding is desirable. Other
         formats can be generated by unifying the separate
         annotations.</item> </list> </div> -->
</div>
