21 Graphs, Networks, and Trees

Part 4

Additional Tag Sets

21 Graphs, Networks, and Trees

Graphical representations are widely used for displaying relations among informational units because they help readers to visualize those relations and hence to understand them better. Two general types of graphical representations may be distinguished.

Graphs, in the strictly mathematical sense, consist of points, often called nodes or vertices, and connections among them, called arcs, or under certain conditions, edges. Among the various types of graphs are networks and trees. Graphs generally and networks in particular are dealt with directly below. Trees are dealt with separately in sections 21.2 and 21.3 . [ see note 114 ]
Charts, which typically plot data in two or more dimensions, including plots with orthogonal or radial axes, bar charts, pie charts, and the like. These can be described using the elements defined in the additional tag set for figures and graphics; see chapter 22 .

The following DTD fragment shows the overall organization of the tag set discussed in the remainder of this chapter.

<!-- 21:  Graphs, networks and trees                          -->
<!-- ... declarations from section 21.1                       -->
<!--     (Graphs)                                             -->
<!--     go here ...                                          -->
<!-- ... declarations from section 21.2                       -->
<!--     (Trees (basic method))                               -->
<!--     go here ...                                          -->
<!-- ... declarations from section 21.3                       -->
<!--     (Trees (alternate method))                           -->
<!--     go here ...                                          -->

This tag set is made available as described in 3.3 ; in a document which uses the markup described in this chapter, the document type declaration should contain the following declaration for the entity tei.nets :

<!ENTITY % tei.nets 'INCLUDE'>

The entire document type declaration for a document using this additional tag set together with the prose base might look like this:

 <!DOCTYPE TEI.2 PUBLIC "-//TEI P3//DTD Main Document Type//EN"
                        "tei2.dtd" [
    <!ENTITY % tei.prose 'INCLUDE' >
    <!ENTITY % tei.nets  'INCLUDE' >
 ]>

Among the types of qualitative relations often represented by graphs are organizational hierarchies, flow charts, genealogies, semantic networks, transition networks, grammatical relations, tournament schedules, seating plans, and directions to people's houses. In developing recommendations for the encoding of graphs of various types, we have relied on their formal mathematical definitions and on the most common conventions for representing them visually. However, it must be emphasized that these recommendations do not provide for the full range of possible graphical representations, and deal only partially with questions of design, layout and placement.

21.1 Graphs and Digraphs

Broadly speaking, graphs can be divided into two types: undirected and directed. An undirected graph is a set of nodes (or vertices) together with a set of pairs of those vertices, called arcs or edges. Each node in an arc of an undirected graph is said to be incident with that arc, and the two vertices which make up an arc are said to be adjacent. An directed graph is like an undirected graph except that the arcs are ordered pairs of nodes. In the case of directed graphs, the term edge is not used; moreover, each arc in an directed graph is said to be adjacent from the node from which the arc emanates, and adjacent to the node to which the arc is directed. We use the element <graph> to encode graphs as a whole, <node> to encode nodes or vertices, and <arc> to encode arcs or edges; arcs can also be encoded by attributes on the <node> element. These elements have the following descriptions and attributes:

<graph> encodes a graph, which is a collection of nodes, and arcs which connect the nodes. Attributes include:
- type describes the type of graph. Suggested values include:
  - transducer a transition network with up to two labels on each arc
  - undirected undirected graph
  - transition network a directed graph with distinguished initial and final nodes
  - directed directed graph
- label gives a label for a graph.
- order states the order of the graph, i.e., the number of its nodes.
- size states the size of the graph, i.e., the number of its arcs.
<node> encodes a node, a possibly labeled point in a graph. Attributes include:
- label gives a label for a node.
- label2 gives a second label for a node.
- value provides the value of a node, which is a feature structure or other analytic element.
- type provides a type for a node. Suggested values include:
  - final final node in a transition network
  - initial initial node in a transition network
- adjFrom gives the IDs of the nodes which are adjacent from the current node.
- adjTo gives the IDs of the nodes which are adjacent to the current node.
- adj gives the IDs of the nodes which are both adjacent to and adjacent from the current node.
- inDegree gives the in degree of the node, the number of nodes which are adjacent from the given node.
- outDegree gives the out degree of the node, the number of nodes which are adjacent to the given node.
- degree gives the degree of the node, the number of arcs with which the node is incident.
<arc> encodes an arc, the connection from one node to another in a graph. Attributes include:
- label gives a label for an arc.
- label2 gives a second label for an arc.
- from gives the ID of the node which is adjacent from this arc.
- to gives the ID of the node which is adjacent to this arc.

Before proceeding, some additional terminology may be helpful. We define a path in a graph as a sequence of nodes n1, ..., nk such that there is an arc from each ni to ni+1 in the sequence. A cyclic path, or cycle is a path leading from a particular node back to itself. A graph that contains at least one cycle is said to be cyclic; otherwise it is acyclic. We say, finally, that a graph is connected if there is a path from some node to every other node in the graph; any graph that is not connected is said to be disconnected.

Here is an example of an undirected, cyclic disconnected graph, in which the nodes are annotated with three-letter codes for airports, and the arcs connecting the nodes are represented by horizontal and vertical lines, with 90 degree bends used simply to avoid having to draw diagonal lines.

          .---LAX---.
          |         |
         LVG-------PHX---TUS    CIB

  Airline Connections in Southwestern USA

Next is a markup of the graph, using <arc> elements to encode the arcs.

<graph type=undirected
       id=cug1
       label='Airline Connections in Southwestern USA'
       order=5
       size=4>
  <node label=LAX id=LAX degree=2>
  <node label=LVG id=LVG degree=2>
  <node label=PHX id=PHX degree=3>
  <node label=TUS id=TUS degree=1>
  <node label=CIB id=CIB degree=0>
  <arc  from=LAX to=LVG>
  <arc  from=LAX to=PHX>
  <arc  from=LVG to=PHX>
  <arc  from=PHX to=TUS>
  </graph>

The label attribute on the <graph> element records a label for the graph; similarly, the label attribute on the <node> elements records the labels of those nodes. The order and size attributes on the <graph> element record the number of nodes and number of arcs in the graph respectively; these values are optional (since they can be computed from the rest of the graph), but if they are supplied, they must be consistent with the rest of the encoding. They can thus be used to help check that the graph has been encoded and transmitted correctly. The degree attribute on the <node> elements record the number of arcs that are incident with that node. It is optional (because redundant), but can be used to help in validity checking: if a value is given, it must be consistent with the rest of the information in the graph. Finally, the from and to attributes on the <arc> elements provide pointers to the nodes connected by those arcs. Since the graph is undirected, no directionality is implied by the use of the from and to attributes; the values of these attributes could be interchanged in each arc without changing the graph.

The adj , adjFrom , and adjTo attributes of the <node> element provide an alternative method of representing unlabeled arcs, their values being pointers to the nodes which are adjacent to or from that node. The adj attribute is to be used for undirected graphs, and the adjFrom and adjTo attributes for directed graphs. It is a semantic error for the directed adjacency attributes to be used in an undirected graph, and vice versa. Here is a markup of the preceding graph, using the adj attribute to represent the arcs.

<graph type=undirected
       id=cug2
       label='Airline Connections in Southwestern USA'
       order=5
       size=4>
  <node label=LAX id=LAX degree=2 adj='LVG PHX'>
  <node label=LVG id=LVG degree=2 adj='LAX PHX'>
  <node label=PHX id=PHX degree=3 adj='LAX LVG TUS'>
  <node label=TUS id=TUS degree=1 adj='PHX'>
  <node label=CIB id=CIB degree=0>
  </graph>

Note that each arc is represented twice in this encoding of the graph. For example, the existence of the arc from LAX to LVG can be inferred from each of the first two <node> elements in the graph. This redundancy, however, is not required: it suffices to describe an arc in any one of the three places it can be described (either adjacent node, or in a separate <arc> element). Here is a less redundant representation of the same graph.

<graph type=undirected
       id=cug3
       label='Airline Connections in Southwestern USA'
       order=5
       size=4>
  <node label=LAX id=LAX degree=2 adj='LVG PHX'>
  <node label=LVG id=LVG degree=2 adj='PHX'>
  <node label=PHX id=PHX degree=3 adj='TUS'>
  <node label=TUS id=TUS degree=1>
  <node label=CIB id=CIB degree=0>
  </graph>

Although in many cases the <arc> element is redundant (since arcs can be described using the adjacency attributes of their adjacent nodes), it has nevertheless been included in the tag set, in order to allow the convenient specification of identifiers, display or rendition information, and labels for each arc (using the attributes id , rend , and label ).

Next, let us modify the preceding graph by adding directionality to the arcs. Specifically, we now think of the arcs as specifying selected routes from one airport to another, as indicated by the direction of the arrowheads in the following diagram.

          +-<-LAX-<-+
          |         |
         LVG--->---PHX--->---TUS   CIB
                    |         |
                    +----<----+

  Selected Airline Routes in Southwestern USA

Here is an encoding of this graph, using the <arc> element to designate the arcs.

<graph type=directed
       id=rdg1
       label='Selected Airline Routes in Southwestern USA'
       order=5
       size=5>
   <node label=LAX id=LAX inDegree=1 outDegree=1>
   <node label=LVG id=LVG inDegree=1 outDegree=1>
   <node label=PHX id=PHX inDegree=2 outDegree=2>
   <node label=TUS id=TUS inDegree=1 outDegree=1>
   <node label=CIB id=CIB inDegree=0 outDegree=0>
   <arc  from=LAX to=LVG>
   <arc  from=LVG to=PHX>
   <arc  from=PHX to=LAX>
   <arc  from=PHX to=TUS>
   <arc  from=TUS to=PHX>
   </graph>

Here is another encoding of the graph, using the adjTo and adjFrom attributes on nodes to designate the arcs.

<graph type=directed
       id=rdg2
       label='Selected Airline Routes in Southwestern USA'
       order=5
       size=5>
   <node label=LAX id=LAX inDegree=1
         outDegree=1 adjTo=LVG adjFrom=PHX>
   <node label=LVG id=LVG inDegree=1
         outDegree=1 adjFrom=LAX adjTo=PHX>
   <node label=PHX id=PHX inDegree=2
         outDegree=2 adjTo='LAX TUS' adjFrom='LVG TUS'>
   <node label=TUS id=TUS inDegree=1
         outDegree=1 adjTo=PHX adjFrom=PHX>
   <node label=CIB id=CIB inDegree=0 outDegree=0>
   </graph>

If we wish to label the arcs, say with flight numbers, then <arc> elements must be used to carry the label attribute, as in the following example.

<graph type=directed
       id=rdg1
       label='Selected Airline Routes in Southwestern USA'
       order=5
       size=5>
   <node label=LAX id=LAX >
   <node label=LVG id=LVG >
   <node label=PHX id=PHX >
   <node label=TUS id=TUS >
   <node label=CIB id=CIB >
   <arc  from=LAX to=LVG label='SW117'>
   <arc  from=LVG to=PHX label='SW711'>
   <arc  from=PHX to=LAX label='AA218'>
   <arc  from=PHX to=TUS label='AW229'>
   <arc  from=TUS to=PHX label='AW225'>
   </graph>

The formal declarations of the <graph> , <node> and <arc> elements are as follows.

<!-- 21.1:  Graphs                                            -->
<!ELEMENT graph         - -  ((node+, arc*) | (arc*, node+))    >
<!ATTLIST graph              %a.global;
          label              CDATA               #IMPLIED
          order              NUMBER              #IMPLIED
          size               NUMBER              #IMPLIED
          type               CDATA               #IMPLIED       >
<!ELEMENT node          - O  EMPTY                              >
<!ATTLIST node               %a.global;
          label              CDATA               #IMPLIED
          label2             CDATA               #IMPLIED
          value              IDREF               #IMPLIED
          outDegree          NUMBER              #IMPLIED
          adjTo              IDREFS              #IMPLIED
          adjFrom            IDREFS              #IMPLIED
          inDegree           NUMBER              #IMPLIED
          type               CDATA               #IMPLIED
          adj                IDREFS              #IMPLIED
          degree             NUMBER              #IMPLIED       >
<!ELEMENT arc           - O  EMPTY                              >
<!ATTLIST arc                %a.global;
          label              CDATA               #IMPLIED
          to                 IDREF               #REQUIRED
          label2             CDATA               #IMPLIED
          from               IDREF               #REQUIRED      >
<!-- This fragment is used in sec. 21                         -->

21.1.1 Transition Networks

For encoding transition networks and other kinds of directed graphs in which distinctions among types of nodes must be made, the type attribute is provided for <node> elements. In the following example, the initial and final nodes (or states) of the network are distinguished. It can be understood as accepting the set of strings obtained by traversing it from its initial node to its final node, and concatenating the labels.

             OLD      MAN      COMES
            +->-+   +-->----o---->---+
            |   |   |                |
            \   /   |                |
        THE  \ /    |                |
(8)   o-->----o-----+                o
                    |                |
                    |                |
                    |                |
                    +-->----o---->---+
                      MEN       COME

<graph type='transition network'
       id=ss8
       label='(8)'
       order=5
       size=6>
   <node id=q0 inDegree=0 outDegree=1 type=initial>
   <node id=q1 inDegree=2 outDegree=3>
   <node id=q2 inDegree=1 outDegree=1>
   <node id=q3 inDegree=1 outDegree=1>
   <node id=q4 inDegree=2 outDegree=0 type=final>
   <arc  from=q0 to=q1 label=THE>
   <arc  from=q1 to=q1 label=OLD>
   <arc  from=q1 to=q2 label=MAN>
   <arc  from=q1 to=q3 label=MEN>
   <arc  from=q2 to=q4 label=COMES>
   <arc  from=q3 to=q4 label=COME>
   </graph>

A finite state transducer has two labels on each arc, and can be thought of as representing a mapping from one sequence of labels to the other. The following example represents a transducer for translating the English strings accepted by the network in the preceding example into French. The nodes have been annotated with numbers, for convenience.

         THE         MAN      COMES
      +--->---1------->-----4--->---+
      |   L'  |     HOMME     VIENT |
      |       |                     |
      |    OLD|VIEIL                |
      |       |                     |
      | THE   |                     |
(8)   0-->----2                     6
      |  LE  / \                    |
      |     /   \                   |
      |    /     \                  |
      |    | OLD |                  |
      |    +-->--+                  |
      |     VIEIL                   |
      |  THE        MEN       COME  |
      +--->---3------>----5---->----+
         LES / \   HOMMES   VIENNENT
            /   \
           /     \
           | OLD |
           +-->--+
            VIEUX

<graph type='transducer'
       order=7
       size=10>
   <node id=t0 label=0 inDegree=0 outDegree=3 type=initial>
   <node id=t1 label=1 inDegree=2 outDegree=1>
   <node id=t2 label=2 inDegree=2 outDegree=2>
   <node id=t3 label=3 inDegree=2 outDegree=2>
   <node id=t4 label=4 inDegree=1 outDegree=1>
   <node id=t5 label=5 inDegree=1 outDegree=1>
   <node id=t6 label=6 inDegree=2 outDegree=0 type=final>
   <arc  from=t0 to=t1 label=THE label2="L'">
   <arc  from=t0 to=t2 label=THE label2=LE>
   <arc  from=t0 to=t3 label=THE label2=LES>
   <arc  from=t1 to=t4 label=MAN label2=HOMME>
   <arc  from=t2 to=t1 label=OLD label2=VIEIL>
   <arc  from=t2 to=t2 label=OLD label2=VIEIL>
   <arc  from=t3 to=t3 label=OLD label2=VIEUX>
   <arc  from=t3 to=t5 label=MEN label2=HOMMES>
   <arc  from=t4 to=t6 label=COMES label2=VIENT>
   <arc  from=t5 to=t6 label=COME label2=VIENNENT>
   </graph>

21.1.2 Family Trees

The next example provides an encoding a portion of a `family tree', in which nodes are used to represent individuals, and parents of individuals, and arcs are used to represent common parentage and descent links. Let us suppose, further, that information about individuals is contained in feature structures, which are contained in feature-structure libraries elsewhere in the document (see 16.3 ). We can use the value attribute on <node> elements to point to those feature structures. Assume that, in some particular representation of the graph, nodes representing females are framed by circles, nodes representing males are framed by boxes, and nodes representing parents are framed by diamonds.

                            Mo     Fa
                  Katherine->--K+A--<-Amberley
                                |
                        So      |    Da
                   +----<-------+----->----+
                   |            | So       |
                Bertrand        +->-+    Rachel
                   |                |
                   |              Frank
         Mo     Fa | Fa     Mo
   Peter->--P+B--<-+->--D+B--<-Dora
             |           |
          So |        Da | So
         +-<-+     +--<--+-->--+
         |         |           |
       Conrad     Kate        John

<graph type='family tree' order=13 size=12>
  <node id=KathR  label='Katherine'
        value=kr1 inDegree=0 outDegree=1>
  <node id=AmbeR  label='Amberley'
        value=ar1 inDegree=0 outDegree=1>
  <node id=KAR  label='K+A'
        inDegree=2 outDegree=3>
  <node id=BertR  label='Bertrand'
        value=br1 inDegree=1 outDegree=2>
  <node id=PeteR  label='Peter'
        value=pr1 inDegree=0 outDegree=1>
  <node id=DoraR  label='Dora'
        value=dr1 inDegree=0 outDegree=1>
  <node id=PBR  label='P+B'
        inDegree=2 outDegree=1>
  <node id=DBR  label='D+B'
        inDegree=2 outDegree=2>
  <node id=FranR  label='Frank'
        value=fr1 inDegree=1 outDegree=0>
  <node id=RachR  label='Rachel'
        value=rr1 inDegree=1 outDegree=0>
  <node id=ConrR  label='Conrad'
        value=cr1 inDegree=1 outDegree=0>
  <node id=KateR  label='Kate'
        value=kr2 inDegree=1 outDegree=0>
  <node id=JohnR  label='John'
        value=jr1 inDegree=1 outDegree=0>
  <arc label='Mo' from=KathR to=KAR>
  <arc label='Fa' from=AmbeR to=KAR>
  <arc label='So' from=KAR to=BertR>
  <arc label='So' from=KAR to=FranR>
  <arc label='Da' from=KAR to=RachR>
  <arc label='Mo' from=PeteR to=PBR>
  <arc label='Fa' from=BertR to=PBR>
  <arc label='So' from=PBR to=ConrR>
  <arc label='Mo' from=DoraR to=DBR>
  <arc label='Fa' from=BertR to=DBR>
  <arc label='Da' from=DBR to=KateR>
  <arc label='So' from=DBR to=JohnR>
  </graph>

21.1.3 Historical Interpretation

For our final example, we represent graphically the relationships among various geographic areas mentioned in a seventeenth-century Scottish document. The document itself is a `sasine', which records a grant of land from the earl of Argyll to one Donald McNeill, and reads in part as follows (abbreviations have been expanded silently, and ``[...]'' marks illegible passages): ``

Item instrument of Sasine given the said Hector Mcneil confirmed and dated 28 May 1632 [...] at Edinburgh upon the 15 June 1632

Item ane charter granted by Archibald late earl of Argyle and Donald McNeill of Gallachalzie wh makes mention that ... the said late Earl yields and grants to the said Donald MacNeill ...

All and hail the two merk land of old extent of Gallachalzie with the pertinents by and in the lordship of Knapdale within the sherrifdome of Argyll

[description of other lands granted follows ...]

This Charter is dated at Inverary the 15th May 1669

In this example, we are concerned with the land and pertinents (i.e. accompanying sources of revenue) described as ``the two merk land of old extent of Gallachalzie with the pertinents by and in the lordship of Knapdale within the sherrifdom of Argyll''.

The passage concerns the following pieces of land:

the Earl of Argyll's land (i.e. the lands granted by this clause of the sasine)
two mark of land in Gallachalzie
the pertinents for this land
the Lordship of Knapdale
the sherrifdom of Argyll

We will represent these geographic entities as nodes in a graph. Arcs in the graph will represent the following relationships among them:

containment (INCLUDE)
location within (IN)
contiguity (BY)
constituency (PART OF)

Note that these relationships are logically related: ``include'' and ``in'', for example, are inverses of each other: the Earl of Argyll's land includes the parcel in Gallachalzie, and the parcel is therefore in the Earl of Argyll's land. Given an explicit set of inference rules, an appropriate application could use the graph we are constructing to infer the logical consequences of the relationships we identify.

Let us assume that feature-structure analyses are available which describe Gallachalzie, Knapdale, and Argyll. We will link to those feature structures using the value attribute on the nodes representing those places. However, there may be some uncertainty as to which noun phrase is modified by the phrase ``within the sheriffdome of Argyll'': perhaps the entire lands (land and pertinents) are in Argyll, perhaps just the pertinents are, or perhaps only Knapdale is (together with the portion of the pertinents which is in Knapdale). We will represent all three of these interpretations in the graph; they are, however, mutually exclusive, which we represent using the excl attribute defined in chapter 15 . [ see note 115 ]

We represent the graph and its encoding as follows, where the dotted lines in the graph indicate the mutually exclusive arcs; in the encoding, we use the exclude attribute to indicate those arcs.

              Earl of Argyll's land
            (land described in sasine)
                      | | |
      ,-<-INCLUDE--<--' + '->-INCLUDE---> Pertinents
      |                 :                  :  |   |
 Gallachalzie          IN  ...<..IN..<.....:  |   |
                        :  :                  |   |
                        :  :             INCLUDE INCLUDE
                        :  :              |           |
                        :  :   (part of pertinents)  (part)
                        :  :              |           |
                        :  :              BY         PART OF
                        :  :              |           |
                        :  :          Lordship of Knapdale
                        :  :                   :
                        :  :  ...<.IN..<.......:
                        :  :  :
                Sherrifdom of Argyll

The graph formalizes the following relationships:

the Earl of Argyll's land `includes' (the parcel of land in) Gallachalzie
the Earl of Argyll's land `includes' the pertinents of that parcel
the pertinents are (in part) `by' the Lordship of Knapdale
the pertinents are (in part) `part of' the Lordship of Knapdale
the Earl of Argyll's land, or the pertinents, or the Lordship of Knapdale, is `in' the Sherrifdom of Argyll

We encode the graph thus:

<graph type=directed order=7 size=9>
  <node id=earl label="Earl of Argyll's land">
  <node id=gall label="Gallachalzie" value=gallfs>
  <node id=pert label="Pertinents">
  <node id=per1 label="Pertinents part">
  <node id=per2 label="Pertinents part">
  <node id=knap label="Lordship of Knapdale" value=knapfs>
  <node id=argy label="Sherrifdome of Argyll" value=argyfs>
  <arc  id=earlgall label="INCLUDE" from=earl to=gall>
  <arc  id=earlargy label="IN" from=earl to=argy
        exclude="pertargy knapargy">
  <arc  id=earlpert label="INCLUDE" from=earl to=pert>
  <arc  id=pertper1 label="INCLUDE" from=pert to=per1>
  <arc  id=pertper2 label="INCLUDE" from=pert to=per2>
  <arc  id=pertargy label="IN" from=pert to=argy
        exclude="earlargy knapargy">
  <arc  id=per1knap label="BY" from=per1 to=knap>
  <arc  id=per2knap label="PART OF" from=per2 to=knap>
  <arc  id=knapargy label="IN" from=knap to=argy
        exclude="earlargy pertargy">
  </graph>

21.2 Trees

A tree is a connected acyclic graph. That is, it is possible in a tree graph to follow a path from any vertex to any other vertex, but there are no paths that lead from any vertex to itself. A rooted tree is a directed graph based on a tree; that is, the arcs in the graph correspond to the arcs of a tree such that there is exactly one node, called the root, for which there is a path from that node to all other nodes in the graph. For our purposes, we may ignore all trees except for rooted trees, and hence we shall use the <tree> element for rooted trees, and the <root> element for its root. The nodes adjacent to a given node are called its children, and the node adjacent from a given node is called its parent. Nodes with both a parent and children are called internal nodes, for which we use the <iNode> element. A node with no children is tagged as a <leaf> . If the children of a node are ordered from left to right, then we say that that node is ordered. If all the nodes of a tree are ordered, then we say that the tree is an ordered tree. If some of the nodes of a tree are ordered and others are not, then the tree is a partially ordered tree. The ordering of nodes and trees may be specified by an attribute; we take the default ordering for trees to be ordered, that roots inherit their ordering from the trees in which they occur, and internal nodes inherit their ordering from their parents. Finally, we permit a node to be specified as following other nodes, which (when its parent is ordered) it would be assumed to precede, giving rise to crossing arcs. The elements used for the encoding of trees have the following descriptions and attributes.

<tree> encodes a tree, which is made up of a root, internal nodes, leaves, and arcs from root to leaves. Attributes include:
- arity gives the maximum number of children of the root and internal nodes of the tree.
- ord indicates whether or not the tree is ordered, or if it is partially ordered. Legal values are:
  - Y indicates that all of the branching nodes of the tree are ordered.
  - N indicates that all of the branching nodes of the tree are unordered.
  - partial indicates that some of the branching nodes of the tree are ordered and some are unordered.
- order gives the order of the tree, i.e., the number of its nodes.
<root> represents the root node of a tree. Attributes include:
- label gives a label for a root node.
- value provides the value of the root, which is a feature structure or other analytic element.
- children provides a list of IDs of the elements which are the children of the root node.
- ord indicates whether or not the root is ordered. Legal values are:
  - Y indicates that the children of the root are ordered.
  - N indicates that the children of the root are unordered.
- outDegree gives the out degree of the root, the number of its children.
<iNode> represents an intermediate (or internal) node of a tree. Attributes include:
- label gives a label for an intermediate node.
- value provides the value of an intermediate node, which is a feature structure or other analytic element.
- children provides a list of IDs of the elements which are the children of the intermediate node.
- parent provides the ID of the element which is the parent of this node.
- ord indicates whether or not the internal node is ordered. Legal values are:
  - Y indicates that the children of the intermediate node are ordered.
  - N indicates that the children of the intermediate node are unordered.
- follow provides an ID of the element which this node follows.
- outDegree gives the out degree of an intermediate node, the number of its children.
<leaf> encodes the leaves (terminal nodes) of a tree. Attributes include:
- label gives a label for a leaf.
- value provides the value of a leaf, which is a feature structure or other analytic element.
- parent provides the ID of parent of a leaf.
- follow provides an ID of an element which this leaf follows.

Here is an example of a tree. It represents the order in which the operators of addition (symbolized by + ), exponentiation (symbolized by ** ) and division (symbolized by / ) are applied in evaluating the arithmetic formula ((a**2)+(b**2))/((a+b)**2) . In drawing the graph, the root is placed on the far right, and directionality is presumed to be to the left.

a--,
   |-**-,
2--'    |
        |-+--,
b--,    |    |
   |-**-'    |
2--'         |
             |-/
a--,         |
   |-+--,    |
b--'    |    |
        |-**-'
        |
2-------'

<tree n='ex1' arity=2 order=12>
   <root  label='/'  id=div1               children='plu1 exp1'>
   <iNode label='+'  id=plu1   parent=div1 children='exp2 exp3'>
   <iNode label='**' id=exp1   parent=div1 children='plu2 num2.3'>
   <iNode label='**' id=exp2   parent=plu1 children='vara1 num2.1'>
   <iNode label='**' id=exp3   parent=plu1 children='varb1 num2.2'>
   <iNode label='+'  id=plu2   parent=exp1 children='vara2 varb2'>
   <leaf  label='a'  id=vara1  parent=exp2>
   <leaf  label='2'  id=num2.1 parent=exp2>
   <leaf  label='b'  id=varb1  parent=exp3>
   <leaf  label='2'  id=num2.2 parent=exp3>
   <leaf  label='a'  id=vara2  parent=plu2>
   <leaf  label='b'  id=varb2  parent=plu2>
   <leaf  label='2'  id=num2.3 parent=exp1>
   </tree>

In this encoding, the arity attribute represents the arity of the tree, which is the greatest value of the outDegree attribute for any of the nodes in the tree. If, as in this case, arity=2 , we say that the tree is a binary tree.

Since the left-to-right (or top-to-bottom!) order of the children of the two + nodes does not affect the arithmetic result in this case, we could represent in this tree all of the arithmetically equivalent formulas involving its leaves, by specifying the attribute ord=N on those two <iNode> elements, the attribute ord=Y on the <root> and other <iNode> elements, and the attribute ord=partial on the <tree> element, as follows.

<tree n='ex2' ord=partial arity=2 order=13>
 <root  label='/'  id=div1   ord=Y             children='plu1 exp1'>
 <iNode label='+'  id=plu1   ord=N parent=div1 children='exp2 exp3'>
 <iNode label='**' id=exp1   ord=Y parent=div1 children='plu2 num2.3'>
 <iNode label='**' id=exp2   ord=Y parent=plu1 children='vara1 num2.1'>
 <iNode label='**' id=exp3   ord=Y parent=plu1 children='varb1 num2.2'>
 <iNode label='+'  id=plu2   ord=N parent=exp1 children='vara2 varb2'>
 <leaf  label='a'  id=vara1        parent=exp2>
 <leaf  label='2'  id=num2.1       parent=exp2>
 <leaf  label='b'  id=varb1        parent=exp3>
 <leaf  label='2'  id=num2.2       parent=exp3>
 <leaf  label='a'  id=vara2        parent=plu2>
 <leaf  label='b'  id=varb2        parent=plu2>
 <leaf  label='2'  id=num2.3       parent=exp1>
 </tree>

This encoding represents all of the following:

((a**2)+(b**2))/((a+b)**2)
((b**2)+(a**2))/((a+b)**2)
((a**2)+(b**2))/((b+a)**2)
((b**2)+(a**2))/((a+b)**2)

Linguistic phrase structure is very commonly represented by trees. Here is an example of phrase structure represented by an ordered tree with its root at the top, and a possible encoding.

      ,---PP---,
      |        |
      P    ,--NP--,
      |    |      |
     with Art     N
           |      |
          the periscope

<tree n='ex3' arity=2 order=8>
   <root  id=PP1               children='P1 NP1'     label='PP'>
   <iNode id=P1    parent=PP1  children=with1        label='P'>
   <leaf  id=with1 parent=P1                         label='with'>
   <iNode id=NP1   parent=PP1  children='the1 peri1' label='NP'>
   <iNode id=Art1  parent=NP1  children=the1         label='Art'>
   <leaf  id=the1  parent=Art1                       label='the'>
   <iNode id=N1    parent=NP1  children=peri1        label='N'>
   <leaf  id=peri1 parent=N1                         label='periscope'>
   </tree>

Finally, here is an example of an ordered tree, in which a particular node which ordinarily would precede another is specified as following it. In the drawing, the xxx symbol indicates that the arc from VB to PT crosses the arc from VP to PN.

       ,--VP--,
       |      |
   ,--VB-----xxx--,
   |          |   |
   VB         PN  PT
   |          |   |
  look       them up

<tree n='ex4' arity=2 order=8>
   <leaf  label='look' id=look1 parent=VB2>
   <leaf  label='them' id=them1 parent=PN1>
   <leaf  label='up'   id=up1   parent=PT1>
   <iNode label='VB'   id=VB2   parent=VB1 children=look1>
   <iNode label='PN'   id=PN1   parent=VP1 children=them1 >
   <iNode label='PT'   id=PT1   parent=VB1 children=up1 follow=PN1>
   <iNode label='VB'   id=VB1   parent=VP1 children='VB2 PT1'>
   <root  label='VP'   id=VP1              children='VB1 PN1'>
   </tree>

The formal declarations of the <tree> , <root> , <iNode> and <leaf> elements are as follows.

<!-- 21.2:  Trees (basic method)                              -->
<!ELEMENT tree          - -  ((leaf | iNode)*, root, (leaf | 
                             iNode)*)                           >
<!ATTLIST tree               %a.global;
          label              CDATA               #IMPLIED
          order              NUMBER              #IMPLIED
          ord                (Y | N | partial)   Y
          arity              NUMBER              #IMPLIED       >
<!ELEMENT root          - O  EMPTY                              >
<!ATTLIST root               %a.global;
          label              CDATA               #IMPLIED
          value              IDREF               #IMPLIED
          children           IDREFS              #IMPLIED
          outDegree          NUMBER              #IMPLIED
          ord                (Y | N)             #IMPLIED       >
<!ELEMENT iNode         - O  EMPTY                              >
<!ATTLIST iNode              %a.global;
          parent             IDREF               #IMPLIED
          label              CDATA               #IMPLIED
          follow             IDREF               #IMPLIED
          value              IDREF               #IMPLIED
          children           IDREFS              #REQUIRED
          outDegree          NUMBER              #IMPLIED
          ord                (Y | N)             #IMPLIED       >
<!ELEMENT leaf          - O  EMPTY                              >
<!ATTLIST leaf               %a.global;
          parent             IDREF               #IMPLIED
          label              CDATA               #IMPLIED
          follow             IDREF               #IMPLIED
          value              IDREF               #IMPLIED       >
<!-- This fragment is used in sec. 21                         -->

21.3 Another Tree Notation

In this section, we present an alternative to the method of representing the structure of ordered rooted trees that is given in section 21.2 , which is based on the observation that any node of such a tree can be thought of as the root of the subtree that it dominates. Thus subtrees can be thought of as the same type as the trees they are embedded in, hence the designation <eTree> , for embedding tree. Whereas in a <tree> , the relationship among the parts is indicated by the children attribute, and by the names of the elements <root> , <iNode> and <leaf> , the relationship among the parts of an <eTree> is indicated simply by the arrangement of their content. However, we have chosen to enable encoders to distinguish the terminal elements of an <eTree> by means of the empty <eLeaf> element, though its use is not required; the <eTree> element can also be used to identify the terminal nodes of <eTree> elements. We also provide a <triangle> element, which can be thought of as an underspecified <eTree> , that is an <eTree> in which certain information has been left out. In addition, we provide a <forest> element, which consists of one or more <tree> , <eTree> or <triangle> elements, and a <forestGrp> element, which consists of one or more <forest> elements. The elements used for the encoding of embedding trees and the units containing them have the following descriptions and attributes.

<eTree> provides an alternative to <tree> element for representing ordered rooted tree structures. Attributes include:
- label gives a label for an embedding tree.
- value provides the value of an embedding tree, which is a feature structure or other analytic element.
<triangle> provides for an underspecified <eTree> , that is, an <eTree> with information left out. Attributes include:
- label gives a label for an underspecified embedding tree.
- value provides the value of a triangle, which is the SGML identifier of a feature structure or other analytic element.
<eLeaf> provides explicitly for a leaf of an embedding tree, which may also be encoded with the <eTree> element. Attributes include:
- label gives a label for a leaf of an embedding tree.
- value provides the value of an embedding leaf, which is a feature structure or other analytic element.
<forest> provides for groups of rooted trees. Attributes include:
- type identifies the type of the forest.
<forestGrp> provides for groups of forests. Attributes include:
- type identifies the type of the forest group.

Like the <root> , <iNode> and <leaf> of a <tree> , the <eTree> , <triangle> and <eLeaf> elements may also have label and value attributes.

To illustrate the use of the <eTree> and <eLeaf> elements, here is an encoding of the second example in section 21.2 , repeated here for convenience.

      ,---PP---,
      |        |
      P    ,--NP--,
      |    |      |
     with Art     N
           |      |
          the periscope

<eTree n='ex1' label='PP'>
   <eTree label='P'><eLeaf label='with'></eTree>
   <eTree label='NP'>
      <eTree label='Art'><eLeaf label='the'></eTree>
      <eTree label='N'><eLeaf label='periscope'></eTree>
      </eTree>
   </eTree>

Next, we provide an encoding, using the <triangle> element, in which the internal structure of the <eTree> labeled NP is omitted.

      ,---PP----,
      |         |
      P        NP
      |       /  \
     with    /    \
            /______\
          the periscope

<eTree n='ex2' label='PP'>
   <eTree label='P'><eLeaf label='with'></eTree>
   <triangle label='NP'><eLeaf label='the periscope'></triangle>
   </eTree>

Ambiguity involving alternative tree structures associated with the same terminal sequence can be encoded relatively conveniently using a combination of the exclude and copyOf attributes described in sections 14.8 and 14.6 . In the simplest case, an <eTree> may be part of the content of exactly one of two different <eTree> elements. To mark it up, the embedded <eTree> may be fully specified within one of the embedding <eTree> elements to which it may belong, and a virtual copy, specified by the copyOf attribute, may appear on the other. In addition, each of the embedded elements in question is specified as excluding the other, using the exclude attribute. To illustrate, consider the English phrase `see the vessel with the periscope', which may be considered to be structurally ambiguous, depending on whether the phrase `with the periscope' is a modifier of the phrase `the vessel' or a modifier of the phrase `see the vessel'. This ambiguity is indicated in the sketch of the ambiguous tree by means of the dotted-line arcs. The markup using the copyOf and exclude attributes follows the sketch.

      ,--------VP....................
      |        |                    :
      V   ,----NP.......         ,--PP--,
      |   |    |       :         |      |
     see Art   N    ,--PP--,     P    ,-NP-,
          |    |    |      |     |    |    |
         the vessel P    ,-NP-, with Art   N
                    |    |    |       |    |
                   with Art   N      the periscope
                         |    |
                        the periscope

<eTree n='ex3' label='VP'>
   <eTree label='V'><eLeaf label='see'></eTree>
   <eTree label='NP'>
      <eTree label='Art'><eLeaf label='the'></eTree>
      <eTree label='N'><eLeaf label='vessel'></eTree>
      <eTree id=ppa exclude=ppb label='PP'>
         <eTree label='P'><eLeaf label='with'></eTree>
         <eTree label='NP'>
            <eTree label='Art'><eLeaf label='the'></eTree>
            <eTree label='N'><eLeaf label='periscope'></eTree>
            </eTree>
         </eTree>
      </eTree>
   <eTree id=ppb copyOf=ppa exclude=ppa label='PP'></eTree>
   </eTree>

To indicate that one of the alternatives is selected, one may specify the select attribute on the highest <eTree> as either select=ppa or select=ppb ; see section 14.8 .

Depending on the grammar one uses to associate structures with examples like `see the man with the periscope', the representations may be more complicated than this. For example, adopting a version of the X-bar theory of phrase structure originated by Jackendoff [ see note 116 ] , the attachment of a modifier may require the creation of an intermediate node which is not required when the attachment is not made, as shown in the following diagram. A possible encoding of this ambiguous structure immediately follows the diagram.

    ..............VP......................
    :                                    :
 ,--V'--,                          ,-----V'------,
 |      |                          |             |
 V    ,-NP--,                   ,--V'--,      ,--PP--,
 |    |     |                   |      |      |      |
see  Spec ,-N'--,               V    ,-NP-,   P    ,-NP-,
      |   |     |               |    |    |   |    |    |
     the  N'  ,-PP---,         see  Spec  N' with Spec  N'
          |   |      |               |    |        |    |
          N   P    ,-NP-,           the   N       the   N
          |   |    |    |                 |             |
      vessel with Spec  N'             vessel        periscope
                   |    |
                  the   N
                        |
                     periscope

<eTree n='ex4' label="VP">
   <eTree id=vbara exclude=vbarb label="V'">
      <eTree id=va label="V"><eLeaf label="see"></eTree>
      <eTree label="NP">
         <eTree id=spec1a label="Spec"><eLeaf label="the"></eTree>
         <eTree label="N'">
            <eTree id=nbar2a label="N'">
               <eTree label="N"><eLeaf label="vessel"></eTree>
               </eTree>
            <eTree id=ppa label="PP">
               <eTree label="P"><eLeaf label="with"></eTree>
               <eTree label="NP">
                  <eTree label="Spec"><eLeaf label="the"></eTree>
                  <eTree label="N'">
                     <eTree label="N"><eLeaf label="periscope"></eTree>
                     </eTree>
                  </eTree>
               </eTree>
            </eTree>
         </eTree>
      </eTree>
   <eTree id=vbara exclude=vbarb label="V'">
      <eTree label="V'">
         <eTree id=vb copyOf=va label="V"></eTree>
         <eTree label="NP">
            <eTree id=spec1b copyOf=spec1a label="Spec"></eTree>
            <eTree id=nbar2b copyOf=nbar2a label="N'"></eTree>
            </eTree>
         </eTree>
      <eTree id=ppb copyOf=ppa label="PP"></eTree>
      </eTree>
   </eTree>

A derivation in a generative grammar is often thought of as a set of trees. To encode such a derivation, one may use the <forest> element, in which the trees may be marked up using the <tree> , the <eTree> or the <triangle> element. The type attribute may be used to specify what kind of derivation it is. Here is an example of a two-tree forest, involving application of the `wh-movement' transformation in the derivation of `what you do' (as in `this is what you do') from the underlying `you do what' [ see note 117 ] .

     ,--S'--,                 ,--S'--,
     |      |                 |      |
    COMP ,--S--,             COMP ,--S--,
     |   |     |              |   |     |
     e   NP  ,-VP-,           NP  NP  ,-VP-,
         |   |    |           |   |   |    |
        you  V    NP        what you  V    NP
             |    |                   |    |
            do   what                do    t

<forest n='ex5' type='syntactic derivation'>
   <eTree n='Stage 1' id=s1sbar label="S'">
      <eTree id=s1comp label="COMP"><eleaf id=s1e label="e"></eTree>
      <eTree id=s1s label="S">
         <eTree id=s1np1 label="NP"><eLeaf label="you"></eTree>
         <eTree id=s1vp label="VP">
            <eTree id=s1v label="V"><eLeaf label="do"></eTree>
            <eTree id=s1np2 label="NP">
               <eLeaf id=s1wh label="what">
               </eTree>
            </eTree>
         </eTree>
      </eTree>
   <eTree n='Stage 2' id=s2sbar corresp=s1sbar label="S'">
      <eTree id=s2comp corresp=s1comp label="COMP">
         <eTree copyOf=s1np2 corresp=s1e label="NP"></eTree>
         </eTree>
      <eTree id=s2s corresp=s1s label="S">
         <eTree id=s2np1 copyOf=s1np1 label="NP"></eTree>
         <eTree id=s2vp corresp=s1vp label="VP">
            <eTree id=s2v copyOf=s1v label="V"></eTree>
            <eTree id=s2np2 corresp=s1np2 label="NP">
              <eLeaf corresp=s1wh label="t">
              </eTree>
            </eTree>
         </eTree>
      </eTree>
   </forest>

In this markup, we have used copyOf attributes to provide virtual copies of elements in the tree representing the second stage of the derivation that also occur in the first stage, and the corresp attribute (see section 14.4 ) to link those elements in the second stage with corresponding elements in the first stage that are not copies of them.

If a group of forests (e.g. a full grammatical derivation including syntactic, semantic and phonological subderivations) is to be articulated, the grouping element <forestGrp> may be used.

The formal declarations of the <eTree> , <triangle> , <eLeaf> , <forest> and <forestGrp> elements are as follows.

<!-- 21.3:  Trees (alternate method)                          -->
<!ELEMENT eTree         - -  ((eTree | triangle | eLeaf )*)     >
<!ATTLIST eTree              %a.global;
          label              CDATA               #IMPLIED
          value              IDREF               #IMPLIED       >
<!ELEMENT triangle      - -  ((eTree | triangle | eLeaf)*)      >
<!ATTLIST triangle           %a.global;
          label              CDATA               #IMPLIED
          value              IDREF               #IMPLIED       >
<!ELEMENT eLeaf         - O  EMPTY                              >
<!ATTLIST eLeaf              %a.global;
          label              CDATA               #IMPLIED
          value              IDREF               #IMPLIED       >
<!ELEMENT forest        - -  ((tree | eTree | triangle)+)       >
<!ATTLIST forest             %a.global;
          type               CDATA               #IMPLIED       >
<!ELEMENT forestGrp     - -  ((forest)+)                        >
<!ATTLIST forestGrp          %a.global;
          type               CDATA               #IMPLIED       >
<!-- This fragment is used in sec. 21                         -->