Graph Description in the TEI


TEI SO W 07 Graph Description in the TEI

This is a look at how RDF(S) might be used to describe the graphs given as examples in the TEI guidelines. (To keep the work to a minimum I've assumed that if a solution can describe a graph it can also describe a tree, and so have not analysed the tree examples in detail) My general conclusion is that almost all the TEI guideline examples can be described completely and accurately using RDF(S).

Essentially RDF is a language to describe graphs. RDFS is RDF's vocabulary description language, and is an extension of RDF. It provides mechanisms for describing groups of related resources and the relationships between these resources. Together RDF and RDFs provide ways of describing metadata. RDF is a W3C reccomendation, and RDFS is in last call to become one. A good place to start learning about RDF is the W3C's RDF Primer; another is Robin Cover's RDF page. Each of these pages has pointers to the actual specifications.

At first sight the TEI encoding has some apparent advantages (the ability to specify size and degree may make it easier to automate the drawing out of a graph). However the downside of this is that the TEI encoding can only describe closed networks — and of course the TEI mechanism will be unable to use the growing body of software designed to work with RDF(S). There is a convention for writing out RDF as XML called the Striped Syntax which I've used in writing out the examples below. Earlier versions of the striped syntax could not describe all known graphs, though this shortcoming appears to be fixed in the current version.

In the previous version of this document, I suggested it might also be worth considering OWL — the new Web Ontology Language that is the successor to DAML+OIL for this work. Having thought some more about it, I think this is not necessary since we are really only dealing with instance data here — not class data.

This really limits the options for consideration to the following:
  • Recommend the use of RDF(S) serialized as XML
  • Recommend the use of RDF(S) in some other format
  • And of course, recommend that the TEI stick with the current solution

The second option seems unlikely to appeal — given that the TEI is already an extension of XML. Given the increasing use of RDF for describing graphs in the web domain, I suggest the first option be given serious consideration.

TEI RDF
Node encodes a node, a possibly labeled point in a graph.label gives a label for a node. Rdf:Description/@about
Label gives a second label for a node. Rdf:label
Value provides the value of a node, which is a feature structure or other analytic element. URI
type provides a type for a node. Rdfs:type
Adjfrom gives the identifiers of the nodes which are adjacent from the current node. Will be the object of an RDF triple
Adjto gives the identifiers of the nodes which are adjacent to the current node. Will be the subject of an RDF triple
Adj gives the identifiers of the nodes which are both adjacent to and adjacent from the current node. The same result can be achieved using daml:inverseOf
Indegree gives the in degree of the node, the number of nodes which are adjacent from the given node.
Outdegree gives the out degree of the node, the number of nodes which are adjacent to the given node.
Degree gives the degree of the node, the number of arcs with which the node is incident.
Arc encodes an arc, the connection from one node to another in a graph.label gives a label for an arc. Rdf:property
Label gives a second label for an arc. Rdf:Property
From gives the identifier of the node which is adjacent from this arc. The subject of an RDF triple
to gives the identifier of the node which is adjacent to this arc. The object of an RDF triple
Hopefully this will be clearer if we encode some of the examples from the tei guidelines in RDF.
Here is the example from P4:2002-03, page 520:
<graph type='directed'
       id='RDG1'
       label='Selected Airline Routes in Southwestern USA'
       order='5'
       size='5'>
   <node label='LAX' id='LAX' inDegree='1' outDegree='1'/>
   <node label='LVG' id='LVG' inDegree='1' outDegree='1'/>
   <node label='PHX' id='PHX' inDegree='2' outDegree='2'/>
   <node label='TUS' id='TUS' inDegree='1' outDegree='1'/>
   <node label='CIB' id='CIB' inDegree='0' outDegree='0'/>
   <arc  from='LAX' to='LVG'/>
   <arc  from='LVG' to='PHX'/>
   <arc  from='PHX' to='LAX'/>
   <arc  from='PHX' to='TUS'/>
   <arc  from='TUS' to='PHX'/>
   </graph>
In RDF this can be written as follows — nodes are red, arcs are black — for clarity only.
<?xml version="1.0"?>
<head>Selected Airline Routes in Southwestern USA</head> 
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
	 xmlns:tei="http://www.tei-c.org/P4X/GD.html">
  <rdf:Description rdf:about="tei:#LAX">
    <tei:route>
      <rdf:Description rdf:about="tei:#LVG">
        <tei:route>
          <rdf:Description rdf:about="tei:#PHX">
            <tei:route>
              <rdf:Description rdf:about="tei:#TUS"/>
            </tei:route> 
            <tei:route>
              <rdf:Description rdf:about="tei:#LAX"/>
            </tei:route>
          </rdf:Description>
        </tei:route>
      </rdf:Description>
    </tei:route>
  </rdf:Description>
  <rdf:Description rdf:about="tei:#CIB"/>
</rdf:RDF>
In RDF we have to give the arc some sort of label — for this example I've arbitrarily chosen 'route' since this is what the arcs in the graph represent. In a later example, the routes are labeled with flight numbers, and the rdf representation becomes something like:
<?xml version="1.0"?>
<head>Selected Airline Routes in Southwestern USA </head>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
         xmlns:tei="http://www.tei-c.org/P4X/GD.html">
  <rdf:Description rdf:about="tei:#LAX">
    <tei:SW117>
      <rdf:Description rdf:about="tei:#LVG">
        <tei:SW711>
          <rdf:Description rdf:about="tei:#PHX">
            <tei:AA225>
              <rdf:Description rdf:about="tei:#TUS"/>
            </tei:AA225> 
            <tei:AA229>
              <rdf:Description rdf:about="tei:#LAX"/>
            </tei:AA229>
          </rdf:Description>
        </tei:SW711>
      </rdf:Description>
    </tei:SW117>
  </rdf:Description>
  <rdf:Description rdf:about="tei:#CIB"/>
</rdf:RDF>
The more complex transition node example simply involves labeling the nodes with types, eg
  <node id='T0' label='0' indegree='0' outdegree='3' type='initial'/> 
becomes
  <rdf:Description about='T0'> 
    <rdf:type = '&iacute;nitial'/> 
    <rdf:label = '0'/>
  </rdf:Description>
There follow some more complex examples — all of which can be represented in RDF with a little thought. So for example the transducer graph has two labels on each arc
  <arc from='t2' to='t1' label='OLD' label2='VIEIL'/>
Which in RDF can be written out as
  <rdf:Description about="t1">
    <tei:from>
      <label xml:lang="fr">VIEIL</label>
      <label xml:lang="eng">OLD</label>
      <rdf:Description about="t2"/>
     </tei:from>
  </rdf:Description>

Last recorded change to this page: 2007-09-16  •  For corrections or updates, contact webmaster AT tei-c DOT org