<?xml version="1.0"?>
<!DOCTYPE TEI.2 SYSTEM "tei-oucs.dtd"
[<!ENTITY eacute	"&#x00E9;"> <!-- LATIN SMALL LETTER E WITH ACUTE -->
<!ENTITY Eacute	"&#x00C9;"> <!-- LATIN CAPITAL LETTER E WITH ACUTE -->
<!ENTITY ccedil	"&#x00E7;"> <!-- LATIN SMALL LETTER C WITH CEDILLA -->
]>
<TEI.2>
  <teiHeader>
    <fileDesc>
      <titleStmt>
        <title>XML and TEI in Practice</title>
      </titleStmt>
      <publicationStmt>
        <p> </p>
      </publicationStmt>
      <sourceDesc>
        <p></p>
      </sourceDesc>
    </fileDesc>
    <revisionDesc>
      <list type="unordered">
        <item><date>July  2001</date>first draft  for HCU summer seminar</item>
      </list>
    </revisionDesc>
  </teiHeader>
  <text>
<front>
<titlePage>
<docTitle>
 <titlePart type="main">XML and TEI in Practice</titlePart>
</docTitle>
<docAuthor>Lou Burnard</docAuthor>
<docDate>July  2001</docDate>
</titlePage>
</front>
<body>
<div rend="slide">
<head>What are people doing with XML and the TEI?</head>
<p>
<list type="ordered">
<item>Text collections and digital libraries</item>
<item>Digital archives of primary source materials</item>
<item>Critical (and uncritical) editions</item>
<item>Language analysis and representation</item>
</list>
</p>
</div>

<div rend="slide">
<head>Favourite kinds of material</head>
<list type="unordered">
<item>The stuff...
<list type="bulleted">
<item>Transcripts, varyingly encoded, of original source  documents</item>
<item>Page images</item>
<item>Associated metadata, sometimes in a database</item>
</list>
     </item>
<item>
... and its organization.
<list type="bulleted"><item>by author (the collected works of...)</item>
<item>by topic (readings in ...)</item>
<item>by association (an archive of ...)</item>
<item>by text (a digital edition/archive of ..)</item>
      </list></item>
    </list>
   </div>

<div rend="slide">
<head>Favourite storage and access methods</head>
<p>Storage:
<list type="bulleted">
<item>Many separate documents or fragments</item>
<item>Virtual documents from a specialised repository</item>
      </list>
</p>
<p>Access:
<list type="unordered">
<item>primarily: web readability</item>
<item>often: finding aids using sophisticated metadata</item>
<item>occasionally: text analytic methods</item>
</list></p>
<p>There seems to be scope for R &amp; D here...</p>
   </div>

<div rend="slide">
<head>Favourite delivery methods </head>
<list type="bulleted">
<item>direct delivery of XML is still rare, but increasing</item>
<item>specialised XML delivery tools (e.g. Dynaweb) are still widely used</item>
<item>hand-crafted text retrieval tools are not uncommon</item>
<item>on-the-fly conversion to HTML is not uncommon</item>
<item>one-off conversion to HTML is frequent</item>
<item>in some places the XML may even be inaccessible!</item>
<item>dumbing-down software (e.g. to eBooks) may assume more importance</item>
      </list>
   </div>


<div rend="slide">
<head>Digital archive  examples</head>
<p>
<list>
<item><xref url="http://www.law.uc.edu/CETL/">Center for Electronic Text in
      the Law (University of Cincinnati, Law faculty)</xref></item>
<item><xref url="http://www.euromusicology.org/">Thesaurus musicarum
       italicarum (Leiden University, Informatics)</xref>
</item>
<item><xref url="http://www.indiana.edu/~letrs/vwwp/index.html">Victorian
       Women Writers Project (Indiana University Library)</xref></item>
<item><xref url="http://www.bodley.ox.ac.uk/toyota">Toyota City Imaging project (Bodleian Library, Oxford)</xref>
     </item>
    </list>
</p>
   </div>
<div rend="slide">
<head>Digital edition  examples</head>
<list type="unordered">
<item><xref
       url="http://jefferson.village.virginia.edu/piers/archive.goals.html">Piers
       Plowman Archive (IATH, University of Virginia)</xref>
</item>
<item><xref
       url="http://www.mshs.univ-poitiers.fr/cescm/lancelot/index.html">La
       Charette project (Princeton, Poitiers)</xref>
</item>
<item><xref url="http://ibsentexts.hit.uib.no/">Henrik Ibsen project
       (Oslo, Trondheim, Bergen)</xref></item>
    </list>
   </div>

<div rend="slide">
<head>Language corpus  examples</head>
<list type="unordered">
<item><xref url="http://nl.ijs.si/ME">Multext East (Slovenian Academy
       et al)</xref>
</item>
<item><xref url="http://info.ox.ac.uk/bnc">British National Corpus</xref>
</item>
<item><xref url="http://www.loria.fr/Projet/Silfide">Silfide (Serveur
       Interactif pour la Langue Fran&ccedil;aise, son
       Identit&eacute;, sa Diffusion, son &Eacute;tude)</xref></item>
    </list>
<p rend="display">etc... see <xref url="http://www.tei-c.org/Applications">the TEI
       Applications pages</xref></p>
   </div>

<div rend="slide">
<head>FAQs</head>
<list type="unordered">
<item>is this text or is it data?</item>
<item>which parts of this should be preserved and how?</item>
<item>IPR: who owns this?</item>
<item>accessibility: who will use this and for what?</item>
<item>accountability: are we doing the Right Thing?</item>
<item>etc.</item>
</list>
<p>... all of these have an effect on the technical solutions chosen.</p>
   </div>

<div rend="slide">
<head>Text and data</head>
<p>We love oppositions!
<list type="unordered">
<item>structured  vs unstructured</item>
<item>metadata vs content</item>
<item>interpretion vs transcription</item>
    </list>
    </p>
<p>But XML/TEI facilitates convergence   
<list type="bulleted">
      <item>text can be treated as data</item>
      <item>data can be treated as text</item>
      <item>all kinds of digital oject can be integrated</item>
     </list></p></div>

<div rend="slide">
<head>Techniques for convergence</head>
<list type="unordered">
<item>resource <hi>management</hi> (whether centralised or distributed) is crucial</item>
<item>establish project-specific Guidelines and document them</item>
<item>establish conventions for naming and identification of documents and
      document fragments</item>
<item>establish which content will be subject to <soCalled>authority
      control</soCalled> and how</item>
<item>use the right tools for the job</item>
    </list>
   </div>

<div rend="slide">
<head>Authority control</head>
<list type="unordered">
<item>Not just about establishing preferred vocabulary</item>
<item>Also a means of multiplying access points </item>
    </list>
<p><eg><![CDATA[
<person id="p123">
  <name type="preferred">Alonso the Magnificent</name>
  <name type="other">Alonso de Cabesa de Vaca</name>
  <birth><date value="15891102">St Brigita's Day, 
     1589</date><placeName>Sevilla</placeName>
  </birth>
  <occupation>Tyrant</occupation>
  <figure entity="p123pic"/>
<!-- etc etc -->
</person>
]]>
</eg></p>
<p><eg><![CDATA[
<p>.... and was owned by 
<name role="owner" key="p123">Alonso the
Magnificent</name> ... </p>
]]></eg></p>

   </div>

<div rend="slide">
<head>Digital Preservation</head>
<list type="unordered">
<item>Scholarship implies a continuity of comprehension
<list type="bulleted">
<item>it isn't enough to preserve the data</item>
<item>we must also preserve its meaning</item>
      </list></item>
<item>XML/TEI encoding makes meaning explicit and independent of 
<list type="bulleted"><item>software</item><item>hardware</item><item>usage</item>
      </list></item>
 <item>... within limits</item>
 <item>Other possible strategies include
<list type="bulleted"><item>emulation</item>
       <item>accumulation</item>
       <item>cryogenics</item>
      </list></item></list>
   </div>
<div rend="slide">
<head>Text analysis: the next frontier</head>
<list type="bulleted">
<item>Once we have made our digital surrogates, what then?</item>
<item>Traditional activities:
      <list type="gloss">
       <label>data discovery</label><item>usually searching by external criteria</item>
       <label>data analysis</label><item>usually searching by internal characteristics</item>
       <label>data synthesis</label><item>usually by associating
       shared judgments</item>
      </list>
</item>
<item>What tools will help combine these approaches?</item>
    </list>
   </div>

<div rend="slide">
<head>Three examples of TEI application software
</head>
<list>
<item>TEI web site</item>
<item>SARA: a corpus analysis tool</item>
<item>Phelix: an XML database system</item>
    </list>
   </div>
<div rend="slide">
<head>TEI web site</head>
<p>The challenge: applying the TEI scheme to management, authoring,
     and maintenance of large documentary websites
</p>
<p>Key features:
<list type="bulleted">
<item>a suitable DTD for authoring</item>
<item>tools for conversion of legacy documents</item>
<item>an effective change management system </item>
     <item>XSLT for rendering XML statically or dynamically</item>
    </list></p>
<p>See <xref url="http://www.oucs.ox.ac.uk/oucsdoc/allc.html">http://www.oucs.ox.ac.uk/oucsdoc/allc.html</xref> for full discussion; 
proofs of the pudding are at <xref
      url="http://www.oucs.ox.ac.uk">http://www.oucs.ox.ac.uk</xref>
     and indeed <xref
      url="http://www.tei-c.org">http://www.tei-c.org</xref></p>
    </div>
<div rend="slide">
<head>SARA</head>
<p>The challenge: support lexical analysis of very large amounts of
     richly encoded text</p>
<p>Key features:
<list type="bulleted">
<item>SGML aware search and retrieval of linguistic data</item>
<item>Inverted file index of tags and content</item>
<item>User-friendly windows client talking to special text retrieval engine</item>
<item>Generalised to support any TEI conformant corpus</item>
    </list></p>
<p>See <xref
      url="http://www.hcu.ox.ac.uk/SARA">http://www.hcu.ox.ac.uk/SARA</xref>
      for sample tutorials and downloads (also try Lampeter Corpus on
      your CD</p>
   </div>
<div rend="slide">
<head>Phelix</head>
<p>The challenge: support DBMS-style retrieval and management of
     richly encoded metadata fragments</p>
<p>Key features:
<list type="bulleted">
<item>Detailed set of  TEI extensions</item>
<item>XML tree is decomposed into relations representing XML
       structure, not its semantics</item>
<item>XML fragments generated and rendered using XSLT</item>
      <item>Entirely web-based interface, held together with PHP</item>
    </list></p>
<p>For proof of pudding, see <xref
      url="http://janus.oucs.ox..ac.uk/master">http://janus.oucs.ox.ac.uk/master</xref>
</p>
   </div>

</body>
</text>
</TEI.2>
