<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet type="text/css" href="viewslides.css"?>
<!DOCTYPE TEI.2 SYSTEM "teixlite.dtd"[
  <!ATTLIST xptr url CDATA #IMPLIED >
  <!ATTLIST xref url CDATA #IMPLIED >
  <!ENTITY verbar "|">
  <!-- &#124; -->
  <!ENTITY ldots "..." >
  <!-- &#x2026; -->
  <!ENTITY null "" >
  <!ENTITY mdash "---">
  <!--&#x2014; -->
  <!ATTLIST figure scale CDATA #IMPLIED height CDATA #IMPLIED file CDATA #IMPLIED >
]>
<TEI.2> 
   <teiHeader type="text" status="new"> 
      <fileDesc> 
<titleStmt>
<title>Using XML in the Real World</title>
</titleStmt> 
<publicationStmt> 
<p>Part of the HCU Seminar series</p> 
</publicationStmt> 
<sourceDesc default="NO">
<p>Lightly revised from XML-bigpic, itself cobbled together from XML Tools and XML Choices talks</p>
</sourceDesc>
      </fileDesc> 
      <revisionDesc> 
<list type="unordered">
<item>
   <date>23 Jul 01</date>LB revised again for Summer Seminars</item> 
<item>
   <date>17 Feb 01</date>LB revised again for XML Publishing</item> 
<item>
   <date>7 Jan 01</date>LB revised for Winter Seminars</item> 
<item>
   <date>13 Jul 00</date>LB drafted</item> 
</list>
</revisionDesc> 
</teiHeader> 
<text> 
<front>
<docTitle>
<titlePart type="main">Using XML in the real world</titlePart>
</docTitle>
<docAuthor>Lou Burnard and Sebastian Rahtz
</docAuthor> 
<docDate>July  2001</docDate> 
</front> 
<body> 

<div rend="slide">
<head>Using XML in the Real World</head>
<p>
   <figure width="4in" file="xml-logo.png"/>
</p>
<list type="unordered"><item>What is XML <emph>for</emph>?</item>
<item>How is it best used?</item>
<item>What tools are available?</item>
</list></div>

<div rend="slide">
<head>What is XML for?</head>
<list type="unordered">
<item>exchanging information
<list type="ordered">
<item>between people</item>
<item>between people and machines</item>
<item>between machines</item>
</list>
</item>
<item>preserving information
<list type="ordered">
<item>without usage-dependency</item>
<item>without medium-dependency</item>
<item>independent of time, space, and language</item>
</list>
</item>
</list>
</div>
<!--
<div rend="slide">
<head>Data reuse</head>
<p>Digital data can be reusable only to the extent that it is independent
<list type="unordered">
<item>of application</item>
<item>of platform</item>
<item>of software environment</item>
</list>
</p><p>XML therefore is the key to data reuse </p>
</div>
-->
<div rend="slide">
<head>Delivering information</head>
<p>XML is a good way of representing information. But how about
<list type="unordered">
   <item>delivering XML content on the web</item>
   <item>... and on paper</item>
   <item>storing and managing XML documents</item>
   <item>... and virtual documents</item>
</list>
</p>
<p>Can we get the best of both worlds?</p>
</div>

<div rend="slide">
<head>What tools do we need?</head>
<p>
<list type="unordered">
<item> Appropriately expressive languages (eg TEI XML)</item>
<item> Syntax-checking document creation tools (aka Editors)</item>
<item> Document transformation tools</item>
<item> Document delivery tools</item>
<item> Document storage and management tools</item>
<item> Programming interfaces for a variety of languages</item>
</list>
</p>
</div>


<div rend="slide">
<head>Generic languages</head>
   <list type="unordered">
   <item>DOM: Document Object Model Level 2;</item>
   <item>XML Schema (description of structures and data types);</item>
   <item>XPath: addressing parts of an XML document;</item>
   <item>XSLT: transforming XML documents for use with XSL;</item>
   <item>XSL: extensible stylesheet language;</item>
   <item>XLink: XML Linking Language;</item>
   <item>XPointer: XML Pointer Language.</item>
   </list>
</div>

<div rend="slide">
<head>... specialised (but generic) languages ... </head>
<list type="unordered">
   <item>SVG: scalable vector graphics;</item>
   <item>MathML: Mathematical Markup Language;</item>
   <item>RDF: Resource Description Framework;</item>
   <item>SMIL: Synchronised Multimedia Integration Language </item>
</list>
<p>... etc etc etc</p>
</div>


<div rend="slide">
<head>Document creation and editing</head>
<p>
There's an ever expanding choice of XML editing tools:
<list type="unordered">
<item> Plain text editors, typing &lt; and &gt; by hand (e.g. Notepad)</item>
<item> Customised plain text editors, with built in tagging (e.g. Notetab)</item>
<item> Customised programming editors (notably GNU Emacs)</item>
<item> Word processors with XML add-ons (e.g. WordPerfect)</item>
<item> Data-oriented XML editors (eg XML Spy)</item>
<item> Document-oriented XML editors (eg XMetal)</item>
</list>
</p>
<p>And there's also the XML that gets generated without anyone
     noticing...</p>
</div>

<!-- 
<div rend="slide">
<head>Emacs XML mode</head>
<p>
<figure height="4in"  file="psgml.png"/>
</p>
</div>

<div rend="slide">
<head>XMetal</head>
<p>
<figure height="4in"  file="xmetal.png"/>
</p>
</div>

<div rend="slide">
 <head>Tree editor (Xeena)</head>
<p>
<figure height="4in" file="xeena1.png"/>
</p>
</div>

<div rend="slide">
<head>Transformation into XML</head>

<p>Automatic conversion may be  a viable solution for some common formats,
but a great deal depends on the regularity of the input. GIGO/spso</p>
<p>For example: word to XML involves:
<list type="ordered">
 <item>saving Word files as HTML</item>
 <item>using the W3C `tidy' program to convert HTML to XHTML, and tidy
 up the Word nasties</item>
 <item>using an XSLT transformation to convert XHTML to the DTD of your choice</item>
<item>... and usually a <emph>lot</emph> of hand-tweaking</item>
</list>
</p>

<p>This <soCalled>rainbow technique</soCalled> works well &mdash; if
you have used Word styles consistently, but probably not otherwise</p>
</div>

-->
<div rend="slide">
<head>Document transformation tools</head>

<p>A <term>stylesheet</term> allows you to define how XML elements
are to be transformed.</p>

<list type="unordered">
<!--
<item> Document Style Semantics and Specification Language (DSSSL): an
  ISO standard; powerful expression language based on Scheme
  (Lisp).</item>
-->
<item> Extensible Style Language/Transformation (XSLT): fully-featured
transformation language</item>

<item> Cascading Style Sheets (CSS): allows you
to add formatting styles (only) to your document; </item>

<item>A variety of proprietary stylesheet languages also exists, 
tied to specific software;</item>

<item>Or you can use whatever software you like to map XML into
something else (e.g. LaTeX, nroff, RTF, Framemaker)
</item>
</list>

</div>

<div rend="slide">
<head>Transformation tools</head>
<list type="gloss">
<label>XSLT-based</label><item>Many, but varying in implementation level: we currently recommend <ident>saxon</ident></item>
<label>proprietary</label><item>Legacy SGML systems like Balise, Omnimark; new scripting schemes like XML Script</item>
<label>generic software</label><item>easier to develop with XML-aware
	    libraries, written to a standard API such as DOM</item>
</list>
</div>

<div rend="slide">
<head>Typical transformation jobs</head>
<list type="ordered">
<item>Render <gi>foo</gi> elements in italics</item>
<item>Render <gi>foo</gi> elements within <gi>bar</gi> elements in italics</item>
<item>Insert <code>Foo number</code> and the value of its <ident>number</ident> attribute in front of every foo</item>
<item>Indent every <gi>p</gi> element by 1 em, except for the first one in a <gi>div</gi></item>
<item>Take the first <gi>head</gi> element inside each <gi>div</gi> and add it to a table of contents</item>
</list>
</div>

<div rend="slide">
<head>Less obvious transformation jobs</head>
<list type="ordered">
<item>Count <gi>foo</gi> elements occurring within <gi>bar</gi> elements </item>
<item>Sort all  <gi>foo</gi> elements by the value of their  <ident>which</ident>
      attribute, suppressing duplicates</item>
<item>Display only <gi>foo</gi> elements whose <ident>which</ident>
      attribute has the same value as a <gi>bar</gi> element elsewhere</item>
<item>Display every <gi>p</gi> element containing some string</item>
<item>Display the parent element of every <gi>foo</gi> element,
      sorting them by the value of the <ident>which</ident>
      attribute on the last <gi>bar</gi> element they contain</item>
</list>
</div>

<div rend="slide">
<head>XML <soCalled>parsers</soCalled> and validators</head>
<p>Embedded or free standing, validation is an integral part of XML document processing. </p>
<p>There are lots of  products, both free and commercial:
<list type="unordered">
<item> in Java from Sun, Oracle, and IBM as well as individuals</item>
<item> in C, embedded in Perl and various applications like Netscape</item>
<item> in C++ from IBM</item>
<item> something in more or less any language you like, from Python to Dylan</item>
</list>
&ldots;plus all the existing SGML software
</p>
</div>


<div rend="slide"><head>Processing strategies</head>
<p>An XML document is a serialized tree structure. How should it be
processed? </p>
<p>There are three currently favoured approaches:
<list type="ordered">
<item>event-based (e.g. SAX)</item>
<item>tree-based (e.g. DOM)</item>
<item>declarative or functional (e.g. XSLT)</item>
</list>
</p>
</div>



<div rend="slide">
<head>XML on the web</head>
<p>Eventually, all web user agents (browsers) will be XML aware! Until they are,
we have to choose :<list type="ordered">
   <item>transform XML to HTML on the server (statically)</item>
   <item>transform XML to HTML on the server (dynamically, 
using a servlet)</item>
   <item>render XML on the client using CSS or dynamically
with some kind of plugin</item>
</list>
</p>
</div>
<div rend="slide">
<head>XML on the web: typical architecture</head>
<p>
<figure height="4in" file="xml02-1.png"/>
</p>
</div>
<div rend="slide">
<head>XML on paper</head>

<p>The combination of XML, XSL-T and a good FO-engine could do away
with the need for expensive proprietary DTP and word processing
systems</p>
<p>It hasn't happened yet, but it might...</p>
</div>

<div rend="slide">
<head>Storage strategies</head>
<p>Data has to be stored somewhere.
How should XML data be managed? There are several possibilities:<list type="ordered">
   <item>as discrete XML documents</item>
   <item>within any convenient DBMS</item>
   <item>within an XML repository</item>
</list>
</p>
</div>
<div rend="slide">
<head>XML documents</head>
<p>In the traditional docucentric world...
<list type="unordered">
   <item>information is stored in XML documents, somewhere, and in
some form</item>
   <item>entities give some degree of modularity</item>
   <item>but there has to be centralized naming and management for version control, integrity, etc.</item>
</list>
</p>
<p><eg>&lt;!ENTITY doc1 SYSTEM "docs/frag1.xml"&gt;
&lt;!ENTITY doc2 SYSTEM "docs/frag2.xml"&gt;</eg>
</p>
<p><eg>&lt;?xml version="1.0" ?&gt;
&lt;!DOCTYPE theDoc SYSTEM "theDTD.dtd" [
  &lt;!ENTITY % theDocList SYSTEM "theDocs.ent"&gt;
  %theDocList; ]&gt;
&lt;theDoc&gt;
&amp;doc1; &amp;doc2;
&lt;/theDoc&gt;</eg>
</p>
</div>

<div rend="slide">
<head>The docucentric world</head>
<!--
<p>Some examples:<list type="ordered">
   <item>
      <xref url="http://bodley.ox.ac.uk/oxxml/" to="DITTO" targOrder="U" from="ROOT">Oxlip</xref>
   </item>
   <item>
      <xref url="http://quirk.oucs.ox.ac.uk/TEI/Applications/" to="DITTO" targOrder="U" from="ROOT">TEI
website</xref>
   </item>
   <item>
      <xref url="http://nsmsweb4.oucs.ox.ac.uk/cgi-bin/lex3.pl" to="DITTO" targOrder="U" from="ROOT">Lexicon of Greek Personal
Names</xref>
   </item>
   <item>The Lampeter Corpus (on your
CD)</item>
</list>
</p> -->
<p>Good points:
<list type="unordered">
   <item>conceptually clear</item>
   <item>robust and portable</item>
</list>
</p>
<p>Less good points:<list type="unordered">
   <item>
      <emph>Everything </emph>must be an XML entity</item>
   <item>may appear inflexible or redundant</item>
</list>
</p>
</div>

<div rend="slide">
<head>Virtual documents</head>

<!--<p>But XML was designed as an <emph>interchange</emph> format to
facilitate information flow. As such, it can be dynamically generated
as and when we need it.</p>-->

<p>Storage is a special kind of
processing, like formatting, requiring a transformation in and out of 
some storage format. So we could <list type="unordered">
   <item>store information in
non-XML formats (optimized for specific functions, e.g. text retrieval or
relational tables)</item>
   <item>recover all and only the information needed from
the store in the form of a dynamically-generated XML
document/fragment</item>
   <item>in an XML repository, access should be in XML
terms; at present, there is usually a need for some mapping
process</item>
</list>
</p>
</div>
<div rend="slide">
<head>XML databases: the options</head>
<list type="unordered">
<item>Store some information as relations, and some as XML (e.g. ProtCem) </item>
<item>Store the XML structure as relations but expose only an XML view  (e.g. Phelix)</item>
<item>Store and expose only XML (e.g. Meerkat and other RSS-based services)</item>
</list>
</div>
<div rend="slide">
<head>DBMS or XML?</head>
<p>Do you have to choose?</p>
<list type="unordered">
<item>The argument from history<list type="ordered">
      <item>flatfiles gave way to network DBMS</item>
      <item>network DBMS gave way to relational</item>
      <item>will relational DBMS give way to oodbs?</item>
   </list>
</item>
<item>Getting the best of both worlds<list type="unordered">
      <item>DBMS are good at storing and managing <term>relations</term></item>
      <item>the equivalent XML technologies are not yet mature</item>
      <item>but DBMS can be cajoled into presenting their contents in XML terms</item>
   </list>
</item>
</list>
</div>
<div rend="slide">
<head>Delivery strategies</head>
<list type="unordered">
<item>Our goal is fast and efficient access to any subtree of the docuverse, of any
size</item>
<item>XPATH has an adequately rich semantics</item>
<item>XSL-T has an adequately rich syntax (we think)</item>
<item>The rest is a Simple Matter of Programming...</item>
</list>
</div>
<div rend="slide"><head>Delivery strategies (today)</head>
<list type="gloss">
   <label>Small-scale solution</label>
   <item>Use XSLT and  XPATH expressions</item>
   <label>Untidy solution</label>
   <item>Store XML in conventional database and do textual search</item>
   <label>High-tech solution</label>
   <item>Pre-index
all text in all elements, and provide one-off front-end
application</item>
   <label>Low-tech
solution</label>
   <item>Use XML-ified grep-like utility to
search documents (LTXML tools) </item>
</list>
</div>
<div rend="slide">
<head>Heterogeneity one way ... </head>
<p>
<figure height="4in" file="xml02-2.png"/>
</p>
</div>
<div rend="slide">
<head>... or
another</head>
<p>
<figure height="4in" file="xml02-3.png"/>
</p>
</div>
<div rend="slide">
<head>Development strategies</head>
<list type="unordered">
<item>XML began as a way
of smuggling SGML onto the web... </item>
<item>... but seems to have taken over
as the industry's driving force</item>
<item>Where will XML have taken us in the next few
years?</item>
<item>What should we expect to be able to
do?</item>
</list>
</div>
      </body>
   </text>
</TEI.2>


