<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE TEI.2 SYSTEM "../../Lite/DTD/teixlite.dtd"[
  <!ATTLIST xptr url CDATA #IMPLIED >
  <!ATTLIST xref url CDATA #IMPLIED >
  <!ENTITY verbar "|">
  <!-- &#124; -->
  <!ENTITY ldots "..." >
  <!-- &#x2026; -->
  <!ENTITY null "" >
  <!ENTITY mdash "---">
  <!--&#x2014; -->
  <!ATTLIST figure scale CDATA #IMPLIED height CDATA #IMPLIED file CDATA #IMPLIED >
]>
<TEI.2> 
   <teiHeader type="text" status="new"> 
      <fileDesc> 
         <titleStmt>
            <title>The XML
Refresher</title>
         </titleStmt> 
         <publicationStmt> 
            <p> First version for HCU
Summer school</p> 
         </publicationStmt> 
         <sourceDesc default="NO"> 
            <p/> 
         </sourceDesc>

      </fileDesc> 
      <revisionDesc> 
         <list type="unordered">
            <item>
               <date>7 Jan 01</date>LB
revised name space discussion</item>
            <item>
               <date>12 Jul 00</date>LB
revisions</item> 
            <item>
               <date>11 Jul 00</date>LB on a train from Paris</item>

         </list> 
      </revisionDesc> 
   </teiHeader> 
   <text> 
      <front> 
         <docTitle>

            <titlePart type="main">XML: the refresher</titlePart> 
         </docTitle>

         <docAuthor>Lou Burnard

</docAuthor> 
         <docDate>July 2001</docDate> 
      </front> 
      <body> 
         <div rend="slide">
            <head>Topics</head>
            <list>
               <item>
The rules of the game</item>
               <item> Are you well formed?</item>
               <item> Making the
rules</item>
               <item> Are you valid?</item>
               <item> What use is a
DTD?</item>
            </list>
         </div>
         <div rend="slide">
            <head>Making Digital Resources</head>
            <list type="unordered">
               <item>Texts are  more than
simply sequences of glyphs
               <list><item>they have <hi>structure</hi> and
<hi>context</hi> and they also have multiple readings </item></list></item>

               <item>
                  <hi>Encoding</hi> or <hi>markup</hi> provides a means of making such
readings explicit</item>
               <item>Only that which is explicit can be digitally
processed</item>
            </list>
         </div>
         <div rend="slide">
            <head>XML: what it is and why you should
care</head>
            <list>
               <item>XML is <hi>structured data</hi> represented as strings
of text</item>
               <item>XML looks like HTML, except that:-<list>
                     <item>XML is
<hi>extensible</hi> 
                     </item>
                     <item>XML must be <hi>well-formed</hi> 
                     </item>

                     <item>XML can be <hi>validated</hi> 
                     </item>
                  </list>
               </item>
               <item>XML is
application-, platform-, and vendor- independent </item>
               <item>XML empowers the
<hi>content provider</hi> and facilitates data integration</item>
            </list>
         </div>


         <div rend="slide">
            <head>XML terminology</head>
            <p>An XML document contains:-
<list>
                  <item>elements, possibly bearing attributes</item>
                  <item>processing
instructions</item>
                  <item>entity references</item>
                  <item>CDATA marked sections</item>
                  <item>IGNORE/INCLUDE marked sections</item>
               </list>
            </p>
            <p>An XML document must be <term>well-formed</term>
and may be<term>valid</term>
            </p>
         </div>
<div rend="slide"><head>XML is an international standard</head>
<list><item>XML requires use of ISO 10646 <list><item>a 31 bit character reportoire including most human writing systems</item>
       <item>encoded as UTF8 or UTF16</item></list></item>
<item>other encodings may be specified at the document level</item>
     <item>language may be specified at the element level using <hi>xml:lang</hi></item>
    </list>
   </div>
         <div rend="slide">
            <head>The rules of the
XML Game</head>
            <list>
               <item>An XML document represents a (kind of)
<term>tree</term>
               </item>
               <item>It has a single <term>root</term> and many
nodes</item>
               <item>Each node can be<list> 
                     <item>a subtree</item>
                     <item>a single
<term>element</term> (possibly bearing some <term>attributes</term>)</item>
                     <item>a string of <term>character data</term>
                     </item>
                  </list>
               </item>
               <item>Each
element has a type or <term>generic identifier</term>
               </item>
               <item>Attribute
names are predefined for a given element; values can also be constrained</item>

            </list>
         </div>
         <div rend="slide">
            <head>Representing an XML tree</head>
            <list>

               <item>An XML document is encoded as a linear string of characters</item>

               <item>It begins with a special <term>processing instruction</term>
               </item>

               <item>Element occurrences are marked by <term>start-</term> and
<term>end-tags</term>
               </item>
               <item> The characters &lt; and &amp; are Magic and
must always be "escaped"</item>
               <item>
                  <term>Comments</term> are delimited by
&lt;!-- and --&gt;</item>
               <item>
                  <term>CDATA sections</term> are delimited by
&lt;![CDATA[ and ]]&gt;</item>
               <item>Attribute name/value pairs are supplied on
the start-tag and may be given in any order</item>
               <item>Entity references are
delimited by &amp; and ;</item>
            </list>
         </div>
         <div rend="slide">
            <head>An example
XML document</head>
            <eg>   &lt;?xml version="1.0" encoding="utf-8" ?&gt;
   &lt;cookBook&gt;

     &lt;recipe n="1"&gt;
      &lt;head&gt;Nail Soup&lt;/head&gt;
      &lt;ingredientList&gt;  .... &lt;/ingredientList&gt;
      &lt;procedure&gt;  ....   &lt;/procedure&gt;  
     &lt;/recipe&gt;

     &lt;recipe n="2"&gt; 
     &lt;!-- contents of second recipe here --&gt;
     &lt;/recipe&gt;
   
   &lt;!-- hic desunt multa --&gt;

   &lt;/cookBook&gt;
</eg>
         </div>
         <div rend="slide">
            <head>XML syntax: the small print</head>
            <p>What
does it mean to be <term>well-formed</term>?</p>
            <list type="ordered">
               <item>
there is a single root node containing the whole of an XML document</item>

               <item> each subtree is properly nested within the root node</item>
               <item> names
are always case sensitive</item>
               <item> start-tags and end-tags are always
mandatory (except that a combined start-and-end tag may be used for empty
nodes)</item>
               <item> attribute values are always quoted</item>
            </list>
         </div>

         <div rend="slide">
            <head>Splot the mistake</head>
            <eg>&lt;greeting&gt;Hello world!&lt;/greeting&gt;
&lt;greeting&gt;Hello world!&lt;/Greeting&gt;

&lt;greeting&gt;&lt;grunt&gt;Ho&lt;/grunt&gt; world!&lt;/greeting&gt;
&lt;grunt&gt;Ho &lt;greeting&gt;world!&lt;/greeting&gt;&lt;/grunt&gt;
&lt;greeting&gt;&lt;grunt&gt;Ho world!&lt;/greeting&gt;&lt;/grunt&gt;

&lt;grunt type=loud&gt;Ho&lt;/grunt&gt;
&lt;grunt type="loud"&gt;&lt;/grunt&gt;

&lt;grunt type= "loud"&gt;
&lt;grunt type ="loud"/&gt;</eg>
         </div>
         <div rend="slide">
            <head>Defining the
rules</head>
            <p>A <hi>valid</hi> XML document will reference a <term>document
type declaration</term> (DTD) :<eg>&lt;!DOCTYPE cookBook SYSTEM "cookbook.dtd"&gt;</eg>
            </p>

            <p>A DTD specifies:<list>
                  <item>names for all your elements </item> 
                  <item>names
and default values for their attributes</item>
                  <item>rules about how elements
can nest </item> 
                  <item>names for re-usable pieces of data (entities)</item>
                  <item>and a few other things</item>
               </list>
            </p>
            <p>n.b. A DTD does
<emph>not</emph> specify anything about what elements "mean"</p>
         </div>
<div rend="slide"><head>The DTD Subset</head>
<list>
<item><p>As well as referencing a DTD, an XML document can add some extra
     declarations known as the <term>DTD subset</term></p>

<p><eg>&lt;!DOCTYPE cookBook SYSTEM "cookbook.dtd" [
       &lt;!-- additional declarations here --> 
]&gt;</eg></p>
     </item>
<item>Declarations in the subset are processed before those in the DTD</item>
<item>This gives us the ability to modify a DTD... see later!</item>
    </list>
</div>

         <div rend="slide">
            <head>Defining an element</head>
            <p>An element declaration
takes the form<eg>&lt;!ELEMENT name contentModel &gt;</eg> 
            </p>

            <list type="gloss">
               <label>name</label>
               <item> is the name of the
element</item>
               <label>contentModel</label> 
               <item>defines valid content for the
element</item>
            </list>
            <p>The <term>content</term> of an element can be:<list>

                  <item>#PCDATA</item>
                  <item>EMPTY</item>
                  <item>other elements</item>
                  <item>
                     <term>mixed</term> content combines PCDATA and
                     other elements</item>
               </list>
            </p>
         </div>
         <div rend="slide">
            <head>Content models</head>

            <p>Within a content model:<list>
                  <item>
                     <term>sequence</term> is indicated by
comma </item>
                  <item>
                     <term>alternation</term> is indicated by | </item>

                  <item>
                     <term>grouping</term> is indicated by parentheses</item>
               </list>
            </p>

            <p>
               <term>Occurrence indicators</term>:

<table>

                  <row role="data">
                     <cell rows="1" role="data" cols="1">[nothing]</cell>
                     <cell rows="1" role="data" cols="1">once</cell>
                     <cell rows="1" role="data" cols="1">?</cell>
                     <cell rows="1" role="data" cols="1">optionally
once</cell>
                  </row>
                  <row role="data">
                     <cell rows="1" role="data" cols="1">+</cell>
                     <cell rows="1" role="data" cols="1">one or more
times</cell>
                     <cell rows="1" role="data" cols="1">*</cell>
                     <cell rows="1" role="data" cols="1">zero or more times</cell>
                  </row>
               </table></p>
<p>If #PCDATA appears in a content model...<list>
                  <item>it can only appear once</item>
                  <item> it must appear <hi>first</hi>
                  </item>
                  <item>if in an alternation, only the
* occurrence indicator is allowed</item>
               </list>
            </p>
         </div>
         <div rend="slide">
            <head>For example...</head>
            <p>
               <eg>&lt;!ELEMENT a (b+) &gt;
&lt;!ELEMENT b EMPTY&gt;
&lt;!ELEMENT c (#PCDATA)&gt;
&lt;!ELEMENT a (b,c) &gt;
&lt;!ELEMENT a (b|c)* &gt;
&lt;!ELEMENT a (#PCDATA|b|c)* &gt;
&lt;!ELEMENT a (b,  (c|d)*) &gt;
&lt;!ELEMENT a (b?, (c|d)+) &gt;
&lt;!ELEMENT a (b?, (c+|d+)) &gt;</eg>
            </p>
         </div>

         <div rend="slide">
            <head>Defining an attribute list</head>
            <p>An attribute list
declaration takes the form<eg>&lt;!ATTLIST name attributelist &gt;</eg>
            </p>

            <list type="gloss">
               <label>name</label>
               <item> is the name of the element bearing
these attributes</item>
               <label>attributeList</label> 
               <item>is a list of
attribute specifications, each containing <list>
                     <item>an attribute name</item> 

                     <item>a declared value </item> 
                     <item>a default value</item> 
                  </list>
               </item>

            </list>
            <p>For example:<eg>&lt;!ATTLIST recipe serves CDATA #REQUIRED
                 id     ID    #IMPLIED
                 tested (yes|no|maybe) "maybe"&gt;</eg>
            </p>
         </div>

         <div rend="slide">
            <head>Defining an attribute list (2)</head>
            <p>The range of
possibilities is actually rather limited:<list type="gloss">
                  <label>declared
value</label>
                  <item>can be<list>
                        <item>an explicit list e.g.
(fish|fowl|herring)</item>
                        <item>CDATA</item>
                        <item>ID, IDREF, or
IDREFS</item>
                     </list>
                  </item>
                  <label>default value</label>
                  <item>can be<list>

                        <item>an explicit value e.g. "fish"</item>
                        <item>#IMPLIED</item>

                        <item>#REQUIRED</item>
                        <item>FIXED</item>
                     </list>
                  </item>
               </list>
            </p>
         </div>

         <div rend="slide">
            <head>An example DTD</head>
            <eg>&lt;!ELEMENT cookBook (recipe+)&gt;
&lt;!ELEMENT recipe (head?, (ingredientList|procedure|para)*) &gt;
&lt;!ATTLIST recipe serves CDATA #IMPLIED&gt;
&lt;!ELEMENT head (#PCDATA) &gt;
&lt;!ELEMENT ingredientList (ingredient+)&gt;
&lt;!ELEMENT ingredient (#PCDATA|food|quantity)* &gt;
&lt;!ELEMENT procedure (step+) &gt;
&lt;!ELEMENT food (#PCDATA)&gt;
&lt;!ATTLIST food 
   type  (veg|prot|fat|sugar|flavour|unspec) "unspec"
   calories (high|medium|low|none|unknown) "unknown" &gt;
&lt;!ELEMENT quantity EMPTY &gt;
&lt;!ATTLIST quantity value CDATA #REQUIRED
                  units CDATA #IMPLIED
                  exact (Y|N) "N"&gt;
&lt;!ELEMENT para (#PCDATA|food)*&gt;
&lt;!ELEMENT step (#PCDATA|food)*&gt;</eg>
         </div>
         <div rend="slide">
            <head>Entities</head>
            <p>An <term>entity</term> is a named sequence
of characters, predefined  for convenience. </p>
            <p>Typical uses include:<list>
                  <item>to represent characters which cannot reliably be typed
in</item>
                  <item>as a short cut for boiler plate text</item>
                  <item>containers for
external (non-XML) data such as graphics</item>
                  <item>as a means of abbreviating
parts of a DTD (parameter entities)</item>
               </list>
            </p>
            <p>A special form of entity name is available for most characters, based on
     its position in the ISO 10646 standard.</p>
         </div>
         <div rend="slide">
            <head>Entities: some examples</head>
            <p>
               <eg>  &lt;!ENTITY mdash "&amp;#x2014;"&gt;
  &lt;!ENTITY hcu "Humanities Computing Unit"&gt;
  &lt;!ENTITY fig1 SYSTEM "fig1.bmp" NDATA BMP&gt;
  &lt;!ENTITY % foodTypes 
        "(veg|prot|fat|sugar|flavour|unspec)"&gt;</eg>
            </p>
            <p>A parameter entity
is one way of changing the range of values permitted for attribute values. <eg>&lt;!ATTLIST food type %foodTypes; #IMPLIED&gt;</eg>
            </p>
            <p>If a DTD contains two or more definitions for the same
     entity, then the first one found wins. This means a declaration
     in the DTD subset can over-ride one in the DTD:<eg>&lt;!DOCTYPE cookBook SYSTEM "cookbook.dtd" [
&lt;!ENTITY % foodTypes "(good|bad|indifferent)"&gt;
]&gt;</eg>
            </p>
         </div>
         <div rend="slide">
            <head>What use is a DTD?</head>
            <list>

               <item>A DTD is very useful at data preparation time (e.g. to enforce
consistency), but redundant at other times</item> 
               <item>If a document is
well-formed, its DTD can be (almost) entirely recreated from it. </item>

               <item>DTDs don't allow you to specify much by the way of content
validation</item>
               <item>Unlike other parts of the XML family, DTDs are not
expressed in XML</item>
            </list>
            <p>The XML Schema Language addresses these
issues, and may eventually replace the DTD entirely... maybe.</p>
         </div>

         <div rend="slide"> 
            <head>XML: a licence for ill?</head> 
            <p>XML allows you to
make up your own tags, and doesn't require a DTD... isn't that rather
dangerous?<list type="unordered"> 
                  <item>XML allows you to name elements
freely</item> 
                  <item>one man's <gi TEI="yes">p</gi> is another's <gi TEI="yes">para</gi> (or is
it?)</item> 
                  <item>the appearance of interchangeability may be worse than its
absence</item> 
               </list>
            </p> 
            <p>
               <term>Namespaces</term> provide a partial
solution (but are incompatible with the use of a DTD)</p>
         </div>

         <div rend="slide">
            <head>Namespaces</head>
            <p>A name space associates a
<term>namespace prefix</term> with some unique identifier (looks like a URL but
isn't)</p>
            <p>It is usually defined on the root element of a document (but need
not be)<eg>&lt;root xmlns:mutt="mutt.co.uk"
            xmlns:jeff="www.jeff.org"&gt;</eg>
            </p>
            <p>The namespace prefix can
then be used to distinguish for example <eg>&lt;mutt:table&gt; .... &lt;/mutt:table&gt;
&lt;jeff:table&gt; .... &lt;/jeff:table&gt;</eg>
            </p>
            <p>An XML processor can be told to
process elements from different namespaces differently</p>
         </div>

         <div rend="slide">
            <head>Defaulting namespaces</head>
            <list>
               <item>If no namespace
prefix appears in a tagname, it is said to belong to the <term>default
namespace</term>
                  <eg>&lt;jeff:table&gt;&lt;!-- a jeff type table --&gt;&lt;/jeff:table&gt;
&lt;table&gt;Some other kind of table&lt;/table&gt;</eg>
               </item>
               <item>The default
namespace may be defined on the root element of the document<eg>&lt;root xmlns="mutt.co.uk"&gt;</eg>

               </item>
            </list>
         </div>
         <div rend="slide">
            <head>DTD : what does it <hi>really</hi>
mean? </head>
            <list type="unordered"> 
               <item>To get the best out of XML, you need
two kinds of DTD: <list type="unordered"> 
                     <item>document type
<hi>declaration</hi>: elements, attributes, entities, notations (syntactic
constraints)</item> 
                     <item>document type <hi>definition</hi>: usage and meaning
constraints on the foregoing</item> 
                  </list>
               </item> 
               <item>Published
specifications (if you can find them) for XML DTDs usually combine the two,
hence they lack modularity </item> 
            </list> 
         </div> 
         <div rend="slide"> 
            <head>Some
typical scenarios </head> 
            <list type="ordered"> 
               <item>Make up your own DTD 
<list type="unordered"> 
                     <item>... starting from scratch </item> 
                     <item>... by
combining components from one or more pre-existing conceptual frameworks (aka
<hi>architecture</hi> or <hi>namespace</hi>) </item> 
                  </list>
               </item> 

               <item>Customize a pre-existing DTD <list type="unordered"> 

                     <item>
                        <hi>definitions</hi> should be meaningful within a given user community
</item> 
                     <item>
                        <hi>declarations</hi> should be appropriate to a given set of
applications </item> 
                  </list>
               </item>
            </list> 
            <p>The TEI is a good candidate for
the second approach</p>
         </div>
      </body> 
   </text> 
</TEI.2>
