<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/css" href="./slides-tei.css"?>
<!DOCTYPE TEI.2 SYSTEM "../../../Lite/DTD/teixlite.dtd" [
<!ATTLIST figure
  width CDATA #IMPLIED
  height CDATA #IMPLIED
  url CDATA #IMPLIED >
<!ATTLIST xref url CDATA #IMPLIED>
<!-- Many browsers do not read an external DTD, so these entities are -->
<!-- declared here (even though they are also declared in -->
<!-- teixlite.dtd) so that those browsers won't throw an error-->
<!ENTITY hellip	"&#x2026;"> <!-- HORIZONTAL ELLIPSIS -->
<!ENTITY mdash	"&#x2014;"> <!-- EM DASH -->
<!ENTITY ndash	"&#x2013;"> <!-- EN DASH -->
]>
<TEI.2>
  <teiHeader>
    <fileDesc>
      <titleStmt>
        <title>Markup: Why Bother?</title>
	<title>with an introduction to XML</title>
        <author>Syd Bauman</author>
      </titleStmt>
      <publicationStmt>
        <date value="2005-02-25"/>
        <distributor>TEI-Consortium (via website)</distributor>
        <address>
          <addrLine>info@tei-c.org</addrLine>
        </address>
        <availability>
          <p>Copyleft 2005 by Syd Bauman and the Brown University
          Women Writers Project.</p>
        </availability>
        <pubPlace>Given at the winter 2005 TEI Workshop 2005-02-26/27 at
          the University of Illinois at Urbana-Champaign</pubPlace>
      </publicationStmt>
      <sourceDesc>
        <p>Based on the same talk from the 2003-10 UIUC TEI workshop.</p>
      </sourceDesc>
    </fileDesc>
    <revisionDesc>
<change><date value="2005-02-28">Mon, 28 Feb 05</date>
	<respStmt><name>Syd Bauman</name>
	<resp>workshop co-presenter</resp></respStmt>
	<item>Updates for use on TEI website</item>
	</change>
    </revisionDesc>
  </teiHeader>
  <text>
    <front>
      <titlePage>
        <docTitle>
          <titlePart type="main">Markup: Why Bother?</titlePart>
        </docTitle>
        <docAuthor>Syd Bauman</docAuthor>
        <docDate>2003-10-25/26</docDate>
      </titlePage>
    </front>
    <body>

      <div rend="slide" n="01">
        <head>What is Markup?</head>
        <p>Markup is a method of making explicit one's understanding of a text.</p>
      </div>

      <div rend="slide" n="02">
        <head>In the Beginning &hellip;</head>
        <p>
          <foreign>Scriptio Continua</foreign>
          <eg>NOTEXACTLYTHEMOSTREADABLESTUFFINTHEWO
RLDBUTITDIDSUFFICEFORSOMETIMEDESPITE&hellip;</eg>
        </p>
      </div>

      <div rend="slide" n="03">
        <head>Punctuational Markup</head>
        <p>Added whitespace and punctuation <eg>NOTEXACTLYTHEMOSTREADABLESTUFFINTHEWO
RLDBUTITDIDSUFFICEFORSOMETIMEDESPITE&hellip;</eg>
        </p>
        <p>
          <eg>Not exactly the most readable stuff in the world,
but it did suffice for some time despite&hellip;</eg>
        </p>
      </div>

      <div rend="slide" n="04">
        <head rend="syd">Presentational Markup</head>
        <p>
          <figure width="244" height="312" url="letter-template.png"/>
        </p>
      </div>

      <div rend="slide" n="05">
        <head>Categories of Markup</head>
        <p>Once could think of markup as having two qualities, <term>mood</term>
          (<term>imperative</term> vs. <term>indicative</term>) and <term>domain</term>
            (<term>logical</term> or <term>structural</term> vs. <term>renditional</term>). <table>
            <row>
              <cell/>
              <cell>Indicative</cell>
              <cell>Imperative</cell>
            </row>
            <row>
              <cell>Logical</cell>
              <cell>&lt;head&gt;</cell>
              <cell/>
            </row>
            <row>
              <cell>Renditional</cell>
              <cell>&lt;hi&gt;</cell>
              <cell>.ce .bd</cell>
            </row>
          </table>
        </p>
      </div>

      <div rend="slide" n="06">
        <head rend="syd">Markup has Meaning</head>
        <p>
          <eg>WORDSWORTHLESSORBITSUNBOUND</eg>
        </p>
        <p>
          <eg>Words — Worthless; Orbits — Unbound</eg>
        </p>
        <p>
          <eg>Wordsworth Lessor Bit Sun-bound</eg>
        </p>
        <p>
          <eg>Words worth less, or bits unbound</eg>
        </p>
      </div>

      <div rend="slide" n="07">
        <head rend="syd">What We Mean</head>
        <p>So far only rudimentary, vulgar markup.</p>
        <p>TEI <note place="inline">indeed any decent XML application</note>, uses indicative logical markup<note
            place="inline">previously called <soCalled>descriptive</soCalled> markup</note> to
          indicate the structure of the text, labeling what various chunks of text <emph>are</emph>
          rather than what to do with them.</p>
        <p>This allows for multiple kinds of processing on a single text. Most obviously different
          looking output from the same input with different layouts or stylesheets.</p>
      </div>

      <div rend="slide" n="08">
        <head>Descriptive (Indicative, Logical) Markup</head>
       <list>
         <item>Describes the structure and function of a textual feature, not its appearance.</item>
         <item>Allows for multiple presentations and uses of the same structural data</item>
         <item>Permits independence of output mechanism (software, hardware, etc.)</item>
       </list>
        <p>
          <q rend="display">In the end, it should be clear that descriptive markup is not just the
            best approach of the competing markup systems; it is the best imaginable approach.</q>
          <bibl>Coombs, Renear, and DeRose, <title level="a">Markup Systems and the Future of
              Scholarly Text Processing,</title> <title level="j">Communications of the ACM</title> 30.11 (1987-11): 933–947.</bibl>
        </p>
      </div>

      <div rend="slide" n="09">
        <head>Why Use (Indicative Logical) Markup?</head>
        <list type="unordered">
          <item>To separate data from metadata</item>
          <item>To separate form from content</item>
          <item>To facilitate retrieval and analysis</item>
          <item>To achieve data longevity</item>
        </list>
      </div>

      <div rend="slide" n="10">
        <head>XML</head>
        <p>The <xref url="http://www.w3.org/TR/REC-xml">extensible markup language (XML)</xref> is a
          standardized method for marking up texts with
          indicative logical markup, and, optionally, for explicitly declaring the markup language
          used.</p>
      </div>

      <div rend="slide" n="11">
        <head>Why XML?</head>
        <p>XML is <list type="unordered">
            <item>easy to understand;</item>
            <item>non-proprietary plain-text: <list>
                <item>human readable,</item>
                <item>software independent,</item>
                <item>hardware independent;</item>
              </list>
            </item>
            <item>(relatively) easy to write a parser for;</item>
            <item>represents hierarchical text structures very well, <list type="unordered">
                <item>non-hierarchical not so well;</item>
              </list>
            </item>
            <item>ubiquitous, <list type="unordered">
                <item>very well supported with both commercial and free software.</item>
              </list>
            </item>
          </list>
        </p>
      </div>

      <div rend="slide" n="12">
        <head>XML compared to SGML</head>
        <p>XML is a subset of SGML, a simplification
        <list>
          <item>Eliminates the most complex features</item>
          <item>Removes requirement for a DTD</item>
          <item>Decreases range of options (tag omission, choice of delimiters)</item>
        </list>
        </p>
        <p>Many encoding languages exist in both:
        <list>
          <item>[X]HTML</item>
          <item>TEI</item>
          <item>DocBook</item>
        </list></p>
      </div>

      <div rend="slide" n="13">
        <head>XML Basics</head>
        <p>XML is a metalanguage
        <list>
          <item>No tags or attributes of its own</item>
          <item>Instead, a set of rules for defining tags and attributes</item>
          <item>Imposes no constraints on elements and attributes in document</item>
          <item>Instead, defines how rules for such constraints are written</item>
        </list>
        </p>
      </div>

      <div rend="slide" n="14">
        <head>Metaphors</head>
        <list>
        <item>XML as a "tree structure"</item>
        <item>XML as an "ordered hierarchy of content objects" <note place="inline" rend="smaller">Renear, DeRose, Mylonas</note></item>
        <item>XML as a tree-like representation of other, more complex structures</item>
        </list>
      </div>

      <div rend="slide" n="15">
        <head>Parts of an XML Document Instance</head>
        <p>Everything is delimited:<list>
            <item>elements by <term>start-tags</term> and <term>end-tags</term>
            </item>
            <item>tags by <code>&lt; &hellip; &gt;</code> and <code>&lt;/ &hellip;
                &gt;</code>
            </item>
            <item>special case: an element with no content may be represented as <code>&lt;
                &hellip; /&gt;</code> as short-hand for <code>&lt; &hellip;
                &gt;&lt;/ &hellip; &gt;</code>
            </item>
            <item>entity references by &amp; &hellip; ;<code/>
            </item>
          </list>
        </p>
      </div>

      <div rend="slide" n="16">
        <head>Everything's Delimited: elements</head>
        <p>Text is divided into <term>elements</term> (the <soCalled>nouns</soCalled> of
          the encoding &mdash; <term>content objects</term>).</p>
        <list type="unordered">
          <item>Elements are delimited by <term>tags</term>: a start-tag at the begining, and an
            end-tag at the end.</item>
          <item>Start-tags are delimited by <code>&lt;</code> and <code>&gt;</code>.</item>
          <item>End-tags are delimited by <code>&lt;/</code> and <code>&gt;</code>.</item>
          <item>Special case: for an empty element start- and end- tags are combined — delimited by
              <code>&lt;</code> and <code>/&gt;</code>.</item>
        </list>
      </div>

      <div rend="slide" n="17">
        <head>Example Elements</head>
        <p>
          <eg><![CDATA[<name>Sara Schmidt</name>]]></eg>
          <eg><![CDATA[<para>The year that Buttercup was born, the most
beautiful woman in the world]]> &hellip; &lt;/para&gt;</eg>
          <eg>&lt;line&gt;&apos;<![CDATA[Twas the night before Christmas</line>]]></eg>
          <eg><![CDATA[<para>We've had <hit-count/> hits since last
updated on <auto-date/> at <auto-time/>.</para>]]></eg>
        </p>
      </div>

      <div rend="slide" n="18">
        <head>Everything's Delimited: attributes</head>
        <p>Elements have attributes (sort of as nouns have adjectives). </p>
        <list type="unordered">
          <item>Attributes have names: <q>rend</q>, <q>type</q>, <q>id</q>, <q>n</q>,
              <q>target</q>
          </item>
          <item>Attributes have values: <ident type="val">slant(italic)</ident>, <ident type="val"
              >chapter</ident>, <ident type="val">P0001</ident>
          </item>
          <item>Attributes are specified in the start (or empty) tag</item>
          <item>The attribute name is preceded by whitespace and followed by <code>=</code><list type="unordered"><item>You can have whitespace before the <code>=</code></item></list>
          </item>
          <item>The attribute value is delimited by either single or double quotation marks (<code>&#x0022;</code>s or
              <code>&#x0027;</code>s). <emph>No curly quotes!</emph></item>
        </list>
      </div>
<!--      <div rend="slide" n="19">
        <head>Examples of Attributes</head>
        <p><eg><![CDATA[<name type="person">]]></eg></p>
        <p><eg><![CDATA[<name type="person"
      reg="Flanders, Julia H."
      key="JFlanders.lfw"
          rend="slant(italic)">]]></eg></p>
        </div> 

        <div rend="slide" n="20">
        <head>Attribute functions</head>
        <p>Typology: <ident type="attrName">type</ident>, <ident type="attrName">ana</ident>, <ident
            type="attrName">unit</ident>
        </p>
        <p>Identity: <ident type="attrName">id</ident>, <ident type="attrName">key</ident>
        </p>
        <p>Pointing: <ident type="attrName">target</ident>, <ident type="attrName">corresp</ident>,
            <ident type="attrName">sameAs</ident>
        </p>
        <p>Enumeration: <ident type="attrName">n</ident>
        </p>
        <p>Description: <ident type="attrName">lang</ident>
        </p>
        <p>Annotation: <ident type="attrName">resp</ident>, <ident type="attrName">desc</ident>,
            <ident type="attrName">reason</ident>, <ident type="attrName">wit</ident>
        </p>
        <p>Regularization: <ident type="attrName">reg</ident>, <ident type="attrName">value</ident>
        </p>
        <p>Alternative: <ident type="attrName">corr</ident>, <ident type="attrName">expan</ident>
        </p>
      </div>-->

      <div rend="slide" n="21">
        <head>Example Elements</head>
        <p>
          <eg><![CDATA[<name type='place'>Philadelphia</name>]]></eg>
          <eg><![CDATA[<acronym full=
            "The University of Illinois at Urbana-Champaign">UIUC</acronym>]]></eg>
          <eg><![CDATA[<line n="1">'Twas the night before Christmas</line>]]></eg>
          <eg><![CDATA[<book title="Better Living Through TEI"
      author="Mark Upgood"
      cost="$5.99"
      stock='12' />]]></eg>
        </p>
      </div>

      <div rend="slide" n="22">
        <head>Everything's Delimited: character references</head>
        <p>To refer to a character that is not on your keyboard, delimit its ISO 10646 (or Unicode)
          codepoint with <list>
            <item>
              <code>&amp;#</code> and <code>;</code> for decimal values, or</item>
            <item>
              <code>&amp;#x</code> and <code>;</code> for hexidecimal values.</item>
          </list>
        </p>
      </div>

      <div rend="slide" n="23">
        <head>Everything's Delimited: entity references</head>
        <p>Names delimited by <code>&amp;</code> and <code>;</code> are references to the string
          of characters or bits declared as associated with that name.</p>
        <p>How to declare such an entity is not covered here.</p>
        <p>Five entities come pre-declared: <code>amp</code>, <code>lt</code>, <code>gt</code>,
            <code>apos</code>, and <code>quot</code>.</p>
      </div>

      <div rend="slide" n="24">
        <head>Entity references</head>
        <p>A way to handle special characters (i.e. not lower ASCII) and boilerplate text</p>
        <list><item>Delimited by <code>&amp;</code> and <code>;</code></item>
        <item>&amp;eacute; = é</item>
        <item>&amp;copyright; = "this text is copyrighted by the Women Writers
          Project&hellip;"</item></list>
      </div>

      <div rend="slide" n="25">
        <head>Well-formedness</head>
        <p>Simple set of rules on document syntax.<list>
        <item>single <soCalled>root</soCalled> element</item>
          <item>every element has a start- and an end-tag (or empty)</item>
          <item>no element overlap</item>
          <item>all entities and attributes properly delimited</item>
        </list></p>
      </div>

    <div rend="slide" n="26">
      <head>Validity</head>
        <p>Validity: follows the rules of a DTD
        <list>
          <item>which elements can go where</item>
          <item>which attributes can be (or must be) on which elements</item>
          <item>minimal constraint on values of attributes</item>
        </list>
        </p>
      </div>
<!--  
  </div>

  <div rend="slide" n="27">
        <head>Namespaces</head>
        <p>A way of using tag vocabularies from a different markup language</p>
        <p>Allows for specialization of markup languages (by discipline, by function)</p>
        <p>Good for metadata: can use TEI header in a METS record</p>
        <p>Good for specialized markup: e.g. MathML, MusicML</p>
        <p>No need for every markup language to handle everything</p>
  </div>-->

      <div rend="slide" n="28">
        <head>Samples</head>
        <p><eg>&lt;p&gt;&lt;q&gt;Curiouser and curiouser&lt;/q&gt; said
&lt;name&gt;Alice&lt;name&gt; to herself.&lt;/p&gt;</eg></p>
      </div>

      <div rend="slide" n="29">
        <head>Samples</head>
        <eg>&lt;lg&gt;
   &lt;l&gt;A bird came down the walk&lt;/l&gt;
   &lt;l&gt;He did not know I saw,&lt;/l&gt;
   &lt;l&gt;He bit an angleworm in half,&lt;/l&gt;
   &lt;l&gt;And ate the fellow raw.&lt;/l&gt;
 &lt;/lg&gt;</eg>
      </div>

      <div rend="slide" n="30">
        <head>Samples</head>
<eg>&lt;lg&gt;
  &lt;l&gt;&lt;s&gt;A bird came down the walk&lt;/l&gt;
  &lt;l&gt;He did not know I saw.&lt;/s&gt;&lt;/l&gt;
  &lt;l&gt;&lt;s&gt;He bit an angleworm in half,&lt;/l&gt;
  &lt;l&gt;And ate the fellow raw.&lt;/s&gt;&lt;/l&gt;
&lt;/lg&gt;</eg>
      </div>
</body>
  </text>
</TEI.2>
