TEI Tite RFP: Validating XML


Validating TEI Tite documents

XML transcriptions returned by vendors must validate against one or more schemas for TEI Tite provided by the TEI Consortium. Current instructions for using these schemas follow; these may be amended once the conversion program is in place.

DTD

The vendor must be able to validate XML files against the TEI Tite DTD. The DTD may be downloaded from http://www.tei-c.org/release/xml/tei/custom/schema/dtd/tei_tite.dtd. Depending on whether the vendor's XML tools support the fetching of a DTD over the Internet, the DTD may be invoked either as a system file or via a URL:

  • system: <!DOCTYPE text SYSTEM "tei_tite.dtd">
  • URL: <!DOCTYPE text SYSTEM "http://www.tei-c.org/release/xml/tei/custom/schema/dtd/tei_tite.dtd">

RELAX NG

In addition, the vendor should be able to validate XML files against the TEI Tite RELAX NG schema, which provides more checking of data types. The schema may be downloaded from http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_tite.rng. When using RELAX NG to validate TEI Tite files, both the DTD and the RELAX NG schema must be invoked in order for RELAX NG to inherit the proper namespaces for elements. The method for doing this will vary according to the vendor's XML tools, but using the oXygen XML platform, for example, one would add a processing instruction to the Document Type Declaration:

<!DOCTYPE text SYSTEM "tei_tite.dtd"> <?oxygen RNGSchema="tei_tite.rng" type="xml"?>

Other Schemas

A W3C XML Schema (.xsd file) provided by the TEI Consortium may be a future option for file validation.

A vendor should be able to validate XML files using a Schematron schema. The TEI and/or individual content holders may wish to provide Schematron schemas for enforcement of specific encoding practices.

Test Document

Potential vendors may quickly test their ability to validate a simple TEI Tite document using the following XML code. Using the TEI Tite DTD alone, it should be reported as valid; using the DTD + RELAX NG, it should be reported as invalid because of an illegal format for date/@when.

<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE text SYSTEM "tei_tite.dtd"> <?oxygen RNGSchema="tei_tite.rng" type="xml"?> <text xmlns="http://www.tei-c.org/ns/1.0"> <body> <p>This is an <i>italicized</i> word. Today is <date when="5/20/2009">20 May 2009</date>.</p> </body> </text>

(The above XML code works with the oXygen editor; if you are using a different platform, you may need to substitute an appopriate processing instruction.)