TEI TF on SGML to XML Migration: Tools Page


Contents

This page provides pointers to the tools recommended by the task force.

OpenSP

OpenSP (based on SP by James Clark, itself based on SGMLS by James Clark, which was based on ARCSGML by Charles Goldfarb) is maintained by the OpenJade project. It contains a number of related utilities: an SGML or XML parser, a normaliser and, of particular interest to this group, a utility for converting SGML documents to XML. This utility, called osx has recently been enhanced by Jessica Hekman to include some features of particular usefulness to the task of TEI legacy conversion. The software is distributed in source form or as Windows binaries from Source Forge.

tei2tei.xsl

An XSLT stylesheet written by Sebastian Rahtz, tei2tei.xsl is specifically designed to clean up the results of an SGML to XML transformation that was performed with sx/osx. It transforms TEI element names into their proper mixed case and removes attributes with default values. It requires that the DOCTYPE declaration and DTD subset be replaced by hand.

convert.bat

A sample Unix batch script that uses sx and the Saxon XSLT processor to transform SGML documents into XML. The sed command preserves entity references through sx processing (note that the current version of osx provides command-line options to control entity handling). DOCTYPE declarations and the DTD subset must be replaced by hand.

wwp-store_sgml2xml.perl

A Perl script provided by Syd Baumann for converting files that conform to the Brown University Women Writer's Project SGML DTD to XML. While not intended for general purpose use, this program may work well in certain circumstances. Be sure to read the ‘known bugs’ and ‘limitations’ sections of the header comment.

xmlify

This shell script provided by Lou Burnard is for converting the SGML files from the British National Corpus. This script runs files through osx (preserving internal and external entity references) and an XSLT transformation (pretty printing and replacement of character entity references with character number references).


Last recorded change to this page: 2007-12-13  •  For corrections or updates, contact webmaster AT tei-c DOT org