1. The British National Corpus

This subdirectory contains some files demonstrating a particularly tricky SGML to XML conversion problem! It has been prepared for the use of the TEI Workgroup on SGML/XML Migration only; files herein are copyright and may not be re-distributed without permission. See the BNC website cited below for licensing conditions.

The British National Corpus (BNC) is a 100 million word corpus of modern British English, in which each distinct word has a part of speech code (POS) attached to it, as well as all the usual TEI flummery. This makes it Big. The DTD for the current release of the BNC is TEI compliant, inasmuchas

The documentation is included here in the original XML source and may also, more easily perhaps, be read at the BNC website starting from The BNC online user guide; in both cases there is a minor error, which I leave the discerning reader to discover.

This archive also contains:

On my system, in this directory, typing either of the following lines

nsgmls -s sgmldecl driver.sgm
nsgmls -s -c TEIcatalog sgmldecl driver.sgm
   
produces satisfyingly no messages other than a successful compilation. Your mileage may, as they say, vary.

Lou Burnard, on Guy Fawkes Day, 2002

.