ED A35: Overview of TEI Main DTD Files C. M. Sperberg-McQueen, 6 March 1993 rev. 16 March 1993 We have the following main DTD files (or classes of files); 'main' DTDs are used for texts, and are distinguished from 'auxiliary' DTDs, which are used for meta-information (writing system, feature system, structured header, tag set). tei2.dtd - main DTD file, used by all TEI documents teigis2.ent - defines parameter entities for TEI GIs teiclas2.ent - defines element classes and content models teiXX2.ent - defines parameter entities for one tag set XX (so: teipros2.ent, teidict2.ent, teilink2.ent ...) teiXX2.dtd - defines elements and attlists for one tag set XX (so: teipros2.dtd, teidict2.dtd, teilink2.dtd ...) In the following discussion, I often use the name 'tagset' as variable for 'prose' | 'verse' | 'drama' | ... | 'textcrit' | 'physical' TEI2.DTD has the following structure: I. Preliminaries declare TEI keywords (now: %INHERITED, %ISOdate) embed %TEI.elementNames (= system 'teigis2.ent'), which declares all the entities declare all tag-set selection entities (TEI.prose, TEI.verse, ...) as 'IGNORE' II. Define element classes embed %TEI.elementClasses (= system 'teiclas2.dtd'), on which see below III. Define TEI.2, TEI.corpus.2 (or TEI.2.corpus, or whatever it is now named) IV. Embed local extensions, core tag sets, base tag set, and additional tag sets (embedding local extensions here provides a more convenient interface for those adding new tags: if they define them in entity %TEI.extensions, they are embedded here and can therefore use our content-model parameter entities; based on experience with Odd.dtd and P2X.dtd, I think this simplifies life a lot for the modifier. TEIgis2.dtd contains a long list of entity declarations of the form if we substitute a different file, we can translate all gis into another language (but not attribute names, attribute values, or parameter entity names) TEIclas2.dtd defines the major parameter entities used in content models, including the entire class system of the core elements: I. Low-level classes (as in published fascicle of CO, definitions of hqinter, hqphrase, etc., to seg) II. High-level classes (more or less as in CO) III. Element classes marked for specific bases embed the %TEI.tagset.ent files, as appropriate. N.B. only one base may be declared at a time, unless the general or mixed base are used. The ent files are embedded thus: %TEI.dictionary.ent ]]> These files define the class ch.tagset and its model (e.g. %m.ch.dictionary), which contains all the chunk- and inter-level elements defined in the base (or: unique to the base -- if gi sets are disjoint, this is the same thing). More below on contents of these. Note that the general and mixed bases must embed all of these; it's not clear yet whether they will automatically embed all, or we can allow the user to specify exactly which bases should be included in a general file. Either way, these files must not define chunk.seq -- instead they will define tagset.seq (e.g. prose.seq, dictionary.seq, like this: IV. Declarations of common content models ]]> ]]> TEIdict2.ent and other base.ent files declare any and all useful parameter entities, including at least: m.ch.dictionary (more generally m.ch.tagset, model group of the class ch.tagset). This includes all the chunk and inter-level elements which are unique to this base dictionary.seq (will become definition of chunk.seq -- this level of indirection is optional; a colleague persuaded me it would make the pattern of chunk.seq definitions in TEIclas2.dtd more obvious, and it also helps ensure that each fascicle and dtd file will define what it treats as a chunk.seq a.dictionary (attribute def of class dictionary) -- used in defining a.global (should not overlap with a.global; if it does, rules of inheritance will cause the a.global defs to win) if there are no global atts in a base, this is '' TEItc2.ent and other additional.ent files similarly define useful parameter entities, but no m.ch.tagset or tagset.seq. They *do* declare a.textcrit (attributes of class textcrit) - used in a.global Tags defined by any tag set (base or additional) will all be in one of the following groups, and will become part of the document grammar as indicated. phrases : phrase-level elements should be declared as members of one of the existing phrase-level classes (data, edit, hqphrase, loc, seg), OR declared as members of a new class, which is a member of class PHRASE. The elements will then be picked up in the entity m.phrase, and will be legal in phrase.seq and paraContent. chunks : should be declared as members of m.ch.tagset; they will then be picked up as part of tagset.seq, and thus of chunk.seq inter-level elements : should be declared *both* as members of m.ch.tagset, which makes them part of tagset.seq and thus of chunk.seq; *and* as members of an existing inter-level class (bibl, hqinter, lists, notes) or of a new inter-level class, which will make them part of m.inter and thus of paraContent. others : elements which are none of the above may exist; they might be specialized subelements allowed only in certain crystals, etc. At some level, one of their legal ancestors must be chunk or inter-level element (or allowed as an inclusion exception on TEXT or TEI.2).