PreviousUpNext

29 Modifying the TEI DTD

Part 6

Technical Topics

29 Modifying the TEI DTD

These Guidelines provide an encoding scheme suitable for encoding a wide range of texts, and capable of being extended in well-defined and documented ways for texts that cannot be conveniently or appropriately encoded using what is provided. This chapter describes how the TEI encoding scheme may be modified and extended.

The formal mechanisms provided by these Guidelines are syntactic mechanisms for encoding information in electronic texts. The semantics associated with these syntactic mechanisms are described only informally in the accompanying text. Although the descriptions have been written with care, there will inevitably be cases where the intention of the contributors has not been conveyed with sufficient clarity to prevent users of the Guidelines from `extending' them in the sense of attaching slightly variant semantics to them.

Beyond this unintentional semantic extension, some of the elements provided can intentionally be used in a variety of ways; for example, the element <note> has an attribute type that can take on arbitrary string values, depending on how it is used in a document. A new type of annotation, therefore, requires no extension to the TEI document type definition.

Furthermore, there are several ways for combining and extending the existing syntactic mechanisms themselves. Earlier chapters have identified these:

This chapter explains how the supplied tag sets can be modified by suppressing elements, renaming elements, extending syntactic classes, and adding elements. The different techniques described here have different effects on the level of TEI conformance to be ascribed to a text, as described in chapter 28 .

The TEI DTD is designed to support modification of the tag sets in a documented way that can be validated by an SGML parser. Those wishing to modify the tag sets must do so using only the mechanisms described in this chapter if the resulting documents are to be TEI conformant.

This approach to modification implies the existence of two versions of any DTD derived from the TEI tag sets. The first version is a fixed human-readable version. The TEI DTD fragments that have appeared earlier in these Guidelines are all presented in this version, save in the few cases where forward pointers to this chapter appear. This is the publication version of the DTD. Applications that do not use any of the modification mechanisms described in this chapter should use these DTD fragments.

The second version of a TEI DTD is automatically derived from the first by the introduction of parameter entities. These parameter entities are given values that contain small parts of the definition of the DTD. If these parameter entities are not modified, their default expansion (equivalent to the publication version) is used within the DTD. However, their definitions can be easily changed for a specific application. This is the modifiable version of the DTD. It is also directly usable by an SGML parser.

In the absence of any modification, the TEI core DTD and the additions to it embodied in the base and additional tag sets behave as follows:

  1. All the elements described in the relevant sections of these Guidelines are defined.
  2. The names of these elements are as given by these Guidelines.
  3. The content model of each element is as given by these Guidelines.
  4. The attributes of each element and the names and types of these attributes are as given in these Guidelines.
  5. The membership of element classes, and the resultant inheritance of attributes are as given in these Guidelines.
The modification mechanisms allow the user to override these defaults in the following ways, while retaining conformance to the TEI Guidelines:
  1. The definition of elements may be suppressed, so deleting the associated elements from the modified DTD.
  2. The name (generic identifier) associated with an element may be changed, while preserving the semantics of the element. Note, however, that the new name may not clash with the default name of any element defined in these Guidelines.
  3. The global teiform attribute should be used to specify the TEI name for such renamed elements.
  4. Those parts of the content model of an element which are specified by classes may be extended by adding members to the classes.
  5. Further attributes, together with their names and types, may be specified for an individual element and existing attributes for an individual element may be renamed. Note that the new names may not clash with names of existing attributes for the element.
  6. Further attributes, together with their names and types, may be specified for the elements in a particular class and those inheriting characteristics from that class.
The modification mechanisms presented in this chapter are quite general, and may be used to make all the types of changes just listed. They can also be used to make more complete modifications of the DTD; if changes other than those listed above are made, however, the resulting text will no longer be TEI conformant.

These Guidelines provide for user modification of the TEI DTD largely by using SGML parameter entities at appropriate points in the DTD. It is not absolutely essential to understand them in detail to modify the TEI DTD (the examples later in this chapter can be followed cookbook fashion), but it will probably prove helpful.

Parameter entities are an SGML mechanism for allowing string substitution within markup declarations; they can thus can be used to effect changes in declarations. A parameter entity, unlike an element, may be declared more than once in an SGML document type definition; if more than one declaration is given, the parser uses the first one it encounters. Since the declaration subset within the document is read before the external file containing the predefined DTD, an entity declaration in the DTD subset will take precedence over one in the external file. In the TEI DTD, the literal string which defines the model group for some elements, say <p> , is made the value of a parameter entity; the actual element declaration for <p> contains not the literal string itself, but a reference to the parameter entity (in this case, paraContent ). If the document's DTD subset contains a declaration for paraContent , this will be used in preference to the standard definition within the external TEI DTD files. The redefinability of parameter entities accounts both for the TEI's use of parameter entities as the vehicle for effecting extensions, and for the separation of entity definitions from other material (to be defined below) that might be needed for certain modifications of the TEI DTD.

Local modifications are most conveniently grouped into two files, one containing modifications to the TEI parameter entities, and the other containing new or changed declarations of elements and their attributes. Names for these files should be specified by the parameter entities TEI.extensions.ent and TEI.extensions.dtd , by declarations such as the following:

<!ENTITY % TEI.extensions.ent system 'project.ent' >
<!ENTITY % TEI.extensions.dtd system 'project.dtd' >
These declarations must be given in the document's DTD subset. The first will be needed for all of the modifications discussed below. The second is only required for the last type of extension described. The parameter entities thus defined are then referenced at an appropriate point in the compiling of the TEI DTD, as further described in section 3.3 .

29.1 Kinds of Modification

There are several kinds of modification that can be made to the TEI DTD as follows:

  1. deletion of elements,
  2. renaming of elements,
  3. extension of classes, and
  4. modification of content models or attribute lists.
These are described in the remaining sections of this chapter.

Each kind of modification changes the set of documents that can be parsed using the DTD. Each combination of the original TEI DTD fragments may be thought of as defining a certain set of documents. Each DTD resulting from a modified set of TEI DTD fragments allows a different set of documents to be parsed. The set of documents parsed by the original DTD may be properly contained in the set of documents parsed by a modified DTD, or vice versa. Modifications that have either of these two results are called clean modifications in the remainder of the chapter. Alternatively, the set of documents parsed by the original DTD might overlap the set of documents parsed by the modified DTD with neither being properly contained in the other. Modifications that have this result are called unclean modifications in the remainder of the chapter.

29.1.1 Suppressing Elements

The simplest way to modify the supplied tag sets is to suppress one or more of the supplied elements. In the modifiable version of the DTDs, every element declaration is enclosed by a marked section. The marked section is governed by one of the keywords IGNORE or INCLUDE , which is provided indirectly using a parameter entity. This parameter entity has the same name as the generic identifier of the element. Thus, the declaration for the paragraph element, <p> , occurs within a marked section `guarded' by the parameter entity p :

<![ %p [
<!-- element and attlist declaration for p goes here   -->
]]>
The declaration given for these guarding entities in the modifiable version is INCLUDE in all cases. The construct above is interpreted thus: the first of the three lines is the opening of a marked section; when the parser encounters the section and sees the keyword INCLUDE as its guard (more precisely, sees a parameter entity the value of which is the keyword INCLUDE ), the content of the marked section is parsed; the second line of the three is the content of the marked section; and the third line of the three is the closing of a marked section. If the guard is changed to IGNORE , the SGML parser will ignore the content of the marked section.

Thus, to delete the declaration of a generic identifier and thus suppress the element entirely, the entity that provides the guard on the marked section wherein the element declaration appears must simply be set to IGNORE . For example, if the <note> element is not to be used in a particular application, the line

<!ENTITY % note 'IGNORE'>
must be inserted in the file which has its name given by the entity TEI.extensions.ent .

Two different cases of deleting one or more elements from the TEI DTD can be identified. The first case involves deleting only elements that are optional wherever they appear in TEI documents. Deleting these is clean in the sense that documents that are parseable with the modified DTD can also be parsed according to the original TEI DTD. To say this another way, the set of documents matching the new DTD is contained in the set of documents matching the original DTD.

The second case involves deleting elements that are required in one or more of their appearances in TEI documents. Deleting these is unclean in that some documents that can be parsed according to the new DTD could not be parsed according to the original TEI DTD. To say this another way, the set of documents matching the new DTD neither contains nor is contained in the set of documents matching the original DTD.

29.1.2 Renaming Elements

In the modifiable version of the TEI DTD, elements are not referred to directly by their generic identifiers; instead, the modifiable version of the DTD makes use of parameter entities that expand to the standard generic identifiers. This allows renaming of elements by redefining the appropriate parameter entities. The names of parameter entities used for naming are formed by taking the standard generic identifier of the element and attaching the string ``n.'' (for ``name'') as a prefix. Thus, the standard generic identifiers for paragraphs, notes and quotations, <p> , <note> , and <q> , are defined by declarations of the following form:

<!ENTITY % n.p    'p'>
<!ENTITY % n.note 'note'>
<!ENTITY % n.q    'q'>
These parameter entities are all contained within a file (teigis.ent ) which is embedded during the compilation of a TEI DTD. To change the name of an element therefore, all that is needed is to provide an overriding declaration for the appropriate parameter entity. For instance, the following declaration converts <note> to <annotation> :
<!ENTITY % n.note 'annotation'>
This declaration must be inserted in the file which has its name given by the entity TEI.extensions.ent .

Two different cases of renaming can be identified. The first case involves replacing existing names with names that are otherwise unused in the TEI name space. (This can be easily checked by looking in the index of the Guidelines.) Such a modification is clean in that the new DTD would still accepts any document accepted by the publication DTD (given the appropriate renaming of elements). The new name cannot possibly conflict with the generic identifier of any other element, since there can be no other occurrences. To say this another way, the set of documents matching the new DTD is isomorphic to the set of documents matching the old DTD. The example given results in a clean modification because there is no element <annotation> specified in these Guidelines. It is also true that any document not using the renamed element which parses under the unmodified DTD will also parse under the modified DTD.

The second case involves introducing a name already used somewhere in a TEI tag set. This is unclean in that it changes what an existing generic identifier means. The name in question could not be declared by any tag set that is used in the document, as it is syntactically invalid to provide two declarations for the same element. The new generic identifier might occur in some TEI tag set not currently included in the DTD used to parse the document. For example, if in some setting the element <note> were assigned the new name <fs> (because, say, notes are used in some technical document to record functional specifications) there might be no immediate problem. If however it was later decided to add the feature structure analysis tag set into the DTD used to parse the document, though, a name clash would occur. There would also be problems in interchanging the resulting documents without confusion.

As a special case, consider translating all of the generic identifiers for all elements into some other language, L. It may be, for example, that the word for ``paragraph'' in language L begins with the letter ``s'' and that thus the paragraph element is renamed as <s> . By the definition just given, this would be an unclean modification because an element <s> already exists in the TEI DTD. However, this is clearly not a problem so long as all of the names are redefined at once and that no collisions occur in the new name space: that is, provided that the TEI element <s> is renamed as some other string. This can be done by a total replacement of the file that contains the entity declarations for the names of the elements. This systematic replacement of names in the DTD must be followed by a systematic use (or replacement) of the new names in the document. To think about this in the terms used earlier, the set of documents matching the new DTD (with all names systematically changed in both the DTD and the documents) is isomorphic to the set of documents matching the original DTD with no names changed (in either the DTD or the documents).

The formal declarations of the parameter entities used for generic identifiers are contained in the file teigis.ent ; since their names and replacement texts are fully predictable, these parameter entities are not individually documented in the reference section of these Guidelines. The parameter entity tei.elementNames is used to embed the file teigis.ent in the DTD. A re-declaration for this parameter entity may therefore be used to embed a different version of this file:

<!ENTITY % tei.elementNames system 'OTAgis.ent' >

If an element is renamed using the techniques described here, its declaration for the global TEIform attribute will be left undisturbed; the default value will therefore still be the standard TEI name for the element. TEI-aware application programs can thus process TEI-conformant documents which rename TEI elements, since by consulting the TEIform attribute value the application can learn the standard name for the element and process it accordingly.

In the normal course of events, this value of this attribute will never be specified in a TEI-conformant document; all occurrences will have the default value. In some special circumstances, it can be useful to specify a non-default value on some instances of an element; this allows application programs to process correctly a locally defined element which usually corresponds to one TEI element (which would be expressed by the default value) but sometimes to another TEI element (which would be expressed by explicit values attached to the element instance).

29.1.3 Class Extension

In 3.7.2 , the concept of a class of elements that can appear in the same kinds of structural locations in a document was introduced. An entity is associated with each class named by prefixing the string ``m.'' (for ``model'') to the name of the class. For example, the value of the parameter entity m.bibl is a list of the members of the class bibl .

In the modifiable version of the TEI DTD, an additional entity is defined for each model class. This additional entity also takes the name of the class, this time prefixed by the string ``x.'' (for ``extension''). The default value of these x-dot entities is always the empty string. A reference to the corresponding x-dot entity is always included within the replacement string for each m-dot entity. This enables an encoder to add new members to a class simply by declaring a new value for its associated x-dot entity.

For example, the class bibl has the three members <bibl> , <biblFull> , and <biblStruct> . Its content-model entity is defined thus:

<!ENTITY % x.bibl '' >
<!ENTITY % m.bibl '%x.bibl bibl | biblFull | biblStruct' >
With the default value of the x-dot entity, this is the same as defining m.bibl with the replacement text bibl | biblFull | biblStruct .

An encoder can add an element to the class by providing a new declaration for the x-dot entity. For example, to add a new element called <myBib> , this definition would be used:

<!ENTITY % x.bibl 'myBibl |' >
Note that the specification of an x-dot entity must always end with the vertical bar character (for alternation). The definition would be inserted at the appropriate place in the file associated with the entity TEI.extensions.ent . This changes the replacement text of m.bibl from its default value to myBibl | bibl | biblFull | biblStruct . If more than one element is to be added to a class, the x-dot entity for the class should be redefined as a list of the new generic identifiers, each one (including the last) followed by a vertical bar.

Class extension is always clean in that the set of documents matching the DTD containing the extended class contains all of the documents matching the original DTD. Class extension can be by adding an existing element to a class, or by adding a new element (as described in the next section) and extending the class with the new element.

29.1.4 New content models

Encoders can modify the content models that specify what is contained in an element or set of elements defined by the TEI DTD, modify the attributes of existing elements, or add new elements to the DTD.

Content models or attributes for existing elements are modified in two stages. First, the existing definition of the element must be deleted in the manner described in the first section of this chapter. Second, a new definition of the element is given. This new definition must be inserted in the file associated with the entity TEI.extensions.dtd .

For example, suppose that symbolic designations to be marked with the element <term> can always be associated with a particular source. While the content model of the publication version of the TEI DTD is acceptable, the attribute list needs to be extended. To perform this modification, the following steps must be taken. The declaration

<!ENTITY % term 'IGNORE' >
must be inserted into the file associated with the entity TEI.extensions.ent . Then a new definition must be inserted into the file associated with the entity TEI.extensions.dtd . In this example, the definition will be the same as that given in 6.3.4 , save for the addition of a new attribute.
 <!ELEMENT term          - -  (%phrase.seq;) >
 <!ATTLIST term               %a.global;
           type               CDATA               #IMPLIED
           source             CDATA               #IMPLIED       >

New elements are defined by inserting their definitions into the file associated with the entity TEI.extensions.dtd . To be usable, they must somehow be included in the model for some existing element. This can be done either by class extension (which can now be seen to be a restricted, special case of the process defined here) or by redefining the element(s) within which the new element is to be included.

The set of documents matched by the modified DTD and the set of documents matched by the original DTD may be related in several different ways. It is certainly possible that the former could properly include the latter or vice versa; either of these could be said to be clean modifications because the set to be matched has become strictly larger or strictly smaller.

It is also possible that the set of documents matched by the modified DTD is different from the set matched by the original DTD and they may either contain some common documents or have no documents in common; either of these is said to be an unclean modification.

Radical revision is possible. It would be possible to remodel so that the <teiHeader> is not required, or that it is required but the minimal components described in chapter 5 , are no longer required, or that no <text> element is required. In fact, the mechanism, if used in an extreme way, permits deletion of the entire set of TEI definitions and their replacement by an entirely different DTD! Such revisions would result in documents that are not TEI conformant in even the broadest sense, and it is not intended that encoders use the mechanism in this way.

29.2 Documenting the Modifications

When the modification mechanisms are used, their use must be documented. There are two ways in which information about the modifications is recorded.

The first record of the modifications is in the use of the extension files. The file associated with the entity TEI.extensions.ent contains the specifications of the parameter entities that are redefined to accomplish the modifications. This file should be structured in such a way that readers can easily identify any modifications that have been made. The following structure is recommended.

<!--  The following elements are deleted     -->
<!--  The following elements are renamed     -->
<!--  The following classes are extended     -->
<!--  The following elements are revised   -->
The appropriate parameter entity specifications should be entered after each comment in this file. The order of the comments corresponds to the order of the discussion in this chapter (roughly, from simple to complex). Setting the appropriate entity to IGNORE for each revised element is done in the last section of the file.

The file associated with the entity TEI.extensions.dtd should contain the DTD fragments for new and changed element definitions. The following structure is recommended.

<!--  The following declarations define new extensions    -->
<!--    (element and attlist specifications for new tags  -->
<!--    introduced in part 3 of the ent file go here)     -->
<!--  ...                                                 -->
<!--  The following declarations define revised tags    -->
<!--    (element and attlist specifications for tags      -->
<!--    mentioned in part 4 of the ent file go here       -->
<!--  ...                                                 -->

These files give an SGML parser sufficient information to implement the modifications and are also useful in providing human readers with some indication of the changes made in the TEI DTD. Full documentation of any additional or modified elements should also be provided, using the ancillary tag set described in chapter 27 .


PreviousUpNext