Guidelines

The TEI Infrastructure

This chapter describes the infrastructure for the encoding scheme defined by these Guidelines. It introduces the conceptual framework within which the following chapters are to be understood, and the means by which that conceptual framework is implemented. It assumes some familiarity with XML and XML schemas (see chapter ) but is intended to be accessible to any user of these Guidelines. Other chapters supply further technical details, in particular chapter which describes the XML schema used to express the Guidelines themselves, and chapter which combines a discussion of modification and conformance issues with a description of the intended behaviour of an ODD processor; these chapters should be read by anyone intending to implement a new TEI-based system.

The TEI encoding scheme consists of a number of modules, each of which declares particular XML elements and their attributes. Part of an element's declaration includes its assignment to one or more element classes. Another part defines its possible content and atttributes with reference to these classes. This indirection gives the TEI system much of its strength and its flexibility. Elements may be combined more or less freely to form a schema appropriate to a particular set of requirements. It is also easy to add new elements which reference existing classes or elements to a schema, as it is to exclude some of the elements provided by any module included in a schema.

In principle, a TEI schema may be constructed using any combination of modules. However, certain TEI modules are of particular importance, and should always be included in all but exceptional circumstances: the module tei described in the present chapter is of this kind because it defines classes, macros, and datatypes which are used by all other modules. The core module, defined in chapter contains declarations for elements and attributes which are likely to be needed in almost any kind of document, and is therefore recommended for global use. The header module defined in chapter provides declarations for the metadata elements and attributes constituting the TEI Header, a component which is required for TEI conformance, while the textstructure module defined in chapter declares basic structural elements needed for the encoding of most book-like objects. Most schemas will therefore need to include these four modules.

The specification for a TEI schema is itself a TEI document, using elements from the module described in chapter : we refer to such a document informally as an ODD document, from the design goal originally formulated for the system: One Document Does it all. Stylesheets for maintaining and processing ODD documents are maintained by the TEI, and these Guidelines are also maintained as such a document. As further discussed in , an ODD document can be processed to generate a schema expressed using any of the three schema languages currently in wide use: the XML DTD language, the ISO RELAX NG language, or the W3C Schema language, as well as to generate documentation such as the Guidelines and their associated web site.

The bulk of this chapter describes the TEI infrastructure module itself. Although it may be skipped at a first reading, an understanding of the topics addressed here is essential for anyone planning to take full advantage of the TEI customization techniques described in chapter .

The chapter begins by briefly characterizing each of the modules available in the TEI scheme. Section describes in general terms the method of constructing a TEI schema in a specific schema language such as XML DTD language, RELAX NG, or W3C Schema.

The next and largest part of the chapter introduces the attribute and element classes used to define groups of elements and their characteristics (section ).

Finally, section introduces the concept of macros, which are used to express some commonly used content models, and lists the datatypes used to constrain the range of legal values for TEI attributes (section ).

TEI Modules

These Guidelines define several hundred elements and attributes for marking up documents of any kind. Each definition has the following components: a prose description a formal declaration, expressed using a special-purpose XML vocabulary defined by these Guidelines in combination with elements taken from the ISO schema language RELAX NG usage examples

Each chapter of the Guidelines presents a group of related elements, and also defines a corresponding set of declarations, which we call a module. All the definitions are collected together in the reference sections provided as an appendix. Formal declarations for a given chapter are collected together within the corresponding module. For convenience, each element is assigned to a single module, typically for use in some specific application area, or to support a particular kind of usage. A module is thus simply a convenient way of grouping together a number of associated element declarations. In the simple case, a TEI schema is made by combining together a small number of modules, as further described in section below.

The following table lists the modules defined by the current release of the Guidelines: Module name Formal public identifier Where defined analysisAnalysis and InterpretationcertaintyCertainty and UncertaintycoreCommon CorecorpusMetadata for Language CorporadictionariesPrint DictionariesdramaPerformance TextsfiguresTables, Formulae, FiguresgaijiCharacter and Glyph DocumentationheaderCommon Metadataiso-fsFeature StructureslinkingLinking, Segmentation, and AlignmentmsdescriptionManuscript DescriptionnamesdatesNames, Dates, People, and PlacesnetsGraphs, Networks, and TreesspokenTranscribed SpeechtagdocsDocumentation ElementsteiTEI InfrastructuretextcritText CriticismtextstructureDefault Text StructuretranscrTranscription of Primary SourcesverseVerse

For each module listed above, the corresponding chapter gives a full description of the classes, elements, and macros which it makes available when it is included in a schema. Other chapters of these Guidelines explore other aspects of using the TEI scheme.

Defining a TEI Schema

To determine that an XML document is valid (as opposed to merely well-formed), its structure must be checked against a schema, as discussed in chapter . For a valid TEI document, this schema must be a conformant TEI schema, as further defined in chapter . Local systems may allow their schema to be implicit, but for interchange purposes the schema associated with a document must be made explicit. The method of doing this recommended by these Guidelines is to provide explicitly or by reference a TEI schema specification against which the document may be validated.

A TEI-conformant schema is a specific combination of TEI modules, possibly also including additional declarations that modify the element and attribute declarations contained by each module, for example to suppress or rename some elements. The TEI provides an application-independent way of specifying a TEI schema by means of the schemaSpec element defined in chapter . The same system may also be used to specify a schema which extends the TEI by adding new elements explicitly, or by reference to other XML vocabularies. In either case, the specification may be processed to generate a formal schema, expressed in a variety of specific schema languages, such as XML DTD language, RELAX NG, or W3C Schema. These output schemas can then be used by an XML processor such as a validator or editor to validate or otherwise process documents. Further information about the processing of a TEI formal specification is given in chapter .

A Simple Customization

The simplest customization of the TEI scheme combines just the four recommended modules mentioned above. In ODD format, this schema specification takes this form:

This schema specification contains references to each of four modules, identified by the key attribute on the moduleRef element. The schema specification itself is also given an identifier (TEI-minimal). An ODD processor will generate an appropriate schema from this set of declarations, expressed using the XML DTD language, the ISO RELAX NG language, the W3C Schema language, or in principle any other adequately powerful schema language. The resulting schema may then be associated with the document instance by one of a number of different mechanisms, as further described in chapter . The start point (or root element) of document instances to be validated against the schema is specified by means of the start attribute. Further information about the processing of an ODD specification is given in .

A Larger Customization

These Guidelines introduce each of the modules making up the TEI scheme one by one, and therefore, for clarity of exposition, each chapter focusses on elements drawn from a single module. In reality, of course, the markup of a text will draw on elements taken from many different modules, partly because texts are heterogenous objects, and partly because encoders have different goals. Some examples of this heterogeneity include: a text may be a collection of other texts of different types: for example, an anthology of prose, verse, and drama; a text may contain other smaller, embedded texts: for example, a poem or song included in a prose narrative; some sections of a text may be written in one form, and others in a different form: for example, a novel where some chapters are in prose, others take the form of dictionary entries, and still others the form of scenes in a play; an encoded text may include detailed analytic annotation, for example of rhetorical or linguistic features; an encoded text may combine a literal transcription with a diplomatic edition of the same or different sources; the description of a text may require additional specialised metadata elements, for example when describing manuscript material in detail.

The TEI provides mechanisms to support all of these and many other use cases. The architecture permits elements and attributes from any combination of modules to co-exist within a single schema. Within particular modules, elements and attributes are provided to support differing views of the granularity of a text, for example: a definition of a corpus or collection as a series of TEI documents, sharing a common TEI header (see chapter ) a definition of composite texts which combine optional front- and back-matter with a group of collected texts, themselves possibly composite (see section ) an element for the representation of embedded texts, where one narrative appears to float within another (see section )

Subsequent chapters of these Guidelines describe in detail markup constructs appropriate for these and many other possible features of interest. The markup constructs can be combined as needed for any given set of applications or project.

For example, a project aiming to produce an ambitious digital edition of a collection of manuscript materials, to include detailed metadata about each source, digital images of the content, along with a detailed transcription of each source, and a supporting biographical and geographical database might need a schema combining several modules, as follows:

Alternatively, a simpler schema might be used for a part of such a project: those preparing the transcriptions, for example, might need only elements from the core, textstructure, and transcr modules, and might therefore prefer to use a simpler schema such as that generated by the following:

The TEI architecture also supports more detailed customization beyond the simple selection of modules. A schema may suppress elements from a module, suppress some of their attributes, change their names, or even add new elements and attributes. Detailed discussion of the kind of modification possible in this way is provided in and conformance rules relating to their application are discussed in . These facilities are available for any schema language (though some features may not be available in all languages). The ODD language also makes it possible to combine TEI and non-TEI modules into a single schema, provided that the non-TEI module is expressed using the RELAX NG schema language (see further ).

The TEI Class System

The TEI scheme distinguishes about five hundred different elements. To aid comprehension, modularity, and modification, the majority of these elements are formally classified in some way. Classes are used to express two distinct kinds of commonality among elements. The elements of a class may share some set of attributes, or they may appear in the same locations in a content model. A class is known as an attribute class if its members share attributes, and as a model class if its members appear in the same locations. In either case, an element is said to inherit properties from any classes of which it is a member.

Classes (and therefore elements which are members of those classes) may also inherit properties from other classes. For example, supposing that class A is a member (or a subclass) of class B, any element which is a member of class A will inherit not only the properties defined by class A, but also those defined by class B. In such a situation, we also say that class B is a superclass of class A. The properties of a superclass are inherited by all members of its subclasses.

A basic understanding of the classes into which the TEI scheme is organized is strongly recommended and is essential for any successful customization of the system.

Attribute Classes

An attribute class groups together elements which share some set of common attributes. Attribute classes are given names beginning att. and are usually adjectival. For example, the members of the class att.canonical have in common a key and a ref attribute, both of which are inherited from their membership in the class rather than individually defined for each element. These attributes are said to be defined by (or inherited from) the att.canonical class. If another element were to be added to the TEI scheme for which these attributes were considered useful, the simplest way to provide them would be to make the new element a member of the att.canonical class. Note also that this method ensures that the attributes in question are always defined in the same way, taking the same default values etc., no matter which element they are attached to.

Some attribute classes are defined within the tei infrastructural module and are thus globally available. Other attribute classes are specific to particular modules and thus defined in other chapters. Attributes defined by such classes will not be available unless the module concerned is included in a schema.

The attributes provided by an attribute class are those specified by the class itself, either directly, or by inheritance from another class. For example, the attribute class att.pointing.group provides attributes domains and targFunc to all of its members. This class is however a subclass of the att.pointing class, from which its members also inherit the attributes type and evaluate. Members of the class att.pointing will thus have these two attributes, while members of the class att.pointing.group will have all four.

Note that some modules define superclasses of an existing infrastructural class. For example, the global attribute class att.divLike makes attributes org, part, and sample available, while the att.metrical class, which is specific to the verse module, provides attributes met, real, and rhyme. Because att.metrical is defined as a superclass of att.divLike, all six of these attributes are available to elements; the declaration for att.metrical adds its three attributes to the three already defined by att.divLike when the verse module is included in a schema. If, however, this module is not included in a schema, then the att.divLike elements supplies only the three attributes first mentioned.

Attributes specific to particular modules are documented along with the relevant module rather than in the present chapter. One particular attribute class, known as att.global, is common to all modules, and is therefore described in some detail in the next section. A full list of all attribute classes is given in below.

Global Attributes

The following attributes are defined for every TEI element.

These attributes are optionally available for any TEI element; none of them is required.

Element Identifiers and Labels

The value supplied for the xml:id attribute must be a legal name, as defined in the World Wide Web Consortium's XML Recommendation. This means that it must begin with a letter, or the underscore character (_), and contain no characters other than letters, digits, hyphens, underscores, full stops, and certain combining and extension characters.The colon is also by default a valid name character; however, it has a specific purpose in XML (to indicate namespace prefixes), and may not therefore be used in any other way within a name.

In XML names (and thus the values of xml:id in an XML TEI document) uppercase and lowercase letters are distinguished, and thus partTime and parttime are two distinctly different names, and could (though perhaps unwisely) be used to denote two different element occurrences.

If two elements are given the same identifier, a validating XML parser will signal a syntax error. The following example, therefore, is not valid: <p xml:id="PAGE1"><q>What's it going to be then, eh?</q></p> <p xml:id="PAGE1">There was me, that is Alex, and my three droogs, that is Pete, Georgie, and Dim, ... </p>

For a discussion of methods of providing unique identifiers for elements, see section .

The n attribute also provides an identifying name or number for an element, but in this case the information need not be a legal xml:id value. Its value may be any string of characters; typically it is a number or other similar enumerator or label. For example, the numbers given to the items of a numbered list may be recorded with the n attribute; this would make it possible to record errors in the numeration of the original, as in this list of chapters, transcribed from a faulty original in which the number 10 is used twice, and 11 is omitted: About These Guidelines A Gentle Introduction to SGML Verse Drama Spoken Materials Print Dictionaries The n attribute may also be used to record non-unique names associated with elements in a text, possibly together with a unique identifier as in the following example:

As noted above there is no requirement to record a value for either the xml:id or the n attribute. Any XML processor can identify the sequential position of one element within another in an XML document without any additional tagging. An encoding in which each line of a long poem is explicitly labelled with its numerical sequence such as the following is therefore probably redundant.

Language Indicators

The xml:lang attribute indicates the natural language and writing system applicable to the content of a given element. If it is not specified, the value is inherited from that of the immediately enclosing element. As a rule, therefore, it is simplest to specify the base language of the text on the TEI element, and allow most elements to take the default value for xml:lang; the language of an element then need be explicitly specified only for elements in languages other than the base language. It is strongly recommended that all language shifts in the source be explicitly identified by use of the xml:lang attribute, as further described in chapter .

The values used for the xml:lang attribute must be constructed in a particular way, using values from standard lists. See further .

The following two encodings convey the same information about the language of the text. In the first, the xml:lang attributes on the emph elements specify the same value as that on the parent p element, while in the second they inherit that value without specifying it.

... Both parties deprecated war, but one of them would make war rather than let the nation survive, and the other would accept war rather than let it perish, and the war came.

In the following example, by contrast, the xml:lang attribute on the term element must be given if we wish to record the fact that the technical terms used are Latin rather than English; no xml:lang attribute is needed on the q element, by contrast, because it is in the same language as its parent.

The constitution declares that no bill of attainder or ex post facto law shall be passed. ...

Note that additional information about a particular language may be supplied in the language element within the header (see section ).

Rendition Indicators

The rend attribute is used to give information about the physical presentation of the text in the source. In the following example, it is used to indicate that both the emphasized word and the proper name are printed in italics:

... Their motives might be pure and pious; but he was equally alarmed by his knowledge of the ambitious Bohemond, and his ignorance of the Transalpine chiefs: ...

If all or most emph and name elements are rendered in the text by italics, it will be more convenient to register that fact in the TEI header once and for all (using the rendition element discussed below) and specify a rend value only for any elements which deviate from the stated rendition.

Although the contents of the rend attribute are free text, in any given project, encoders are advised to adopt a standard vocabulary with which to describe typographic or manuscript rendition of the text.

The rendition element defined in may be used to hold such descriptions, expressed in free text, or using a formal language. A rendition element can then be associated with any element, either by default, or by means of the global rendition attribute. For example: font-style: italic

The rendition attribute always points to one or more rendition elements, each of which defines some aspect of the rendering or appearance of the text in its original form. These details may be described using a formal language, such as CSS () or XSL-FO (); in some other formal language developed for a specific project; or informally in running prose. Although languages such as CSS and XSL-FO are generally used to describe document output to screen or print, they nonetheless provide formal and precise mechanisms for describing the appearance of many source documents, especially print documents, but also many aspects of manuscript documents. For example, both CSS and XSL-FO provide mechanisms for describing typefaces, weight, and styles; character and line spacing; and so on.

If both rendition and rend attributes are provided for a given element, the latter always takes precedence. The rendition attribute is analogous to the X/HTML class attribute, which references style declarations in a Cascading Style Sheet. The rend attribute is analogous to the XHTML or HTML style attribute, which provides a mechanism for embedding inline rendition information at the point of use within a document. Note that, in either case, the TEI attributes describe the rendition or appearance of the source document, not intended output renditions, although often the two may be closely related.

&att.ascribed; &att.canonical; &att.dimensions; &att.damaged; &att.datable.w3c; &att.datable; &att.declarable; &att.declaring; &att.divLike; &att.duration.w3c; &att.duration; &att.editLike; &att.global; &att.handFeatures; &att.internetMedia; &att.interpLike; &att.measurement; &att.naming; &att.placement; &att.segLike; &att.sourced; &att.spanning; &att.tableDecoration; &att.timed; &att.transcriptional; &att.translatable; &att.typed; &att.xmlspace;

Model Classes

As noted above, the members of a given TEI model class share the property that they can all appear in the same location within a document. Wherever possible, the content model of a TEI element is expressed not directly in terms of specific elements, but indirectly in terms of particular model classes. This makes content models simpler and more consistent; it also makes them much easier to understand and to modify.

Like attribute classes, model classes may have subclasses or superclasses. Just as elements inherit from a class the ability to appear in certain locations of a document (wherever the class can appear), so all members of a subclass inherit the ability to appear wherever any superclass can appear. To some extent, the class system thus provides a way of reducing the whole TEI galaxy of elements into a tidy hierarchy. This is however not entirely the case.

In fact, the nature of a given class of elements can be considered along two dimensions: as noted, it defines a set of places where the class members are permitted within the document hierarchy; it also implies a semantic grouping of some kind. For example, the very large class of elements which can appear within a paragraph comprises a number of other classes, all of which have the same structural property, but which differ in their field of application. Some are related to highlighting, while others relate to names or places, and so on. In some cases, the set of places where class members are permitted is very constrained: it may just be within one specific element, or one class of element, for example. In other cases, elements may be permitted to appear in very many places, or in more than one such set of places.

These factors are reflected in the way that model classes are named. If a model class has a name containing part, such as model.divPart or model.biblPart then it is primarily defined in terms of its structural location. For example, those elements (or classes of element) which appear as content of a div constitute the model.divPart class; those which appear as content of a bibl constitute the model.biblPart class. If, however, a model class has a name containing like, such as model.biblLike or model.nameLike, the implication is that its members all have some additional semantic property in common, for example containing a bibliographic description, or containing some form of name, respectively. These semantically-motivated classes often provide a useful way of dividing up large structurally-motivated classes: for example, the very general structural class model.pPart.data (data elements that form part of a paragraph) has four semantically-motivated member classes (model.addressLike, model.dateLike, model.measureLike, and model.nameLike), the last of these being itself a superclass with three members.

Although most classes are defined by the tei infrastructure module, a class cannot be populated unless some other specific module is included in a schema, since element declarations are contained by modules. Classes are not declared top down, but instead gain their members as a consequence of individual elements' declaration of their membership. The same class may therefore contain different members, depending on which modules are active. Consequently, the content model of a given element (being expressed in terms of model classes) may differ depending on which modules are active.

Some classes contain only a single member, even when all modules are loaded. One reason for declaring such a class is to make it easier for a customization to add new member elements in a specific place, particularly in areas where the TEI does not make fully elaborated proposals. For example, the TEI class model.rdgLike, initially empty, is expanded by the textcrit module to include just the TEI rdg element. A project wishing to add an alternative way of structuring text-critical information could do so by defining their own elements and adding it to this class.

Another reason for declaring single-member classes is where the class members are not needed in all documents, but appear in the same place as elements which are very frequently required. For example, the specialised element g used to represent a non-Unicode character or glyph is provided as the only member of the model.gLike class when the gaiji module is added to a schema. References to this class are included in almost every content model, since if it is used at all the g must be available wherever text is available; however these references have no effect unless the gaiji module is loaded.

At the other end of the scale, a few of the classes predefined by the tei module are subsequently populated with very many members. For example, the class model.pPart groups all the classes of element which can appear within a p or paragraph element. The core module alone adds more than fifty elements to this class; the namesdates module adds another twenty, as does the tagdocs module. Since the p element is one of the basic building blocks of a TEI document it is not surprising that each module will need to add elements to it. The class system here provides a very convenient way of controlling the resulting complexity. Typically, elements are not added directly to these very general classes, but via some intermediate semantically-motivated class.

Just as there are a few classes which have a single member, so there are some classes which are used only once in the TEI architecture. These classes, which have no superclass and therefore do not fit into the class hierarchy defined here, are a convenient way of maintaining elements which are highly structured internally, but which appear from the outside to be uniform objects like others at the same level.In former editions of these Guidelines, such elements were known metaphorically as crystals. Members of such classes can only ever appear within one element, or one class of elements. For example, the class model.addrPart is used only to express the content model for the element address; it references some other classes of elements, which can appear elsewhere, and also some elements which can only appear inside an address.

Basic Model Classes

The TEI class system makes the following threefold division of elements: divisions high level, possibly self-nesting, major divisions of texts. These elements populate the classes model.divLike, model.div1Like, etc. chunks elements such as paragraphs and other paragraph-level elements, which can appear directly within texts or within such divisions, but not within other chunks. These elements populate the class model.divPart, either directly or by means of other classes such as model.pLike (paragraph-like elements), model.entryLike, etc. phrase-level elements elements such as highlighted phrases, book titles, or editorial corrections which can occur only within chunks (paragraphs or paragraph-level elements), but not between them (and thus cannot appear directly within a division). These elements populate the class model.phrase.Note that in this context, phrase means any string of characters, and can apply to individual words, parts of words, and groups of words indifferently; it does not refer only to linguistically-motivated phrasal units. This may cause confusion for readers accustomed to applying the word in a more restrictive sense.

The TEI identifies the following fundamental groupings derived from these three: inter-level elements elements such as lists, notes, quotations, etc. which can appear either between chunks (as children of a div) or within them; these elements populate the class model.inter. Note that this class is not a superset of the model.phrase and model.chunk classes but rather the group of elements which are both chunk-like and phrase-like; the classes model.phrase, model.pLike, and model.inter are all disjoint. components elements which can appear directly within texts or text divisions; this is a combination of the inter- and chunk- level elements defined above. These elements populate the class model.common, which is defined as a superset of the classes model.divPart, model.inter, and (when the dictionary module is included in a schema) model.entryLike. Broadly speaking, the front, body, and back of a text each comprises a series of components, optionally grouped into divisions.

As noted above, some elements and element classes belong to none of these groupings; however, over two-thirds of the 500+ elements defined in the present edition of these Guidelines are classified in this way. Future editions of these recommendations will extend and develop this classification scheme.

A complete alphabetical list of all model classes is provided in .

&model.nameLike.agent; &model.segLike; &model.hiLike; &model.emphLike; &model.highlighted; &model.dateLike; &model.measureLike; &model.egLike; &model.graphicLike; &model.offsetLike; &model.pPart.msdesc; &model.pPart.editorial; &model.pPart.transcriptional; &model.pPart.edit; &model.ptrLike; &model.lPart; &model.global.meta; &model.milestoneLike; &model.gLike; &model.oddDecl; &model.oddRef; &model.phrase.xml; &model.specDescLike; &model.biblLike; &model.handDescPart; &model.headLike; &model.labelLike; &model.listLike; &model.noteLike; &model.lLike; &model.pLike; &model.stageLike; &model.featureVal.complex; &model.featureVal.single; &model.entryPart; &model.entryPart.top; &model.global.edit; &model.global.spoken; &model.divPart; &model.persTraitLike; &model.persStateLike; &model.persEventLike; &model.personLike; &model.personPart; &model.placeTraitLike; &model.placeNamePart; &model.placeStateLike; &model.placeEventLike; &model.publicationStmtPart; &model.glossLike; &model.quoteLike; &model.qLike; &model.rdgLike; &model.respLike; &model.divWrapper; &model.divTopPart; &model.divTop; &model.frontPart.drama; &model.pLike.front; &model.divBottomPart; &model.divBottom; &model.titlepagePart; &model.msItemPart; &model.choicePart; &model.recordingPart; &model.imprintPart; &model.catDescPart; &model.settingPart; &model.textDescPart; &model.castItemPart; &model.physDescPart; &model.addressLike; &model.nameLike; &model.global; &model.featureVal; &model.biblPart; &model.frontPart; &model.addrPart; &model.pPart.data; &model.inter; &model.common; &model.phrase; &model.limitedPhrase; &model.divLike; &model.divGenLike; &model.div1Like; &model.div2Like; &model.div3Like; &model.div4Like; &model.div5Like; &model.div6Like; &model.div7Like;

Macros

The infrastructure module defined by this chapter also declares a number of macros, or shortcut names for frequently occurring parts of other declarations. Macros are used in two ways in the TEI scheme: to stand for frequently-encountered content models, or parts of content models (); and to stand for attribute datatypes ().

Standard Content Models

As far as possible, the TEI schemas use the following set of frequently-encountered content models to help achieve consistency among different elements.

The present version of the TEI Guidelines includes some 500 different elements. shows, in descending order of frequency, the seven most commonly used content models. Content modelNumber of elements using thisDescriptionmacro.phraseSeq83any combination of text with elements from the model.gLike, model.global, or model.phrase classesmacro.paraContent49macro.phraseSeq with the addition of model.interempty39 elements that have no contentmacro.specialPara24macro.paraContent with the addition of model.divPartmacro.phraseSeq.limited24a subset of model.phraseSeq appropriate for use in non-transcriptional contextstext21plain untagged textmacro.xtext19any combination of text with elements from the model.gLike class

¯o.paraContent; ¯o.limitedContent; ¯o.phraseSeq; ¯o.phraseSeq.limited; ¯o.specialPara; ¯o.xtext;

Datatype Macros

The values which attributes may take in a TEI schema are defined, for the most part, by reference to a TEI datatype. Each such datatype is defined in terms of other primitive datatypes, derived mostly from W3C Schema Datatypes, literal values, or other datatypes. This indirection makes it possible for a TEI application to set constraints either globally or in individual cases, by redefining the datatype definition or the reference to it respectively. In some cases, the TEI datatype includes additional usage constraints which cannot be enforced by existing schema languages, although a TEI-compliant processor should attempt to validate them (see further discussion in chapter ).

Where literal values or name tokens are used in a datatype definition, an associated value list supplies definitions for the significance of suggested or (in the case of closed lists) all possible values.

TEI-defined datatypes may be grouped into those which define normalised values for numeric quantities, probabilities, or temporal expressions, those which define various kinds of shorthand codes or keys, and those which define pointers or links.

The following datatypes are used for attributes which are intended to hold normalized values of various kinds. First, expressions of quantity or probability:

Examples of attributes using the data.probability datatype include degree on damage or certainty; examples of data.numeric include quantity on members of the att.measurement class or value on numeric; examples of data.count include cols on cell and table.

&data.certainty; &data.probability; &data.numeric; &data.count;

Next, the datatypes used for attributes which are intended to hold normalized dates or times, durations, or truth values:

Note that in each of these cases the values used are those recommended by existing international standards: ISO 8601 as profiled by XML Schema Part 2: Datatypes Second Edition in the case of durations, times, and date; W3C Schema datatypes in the case of truth values; BCP 47 in the case of language; and ISO 5218 in the case of sex.

&data.temporal.w3c; &data.duration.w3c; &data.truthValue; &data.xTruthValue; &data.language; &data.sex;

The following datatypes have more specialised uses:

&data.namespace; &data.outputMeasurement; &data.pattern; &data.pointer;

By far the largest number of TEI attributes take values which are coded values or names of some kind. These values may be constrained or defined in a number of different ways, each of which is given a different name, as follows:

The attribute key provided by the att.canonical class is currently the only attribute of type data.key. It is used to supply an externally-defined identifier, such as a database key or filename. Because such identifiers are externally-defined, no constraints are placed on their possible values: any string of Unicode characters may be used. Any constraints on their values, such as the rules for constructing a valid database key in a particular system, may be documented by a tagUsage element in the TEI Header, but are not enforced by the datatype as defined here. Such system-specific constraints may however be added to a TEI schema by using the customisation techniques methods described in .

Attributes of type data.word, such as age on person, are used to supply an identifier expressed as any kind of single token or word. The TEI places a few constraints on the characters which may be used for this purpose: only Unicode characters classified as letters, digits, punctuation characters, or symbols can appear in an attribute value of this kind. Note in particular that such values cannot include whitespace characters. Legal values include cholmondeley, été, 1234, _content, or xml:id, but not grand wazoo. Attributes of this kind are sometimes used to associate (by co-reference) elements of different types.

Attributes of type data.name are also words in this sense, but they have the additional constraint that they must be legal XML identifiers, as defined by the XML 1.0 specification, or successors. As such, they may not begin with digits or punctuation characters. Legal identifiers include cholmondeley, été, e_content, or xml:id, but not grand wazoo or 1234. Attributes of this kind are typically used to represent XML element or attribute names.

Attributes of type data.enumerated, such as new on shift or evidence supplied by att.editLike, have the same definition as data.word above, with the added constraint that the word supplied is taken from a specific list of possibilities. In each case, the element or class specification which includes the definition for the attribute will also contain a list of possible values, together with a prose description of their intended significance. This list may be open (in which case the list is advisory), or closed (in which case it determines the range of legal values). In this latter case, the datatype will not be data.enumerated, but an explicit list of the possible values.

Attributes of type data.code are similar in function, in that they also supply encoded names for values which are defined in more detail elsewhere. In this case, however, the full definition is supplied as content of another XML element, typically but not necessarily in the same document, and it is referenced by means of a pointer.

&data.key; &data.word; &data.code; &data.name; &data.enumerated;

An attribute may, of course, take more than one value of a given type, for example a list of pointer values, or a list of words. In the TEI scheme, this information is regarded as a property of the datatype element used to document the attribute in question rather than as a distinct datatype. See further .

The TEI Infrastructure Module

The tei module defined by this chapter is a required component of any TEI schema. It provides declarations for all datatypes, and initial declarations for the attribute classes, model classes, and macros used by other modules in the TEI scheme. Its components are listed below in alphabetical order: TEI Infrastructure Declarations for classes, datatypes, and macros available to all TEI modules Infrastructure de la TEI 所有TEI模組可用的元素集、資料類型、巨集指令之宣告 Dichiarazione di classi, tipi di dati (datatype)e macro disponibili in tutti i moduli TEIDeclaraçoes de classes, tipos de dados, e macros disponíveis em todos os módulos TEI 全TEIモジュールで使用可能なデータ型，クラス，マクロ．

The order in which declarations are made within the infrastructure module is critical, since several class declarations refer to others, which must therefore precede them. Other constraints on the order of declarations derive from the way in which the modularity of the TEI scheme is implemented in different schema languages. The XML DTD fragment implementing this TEI module makes extensive use of parameter entities and marked sections to effect a kind of conditional construction; the RELAX NG schema fragment similarly predeclares a number of patterns with null (notAllowed) values. These issues are further discussed in chapter .