[1] Not all members listed were able to serve throughout the development of the Guidelines.

[2] This Workgroup was jointly sponsored by the Association for History and Computing.

[3] TEI documents bear identifying numbers which indicate the provenance of the document (here simply `TEI', in other cases the TEI work group number, e.g. `TEI AI5'), the type of document (here `U' and `T', meaning users' guide or users' manual and sample text(s)), and a sequential number. The TEI document number of the document in hand is TEI P3 (for TEI public proposal number 3).

[4] International Organization for Standardization, ISO 8879: Information processing---Text and office systems---Standard Generalized Markup Language (SGML), ([Geneva]: ISO, 1986).

[5] Work is currently going on in the standards community to create (using SGML syntax) a definition of a standard ``document style semantics and specification language'' or DSSSL.

[6] The actual characters used for the delimiting characters (the angle brackets, exclamation mark and solidus) may be redefined, but it is conventional to use the characters used in this description.

[7] The example is taken from William Blake's Songs of innocence and experience (1794). The markup is designed for illustrative purposes and is not TEI-conformant.

[8] Note that this simple example has not addressed the problem of marking elements such as sentences explicitly; the implications of this are discussed below in section 2.5.2 .

[9] Like the delimiters, these are assigned formal names by the standard and may be redefined with an appropriate SGML declaration.

[10] What are here called ``group connectors'' are referred to by the SGML standard simply as ``connectors''; the longer term is preferred here to stress the fact that these connectors are used only in SGML model groups and name groups. Like the delimiters and the occurrence indicators, group connectors are assigned formal names by the standard and may be redefined with an appropriate SGML declaration.

[11] It will not have escaped the astute reader that the fact that verse paragraphs need not start on a line boundary seriously complicates the issue; see further section 2.5.2 .

[12] By convention case is significant in entity names, unlike element names.

[13] Strictly speaking, SGML does not require system identifiers to be operating-system file names, nor need external entities be files at all; they can in principle be any data source available to the SGML processor: files, results of database queries, results of calls to system functions, Web pages --- anything at all. System identifiers can be any method of naming the entity which the SGML parser's interface to its operating environment can use to elicit the data from the environment. It is simpler, however, when first learning SGML, to think of external entities as referring to files, and this discussion therefore ignores the other possibilities. All existing SGML processors do support the use of external entities to refer to files; fewer support the other possible uses of external entities.

[14] Such entity declarations might be used in extending the TEI base tag set for prose using the declarations found in mystuff.dtd .

[15] This is so because the declarations in the DTD subset are read before those in the external DTD file, and the first declaration of a given entity is the one which counts. This was described briefly in section 2.7 .

[16] The SGML Open catalog format described here is documented in SGML Open Technical Resolution 9401:1997, Entity Management , which is available from the SGML Open Web site at http://www.sgmlopen.org .

[17] A parameter entity is an SGML entity used only in markup declarations; references to parameter entities are delimited by a percent sign and a semicolon rather than the ampersand and colon used for general entity references. The entity TEI.core.ent , for example, would be referred to using the string %TEI.core.ent; . Parameter entities can also be used to control the inclusion or exclusion of marked sections of the document or DTD; the TEI DTD uses marked sections to handle the selection of different base and additional tag sets.

[18] More exactly, these are the attributes of the element class global , to which all elements belong; for further discussion of attribute classes and ways in which attributes may be inherited and over-ridden, see section 3.7.1 .

[19] A dummy element class TEIform is defined in the reference section, solely for documentary purposes.

[20] SGML validation checks that all IDREF values exist as id values on elements somewhere in the current SGML document. It is a requirement of the TEI scheme, not of SGML, that the lang attribute point to a <language> element.

[21] The TEIform attribute is based on the notion of architectural forms developed for HyTime (ISO 10744).

[22] Because the details of their pointing mechanism differ, the members of the pointer class do not, however, share their pointing attributes.

[23] Note that in this context, phrase means any string of characters, and can apply to individual words, parts of words, and groups of words indifferently; it does not refer only to linguistically motivated phrasal units. This may cause confusion for readers accustomed to applying the word in a more restrictive sense.

[24] It is expected that after completion of the full text of these Guidelines, the TEI will prepare alternate sets of generic identifiers in languages other than English. It should be noted, however, that in the interests of simplicity parameter entities are used only for generic identifiers; attribute names, standard attribute values, and parameter entity names are less easily modified.

[25] Defined by ISO 8601: 1988, Data elements and interchange formats --- Information interchange --- Representation of dates and times ([Geneva]: International Organization for Standardization, 1988).

[26] The most widely used such entity set is to be found in Annex D to ISO 8879; it is also reproduced or summarized in most SGML textbooks, notably Charles F. Goldfarb, The SGML Handbook (Oxford: Clarendon Press, 1990). A list of some frequently used standard entity names may be found in chapter 37 . Extensive entity sets are being developed by the TEI and others are being documented in the fascicles of ISO/TR 9573: Technical Report: Information processing --- SGML support facilities --- Techniques for using SGML ([Geneva]: ISO, 1988 et seq.).

[27] Strictly speaking the semicolon is not always required; for details see any full treatment of SGML.

[28] Reversible transliteration schemes are defined by national and international standards too numerous to list here; another useful source is ALA-LC Romanization Tables: Transliteration Schemes for Non-Roman Scripts , approved by the Library of Congress and the American Library Association, tables compiled and edited by Randall K. Barry (Washington: Library of Congress, 1991). Many of these schemes, however, use diacritics which are themselves not always available in standard electronic character sets and thus may require careful adaptation for use in electronic work.

[29] Thesaurus Lingu[aelig ] Gr[aelig ]c[aelig ], Beta Manual (Irvine: TLG, [1988]). See also Luci Berkowitz and Karl A. Squitier,: Thesaurus Lingu[aelig ] Gr[aelig ]c[aelig ] Canon of Greek Authors and Works 2nd edition (Oxford: Oxford University Press, 1986).

[30] ISO 639: 1988, Code for the representation of names of languages ([Geneva]: International Organization for Standardization, 1988). The most recent version of this standard supplies three-letter codes as well as the earlier system of two-letter codes. Either may be used in TEI documents.

[31] Standard methods for character-set shifting are defined in ISO 2022: 1986, Information processing --- ISO 7-bit and 8-bit coded character sets --- Code extension techniques , 3d ed. ([Geneva]: International Organization for Standardization, 1986). A standard set of control functions, including methods of specifying script direction (left-right, right-left, top-down, etc.), is defined by ISO 6429: 1992, Information processing --- Control functions for 7-bit and 8-bit coded character sets , ([Geneva]: ISO, 1992). These and related standards are recommended to the notice of those designing systems to handle multiple character sets.

[32] ISO (International Organization for Standardization). ISO 646: 1991. Information processing --- ISO 7-bit coded character set for information interchange. ([Geneva]: International Organization for Standardization, 1991.)

[33] ISO (International Organization for Standardization), and IEC (International Electrotechnical Commission), ISO/IEC 10646-1: 1993. Information technology --- Universal Multiple-Octet Coded Character Set (UCS) --- Part 1: Architecture and Basic Multilingual Plane. ([Geneva]: International Organization for Standardization, 1993.) The Basic Multilingual Plane of ISO 10646 is identical to the sixteen-bit character set known as Unicode: The Unicode Consortium, The Unicode Standard: Worldwide Character Encoding Version 1.0, Volume 1 (Reading, Mass.: Addison-Wesley, 1991). Version 1.0 of Unicode has been superseded by later work incorporated into ISO/IEC 10646; these changes are be reflected in the official published standard, and will be documented in version 1.1 of Unicode.

[34] On the relation between the TEI proposals and other standards for bibliographic description, see further section 5.7 .

[35] Michael Gorman and Paul W. Winkler, eds., Anglo-American Cataloguing Rules , Second Edition (Chicago: American Library Association; London: Library Association; Ottawa: Canadian Library Association, 1978).

[36] Agencies compiling catalogues of machine-readable files are recommended to use available authority lists, such as the Library of Congress Name Authority List, for all common personal names.

[37] Arabic numerals separated by punctuation are recommended for this purpose (e.g., 6.19.33, not VI/xix:33).

[38] This implies that the <tagsDecl> will normally only appear in the header of individual texts in a <teiCorpus> .

[39] On the milestone tag itself, what are here referred to as `variables' are identified by the combination of the ed and unit attributes.

[40] The SUBDOC keyword indicates to an SGML parser that this entity contains SGML data which must be parsed using some other DTD than the current one: in this case, the DTD defined in chapter 26 rather than the view of the TEI DTD defined for the document itself.

[41] Although the way in which a spoken text is performed, (for example, the voice quality, loudness, etc.) might be regarded as analogous to `highlighting' in this sense, these Guidelines recommend distinct elements for the encoding of such `highlighting' in spoken texts. See further section 11.2.6 .

[42] The Oxford English Dictionary documents the phrase `to come down' in the sense to bring or put down; esp. to lay down money; to make a disbursement as being in use, mostly in colloquial or humorous contexts, from at least 1700 to the latter half of the 19th century.

[43] See, for example, Sociolinguistics/Soziolinguistik (An international handbook of the science of language and society. Ein internationales Handbuch zur Wissenschaft von Sprache und Gesellschaft) (Berlin, New York: De Gruyter, 1988), I, pp. 271 and 274.

[44] In some contexts, the term `regularization' has a narrower and more specific significance than that proposed here: the <reg> element may be used for any kind of regularization, including normalization, standardization, and modernization.

[45] See chapter 14 , for a discussion of these elements and the extended syntax they provide for `hypertext' links.

[46] Many encoders find it convenient to retain the line breaks of the original during data entry, to simplify proof-reading, but this may be done without inserting a tag for each line break of the original.

[47] The legal form of identifiers depends in part on the SGML declaration. With appropriate modifications in the declaration, other characters may be made legal in identifiers; this is allowed though not encouraged in TEI-conformant documents.

[48] For example, to distinguish `London' as an author's name from `London' as a place of publication or as a component of a title.

[49] Among the bibliographic software systems and subsystems consulted in the design of the <biblStruct> structure were BibTeX, Scribe, and ProCite. The distinctions made by all three may be preserved in <biblStruct> structures, though the nature of their design prevents a simple one-to-one mapping from their data elements to TEI elements. For further information, see section 6.10.4 .

[50] American National Standard for Bibliographic References, ANSI Z39.29-1977 (New York: American National Standards Institute, 1977), p. 34 (sec. A.2.2.1).

[51] The analysis is not wholly unproblematic: as the text of the standard points out, the first subordinate title is subordinate only to the parallel title in French, while the second is subordinate to both the English main title and the French parallel title, without this relationship being made clear, either in the markup given in the example or in the reference structure offered by the standard.

[52] The BibTeX scheme is intentionally compatible with that of Scribe, although it omits some fields used by Scribe. Hence only one list of fields is given here.

[53] This convention (corresponding with the idea that a type-set document may begin either with a ``level 0'' or a ``level 1'' heading) is provided for convenience and compatibility with some widely used formatting systems.

[54] This decision should be recorded in the <sampling> element of the header.

[55] As with all lists of `suggested values' for attributes, it is recommended that software written to handle TEI-conformant texts be prepared to recognize and handle these values when they occur, without limiting the user to the values in this list.

[56] Definition of such a tag set remains a work item for the TEI; such tag sets for contemporary printed matter already exist or are being created within the publishing industry, for example the Majour (Modular Application for Journals) Project of the European Workgroup on SGML. See for example MAJOUR: Modular Application for Journals: DTD for Article Headers ([n.p.]: EWS, 1991).

[57] For discussion of other attributes of this class, see 7.1.4 .

[58] Another way of avoiding this problem would be to use un-numbered <div> elements; see further 7.1 .

[59] It even has a name: anapaestic substitution.

[60] For a discussion of several of these see J. A. Edwards and M. D. Lampert, eds., Talking Language: Transcription and Coding of Spoken Discourse (Hillsdale, N.J.: Lawrence Erlbaum Associates, 1993); Stig Johansson, Encoding a Corpus in Machine-Readable Form, in Computational Approaches to the Lexicon: An Overview, ed. B. T. S. Atkins et al. (Oxford: Oxford University Press, forthcoming); and Stig Johansson et al. Working Paper on Spoken Texts, document TEI AI2 W1, 1991.

[61] The original is a conversation between two children and their parents, recorded in 1987, and discussed in Brian MacWhinney,: CHAT Manual ([Pittsburgh]: Dept of Psychology, Carnegie-Mellon University, 1988), pp. 87ff.

[62] For the most part, the examples in this chapter use no sentence punctuation except to mark the rising intonation often found in interrogative statements; for further discussion, see section 11.3.3 .

[63] For details see S. Boase, London-Lund Corpus: Example Text and Transcription Guide (London: Survey of English Usage, University College London, 1990).

[64] The term was apparently first proposed by Bengt Loman and Nils J[oslash]rgensen: , in Manual for analys och beskrivning av makrosyntagmer (Lund: Studentlitteratur, 1971), where it is defined as follows: ``A text can be analysed as a sequence of segments which are internally connected by a network of syntactic relations and externally delimited by the absence of such relations with respect to neighbouring segments. Such a segment is a syntactic unit called a macrosyntagm'' (trans. S. Johansson).

[65] Laura Gavioli and Gillian Mansfield: , The Pixi Corpora (Bologna: Cooperativa Libraria Universitaria Editrice, 1990), p. 74.

[66] We refer the reader to previous and current discussions of a common format for encoding dictionaries. For example, Robert A. Amsler and Frank W. Tompa, An SGML-Based Standard for English Monolingual Dictionaries, in Information in Text: Fourth Annual Conference of the U[niversity of] W[aterloo] Centre for the New Oxford English Dictionary October 26-28, 1988, Waterloo, Canada, pp. 61-79; Nicoletta Calzolari et al., Computational Model of the Dictionary Entry: Preliminary Report, Acquilex: Esprit Basic Research Action No. 3030, Six-Month Deliverable, Pisa, April 1990; John Fought and Carol Van Ess-Dykema, Toward an SGML Document Type Definition for Bilingual Dictionaries, TEI working paper TEI AIW20 (available from the TEI); Nancy Ide and Jean Veronis, Encoding Print Dictionaries , Computers and the Humanities (special TEI issue --- to appear); Nancy Ide, Jacques Le Maitre, and Jean Veronis, Outline of a Model for Lexical Databases, (Information Processing and Management, 29, 2, 159-186, 1993); Nancy Ide, Jean Veronis, Susan Warwick- Armstrong, Nicoletta Calzolari, Principles for Encoding machine readable dictionaries , Proceedings of the Fifth EURALEX International Congress, EURALEX'92 (to appear), University of Tempere, Finland; and The DANLEX Group, Descriptive tools for electronic processing of dictionary data, in Lexicographica, Series Maior (T[uuml ]bingen: Niemeyer, 1987).

[67] It is unlikely that many conventional dictionaries will require smaller divisions, but all the usual division elements <div2> through <div7> may be used.

[68] Each example taken from a real dictionary indicates its source using the following abbreviations for dictionary names:

To simplify the electronic presentation of this document on systems with limited character sets, many of the pronunciations are presented using the transliteration found in the electronic edition of the Oxford Advanced Learner's Dictionary . Also, the middle dot in quoted entries is rendered with a full stop, while within the sample transcriptions hyphenation and syllabification points are indicated with | , regardless of their rendition in the source text.

[69] Complications of sequence caused by marginal or interlinear insertions and deletions, which are frequent in manuscripts, or by unconventional page layouts, as in concrete poetry, magazines with imaginative graphic designers, and texts about the nature of typography as a medium, typically do not occur in dictionaries, and so are not discussed here.

[70] This is a slight oversimplification. Even in conservative transcriptions, it is common to omit page numbers, signatures of gatherings, running titles and the like. The simple description above also elides, for the sake of simplicity, the difficulties of assigning a meaning to the phrase ``original sequence'' when it is applied to the printed characters of a source text; the ``original sequence'' retained or recovered from a conservative transcription of the editorial view is, of course, the one established during the transcription by the encoder.

[71] The omission of rendition text is particularly common in systems for document production; it is considered good practice there, since automatic generation of rendition text is more reliable and more consistent than attempting to maintain it manually in the electronic text.

[72] This document is reprinted in TermNet News , no 40, 1993, pp 5-64; copies are also available from Infoterm, z.Hd. Herrn Dr. Gerhard Budin, Heinestra[szlig ]e 38, Postfach No. 130, A-1021 Vienna, Austria.

[73] In this example, as in the others, white space has been liberally used for the sake of legibility; in practice most actual encodings would use less white space.

[74] ISO 10241, Preparation and layout of international terminology standards , 1993.

[75] We use the term alignment as a special case of the more general notion of correspondence. Let A stand for an element with the attribute id=A , and suppose elements A1, A2 and A3 occur in that order and form one group, while elements B1, B2 and B3 occur in that order and form another group. Then a relation in which A1 corresponds to B1, A2 corresponds to B2 and A3 corresponds to B3 is an alignment. On the other hand, a relation in which A1 corresponds to B2, B1 to C2, and C1 to A2 is not an alignment.

[76] The type attribute on the note is used to classify the notes using the typology established in the Advertisement to the work: ``The Imitations of the Ancients are added, to gratify those who either never read, or may have forgotten them; together with some of the Parodies, and Allusions to the most excellent of the Moderns.'' In the source text, the text of the poem shares the page with two sets of notes, one headed ``Remarks'' and the other ``Imitations''.

[77] No special element is provided for this purpose at present: the information should be supplied as a series of paragraphs at the end of the <encodingDesc> element described in section 5.3 .

[78] HyTime is an international standard (ISO 10744) built on SGML. It provides facilities for representing both static and dynamic information for processing and interchange by hypertext and multimedia applications. See ISO/IEC 10744 Information Technology --- Hypermedia/Time-based Structuring Language (HyTime) ([Geneva]: International Organization for Standardization, 1992).

[79] The notation used for this formal grammar is that defined in chapter 39 .

[80] Strictly speaking, |n| (absolute value of n) children.

[81] See section 15.3 , where the text from which this fragment is taken is analyzed.

[82] The corresp attribute is thus distinct from the target attribute in that it is understood to create a double, rather than a single, link. It is also distinct from the targets attribute in that the latter lists all the identifiers of the elements that are doubly linked, whereas the corresp doubly links the element that bears the attribute with the element(s) that make up the value of the attribute.

[83] See William A. Gale and Kenneth W. Church: , Program for aligning sentences in bilingual corpora , Computational Linguistics 19 (1993): 75-102, from which the example in the text is taken.

[84]

Our example uses the English translation of Charles Hoole (1659), and is taken from John E. Sadler, ed. (ed): , John Amos Comenius Orbis Pictus: a facsimile of the first English edition of 1659 (Oxford: Oxford University Press, 1968) (The Juvenile Library).

[85] This sample is taken from a conversation collected and transcribed for the British National Corpus.

[86] See section 15.1 for discussion of the <w> and <c> tags that can be used in the following examples instead of the <seg type=word> and <seg type=character> tags.

[87] An alternative way of representing this problem is discussed in chapter 17 .

[88] In this example, we have placed the <link> next to the tags that represent the alternants. It could also have been placed elsewhere in the document, perhaps within a <linkGrp> .

[89] The variant readings are found in the commercial sheet music, the performance score, and the Broadway cast recording.

[90] Or, as they are widely known, attribute-value pairs; this term should not be confused, however, with SGML attributes and their values, which are similar in concept but distinct in their formal definitions.

[91] The rule marks spaces left for the missing name in the manuscript.

[92] See G. N. Leech and R. G. Garside,: Running a Grammar Factory, in English Computer Corpora: Selected Papers and Research Guide, ed. S. Johansson and A.-B. Stenstr[oslash]m (ed): (Berlin: de Gruyter; New York: Mouton, 1991), pp. 15-32. This sentence and its analysis are reproduced by kind permission of the University of Lancaster's Unit for Computer Research on the English Language.

[93] For the word-class tagging method used by Claws see I. Marshall: , Choice of Grammatical Word Class without Global Syntactic Analysis: Tagging Words in the LOB Corpus, in Computers and the Humanities 17 (1983): 139-50.. For an overview of the system see R. G. Garside, G. N. Leech, and G. R. Sampson,: The Computational Analysis of English: a Corpus-Based Approach (Oxford: Oxford University Press, 1991).

[94] We have replaced the Claws code [dollar] for the `'s' morpheme by GEN , as in the tag set used by the British National Corpus (see 16.10 ), and the code . for the final full stop by PUN .

[95] Feature-structure, rather than feature-value, libraries should be used for housing collections of feature structures.

[96] This is not as hard as it sounds. The embedding of a list within a list is illustrated in the second example below.

[97] Unless the value is the <null> element; see below.

[98] We say that one range is less than or equal to another if both the value and valueTo attributes of the first are less than or equal to the corresponding attributes of the second.

[99] Typically, there will be no need to use an encoding like this one as the value of a feature, since the <any> element is available for that purpose. However, in setting up the feature declaration for that feature, it may be necessary to use such an encoding, precisely so as to provide an interpretation for the use of the <any> element as the value of that feature.

[100] From ``The Manere of Good Lyuynge'', fol 126v of Bodleian MS Laud Misc. 517, plate 8(ii) in English Cursive Book Hands 1250-1500 by M. B. Parkes (Clarendon Press: Oxford, 1969).

[101] On fol 65v of Bodleian MS. Rawlinson Poetry 32; in Parkes 12(ii).

[102] De Nutrimento et Nutribili, Tractatus 1 , fol 217r col b of Merton College Oxford MS O.2.1, (Parkes pl. 16).

[103] In Pierpont Morgan MA 3421, from British Literary Manuscripts/Series II: from 1800 to 1914 , by V. Klinkenborg, H.Cahoon and C. Ryskamp (New York: Pierpont Morgan Library, 1981).

[104] De moribus et actis primorum Normannie ducum , in fol 4v of British Library MS Harley 3742, Parkes pl 6(i).

[105] In Pierpont Morgan MA 3391 (Klinkenborg 123).

[106] In Pierpont Morgan MA 310, (Klinkenborg 23).

[107] The manuscript contains several other substitutions, ignored here for the sake of clarity.

[108] From the Wiltshire Record Office, Dean of Sarum Churchwardens' presentments, 1731, Hurst; the transcription was provided by Donald A. Spaeth.

[109] From folio 52 recto of the Holkham manuscript of Chaucer's Canterbury Tales .

[110] Codex Regius, ed. L. F. A. Wimmer and F. J[oacute]nsson (Copenhagen 1891).

[111] In Pierpont Morgan MA 412, (Klinkenborg 15).

[112] For the sake of legibility in the example, long marks over vowels are omitted. The non-standard entities `ae', `d' and `th' refer to a-e ligature, eth, and thorn respectively.

[113] Strictly, a suitable value such as figurative should be added to the two place names which are presented periphrastically in the second example here, in order to preserve the distinction indicated by the choice of <rs> rather than <name> to encode them in the first version.

[114] The treatment here is largely based on the characterizations of graph types in Gary Chartrand and Linda Lesniak: , Graphs and Digraphs (Menlo Park, CA: Wadsworth, 1986).

[115] That is, the three syntactic interpretations of the clause are mutually exclusive. The notion that the pertinents are in Argyll is clearly not inconsistent with the notion that both the land in Gallachalzie and the pertinents are in Argyll. The graph given here describes the possible interpretations of the clause itself, not the sets of inferences derivable from each syntactic interpretation, for which it would be convenient to use the facilities described in chapter 16 .

[116] R. Jackendoff, X-Bar Syntax , 1977

[117] The symbols e and t denote special theoretical constructs (empty category and trace respectively), which need not concern us here.

[118] Reference needed! FIX THIS NOTE BEFORE PUBLICATION. FIX THIS NOTE BEFORE PUBLICATION. FIX THIS NOTE BEFORE PUBLICATION. FIX THIS NOTE BEFORE PUBLICATION. FIX THIS NOTE BEFORE PUBLICATION. FIX THIS NOTE BEFORE PUBLICATION. FIX THIS NOTE BEFORE PUBLICATION. FIX THIS NOTE BEFORE PUBLICATION.

[119] In this case additional redefinitions may also be needed to avoid name clashes with existing TEI elements. For further details see chapter 29 .

[120] No special purpose element is provided for this purpose by the current version of the Guidelines. The information should be provided as one or more distinct paragraphs at the end of the <encDecl> element described in section 5.3 .

[121] Schemes similar to that proposed here were developed in the 1960s and 1970s by researchers such as Hymes, Halliday, and Crystal and Davy, but have rarely been implemented; one notable exception being the pioneering work on the Helsinki Diachronic Corpus of English, on which see M. Kyt[ouml ] and M. Rissanen,: The Helsinki Corpus of English Texts, in Corpus Linguistics: hard and soft , ed. M. Kyt[ouml ], O. Ihalainen, and M. Rissanen (ed): (Amsterdam: Rodopi, 1988).

[122] It is particularly useful to define participants in a dramatic text in this way, since it enables the who attribute to be used to link <sp> elements to definitions for their speakers; see further section 10.2.2 .

[123] The present proposals do not support the encoding of different settings for the same participant. This is a subject for further work.

[124] See in particular chapters 14 , 15 , and 16 .

[125] For more information on UNIMARC, see Brian P. Holt, UNIMARC Manual (London, U.K.: IFLA Universal Bibliographic Control and International MARC Programme, British Library, 1987). For USMARC, see Walt Crawford, MARC for library use: understanding USMARC (Boston: G.K. Hall, 1989), USMARC format for bibliographic data, including content designation (Washington, D.C.: Library of Congress, 1987), and Deborah J. Byrne, MARC manual : understanding and using MARC records (Englewood, Colo.: Libraries Unlimited, Inc., 1991).

[126] The primary function of the MARC record when it was first designed in the mid-1960s was to allow for the electronic distribution of cataloguing records in support of card production. See Henriette Avram, The MARC Pilot Project (Washington D.C.: Library of Congress, 1968), p. 3. For discussion of the relationship between the MARC record and the catalogue card, see Michael Gorman, "After AACR2R: The Future of the Anglo-American Cataloging Rules," in Richard Smiraglia, ed., Origins, Content and Future of AACR2 Revised (Chicago: American Library Association, 1992).

[127] Dizionario di Abbreviature latine ed italiane per cura di Adriano Cappelli, 6th ed. (Milan: Ulrico Hoepli, 1979). This work on Latin abbreviations might be less convenient for the purpose than one concentrating on Old French, but it is more widely used than any other.

[128] For a fuller discussion of the reasoning behind FSDs and for another complete example, see ``A rationale for the TEI recommendations for feature-structure markup,'' by D. Terence Langendoen and Gary F. Simons (to appear in the special TEI issue of Computers and the Humanities ).

[129] Fernando C. N. Pereira, Grammars and logics of partial information , SRI International Technical Note 420 (Menlo Park, CA: SRI International, 1987), and Stuart Shieber, An Introduction to Unification-based Approaches to Grammar , CSLI Lecture Notes 4 (Palo Alto, CA: Center for the Study of Language and Information, 1986).

[130] This is one of several abbreviations allowed by the SHORTTAG feature; the others (omission of attribute names under certain circumstances and omission of non-required attribute values) are allowed.

[131] Some will regard such simplifications as useful ways of making it easier to develop software which accepts TEI-conformant documents; others will deplore the failure of such software to accept all TEI-conformant documents including those which extend the TEI DTD. In providing the notion of DTD extension for describing what documents are and are not accepted by such software, the TEI acts in the belief that such software will in fact be developed; it neither endorses nor deplores its construction or use.

[132] See document TEI PC P1 ``The Preparation of Text Encoding Guidelines.''

[133] Although the scripts run in opposite directions, they write numbers in the same direction; the usual view is that the numbers in Hebrew and Arabic run left to right, like those in Latin script, but it is also possible to claim that the numbers in Latin scripts run right to left, like those in Arabic and Hebrew. There is no single satisfactory answer to this question.

[134] Tana de G[aacute]mez, ed., Simon and Schuster's International Dictionary (New York: Simon and Schuster, 1973).