Outline of TEI P2 C. M. Sperberg-McQueen Lou Burnard TEI ED W23 July 21, 1992 (18:16:41) Draft July 21, 1992 (18:16:41) ABSTRACT This document describes the content projected for document TEI P2, the second public draft of the TEI Guidelines. The overall structure of the guidelines is not expected to change dramatically between the prepa- ration of TEI P2 and the final version of the guidelines to be submitted to the Advisory Board for endorsement at the completion of the current development cycle. Chapter 1 OVERALL 1.1 Types of Content The work products of the first and second cycles must contain or be accompanied by: * Tutorial introduction to markup concepts, computer representation of texts, SGML, and document processing (similar to chapter 2 of TEI P1, to ED W21, and to various other introductory documents, mostly by LB) * Prose specification of the TEI encoding scheme arranged by topic (similar to chapters 3-8 of TEI P1, without some of the more discur- sive sections). This prose should be more abrupt than that of TEI P1 (more like that of the presentation of tags in ED W18 and ED W21). * Prose commentary on the problems raised by tag sets for particular areas (similar to the discursive digressions in TEI P1) * Alphabetically arranged reference section listing each tag, attri- bute and special attribute values alphabetically, with ample cross references * Formal DTDs or DTD fragments for the TEI scheme * Case studies, containing extended examples, with commentary, of the use of the TEI scheme, preferably on real data 1.2 Overall Organization We propose to organize these into documents as follows: TEI P2 (later TEI P3): The Guidelines proper: an authoritative state- ment of the TEI tagging scheme, containing the prose specification, the alphabetical reference section, and the DTDs. TEI U1 et seq.: a set of discipline-specific tutorials. TEI T1 et seq.: a case book (or series of case books) including the extended examples and prose commentary if any. Chapter 2 TEI P2: THE GUIDELINES TEI P2 should contain: * a prose specification of the Guidelines * a reference section documenting each tag * full text of the TEI DTDs 2.1 Table of Contents The prose specification portion of the Guidelines should have the organization given in the following list. After each chapter title the list gives: 1. identifier (value of the ID attribute in the DIV) -- a two- character code in most cases, a longer code if the chapter is expected to be short and not to require many subsections 2. the expected filename ('P2' concatenated with the chapter ID) 3. person with initial responsibility for the section, and editor with initial responsibility A second parenthesis gives the id, filename, etc. assigned by previous versions of this document. Subsections will append characters or digits to the ID and file name of their containing section: subsections 1, 2, and 3 of chapter 22 (p2ch) may have IDs p2ch1, p2ch2, p2ch3. Note that because of the num- ber of chapters, parts are numbered as well as chapters. Within any unit, ID and file numbers are numbered 1-9, A-Z. No unit may have more than 35 subdivisions; any unit with more should be reorganized. As for TEI P1, files should generally be distinct for each distinct section of a chapter, but not for smaller units. Front matter Preface (id=preface, file=p2pref, draft=SH, ed=MSM) (was: id=p2.01, file=P201) Summary (id=summary file=p2summ, draft=eds) (was: id=p2.02, file=P202) Summary of Changes in Version 2 (id=changes, file=p2change, draft=MSM) (was: id=p2.03, file=P203) Part I: Introduction.: introduction, discussion of structure; status of scheme; notions of 'core', 'base', and 'topping' I.1: About These Guidelines (TEI P1 1) (id=AB, file=p2AB, draft=SH, NI, ed=MSM) (was: id=p2.11, file=P211) I.1.1: Texts and Their Electronic Representation I.1.2: Intended Applications I.1.3: Origin and Development I.1.4: Design Principles I.1.5: Structure of This Document I.1.6: Status of This Draft I.1.7: Future Development of the Guidelines I.2: Concise Summary of SGML (id=SGML, file=p2SGML, draft=DTB, ed=LB) (was: id=p2.12 file=P211) I.3: Structure of the TEI Document Type Declarations (P1 1) (id=ST, file=p2st, draft=MSM) (was: id=p2.13, file=P213) I.3.1: Main and Auxiliary DTDs I.3.2: Base Tag Sets and Additional Tag Sets I.3.3: Global Attributes I.3.4: Element Classes and Other Parameter Entities I.3.5: Invocation of TEI DTDs I.3.6: Combining TEI DTD Fragments Part II: Core Tags and General Rules: chapters on the TEI Core and the default base for prose. Material relevant for all users, viz: the header, character sets, entity sets, basic text structure, issues of reference systems, common low-level structural elements, phrase-level elements. II.1: Characters and Character Sets (P1 3) (id=CH, file=P2ch, draft=HG, ed=LB) (was: id=p2.21, file=P221) II.1.1: Local Character Sets II.1.2: Shifting among Character Sets II.1.3: Character Set Problems and Interchange II.1.4: Writing System Declaration II.2: The TEI Header (P1 4) (id=HD, file=P2hd, draft=LB) (was: id=p2.22, file=P222) II.2.1: Structure of the TEI Header II.2.2: The File Description II.2.3: The Source Description II.2.4: The Encoding Description II.2.5: The Profile Description II.2.6: The Revision Description II.2.7: Minimal and Recommended Headers II.2.8: Note for Library Cataloguers II.3: Tags Available in All TEI DTDs (id=CO, file=P2co, mixed) (was: id=p2.23, file=P223) II.3.1: Paragraphs (P1 5.3.1) (id=copara, file=P2copara, ed=MSM) (was: id=p2.231, file=P2231) II.3.2: Ambiguous Punctuation (id=copunc, file=P2copunc, ed=MSM) II.3.3: Highlighting and Related Features (P1 5.3.2, 5.3.4) (id=cohigh, file=P2cohigh, ed=LB) (was: id=p2.236, file=P2236) II.3.4: Material in Quotation Marks (P1 5.3.3) (id=coquot, file=P2coquot, ed=LB) (was: id=p2.237, file=P2237) II.3.5: Terms, Cited Words, and Glosses (P1 5.3.5) (id=cocite, file=P2cocite, ed=LB) (was: id=p2.238, file=P2238) II.3.6: Names (P1 5.3.6) (id=coname, file=P2coname, ed=MSM) (was: id=p2.239, file=P2239) II.3.7: Numbers (P1 5.3.11) (id=conums, file=P2conums, ed=MSM) (was: id=p2.23A, file=P223A) II.3.8: Dates (P1 5.3.11) (id=codate, file=P2codate, ed=MSM) (was: id=p2.23C, file=P223C) II.3.9: Abbreviations (P1 5.3.7) (id=coabbr, file=P2coabbr, ed=MSM) (was: id=p2.23B, file=P223B) II.3.A: Simple Editorial Changes (id=coedit, file=P2coedit, ed=MSM) (was in chapter on text criticism) II.3.B: Simple Cross References (TR3) (id=coxref, file=P2coxref, ed=LB) (was: id=p2.41, file=P241) II.3.C: Lists (P1 5.3.8) (id=colist, file=P2colist, ed=MSM) (was: id=p2.233, file=P2233) II.3.D: Notes (P1 5.3.9) (id=conote, file=P2conote, ed=MSM) (was: id=p2.234, file=P2234) II.3.E: Reference Systems (P1 5.6) (id=corsys, file=P2corsys, ed=MSM) (was: id=p2.232, file=P2232) II.3.F: Bibliographic Citations (P1 5.5) (id=cobibl, file=P2cobibl, ed=LB) (was: id=p2.235, file=P2235) Part III: Base Tag Sets: chapters on alternate bases. Bases define the basic structure of the document (all bound and structured elements from the root to the soup, as well as some floating elements); each may have toppings unique to it. III.1: Base Tag Set for Prose (id=PR, file=P2pr, draft=LB) (was: id=p2.31, file=P231) III.2: Base Tag Set for Verse (TR10) (id=VS, file=P2vs, draft=DR, ed=LB) (was: id=p2.32, file=P232) III.3: Base Tag Set for Drama (TR 11) (id=DR, file=P2dr, draft=EM ed=LB) (was: id=p2.33, file=P233) III.4: Base Tag Set for Transcriptions of Spoken Texts (AI2) (id=SP, file=P2sp, ed=LB) (was: id=p2.34, file=P234) III.5: Base Tag Set for Letters and Memos (?) (id=LM file=P2LM, draft=DG, DSpaeth, ed=MSM) (was: id=p2.35, file=P235) III.6: Base Tag Set for Printed Dictionaries (AI5) (id=DI, file=P2di, draft=NI, ed=MSM) (was: id=p2.36, file=P236) III.7: Base Tag Set for Terminological Data (AI7) (id=TD, file=P2td, ed=MSM) (was: id=p2.38, file=P238) III.8: Base Tag Set for Language Corpora and Collections (TR6) (id=CC, file=P2cc, draft=LB) (was: id=p2.39, file=P239) III.9: User-defined Base Tag Sets (AI4) (id=UD, file=P2ud, draft=DGr) (was: id=p2.3a, file=P23a) Part IV: Additional Tag Sets: chapters on toppings. Toppings may be applicable to all bases or to some subset. In general, toppings define floating elements: chunks or flavors in the soup or primal matter. IV.1: Tags for Analysis and Interpretation (id=AI, file=P2ai, draft=DTL, ed=MSM) (was: id=p2.44, file=P244) IV.2: Applications of Analytic Tools IV.21: Literary Interpretation and Analysis (TR10,11,12) (id=LT, file=P2lt, draft=DRobey, DRoss, ed=LB) (was: id=p2.451, file=P2451) IV.22: Historical Interpretation and Analysis (AI4) (id=hi, file=P2hi, draft=DG, ed=LB) (was: id=p2.453, file=P2453) IV.23: Linguistic Analysis (AI1) (id=LG, file=P2Lg, draft=DTL, ed=LB) (was: id=p2.452, file=P2452) IV.3: Certainty (id=cert file=P2cert, ed=MSM) (new) IV.4: Hypermedia (TR3) (id=HY, file=P2hy, draft=SJD, ed=LB) (was: id=p2.42, file=P242) IV.5: Additional Tags for Names and Dates (id=ND, file=P2nd, draft=DGr, ed=??) (was in 23 on core tags) IV.6: Text Criticism and Apparatus (TR2) (id=TC, file=P2tc, draft=PR, ed=MSM) (was: id=p2.47, file=P247) IV.7: Formulae and Tables (TR4) (id=FT, file=P2ft, draft=AR,DD, ed=LB) (was: id=p2.43, file=P243) IV.8: Physical Characteristics of the Copy Text (TR 9, 8) (id=PH, file=P2ph, ed=LB) (was: id=p2.46, file=P246) IV.9: Additional Tags for TEI Header (partic.desc, setting.desc, text.desc) (id=AH, file=P2ah, ed=MSM) (was: id=xxx, file=P2xxx) Part V: Auxiliary Document Types: documentation of specialized tag sets for encoding information, etc. V.1: Structured Header (id=shdr, file=P2shdr, draft=RG, ed=LB) (was: id=p2.51, file=P251) V.2: Writing System Declaration (id=wsd, file=P2wsd, draft=HG, ed=MSM) (was: id=p2.52, file=P252) V.3: Feature System Declaration (id=fsd, file=P2fsd, draft=GS, ed=MSM) (was: id=p2.53, file=P253) V.4: Tag Set Declaration (id=tsd, file=P2tsd, ed=LB) (was: id=p2.54, file=P254) Part VI: Technical Topics: discussions of specialized problems at a technical level VI.1: TEI Conformance (id=conf, file=P2conf, ed=LB) (was: id=p2.61, file=P261) VI.2: Modifying TEI DTDs (id=mods, file=P2mods, ed=MSM) (was: id=p2.62, file=P262) VI.3: Local Installation and Support of TEI Markup (id=loc, file=P2loc, ed=MSM) (was: id=p2.63, file=P263) VI.4: Use of TEI Encoding Scheme in Interchange (id=int, file=P2int, ed=LB) (was: id=p2.64, file=P264) VI.5: Treatment of Non-Hierarchical Information (CONCUR, fs, linked lists, faking it) (id=hier, file=P2hier, ed=LB) VI.6: Algorithm for Recognizing Canonical References (id=cref, file=P2cref, ed=LB) Part VII: Alphabetical Reference List of Tags and Attributes: refer- ence entries for every tag and entity declaration Part VIII: Reference Material: miscellaneous reference matter VIII.1: Full TEI Document Type Declarations (id=DT, file=P2dt, ed=eds) (was: id=p2.81, file=P281) VIII.2: Standard Writing System Declarations (id=XW, file=P2xw, draft=HG, ed=LB) (was: id=p2.82, file=P282) VIII.3: Feature Structure Declaration for Basic Grammatical Annota- tion (id=XF, file=P2xf, draft=TL, GS, ed=MSM) (was: id=p2.83, file=P283) VIII.4: Sample Tag Set Declaration (id=XT, file=P2xt, draft=LB) (was: id=p2.84, file=P284) VIII.5: Formal Grammar for the TEI-Interchange Format Subset of SGML (id=GRAM, file=P2gram, draft=MSM) (was: id=p2.85, file=P285) Back Matter 2.2 Prose Specifications Each section on a set of tags should, in order: 1. list the generic identifiers of the tags declared or described at any length in the section 2. define/describe the set of textual features covered in the sec- tion, in a paragraph or two 3. give, in list form: * the tags used to mark the features (as in 5.3.2 and many other sections of P1) * following each tag, its non-global attributes if any (as in 5.3.9 et al. of P1) * following each attribute description, any special values (as in 5.3.3 and 5.6.4 of P1, inter alia) if the attribute has a semi-closed domain 4. One, two, or three examples of at most a paragraph in length. In rare cases, these may include an image of the relevant page of the copy text(s). 5. One or more DTD fragments defining the tags and attributes. 2.3 Reference Section This section, modeled on the manuals for Formex and Majour, will have entries for each tag and attribute modeled on the specification in ED W5, plus information on what set of tags the tag fits into (core, prose base, text criticism, ...) and whether it is required, recommended, or optional. A short example should be provided for each tag; the same example may be reused for several tags. 2.4 DTDs These will be presented both (a) much as in the current appendix B, and (b) with alternation of prose commentary and DTD, roughly as in Don- ald Knuth's publications of Web, TeX and Metafont. It has proved feasi- ble to make a version of Knuth's Web system to work with SGML documents and DTDs (see TEI ED W29, the documentation for the ODD system), and that system is used to generate both the DTD fragments and the full DTDs. Topics covered should include * notion of element and attribute classes (aka parameter entities) * major structure; minor structure; floats; element content; empties (soup, broth, etc.) * dtd construction kit (base+toppings) * indirection methods (redefinition of parameter entities, etc.) Chapter 3 TEI U1 ET SEQ.: TUTORIALS Each tutorial should cover the following: * Summary of basics of SGML, containing essentially the same material as chapter 3 of P1, but shorter, and with examples related to a spe- cific discipline. * Brief description of how SGML can be used, again with applications drawn from a specific discipline, and some account of existing soft- ware * Brief summary of TEI philosophy, overall structure of the Guide- lines, the TEI core tagset and any specialist tagset of particular importance to the disciplines * A single worked example throughout each text Total length: not to exceed 30 pages. May be automatically generated, for the most part, from existing documents (e.g. EDW21, EDW24). It should be possible for a user to read this material alone and follow its advice, without referring to the rest of the book and without being too seriously misled. At least one generic tutorial should also be available by June 1992. Other disciplines we hope to cover are: literature, history, linguis- tics and lexicography. Co-operation from specialists in these fields would obviously be very useful. Chapter 4 TEI T1 ET SEQ.: CASE BOOKS These will resemble those in appendix C of P1, but should be accompa- nied by prose commentary. The samples prepared by the affiliated projects are good candidates for inclusion as extended examples, as are the transduction papers being produced by ML. Draft July 21, 1992 (18:16:41)