From: CBS%UK.AC.EARN-RELAY::EARN.UICVM::TEI-REP,CBS%UK.AC.EARN-RELAY::NO.NAC::USE.UIO.UNINETT::H_JOHANSSON 25-JAN-1990 15:58:07.42 To: LOU,susan CC: Subj: outline of sections on text representation Via: UK.AC.EARN-RELAY; Thu, 25 Jan 90 15:57 GMT Received: from UKACRL by UK.AC.RL.IB (Mailer X1.25) with BSMTP id 6838; Thu, 25 Jan 90 13:58:57 GM Received: from UICVM by UKACRL.BITNET (Mailer X1.25) with BSMTP id 3981; Thu, 25 Jan 90 13:58:45 G Received: by UICVM (Mailer R2.03B) id 5062; Wed, 24 Jan 90 21:32:06 CST Date: Wed, 24 Jan 90 15:28:00 +0100 Reply-To: Text Encoding Initiative - Text Representation Committee list < TEI-REP@EARN.UICVM>, Stig Johansson Sender: Text Encoding Initiative - Text Representation Committee list < TEI-REP@EARN.UICVM> From: Stig Johansson Subject: outline of sections on text representation To: Lou Burnard , Susan Hockey TEI TRR4 Outline of sections on text representation 4 Characters and Character Sets - Steven DeRose, comments by Wilhelm Ott (espe cially) 4.1 Principles and Definitions 4.1.1 Character, character repertoire, (coded) character set Character sets - standard (ASCII, ISO 646, ISO 8859, 6937, ISO 10646?) - vendor-specific (EBCDIC, IBMPC, Mac, Postscript...) - character set registration 4.1.2 Practical Constraints: readability & data interchange 4.1.3 Marking Character Set and Language Shifts 4.1.4 SGML Character Set Support - definition of character sets - switching between character sets (ISO 2022) with and without suppressing markup recognition - entity references (general entities, character-entities) - support for transliterations (SHORTREF) - definition of new entities - definition of new transliterations 4.2 Recommendations 4.2.1 Latin Scripts for European Languages (Eastern Europe may be incomplete in June 1990.) - character sets (ISO, Registered, Vendor) & declarations for using them - entity sets & declarations for using them - transliterations 4.2.2 Cyrillic Script (subdivided as above; may be incomplete in June 1990) 4.2.3 Greek Script 4.2.4 Hebrew 4.2.5 Phonetic Alphabets (PhonASCII, Arpabet?, IPA) 4.2.6 Non-alphabetic symbols (Entity sets defined by ISO 8879. Character sets registered with ECMA.) Reference Section: Coded Character Sets Entity Sets Transliteration Schemes 6 Features Common to Many Text Types 6.1 Principles and Definitions - Stig Johansson - Focus on features with conventional typographic/ms realization - structural divisions - attributes of phrases in running text (emphasis, quotation, etc) - annotation, including variant readings - index entries - illustrations, figures, tables - parallels among texts - Not same as objective vs interpretive distinction 6.2 Core Structural Features - Mark Liberman, David Chesnutt 6.2.1 Front Matter (title page - reference to ch 5) (Dedication, Preface, Tables of Contents/Figures/Illustrations/ Abbreviations) 6.2.2 Body (Part, Chapter, Section, Sub-section, etc. H0-H6. Paragraph.) 6.2.3 Back Matter (Appendices, Index?, Bibliography, Colophon) 6.3 Other Core Tags (Basic Non-structural Features) - Susan Hockey, David Chesn utt - emphasis, technical terms, marked terms and passages - glossing, cited terms - quotation, direct discourse - distancing (by quotes) - annotation (notes, footnotes, endnotes, marginalia) - font shifts and special layout (Prefer to mark underlying feature. Mark presentation only if the feature is uncertain.) - page breaks, column breaks, volume breaks - words/phrases/quotations in foreign languages - names?, abbreviations? - lists - index entries - illustrations and artwork 6.4 Figures, Tables, and Diagrams - Roberto Cencioni 6.5 Bibliographic References - Lou Burnard & MSM - simple tags (biblist, bibitem, article.title, work.title) - elaborations for information retrieval 6.6 Editorial Comment - Stig Johansson - SIC and suggested correction - editorially emended and original form - misc editorial comments (Use SGML comment, but sparingly. Add a tag, if consistently marking some kinds of explication or giving some information in comment.) 6.7 Reference Systems - Elli Mylonas - using source page and lines - using traditional reference numbers (lineation, verse numbers, Stephanus...) - adding reference numbers - tree-path method - segment-numbering using orth sentences & other bottom-level tags - nested elements get own number - nested elements get no number 6.8 Treatment of Ambiguous Punctuation - Stig Johansson (period, quotation marks, inverted comma, apostrophe, comma; declaring treatment) 6.9 Critical Apparatus - Robin Cover, input & comments from Wilhelm Ott - running commentaries multi-level annotations Example: Glossa ordinaria - variant readings - annotation method, Kraft method, Thaller method, Wujastyk method, CoSyTraWMa method, MSM method 6.10 Parallel Texts - Robin Cover - multiple versions in same data stream - synchronizing versions in separate data streams - simultaneous actions in single text (overlap) 6.11 Cross-reference and Text Links - Steven DeRose - simple internal cross-reference to sections, notes, figures, tables, etc - external reference - hypertext: typing the link ... 7 Features in Specific Text Types 7.1 Principles and Definitions 7.1 Recommended Features and Tags 7.2.1 Language corpora and other collections - Stig Johansson - types of corpora and collections (documenting which type is being constructed) - normalization practices & documenting them (general disc, ref to ch 5) - normalization of spelling - omission of material - treatment of idiosyncracies: typos, spelling, dialect... - reference systems - orth sent for printed matter - tone unit for spoken material (or other units) 7.2.2 Literary Text - Elli Mylonas - prose (use core tags), distinguishing dialogue and narrative (tag for direct speech vs quotation?) - verse: basic structural units (stanza, line, half-line; caesura) specialized structural units (tragic poetry, odes, etc) - drama: basic structural units specialized structural units - defining new structural units 7.2.3 Technical and Scientific Texts - Roberto Cencioni - types of technical and scientific texts - special structures - special non-structural features - equations and formulae Reference Section: formal tag descriptions corresponding to Chapters 6 and 7. In drafting the text, do not change the order and number of sections. Feel free to reorder and add (and possibly subtract) within the numbered sections. 24 January 1990 Stig Johansson University of Oslo