Summary of Substantive and Rhetorical Points and Queries in AI3W5, with draft replies. CMSMcQ, 15 Feb 91 1. Guidelines need a theoretical introduction which defines 'text', 'tag', 'hierarchy', etc. R. See section 2.1; for 'hierarchy' esp. 2.1.5.2. 2. Question: are multiple hierarchic structures (physical, formal, grammatical, semantic, actantial, narrative, psychological, etc.) (a) all definable as hierarchies in SGML, (b) taggable in the TEI scheme? R. Yes to both; see 2.1.5.2 on tagging multiple structures, 5.6.1 on physical structures, 7.3.3.1 on (nested) narrative structures, and chapter 6 on grammatical annotations at any level. Structures for semantic, actantial, and psychological tagging are not now provided as part of the Guidelines but may be defined by the user either using the analytic structures defined in chapter 6 or by defining a concurrent tag set using the mechanisms defined in chapters 2 and 8. 3. Can SGML handle richness of expression and multiple levels of meaning? R. Chapter 6 shows, we think, that it can. 4. Discussion of highlighting and font shifts pp. 78, 124 seems to imply reliance on authorial intention; such a reliance ceased being intellectually respectable about 1940. R. Authorial intention may certainly be used as a criterion for tagging font shifts by anyone who believes it important and recoverable, but the tagging of underlying features is no more linked to the theory of authorial intention than is the act of reading a text for comprehension. That authors are responsible for the font shifts and other accidentals of printed texts is in any case widely doubted by analytic bibliographers for the period 1450-1900. See Philip Gaskell, A New Introduction to Bibliography (Oxford, 1972), part I. 5. Literary work requires that electronic texts be stable and not subject to change. R. ? This seems non-obvious; many projects aim at the enrichment of texts in electronic form, which involves at least some change in the files. See for example document TEI AI3 W4 (Literature Needs Survey Results), respondent 36 on items E, F, and G. The stability of a text in a given environment is wholly in the control of the owner of the electronic representation, who may ensure its stability in whatever way is found appropriate. 6. P. 4 alludes to macros and parsers but gives no examples. If they exist, examples should be listed here. R. Macros suitable for entering TEI tags with single keystrokes may be made on the fly in any word processor or editor with a macro facility; it is planned to provide examples in tutorial documentation. The TEI does not attempt to define, recommend, or constrain the methods to be used for data capture (see p. 3); whether macros are used or not, and what form they should take, is an individual choice not prescribed by the TEI. 7. Section 2.1.4 recommends embedding an interpretation of a text into its DTD; this should be changed. R. Not so; section 2.1.4 observes that since any encoding of a text is necessarily interpretive; a fortiori, the document type declaration which defines a class of syntactically permitted encodings constitutes an interpretation of the class of documents. This state of affairs is described as inescapable; no judgement is offered as to its felicity. 8. The guidelines should make explicit the distinction between interpretive markup (e.g. tags for emphatic phrases and foreign words) and non-interpretive markup (e.g. tags for font shifts). R. Agreement as to which features are 'interpretive' and which are not has not thus far been attained. Typographic and paleographic features are subject to disagreements as sharp as linguistic and rhetorical ones; no attempt to classify features as interpretive or non-interpretive (or objective and subjective) has come close to achieving consensus among the participants in the TEI. There has been somewhat more sentiment in favor of the view that all tags are interpretive; see e.g. section 5.1 (p. 71) and 5.11.1 (p. 124). 9. Markup minimization should be explicitly encouraged. R. Minimization in data capture and local processing is at the discretion of the individual user; see sec. 1.1.3-1.1.4 and 2.2.2 (pp. 2-3 and 34). 10. Text structure should be made clear by format of the file, not by explicit tags for text structure. R. The use of file format to convey structural information may be attempted locally in data capture of restricted sets of documents, by means of the SGML SHORTREF feature; see sec. 2.2.2 (p. 34). Effective use of this feature exploits structural characteristics of particular classes of documents and is hard to reconcile with a general-purpose encoding scheme. 11. Explicit coding of text structure will be error-prone and hard to verify. Implicit coding by means of file format should be preferred. R. Explicit coding allows automatic verification (e.g. by SGML software) of a document's structural validity; see sec. 2.1.1.2 (pp. 10-11). Recognition of structure on the basis of page layout and similar clues is a non-trivial application which occupies a growing branch of artificial intelligence research. 12. Pagination and lineation frequently vary with the printing, not just the edition, of a text; printer and date of printing should be required in the TEI header. R. By definition, the term 'edition' applies to the set of volumes produced from one setting of a printed text; except for stop-press repagination of the text, pagination and lineation can thus never vary within an edition. 'Edition' is also often used of reissues at a later date, even when the same plates are used; different 'editions' in this sense may have the same pagination, but the same 'edition' in either sense will not vary in pagination. See Gaskell, pp. 313-316; AACR 2 pp. 28-29, 59-60, OED s.v. 'edition', and TEI P1, 4.3.2 (p. 59). 13. Section 5.1 (p. 71) definition of 'text' is incorrect. Not all texts are extended and spoken discourse is not 'text' until written down. R. Section 5.1 does not define 'text' in the abstract, but attempts to describe the usage of TEI P1. 'Extended' is a relative term; TEI P1 does not now address issues of encoding texts shorter than a paragraph or a sentence, and there have been thus far no suggestions that such issues need addressing. 14. Line numbers are important methods of locating specific passages of text and should be recommended for general use. R. Tags for line numbering are provided in section 5.6, which also discusses the importance of reference schemes in general. 15. The word 'colophon' is not one everyone can be expected to know. R. True. Being the standard term it is briefer and clearer than any alternative thus far found; it is defined in Webster's Seventh Collegiate for non-literary scholars. 16. In Pleiade editions, the colophon appears in the front matter. [The 'colophon' tag should therefore be allowed in the front matter.] R. Colophons appear by definition at the end of the book; see Webster's 7th and OED, s.v. 'colophon'. When the information appears in the front matter, it should be encoded there using the title page or front.part tags. 17. Line breaks should be mentioned as a possible constituent of paragraphs in section 5.3.1 R. Line breaks belong to a different hierarchy; see section 5.6 and section 1.B of the document being commented on. 18. The Literary Needs Survey made abundantly clear that line numbering is required by literary scholars and should be recommended in all cases. R. ? Only three respondents to item B of the survey mention line breaks explicitly, as "not always important", "less helpful", and "not important unless the first edition is a printed edition". Three respondents specify general approval of the list which includes line breaks; one indicates general disapproval. 19. Normal practice in literary study of prose texts is to refer to page and line numbers. R. Not so; the usual practice documented by style sheets is (for prose) page reference alone; the MLA style sheet discourages citation by lines from prose texts. See Chicago Manual of Style, p. 000; MLA Handbook, p. 000; Turabian, p. 000. See also Fortier, Voyage, p. 000; Potter, Blort, p. 000. 20. Section 7.3.1.1 requires (p. 177) the specification of the METER attribute in every line; this should not be so. R. Section 7.3.1.1 observes that when meter is irregular a single specification of meter at the stanza or canto level cannot capture the entire metrical pattern. Like all attributes not described as required, the METER attribute is optional; see p. ix and p. 3 on the formulation of requirements, recommendations, and neutral observations in TEI P1. 21. The prescription for rendering rhyme pattern made in section 7.3.1.2 is too prescriptive and should be loosened. R. Not so; section 7.3.1.2 suggests giving the rhyme pattern as an unrestricted string of characters, since it is not usually possible to make a closed list of possible values. 22. In cases where an older and more authoritative method of identifying specific passages exists, as in the Bible, it should be used in preference to page and line numbers from the source text. R. Yes; see section 5.6 on reference systems. It would be wrong, though, to describe Biblical versification either as 'authoritative' or as 'older' than printed Bibles, since it postdates Gutenberg's work and was instituted by a publisher, not by any Biblical author or any (secular or religious) authority. 23. Explicit lists should be given of required and optional tags. R. Excellent suggestion; this will be done. 24. Short tags should be explicitly recommended for local processing, expanding on the recommendation to that effect in section 1.1.2. R. No such recommendation is made in section 1.1.2; the current draft makes no recommendations at all as to the specific codes used for local processing; see sections 1.1.3 and 1.1.4. 25. Exclamation point, pound sign, and square brackets should be allowed in interchange. SGML should not take precedence over the needs of scholars. R. Hear, hear. The exclusion of these characters from the ISO 646 Subset has nothing to do with SGML; they are excluded because they vary from country to country, or because for historical reasons their use results in data corruption in transit. Their use in local processing is not forbidden. See section 3.1.5. 26. Names of data entry personnel should be recorded in the TEI header. R. Section 4.3.1 describes the conventions for recording responsibility defined by relevant practices and standards for publishing, bibliographic work, and library work; like typesetters and pressmen, data entry personnel are typically not listed as intellectually responsible for a work, despite their obvious importance in its production. See the International Standard Bibliographic Description (Computer Files). 27. The tags for names and abbreviations (sections 5.3.6 and 5.3.7) must be optional. R. They are, hence the use of 'may' in their introduction; see pp. ix and 3 on the syntactic forms used for required and optional tags. 28. List handling tags are too wordy and take too much for granted. R. ? 29. Section 5.4 applies only to post-input markup. R. Whether tags for editorial interventions are supplied during data capture or later is a choice for individual users and is not constrained by TEI P1; it is certainly possible to apply these tags (like any other) either during or after data capture. 30. The example given for critical apparatus is trivial. R. True; all the real examples available were too hard to comprehend and none had all the structural features which needed to be exhibited. The example was purposely kept trivial to allow the user to concentrate on the problems of intellectual structure in the description of textual variations. 31. Lack of variants should not be recorded explicitly. R. What is meant by 'lack of variants'? 32. Experts in text criticism should be consulted for the tags for critical apparatus. R. We have done so and will continue to do so. Those who helped develop one or more of the methods presented here have been involved, at a conservative estimate, as editors or as technical advisors in the production of several score of critical editions and three major software packages which deal (inter alia) with critical apparatus and multi-versioned text. 33. Direct quotation, indirect quotation, indirect discourse, free indirect discourse, authorial comment, description, and narration cannot reliably be distinguished from each other and should not be tagged. R. Where these features are not clear, it is unlikely that anyone would want to tag them; see sections 5.1, 5.3.2, and 5.3.3 (pp. 71, 78, and 81). Scholars who use these concepts, however, are apt to feel that they have the expertise to distinguish them reliably in some cases. Later users of an encoding are not required to believe every tag in the file. 34. The bibliographic tagging in section 5.3.7 is too cumbersome, especially for use in data capture. R. There is no bibliographic tagging in section 5.3.7; it contains an example of optional tags for marking types of abbreviations. No suggestion is made as to when such a tag might be applied; a researcher wishing to tag the feature will presumably be in a position to choose a convenient time. (During the development work, interest in having tags available for abbreviations was expressed primarily in connection with machine translation and similar applications, often pursued with large staffs and substantial computational resources. It seems objectively unlikely that many literary scholars will feel the need to mark abbreviations in this way, though there are examples of encoding schemes for literary work which do mark examples.) 35. Section 5.8.1 proposes a tag for sentence boundaries, which assumes that sentence boundaries can be known. R. Not so; section 5.8.1 proposes a tag for arbitrary segmentation of a text into units convenient for analysis, and describes its possible application not to 'sentences' (a linguistic unit) but to orthographic sentences (an orthographic unit), which are by definition marked explicitly in the copy text. See p. 103. See also the work of Rosanne Potter for a good example of the use of such division of a text into orthographic sentence units or utterances. 36. Consistent use of presentational markup would avoid the problems that arise when descriptive markup is not feasible for some reason. R. True; it would also avoid the advantages of descriptive markup. See Coombs et al., 1988, and DeRose et al., 1990, for lucid explanations of the problems with presentational markup. 37. The example from Richardson's Clarissa in section 5.11.1 does not identify the copy text or give page and line numbers. R. True; neither is germane to the point being illustrated. 38. In the example from Richardson, the word 'Anglice' is marked as Latin, but it is not found in Lewis and Short; is it really Latin? R. 'Anglice' is a regular formation (of late Antiquity or more likely of the middle ages) from 'Angli'; since the English language postdates the classical period, it is unsurprising to find the word missing from Lewis and Short. See OED s.v. 'Anglic' and 'Anglice'; see also Webster's Second International, which marks the word as non-English. 39. In the Richardson example, it is unclear whether the italics mark quotation, emphasis, or irony. R. The example given reflects this uncertainty by using the tag 'highlighted' for italics which the encoder felt unable to disambiguate; the possibility of disagreement on the interpretation of other italics is preserved by the explicit recording in each case of the typographic feature being interpreted. 40. The use of tags like DIV0 and DIV1 will frighten literary scholars. Blank lines should be used instead. R. Literary scholars using GML, Script, TeX, LaTeX, Scribe, troff, and the scores of other text-handling programs which use structural tags generically similar to DIV0 have not been notably frightened by the experience. 41. The second example date in section 5.3.11 should end the tagged date after 'seventy-seven', not after 'Eighty-Sixth', to be consistent with the interpreted value. R. Not so; 1977 was also the 201st year of the Republic and the 86th of the University from whose diploma the date was transcribed. The date simply gives the year three separate names; like regnal years and indictions these redundant designations are here interpreted as parts of the same date. 42. What is the meaning of the 'unit=absent' attribute-value pair for the MILESTONE tag? What is there to mark if the text is not present? R. The value 'absent' is intended for cases when MILESTONE is used to mark multiple canonical reference schemes in the same text. (E.g. the pagination of two commonly used editions, to allow the electronic text to be compared conveniently with either.) Whenever the electronic text contains material for which the canonical scheme provides no number at all, the MILESTONE mechanism needs a way to mark the end of the text properly referred to with the last preceding name or number. 'unit=absent' will be needed, among other cases, whenever the creator of a canonical scheme excluded as spurious some passage included in the copy text of the electronic version. 43. Section 7.1 uses the term 'narrative' in the sense 'prose'. R. ? As indicated by the coexistence of 'verse' and 'drama', the subdivisions of section 7.3 are not and cannot be interpreted as non-overlapping. The substantive remarks of 7.3.3 apply (or are intended to apply) to all narrative in whatever form; they have no application to non-narrative prose. 44. Section 7.3.2.1 engages in overkill by specifying Cordelia as the speaker both in an element and in an attribute. Why? R. Not so. The copy text gives the speaker as 'Cor.', not 'Cordelia'. The encoder in this case is careful to retain the ambiguity of the copy text (Cornwall is also a character in the play and might be the speaker here) since it can dramatically affect the interpretation of the scene, while registering the opinion that the speaker is in fact Cordelia. The SPEAKER tag is optional because such cases are not common. 45. Section 7.3.2.1 does not contribute to the problem of attaching to each sentence or word of a play the identity of its speaker. R. Not so; the SP attribute performs just such an attachment in a way understood by any SGML processor, as does (in a different way) the SPEAKER tag. 46. Cast list should also include date and location of first performance. R. Thank you. 47. The confusion between '1' and '2' and 'Francisco' and 'Barnardo' is messy. R. Yes it is; with the exception of some pedagogically motivated simplifications, however, this encoding faithfully represents the confusion of the copy text, where the speeches are labeled with '1' and '2'. 48. The DTD for drama is unusable. R. Why?