Summary of Substantive and Rhetorical Points and Queries in AI3W5,
with draft replies.

CMSMcQ, 15 Feb 91

1.  Guidelines need a theoretical introduction which defines 'text',
'tag', 'hierarchy', etc.

R.  See section 2.1; for 'hierarchy' esp. 2.1.5.2.

2.  Question:  are multiple hierarchic structures (physical, formal,
grammatical, semantic, actantial, narrative, psychological, etc.) (a)
all definable as hierarchies in SGML, (b) taggable in the TEI scheme?

R.  Yes to both; see 2.1.5.2 on tagging multiple structures, 5.6.1 on
physical structures, 7.3.3.1 on (nested) narrative structures, and
chapter 6 on grammatical annotations at any level.  Structures for
semantic, actantial, and psychological tagging are not now provided as
part of the Guidelines but may be defined by the user either using the
analytic structures defined in chapter 6 or by defining a concurrent tag
set using the mechanisms defined in chapters 2 and 8.

3.  Can SGML handle richness of expression and multiple levels of
meaning?

R.  Chapter 6 shows, we think, that it can.

4.  Discussion of highlighting and font shifts pp. 78, 124 seems to
imply reliance on authorial intention; such a reliance ceased being
intellectually respectable about 1940.

R.  Authorial intention may certainly be used as a criterion for tagging
font shifts by anyone who believes it important and recoverable, but the
tagging of underlying features is no more linked to the theory of
authorial intention than is the act of reading a text for comprehension.
That authors are responsible for the font shifts and other accidentals
of printed texts is in any case widely doubted by analytic
bibliographers for the period 1450-1900.  See Philip Gaskell, A New
Introduction to Bibliography (Oxford, 1972), part I.

5.  Literary work requires that electronic texts be stable and not
subject to change.

R.  ?  This seems non-obvious; many projects aim at the enrichment of
texts in electronic form, which involves at least some change in the
files.  See for example document TEI AI3 W4 (Literature Needs Survey
Results), respondent 36 on items E, F, and G.  The stability of a text
in a given environment is wholly in the control of the owner of the
electronic representation, who may ensure its stability in whatever way
is found appropriate.

6.  P. 4 alludes to macros and parsers but gives no examples.  If they
exist, examples should be listed here.

R.  Macros suitable for entering TEI tags with single keystrokes may be
made on the fly in any word processor or editor with a macro facility;
it is planned to provide examples in tutorial documentation.  The TEI
does not attempt to define, recommend, or constrain the methods to be
used for data capture (see p. 3); whether macros are used or not, and
what form they should take, is an individual choice not prescribed by
the TEI.

7. Section 2.1.4 recommends embedding an interpretation of a text into
its DTD; this should be changed.

R.  Not so; section 2.1.4 observes that since any encoding of a text is
necessarily interpretive; a fortiori, the document type declaration
which defines a class of syntactically permitted encodings constitutes
an interpretation of the class of documents.  This state of affairs is
described as inescapable; no judgement is offered as to its felicity.

8.  The guidelines should make explicit the distinction between
interpretive markup (e.g. tags for emphatic phrases and foreign words)
and non-interpretive markup (e.g. tags for font shifts).

R.  Agreement as to which features are 'interpretive' and which are not
has not thus far been attained.  Typographic and paleographic features
are subject to disagreements as sharp as linguistic and rhetorical ones;
no attempt to classify features as interpretive or non-interpretive (or
objective and subjective) has come close to achieving consensus among
the participants in the TEI.  There has been somewhat more sentiment in
favor of the view that all tags are interpretive; see e.g. section 5.1
(p. 71) and 5.11.1 (p. 124).

9.  Markup minimization should be explicitly encouraged.

R.  Minimization in data capture and local processing is at the
discretion of the individual user; see sec.  1.1.3-1.1.4 and 2.2.2 (pp.
2-3 and 34).

10.  Text structure should be made clear by format of the file, not by
explicit tags for text structure.

R.  The use of file format to convey structural information may be
attempted locally in data capture of restricted sets of documents, by
means of the SGML SHORTREF feature; see sec. 2.2.2 (p. 34).  Effective
use of this feature exploits structural characteristics of particular
classes of documents and is hard to reconcile with a general-purpose
encoding scheme.

11.  Explicit coding of text structure will be error-prone and hard to
verify.  Implicit coding by means of file format should be preferred.

R.  Explicit coding allows automatic verification (e.g. by SGML
software) of a document's structural validity; see sec. 2.1.1.2 (pp.
10-11).  Recognition of structure on the basis of page layout and
similar clues is a non-trivial application which occupies a growing
branch of artificial intelligence research.

12.  Pagination and lineation frequently vary with the printing, not
just the edition, of a text; printer and date of printing should be
required in the TEI header.

R.  By definition, the term 'edition' applies to the set of volumes
produced from one setting of a printed text; except for stop-press
repagination of the text, pagination and lineation can thus never vary
within an edition.  'Edition' is also often used of reissues at a later
date, even when the same plates are used; different 'editions' in this
sense may have the same pagination, but the same 'edition' in either
sense will not vary in pagination.  See Gaskell, pp. 313-316; AACR 2 pp.
28-29, 59-60, OED s.v. 'edition', and TEI P1, 4.3.2 (p. 59).

13.  Section 5.1 (p. 71) definition of 'text' is incorrect.  Not all
texts are extended and spoken discourse is not 'text' until written
down.

R.  Section 5.1 does not define 'text' in the abstract, but attempts to
describe the usage of TEI P1.  'Extended' is a relative term; TEI P1
does not now address issues of encoding texts shorter than a paragraph
or a sentence, and there have been thus far no suggestions that such
issues need addressing.

14.  Line numbers are important methods of locating specific passages
of text and should be recommended for general use.

R.  Tags for line numbering are provided in section 5.6, which also
discusses the importance of reference schemes in general.

15.  The word 'colophon' is not one everyone can be expected to know.

R.  True.  Being the standard term it is briefer and clearer than any
alternative thus far found; it is defined in Webster's Seventh
Collegiate for non-literary scholars.

16.  In Pleiade editions, the colophon appears in the front matter.
[The 'colophon' tag should therefore be allowed in the front matter.]

R.  Colophons appear by definition at the end of the book; see Webster's
7th and OED, s.v. 'colophon'.  When the information appears in the front
matter, it should be encoded there using the title page or front.part
tags.

17.  Line breaks should be mentioned as a possible constituent of
paragraphs in section 5.3.1

R.  Line breaks belong to a different hierarchy; see section 5.6 and
section 1.B of the document being commented on.

18.  The Literary Needs Survey made abundantly clear
that line numbering is required by literary scholars and should be
recommended in all cases.

R.  ?  Only three respondents to item B of the survey mention line
breaks explicitly, as "not always important", "less helpful", and "not
important unless the first edition is a printed edition".  Three
respondents specify general approval of the list which includes line
breaks; one indicates general disapproval.

19.  Normal practice in literary study of prose texts is to refer to
page and line numbers.

R.  Not so; the usual practice documented by style sheets is (for prose)
page reference alone; the MLA style sheet discourages citation by lines
from prose texts.  See Chicago Manual of Style, p. 000; MLA Handbook, p.
000; Turabian, p. 000.  See also Fortier, Voyage, p. 000; Potter, Blort,
p. 000.

20.  Section 7.3.1.1 requires (p. 177) the specification of the METER attribute
in every line; this should not be so.

R.  Section 7.3.1.1 observes that when meter is irregular a single
specification of meter at the stanza or canto level cannot capture the
entire metrical pattern.  Like all attributes not described as required,
the METER attribute is optional; see p. ix and p. 3 on the formulation
of requirements, recommendations, and neutral observations in TEI P1.

21.  The prescription for rendering rhyme pattern made in section 7.3.1.2
is too prescriptive and should be loosened.

R.  Not so; section 7.3.1.2 suggests giving the rhyme pattern as an
unrestricted string of characters, since it is not usually possible to
make a closed list of possible values.

22.  In cases where an older and more authoritative method of
identifying specific passages exists, as in the Bible, it should be used
in preference to page and line numbers from the source text.

R.  Yes; see section 5.6 on reference systems.  It would be wrong,
though, to describe Biblical versification either as 'authoritative' or
as 'older' than printed Bibles, since it postdates Gutenberg's work and
was instituted by a publisher, not by any Biblical author or any
(secular or religious) authority.

23.  Explicit lists should be given of required and optional tags.

R.  Excellent suggestion; this will be done.

24.  Short tags should be explicitly recommended for local processing,
expanding on the recommendation to that effect in section 1.1.2.

R.  No such recommendation is made in section 1.1.2; the current draft
makes no recommendations at all as to the specific codes used for local
processing; see sections 1.1.3 and 1.1.4.

25.  Exclamation point, pound sign, and square brackets should be
allowed in interchange.  SGML should not take precedence over the needs
of scholars.

R.  Hear, hear.  The exclusion of these characters from the ISO 646
Subset has nothing to do with SGML; they are excluded because they vary
from country to country, or because for historical reasons their use
results in data corruption in transit.  Their use in local processing is
not forbidden.  See section 3.1.5.

26.  Names of data entry personnel should be recorded in the TEI header.

R.  Section 4.3.1 describes the conventions for recording responsibility
defined by relevant practices and standards for publishing,
bibliographic work, and library work; like typesetters and pressmen,
data entry personnel are typically not listed as intellectually
responsible for a work, despite their obvious importance in its
production.  See the International Standard Bibliographic Description
(Computer Files).

27.  The tags for names and abbreviations (sections 5.3.6 and 5.3.7)
must be optional.

R.  They are, hence the use of 'may' in their introduction; see pp. ix
and 3 on the syntactic forms used for required and optional tags.

28.  List handling tags are too wordy and take too much for granted.

R.  ?

29.  Section 5.4 applies only to post-input markup.

R.  Whether tags for editorial interventions are supplied during data
capture or later is a choice for individual users and is not constrained
by TEI P1; it is certainly possible to apply these tags (like any other)
either during or after data capture.

30.  The example given for critical apparatus is trivial.

R.  True; all the real examples available were too hard to comprehend
and none had all the structural features which needed to be exhibited.
The example was purposely kept trivial to allow the user to concentrate
on the problems of intellectual structure in the description of textual
variations.

31.  Lack of variants should not be recorded explicitly.


R.  What is meant by 'lack of variants'?

32.  Experts in text criticism should be consulted for the tags for
critical apparatus.

R.  We have done so and will continue to do so.  Those who helped
develop one or more of the methods presented here have been involved, at
a conservative estimate, as editors or as technical advisors in the
production of several score of critical editions and three major
software packages which deal (inter alia) with critical apparatus and
multi-versioned text.

33.  Direct quotation, indirect quotation, indirect discourse,
free indirect discourse, authorial comment, description, and narration
cannot reliably be distinguished from each other and should not be
tagged.

R.  Where these features are not clear, it is unlikely that anyone would
want to tag them; see sections 5.1, 5.3.2, and 5.3.3 (pp. 71, 78, and
81).  Scholars who use these concepts, however, are apt to feel that
they have the expertise to distinguish them reliably in some cases.
Later users of an encoding are not required to believe every tag in the
file.

34.  The bibliographic tagging in section 5.3.7 is too cumbersome,
especially for use in data capture.

R.  There is no bibliographic tagging in section 5.3.7; it contains an
example of optional tags for marking types of abbreviations.  No
suggestion is made as to when such a tag might be applied; a researcher
wishing to tag the feature will presumably be in a position to choose a
convenient time.

(During the development work, interest in having tags available for
abbreviations was expressed primarily in connection with machine
translation and similar applications, often pursued with large staffs
and substantial computational resources.  It seems objectively unlikely
that many literary scholars will feel the need to mark abbreviations in
this way, though there are examples of encoding schemes for literary
work which do mark examples.)

35.  Section 5.8.1 proposes a tag for sentence boundaries, which
assumes that sentence boundaries can be known.

R.  Not so; section 5.8.1 proposes a tag for arbitrary segmentation of a
text into units convenient for analysis, and describes its possible
application not to 'sentences' (a linguistic unit) but to orthographic
sentences (an orthographic unit), which are by definition marked
explicitly in the copy text.  See p. 103.  See also the work of Rosanne
Potter for a good example of the use of such division of a text into
orthographic sentence units or utterances.

36.  Consistent use of presentational markup would avoid the problems
that arise when descriptive markup is not feasible for some reason.

R.  True; it would also avoid the advantages of descriptive markup.  See
Coombs et al., 1988, and DeRose et al., 1990, for lucid explanations of
the problems with presentational markup.

37.  The example from Richardson's Clarissa in section 5.11.1 does
not identify the copy text or give page and line numbers.

R.  True; neither is germane to the point being illustrated.

38.  In the example from Richardson, the word
'Anglice' is marked as Latin, but it is not found in Lewis and Short; is
it really Latin?

R. 'Anglice' is a regular formation (of late Antiquity or more likely of
the middle ages) from 'Angli'; since the English language postdates the
classical period, it is unsurprising to find the word missing from Lewis
and Short.  See OED s.v. 'Anglic' and 'Anglice'; see also Webster's
Second International, which marks the word as non-English.

39.  In the Richardson example, it is unclear whether the italics mark
quotation, emphasis, or irony.

R.  The example given reflects this uncertainty by using the tag
'highlighted' for italics which the encoder felt unable to disambiguate;
the possibility of disagreement on the interpretation of other italics
is preserved by the explicit recording in each case of the typographic
feature being interpreted.

40.  The use of tags like DIV0 and DIV1 will frighten literary scholars.
Blank lines should be used instead.

R.  Literary scholars using GML, Script, TeX, LaTeX, Scribe, troff, and
the scores of other text-handling programs which use structural tags
generically similar to DIV0 have not been notably frightened by the
experience.

41.  The second example date in section
5.3.11 should end the tagged date after 'seventy-seven', not after
'Eighty-Sixth', to be consistent with the interpreted value.

R.  Not so; 1977 was also the 201st year of the Republic and the 86th of
the University from whose diploma the date was transcribed.  The date
simply gives the year three separate names; like regnal years and
indictions these redundant designations are here interpreted as parts of
the same date.

42.  What is the meaning of the 'unit=absent'
attribute-value pair for the MILESTONE tag?  What is there to mark if
the text is not present?

R.  The value 'absent' is intended for cases when MILESTONE is used to
mark multiple canonical reference schemes in the same text.  (E.g. the
pagination of two commonly used editions, to allow the electronic text
to be compared conveniently with either.)  Whenever the electronic text
contains material for which the canonical scheme provides no number at
all, the MILESTONE mechanism needs a way to mark the end of the text
properly referred to with the last preceding name or number.
'unit=absent' will be needed, among other cases, whenever the creator of
a canonical scheme excluded as spurious some passage included in the
copy text of the electronic version.

43.  Section 7.1 uses the term 'narrative' in the sense 'prose'.

R. ?  As indicated by the coexistence of 'verse' and 'drama', the
subdivisions of section 7.3 are not and cannot be interpreted as
non-overlapping.  The substantive remarks of 7.3.3 apply (or are
intended to apply) to all narrative in whatever form; they have no
application to non-narrative prose.

44.  Section 7.3.2.1 engages in overkill by specifying Cordelia as
the speaker both in an element and in an attribute.  Why?

R.  Not so.  The copy text gives the speaker as 'Cor.', not 'Cordelia'.
The encoder in this case is careful to retain the ambiguity of the copy
text (Cornwall is also a character in the play and might be the speaker
here) since it can dramatically affect the interpretation of the scene,
while registering the opinion that the speaker is in fact Cordelia.  The
SPEAKER tag is optional because such cases are not common.

45.  Section 7.3.2.1 does not contribute to the problem of attaching
to each sentence or word of a play the identity of its speaker.

R.  Not so; the SP attribute performs just such an attachment in a way
understood by any SGML processor, as does (in a different way) the
SPEAKER tag.

46.  Cast list should also include date and location of first performance.

R.  Thank you.

47.  The confusion between '1' and '2' and 'Francisco' and 'Barnardo'
is messy.

R.  Yes it is; with the exception of some pedagogically motivated
simplifications, however, this encoding faithfully represents the
confusion of the copy text, where the speeches are labeled with '1' and
'2'.

48.  The DTD for drama is unusable.

R.  Why?