The TEI Guidelines (Version 1.1 10/90): A Critique
TEI Literary Studies Work Group (AI3)
18 October 1991
Table of Contents
This critique of version P1.1 of the TEI Guidelines was
drafted by the five members of the Literature Texts Work Group.
These people work with texts in four natural languages, several
literary genres and periods from the Middle Ages to the present.
Among them they have recorded several million words of text,
directed the development of several software systems, and
published several dozen articles and half a dozen books based on
computer analyses of texts; the methodology of these publications
varies from traditional literary history to advanced statistical
analyses.
Much of the following critique is based on the Survey of the
needs of scholars in literature carried out by the Work Group; to
which some forty interdisciplinary producing scholars responded.
A copy of the results of this Survey is available from the TEI
Project. A preliminary version of this critique was circulated
to the Editors of the project, and Michael Sperberg-McQueen's
responses to it have been extremely helpful in arriving at this
final version.
The Work Group is impressed by the finished character of the
current version of the Guidelines document, and the almost total
absence of typographic errors. As people who work with and
generate texts on a daily basis, we recognize the amount of
effort which such an achievement represents. We wish to begin by
expressing high praise for the current Guidelines as the result
of concentrated and efficacious work on a difficult problem.
Michael and Lou should be particularly singled out for this
praise.
The comments which follow are offered in a spirit of friendly
collaboration in the hope that that will make an impressive
document even better and will bring it more closely into
conformity with the needs and perspectives of scholars working
with literature.
The Work Group understands that the TEI is proposing a coding
system for interchange, not for entry of texts. We realize also
that many things are suggested as options, not as requirements.
It must however also be recognized that simple considerations of
efficiency -- it is practical to have a locally standard code as
close as possible to the interchange code -- will tend to foster
the use of TEI codes at the local level; ASCII was originally
proposed as an interchange code; it is now a standard for
alphanumeric representation.
The very polished and comprehensive nature of the present
Guidelines, also, means that there will be a tendency for them to
become standards, both for interchange and local processing, and
even data entry; this possibility must be faced and taken into
account as they are drafted. By a similar process optional
codes, in the absence of clear distinction between the optional
and the required, will tend to be considered as recommended or
required, in spite of occasional or implicit indications to the
contrary.
Three of the Poughkeepsie principles bear on this matter.
- 2. The Guidelines are also intended to suggest principles for the
encoding of texts in the same format.
- 5. The Guidelines should include a minimal set of conventions for
encoding new texts in the format.
- 9. Conversion of existing machine-readable texts to the new
format involves the translation of their conventions into the
syntax of the new format. No requirements will be made for the
addition of information not already coded in the texts.
It is our opinion that these three principles are of particular
importance to scholars in literature, and that they are not
sufficiently reflected in the current version of the Guidelines.
Our reasons for this opinion will become clear in the rest of
this report.
Like most practitioners of an intellectual discipline,
Literature Scholars are accustomed to working from a
methodological perspective. The Guidelines would profit greatly
from a theoretical introduction, making clear what is meant by
such terms as "text", "tag", "hierarchy", etc. The fragments of
discussion of this topic found here and there in the Guidelines
(e.g. p. 71) are not adequate for this purpose. We realise that
generating such definitions will not be an easy task given that
in a printed text titles, footnotes, and variants are clearly
tags to the text, but in a TEI text they are treated as text.
How the nature of text and tag changes as a result of a change in
medium is not at all clear.
Similarly, we in literature recognize in a single text a
plethora of structures: physical (page and line breaks), formal
(parts, chapters, paragraphs), grammatical, semantic, actantial,
narrative, psychological, and so on. Each can be deemed
hierarchical from certain perspectives. Do the Guidelines permit
all of these structures to be defined as hierarchies? Does it
require such definition for their manipulation? Does it allow
them to be handled simultaneously so that their interrelations
can be examined? The suggestions for treating parallel texts in
5.10.12 (pp. 122-3) and elsewhere are not very clear on these
matters.
Literary texts usually aim at richness of expression and
multiplicity of levels of possible meaning. Can SGML-based
Guidelines integrate this basic characteristic of literature, or
do they attempt to abolish it?
We realise that these are vexed questions, recalcitrant to
simple answers, particularly when one accepts -- as we do with
high praise -- the principle enunciated by the linguists (p. 130)
that all theoretical positions must be welcomed by the
Guidelines, but no one must be given pride of place. On the
other hand, we consider it crucial for the acceptance of the
Guidelines by our constituency that a thoughtful discussion of
these matters be found at the beginning of the Guidelines
document. For instance, the discussion of highlighting on pp. 78
and 124 would seem, in the absence of such a discussion, to be
based on the premise that authorial intention is discernible from
the text; such a premise ceased being intellectually respectable
in our field about fifty years ago.
The pragmatics of work on literature texts is also a source of
concern in a number of areas.
The scholar in literature typically works with large amounts
of data, since computer processing is used mainly when it is not
practical to commit a text to memory.
These scholars are concerned mainly with inputting texts as
rapidly and with as reasonable a cost as possible, verifying it
as effectively and cheaply as possible, and getting on as quickly
as possible with the analytic work which was their reason for
working with the machine.
Except when they are generating a canonical text, literature
scholars work with a specific edition of a text which is
considered canonical in the sense that it is the one which is
cited and quoted in serious professional work. According to
situations, this specific edition will be a critical edition, a
prestigeous edition, a trade edition. They will want to refer
easily to pages and lines in this text. That the electronic
version of this be stable and not subject to change other than to
correct errors is also a requirement. This perspective is made
perfectly clear in the responses to the Survey and in the
practices of the great repositories of machine-readable texts,
like the Tresor de la langue francaise.
Literature scholars are not interested in, in fact many object
vehemently to, the perspective of obtaining texts which already
contain - explicitly or implicitly - literary interpretations.
The responses and comments elicited by the Survey bear eloquent
witness to this.
For these reasons we recommend that the Guidelines clearly
distinguish between a minimal set of required tags and a wide
range of optional tags to be used at the discretion of the text
preparer.
The present version of the Guidelines is not in harmony with
our perspective. Some Examples:
- p. 1 (1.1.1) The statement is made that the Guidelines "are also
intended to provide both guidance to the scholar embarking on the
creation of an electronic text, both as to what textual features
should be captured and as to how they should be represented". We
do not find such a claim appropriate in what is clearly becoming
a technical manual, not a user's guide. We consider that such a
claim constitutes a dangerous trap for the neophyte. It should
be removed.
- p. 4 para 3. States that full tags need not be entered by hand,
and allusion is made to macros or parsers; no examples are
furnished, no names or references are furnished. Here again we
are concerned about about the effect on the neophyte. If macros
and parsers exist, examples of both should be provided here and
at least half of the examples in the rest of the document should
show their use.
-
p. 15 (2.1.4) Recommends embedding a given interpretation into
mark up at the time of data capture or conversion in the form of
a DTD. The Survey clearly indicates that most scholars of
literature strongly oppose finding interpretation already in
texts which they receive. To recommend embodying such
interpretation in an interchange format is paradoxical to say the
least.
- It is recognized that all coding can be seen as a kind of
interpretation but a fundamental distinction must be made here.
A certain character is or is not in italic; once the way of
representing italic has been decided, a simple either-or decision
carrying very little intellectual content will resolve the
matter. Why a word is italicised is open to a number of
interpretations; scholars legitimately may not agree on which one
or ones are valid. This is interpretation in the usual sense,
and is the domain of the scholar working on the completed text,
not that of the coder inputting or converting the text.
Recommendations overlooking this distinction will alienate the
vast majority of literature people working with computer. The
Survey has made this clear.
- p. 16 (2.1.4.2) Minimisation rules are a good idea. Examples
(note the plural) should be provided.
-
p. 23, the example. The coding is much too wordy; the poem,
which is tiny, disappears under mass of the codes. Responses to
the Survey and discussions on TEI-LIST have made clear the dismay
of the scholarly community with this wordiness. Minimisation
will have to be carried much further, and software will have to
be developed with a feature similar to the reveal codes/hide
codes function on many word processors. This is not a minor
problem but points to an underlying reality. If structural
features are indicated by format, this indication suffices.
Those features which require explicit coding will be more
complex, more prone to error, more difficult to enter
consistently, more difficult to verify and proofread. Scholars
are not likely to undertake such onerous tasks whose results will
be so fragile.
- It should be recalled that in the final analysis the success
of the TEI standards will depend on their acceptance and use by
the scholarly community.
- In general, the very wordy nature of the tags recalls an
archaic period in computing, when the user was expected to
specify everything to the machine. A more contemporary and user-
friendly mode of tagging is expected by current users and must be
sought, since few users can be expected to put up with such
wordiness any more.
- p. 28 (2.1.7) Entity Reference (string substitution). This is
excellent. It must be stressed more, alluded to more, and shown
frequently in examples.
-
p. 55 (4.1.4) Since most scholarly work in literature is based
on a canonical text, in which pagination and lineation frequently
varies with the PRINTING not just the edition; it is essential to
identify the date of printing and the print shop in the header
material of a machine readable file of a text based on a printed
edition. Reference back to the original, verification and
proofreading are impossible without them.
-
p. 62. We suggest putting print shop and date of printing between
the information on the publication and that of the distribution.
This would also be the appropriate place to identify the location
and shelf mark for manuscripts and incunabulae.
-
p. 65 (4.5) The encoding declarations are of course the ideal
place to put allusions to and/or explanations of the local coding
conventions. Please stress this fact here. In fact, we
recommend making it a condition of conformity to TEI standards
that local coding for features not available on the key-board
used (font changes, accented letters, etc.), be documented in a
header record.
-
p. 71 (5.1), para 1. The definition of text, "an extended
stretch of natural discourse, whether written or spoken", is not
correct. Not all texts are extended. Spoken natural discourse
is not text until transcribed in written form.
-
p. 71 (5.1), para 5. Again, the ability to point to a unique
place in the text of the original printed document is essential
to the needs of literature scholars. This must be stressed here
and shown in the examples. The Survey is eloquent on this
matter.
-
p. 77 (5.2.5) Colophon -- not a term everyone can be expected to
know. Note that the Pleiade edition shows this as front matter.
Given the practical importance of printing date and print shop
information included here, we recommend that it be put at the
beginning of the file, right after the publisher identification.
-
p. 77 (5.3.1) Given their importance for locating a quoted or
identified passage, line breaks should be mentionned here and
their importance stressed. The Survey made this abundantly
clear.
-
p. 93 (5.6) A strong recommendation to code page breaks:
EXCELLENT. Please put in as strong or a stronger recommendation
to code line breaks, i.e. always put in unless there is a
compelling reason not to do so, even in prose texts. To do
otherwise would be to ignore the contribution of the scholars who
participated in the Survey.
-
pp. 125-6 (5.11.2) Information about the layout of the edition
input (i.e. page and line breaks), which permits reference back
to the original text being studied, is crucial to the needs of
most literature scholars. To state that the "line-break" tag "is
intended only for cases where lineation of a prose text is
considered of importance in its own right" (p. 126), suggests
that such reference is rare, whereas it is THE NORM. It MUST NOT
be downplayed in this fashion.
- Our judgement, confirmed by the Survey, is that most scholars
use electronic text in a fashion that requires the ability to
make unambiguous reference back to a precise place in canonical
printed text on which it is based. Thus lineation of a prose
text is always considered important a priori, unless for cases
like the Bible, a clear case can be made for coding in a
different fashion. In short, the suggestion that lineation can
somehow not be important in a text runs counter to the needs and
practices of scholars of literature.
- p. 177 (7.3.1.1) It is not necessary to specify the metre
attribute in every line. That is the work of the analyst not the
archivist or the scanner corrector.
-
p. 178 (7.3.1.2) Even for rhyme of type "aa" French prosody
recognizes at least three types: rime suffisante (not necessarily
the same as assonance), rime pauvre, and rime riche. Perhaps
this should also be taken into account. BETTER, given the range
of languages to which the Guidelines are to apply and the large
number of prosodic systems in question, perhaps the Guidelines
should not be so prescriptive. The Work Group expects to work on
optional codes for such things, once more pressing requirements
of literature scholars have been attended to.
-
p. 200 Putting tags, entities and redefinitions in a separate
file for calling up by many texts is an excellent idea.
Unfortunately the example is not at all clear, and makes this
seem much more complex and confusing than it is or need be.
-
pp. 207-09. It is a trap for the unwary and an irritation to the
experienced to show the suppression of typographical information
(line breaks) in an extended example like this. The
justification that the edition used wasn't very good - "the
edition being used is of little editorial interest in itself"
(208) - makes things worse; poor editions should not be converted
to machine-readable form!
-
pp. 219-33 (A.6) We agree that in the case of the Bible the
older and more authoritative method of identifying passages
should prevail.
The Guidelines recommend three levels of coding:
- Required in any TEI conformant document (e.g. Title, author,
etc.)
-
Required for interchange, but a more succinct local code is
recommended (e.g. accented letters, non-roman alphabetics).
-
Optional e.g. <Word in italics because of irony, unless the
author really meant just to try to represent the intonation of
the speaker> really </word in italics because of irony, unless the
author really meant just to try to represent the intonation of
the speaker>. It is not always easy to tell which is which from
the present version of the document. This distinction must be
made clear.
We recommend a very small number of required codes: just what
is necessary to identify fully the edition and printing used and
to find a given passage in it in terms of pages and lines,
divisions into chapters, acts and scenes, cantos, or books, etc.,
the character set used, and the representation used for features
in the text but not in the character set (i.e. accented letters,
font changes). All other codes must be optional. Examples of
optional codes should be furnished. We repeat that the
distinction between the two types must be made abundantly clear
even to the uninformed, casual or negligent reader.
In our view, a possible method would be to separate out each
type and group them as required, or optional. An alternate
method would be to tag each heading with a parenthetical
indication of which class each tag or tag type belongs to. The
optimum method would be to do both.
Further comments on coding levels follow:
- p. 1 (1.1.2) The Guidelines recommend the use of simpler and less
wordy codes in a local environment, which codes are to be
translated into full TEI coding for interchange. EXCELLENT!!!
BRAVO!!! PLEASE DO MORE OF THIS! It should be made very clear
that this is the RECOMMENDED approach. Examples of existing
coding schemes upgradable to TEI level taken from existing
archives should be given. Other examples (made up for the
purpose) should be given. It must be made clear to the user that
clean, clear and easy codes are to be the NORM for local use, and
that the full TEI codes are for interchange and possibly archive
purposes only.
-
p. 4 para 5. Interchange format does not allow any tag
reduction. This is legitimate. But it MUST be made clearer that
local minimization is encouraged, as long as automatic upgrading
to full TEI codes is possible from the local code.
-
pp. 13-14 The examples are the perfect place to show a local
code first, then the full TEI code.
-
pp. 45-52 (3.2) Character Sets. It MUST be made clear that this
applies to interchange only. Local codes MUST be recommmended
and SHOWN which are easy to input and easy to use on a screen and
printer of MAC, DOS and Mainframe machines (at least 2 sets of
examples for each of the three). Preferably get some from
existing databases and some from the various forms of 8859.
- The exclusion of such an important punctuation mark as the
exclamation mark puts a needless coding burden on scholars. This
exclusion should be removed. SGML should not take precedence
over the needs of scholars. Similar arguments can be made in
favour of the pound sign and square brackets.
- p. 58 (para 4) The exclusion of recording the names of the
person or persons who actually did the recording work reveals an
inappropriate class and/or gender bias. Please delete this
paragraph.
-
pp. 58-59 The examples provide an excellent opportunity to show
both local codes and TEI codes.
-
p. 59 (4.3.2) para 5. The changes listed "corrections of mis-
spellings of data, changes in the arrangements of the contents,
changes in the output format", are not in fact minor. This
paragraph contradicts p. 55 (4.1.6). Please clarify, or better
still, choose. pp. 82-83 (5.3.6, 5.3.7) It MUST be made clear
that these very wordy and error-prone features are optional.
Please try to cut down their length. It is essential to warn the
potential user of their complexity and of the difficulty of
coding them accurately in a text of any size. Their optional
nature MUST be made more clear. In their present state they are
counter productive, both because of their wordiness and because
of the technical naivete which such wordiness embodies.
-
pp. 84-6 (5.3.8) List handling is excessively wordy and takes too
much for granted. There must be an example of a simplified local
code as well as the full TEI code here.
-
pp. 86-89 (5.3.11) Numbers: a perfect example here of a trap for
the unwary. Only "may" on p. 87 shows that this extremely wordy
coding is optional.
-
pp. 89-90 (5.4) This is a good idea but for a post-input markup.
This fact must be made clear and encouraged. Mention that this
is a relatively rare occurrence.
-
p. 93 (5.6.1) It is absolutely necessary to have an example here
and to show both local and TEI formats.
-
p. 94 (5.6.1) It is absolutely necessary here to have an example
and to show both local and TEI coding. It is very doubtful that
any scholar or dritic will ever use this kind of coding.
Something more straightforward and user friendly is required.
-
p. 97 (5.6.4) Seems to suggest only fully explicit coding in
milestones. You really need to show brief local codes here, PLUS
their expansion into TEI codes.
-
p. 103 (5.8.1) Explicit tagging of sentences. This is overkill.
This must be clearly indicated as optional and another part needs
to be added suggesting how to set up a local code permitting
automatic conversion to this level of coding.
-
pp. 110 ff (5.10.3) The examples from pp. 110 through 117 are
prime candidates for examples of both local and full TEI codes.
The Critical Edition example is particularly weak. The example
is trivial. The only clear presentation is the uncoded one. The
explicit and wordy recording of the lack of variants, and the use
of "&zero.var" for omissions, are bizarre in the extreme and flie
in the face of a millenium of scholarly practice. This attempt
to reduce three parallel texts to a single linearly expressed
notation is clearly defective. The text has been destroyed and
converted into an unreadable list of real and potential variants.
- The prime function of any text is to be read. This conversion
has destroyed the text as text. Reference must be made to
experts in this domain and their advice must be followed. Here
again we hope to work on this, once more fundamental questions
have been resolved.
- p. 170 (7.2.1) The encoding declarations are an EXCELLENT idea
and to be encouraged, indeed made required. They also foster the
definition of local standards which can be converted
automatically into TEI format.
-
pp. 207-09 A perfect place for a two-step example the first part
showing local code, the second showing TEI code.
Here are discussed the two types of coding Presentational
(capital letters, line breaks, italics, etc.), and Descriptive
(Proper noun, italics showing irony, stress or a foreign word,
etc.)
Our perspective is that coding (inputting or converting text)
is not the same as interpreting. Descriptive coding as presented
in the Guidelines is squarely in the domain of interpretation.
Most scholars do not want interpreted texts; they expect to do
that job themselves. They made this abundantly clear in the
Survey; we must not ignore them. When possible scholars hire
assistants to input texts, and do not expect these assistants to
do the interpretation. This whole aspect needs to be brought
into conformity with scholarly practice, otherwise the TEI
standards will not be respected.
To repeat one-to-one conversion of typographical features is
not controversial; it should be done as faithfully as possible.
It must be a requirement in a TEI conformant text. Coding or
interpretation in the sense of description of authorial
"intention" or the choice among several alternatives on the basis
of judgement is a different matter, which is designated
descriptive coding. It can be allowed but never recommended.
The Guidelines are quite unclear on this matter, and seem to make
conflicting suggestions in different places.
Descriptive mark up can at the limit be made an option for
those who feel they must do it. But it must be made clear that
such tagging is OPTIONAL and NOT REQUIRED.
Comments on details follow:
- p. 12 (2.1.2) Direct quotation, indirect quotation, indirect
discourse, free indirect discourse, authorial comment,
description or narration -- all of these aspects of a text can
blend one into another. Which is which is open to interpretation
and debate. It is ludicrous to tag them as if such distinctions
could be made once and for all. Not only must the optional nature
of such tagging be stressed, but potential users must be
cautioned to exercise prudence in such coding, to define
categories carefully, to test them by hand on small samples and
shake them down on larger samples of electronic text, before
undertaking the tagging of a full text.
-
p. 71 (5.1) Presentational mark up is allowed here, as well as
descriptive. NO! Presentational mark up should be recommended,
with descriptive at most recognized as possible if one wants to
use it, but with warnings against it. The examples will have to
be revised.
-
pp. 77-78, 88, etc. The concept of crystals (or the choice of
term) is not made clear, the examples are difficult to follow.
Revision seems in order.
-
pp. 78-9 (5.3.2) This section is presented primarily in terms of
descriptive mark up, which is wrong. The presentational should
be recommended, if only because it avoids the excessive wordiness
of the descriptive approach. The wordiness of the so-called
presentational mark up must be reduced, for example "highlighted
rendition=italic" can be replaced with "ital" without any loss of
information. In fact, the longer form is more descriptive than
presentational. The earlier examples of handling of the
underlying features of italics, require so subjective an
interpretation that any scientific rigour in a text coded using
them would be destroyed.
-
pp. 79-81 (5.3.3) Do NOT recommend tagging of underlying
features, just the opposite. Stick with the <q> </q> for open
and close quotes, suggest something else for block quotes, e.g.
<bq>. Remind the user that she can use open and close quotes or
guillemets (other things for embedded quotes) for a local code
and have a conversion program take care of the rest.
- "Guillemets" by the way is used in the plural. There is no
such thing as a single guillemet. What you show as such are
greater than and less than signs. What is the use of 66U, etc.
when character set tables are in the appendix?
- The recommendation to use "rendition = unmarked" (p. 80) with
"q" is bizarre in the extreme. Many readers, and some of the
better software, can be expected to identify an item as unmarked
without the aid of a specific tag.
- pp. 81-82 (5.3.4, 5.3.5) Perfect traps for the unwary. This is
interpretation and dependant on time; it adds unnecessary work,
confusion and possibility for error. Particularly true in the
case of "croissant" (p. 81) and in the example on p. 82.
-
p. 83 (5.3.7) If anyone in our community sees the bibliographic
tagging on 83, the TEI is a dead letter. The issues of how to
handle names, abbreviaations in names etc. is important and not
easy for programers to deal with but if this level of coding has
to be done at the capture or transmission stage, we assure you,
no one will use TEI. (Sorry archivists and programmers might, but
no one who is putting text into machine readable form in order to
do anything critical or scholarly with it will ever do this kind
of hiding of information in layers and layers of codes.
-
p. 103 (5.8.1) Explicit tagging of sentences. This takes for
granted that such can be known, which is not the case for
numerous poets, and even novelists since the l930's cf. Celine,
Simon, etc. in French. Here is an excellent example of why
descriptive coding is wrong.
-
p. 105 (last para) It is most questionable whether one should
EVER remove an interpretable feature from a text and replace it
by an interpretation. Not only does this make impossible
verification of the data (it has to be re-interpreted not
proofread) but it also involves the coder usurping the role of
the scholar who does the interpretation.
-
p. 123 (5.11) Here presentational mark up is described as
exceptional and extraordinary, earlier it was presented as a
valid alternative; consistent standards never hurt.
- More important, presentational mark up should BE the standard,
with descriptive only an option which is allowed with cautions.
-
p. 123 (5.11) Use of "descriptive" in line one and of
"presentation" in line 4 shows the problem presented by the SGML
approach. If presentational markup had been used from the start
as the sine qua non -- none of this would be a problem.
-
p. 124 (5.11.1) The example. What edition was used? What are
the page and line boundaries? Or was this all made up too?
- This example is a perfect demonstration of the weakness of
descriptive mark up: "Anglice" is not found in the standard
Latin dictionary (Lewis and Short). What are we dealing with
here? Are the italics quotes, emphasis or ironic? Let the coder
code and leave the interpretation to the scholar.
-
p. 176 (7.3) First, according to certain schools of
interpretation texts can and should be regarded in isolation, and
it is not the place of the TEI to pass judgement on this question
of literary theory.
- Second, presentational mark up is essential because the
Guidelines deal with coding a text, not its interpretation. The
role of a given textual feature is ALWAYS open to interpretation,
so the function of a good coding scheme is to facilitate
interpretation, not pre-empt it.
-
p. 214 (bottom) The Hamlet example. The stage type describes
only the first half of the stage direction; this is the problem
with descriptive tagging.
- Someone should try to reduce the wordiness of this tagging,
particularly in the case of the speaker distinctions.
This section contains comments that do not fit easily into the
categories used above.
- pp. 75-76 (5.2.4) Why use <div0> etc.? The names given to the
sections by the author are the text. If the author choses to use
a number "I" or "2" surrounded by blank space that is what SGML
should do. It if cannot code blank lines and blanks, then we are
in rather serious trouble as literature scholars. We will be
forced to describe, when presentation is what we want to do.
This whole section is really designed for programmers, not for
people in our area -- this type of material will only frighten
users away from the Guidelines; it is virtually incomprehensible
and in the long run not even true. There are alternatives other
than the one listed, using the facts of the text, rather than any
imposed divisions: large or small.
-
p. 76 (5.2.4) The distinction between legal and illegal forms is
not clear. In any case the legalistic terminology is not
appropriate.
-
p. 79 (line 7). The "second" sentence. TYPO. It is the only
sentence in the example unless the TEI standards have subtleties
which escaped the committee.
-
p. 88 (example 1) TYPO. </date> must go after "seventy-seven" if
you care to be consistent with the date coded earlier as
1977-06-12.
-
p. 90 (example after <del>) "Dumb clucks": Belittling the reader
in this fashion is not amusing; it is offensive. Remove it and
find a real example from a real text.
-
p. 95 Assumes exactly what we do not want to assume: "text has
been entered without preserving pagination". No need for
artificial reference scheme; one already exists (the page numbers
and carriage returns at the end of the lines).
-
p. 96 (4.6.2) What can it mean to mark as "absent" a piece of
text that is not present? What exactly is there to be marked?
-
p. 105 (5.8.2) Soft hyphens EXIST in source texts. Please
suggest more clearly how to handle them when they occur.
-
pp. 110 ff. (5.10.3). Find a real text for a real example here.
The imaginary and "humourous" one trivialises what is being done.
-
p. 129 (6.1) para 2. Trying to define forms with no reference to
content is a mug's game. The whole concept of structure shows
that form determines content and content determines form, in
varying degrees according to the context, example, and
interpretative perspective, of course. In other words, you must
create unanimity among the community of scholars BEFORE you can
define the forms they can use. Not a practicable enterprise.
-
p. 130 (6.1) The principle for linguistics (welcome all
theoretical positions, favour none) is EXCELLENT. We recommend
the same thing for literature; this is the basic premise of most
of the preceding comments.
-
pp. 140-44 (6.2.4) Incredibly wordy and unreadable coding for
linguistic features. If the linguists consider this a good idea,
more power to them. We recommend not getting into this for
literature texts.
-
p. 169 (7.1) "verse, drama and narrative". Narrative used in the
sense of prose. Not all prose is narrative (cf. Cook books, or
the TEI Guidelines), not even all literary prose is narrative
(some is descriptive). If you are going to try to dictate, or
even make suggestions, to scholars in literature, you must get
the technical language right, and "sermons guidebooks, recipe
books, etc." (p. 176) are NOT narratives, formal or otherwise, in
any accepted sense of the word.
-
p 180 (7.3.2.1) Overkill if both speaker and speech tell that the
speaker is Cordelia -- why not just say so once by recognizing
the abbreviation of the speaker's name that is in the text to be
the "tag" that it is. The real problem in dealing with speech in
plays is that the speaker's tag needs to appear with each
sentence (or all the words) of long speeches. Identifying "Cor"
as <speaker> Cor. </speaker> does not contribute to solving this
problem.
-
p. 180 (7.3.2.2) Excellent example of giving the simple tag,
mentioning that some investigators may want to also encode this,
that and the other, but not giving prescriptive examples.
-
p. 181 (7.3.2.4) French texts of plays also show the date and
place of the first production as well as the names of the actors.
You should provide for this.
-
p. 181 (7.3.3) Use PROSE not narrative, to include the essay and
free form creations (cf. Butor's works).
-
p. 207 -- the original of this example is a printed document, not
scanner output. Please begin by showing the original not an
intermediary stage of processing.
-
P. 215 The idea of removing speaker tags, then identify them as
speaker 1 and speaker 2, but then to actually give them names in
the speech tag that follows, is to say the least messy. Either
the speaker is tagged in the text or is not.
-
p. 215 Note "Mar.Marc" -- clearly a leftover fragment of a
redundant tag.
-
p. 270 Alternate Base for DTD for drama---If this goes out to
any public other than programers, then the TEI standards will not
be used. Give us one reason why anyone would want to.
- Place of insertion to be chosen: Concern was expressed in the
Work Group about the integrity of electronic texts. Simply
counting the size of a file in bytes does not guarantee that one
can recognize modifications in it. Shareware exists which
generates a unique number for a text; a number which will change
if any modifications are made to it. Please look into the
possibility of recommending such software, or better recommending
that it, and the number generated by it be included with archived
or shared texts.