The TEI Guidelines (Version 1.1 10/90): A Critique

TEI Literary Studies Work Group (AI3)

18 October 1991

1 Background

1 Background

This critique of version P1.1 of the TEI Guidelines was drafted by the five members of the Literature Texts Work Group. These people work with texts in four natural languages, several literary genres and periods from the Middle Ages to the present. Among them they have recorded several million words of text, directed the development of several software systems, and published several dozen articles and half a dozen books based on computer analyses of texts; the methodology of these publications varies from traditional literary history to advanced statistical analyses.

Much of the following critique is based on the Survey of the needs of scholars in literature carried out by the Work Group; to which some forty interdisciplinary producing scholars responded. A copy of the results of this Survey is available from the TEI Project. A preliminary version of this critique was circulated to the Editors of the project, and Michael Sperberg-McQueen's responses to it have been extremely helpful in arriving at this final version.

1.1 Perspective

The Work Group is impressed by the finished character of the current version of the Guidelines document, and the almost total absence of typographic errors. As people who work with and generate texts on a daily basis, we recognize the amount of effort which such an achievement represents. We wish to begin by expressing high praise for the current Guidelines as the result of concentrated and efficacious work on a difficult problem. Michael and Lou should be particularly singled out for this praise.

The comments which follow are offered in a spirit of friendly collaboration in the hope that that will make an impressive document even better and will bring it more closely into conformity with the needs and perspectives of scholars working with literature.

The Work Group understands that the TEI is proposing a coding system for interchange, not for entry of texts. We realize also that many things are suggested as options, not as requirements. It must however also be recognized that simple considerations of efficiency -- it is practical to have a locally standard code as close as possible to the interchange code -- will tend to foster the use of TEI codes at the local level; ASCII was originally proposed as an interchange code; it is now a standard for alphanumeric representation.

The very polished and comprehensive nature of the present Guidelines, also, means that there will be a tendency for them to become standards, both for interchange and local processing, and even data entry; this possibility must be faced and taken into account as they are drafted. By a similar process optional codes, in the absence of clear distinction between the optional and the required, will tend to be considered as recommended or required, in spite of occasional or implicit indications to the contrary.

Three of the Poughkeepsie principles bear on this matter.

1.1.1 The Poughkeepsie Principles

2. The Guidelines are also intended to suggest principles for the encoding of texts in the same format.
5. The Guidelines should include a minimal set of conventions for encoding new texts in the format.
9. Conversion of existing machine-readable texts to the new format involves the translation of their conventions into the syntax of the new format. No requirements will be made for the addition of information not already coded in the texts.

It is our opinion that these three principles are of particular importance to scholars in literature, and that they are not sufficiently reflected in the current version of the Guidelines. Our reasons for this opinion will become clear in the rest of this report.

1.1.2 The Perspective of the Literature Scholar

Like most practitioners of an intellectual discipline, Literature Scholars are accustomed to working from a methodological perspective. The Guidelines would profit greatly from a theoretical introduction, making clear what is meant by such terms as "text", "tag", "hierarchy", etc. The fragments of discussion of this topic found here and there in the Guidelines (e.g. p. 71) are not adequate for this purpose. We realise that generating such definitions will not be an easy task given that in a printed text titles, footnotes, and variants are clearly tags to the text, but in a TEI text they are treated as text. How the nature of text and tag changes as a result of a change in medium is not at all clear.

Similarly, we in literature recognize in a single text a plethora of structures: physical (page and line breaks), formal (parts, chapters, paragraphs), grammatical, semantic, actantial, narrative, psychological, and so on. Each can be deemed hierarchical from certain perspectives. Do the Guidelines permit all of these structures to be defined as hierarchies? Does it require such definition for their manipulation? Does it allow them to be handled simultaneously so that their interrelations can be examined? The suggestions for treating parallel texts in 5.10.12 (pp. 122-3) and elsewhere are not very clear on these matters.

Literary texts usually aim at richness of expression and multiplicity of levels of possible meaning. Can SGML-based Guidelines integrate this basic characteristic of literature, or do they attempt to abolish it?

We realise that these are vexed questions, recalcitrant to simple answers, particularly when one accepts -- as we do with high praise -- the principle enunciated by the linguists (p. 130) that all theoretical positions must be welcomed by the Guidelines, but no one must be given pride of place. On the other hand, we consider it crucial for the acceptance of the Guidelines by our constituency that a thoughtful discussion of these matters be found at the beginning of the Guidelines document. For instance, the discussion of highlighting on pp. 78 and 124 would seem, in the absence of such a discussion, to be based on the premise that authorial intention is discernible from the text; such a premise ceased being intellectually respectable in our field about fifty years ago.

The pragmatics of work on literature texts is also a source of concern in a number of areas.

The scholar in literature typically works with large amounts of data, since computer processing is used mainly when it is not practical to commit a text to memory.

These scholars are concerned mainly with inputting texts as rapidly and with as reasonable a cost as possible, verifying it as effectively and cheaply as possible, and getting on as quickly as possible with the analytic work which was their reason for working with the machine.

Except when they are generating a canonical text, literature scholars work with a specific edition of a text which is considered canonical in the sense that it is the one which is cited and quoted in serious professional work. According to situations, this specific edition will be a critical edition, a prestigeous edition, a trade edition. They will want to refer easily to pages and lines in this text. That the electronic version of this be stable and not subject to change other than to correct errors is also a requirement. This perspective is made perfectly clear in the responses to the Survey and in the practices of the great repositories of machine-readable texts, like the Tresor de la langue francaise.

Literature scholars are not interested in, in fact many object vehemently to, the perspective of obtaining texts which already contain - explicitly or implicitly - literary interpretations. The responses and comments elicited by the Survey bear eloquent witness to this.

For these reasons we recommend that the Guidelines clearly distinguish between a minimal set of required tags and a wide range of optional tags to be used at the discretion of the text preparer.

The present version of the Guidelines is not in harmony with our perspective. Some Examples:

p. 1 (1.1.1) The statement is made that the Guidelines "are also intended to provide both guidance to the scholar embarking on the creation of an electronic text, both as to what textual features should be captured and as to how they should be represented". We do not find such a claim appropriate in what is clearly becoming a technical manual, not a user's guide. We consider that such a claim constitutes a dangerous trap for the neophyte. It should be removed.
p. 4 para 3. States that full tags need not be entered by hand, and allusion is made to macros or parsers; no examples are furnished, no names or references are furnished. Here again we are concerned about about the effect on the neophyte. If macros and parsers exist, examples of both should be provided here and at least half of the examples in the rest of the document should show their use.
p. 15 (2.1.4) Recommends embedding a given interpretation into mark up at the time of data capture or conversion in the form of a DTD. The Survey clearly indicates that most scholars of literature strongly oppose finding interpretation already in texts which they receive. To recommend embodying such interpretation in an interchange format is paradoxical to say the least.
- It is recognized that all coding can be seen as a kind of interpretation but a fundamental distinction must be made here. A certain character is or is not in italic; once the way of representing italic has been decided, a simple either-or decision carrying very little intellectual content will resolve the matter. Why a word is italicised is open to a number of interpretations; scholars legitimately may not agree on which one or ones are valid. This is interpretation in the usual sense, and is the domain of the scholar working on the completed text, not that of the coder inputting or converting the text. Recommendations overlooking this distinction will alienate the vast majority of literature people working with computer. The Survey has made this clear.
p. 16 (2.1.4.2) Minimisation rules are a good idea. Examples (note the plural) should be provided.
p. 23, the example. The coding is much too wordy; the poem, which is tiny, disappears under mass of the codes. Responses to the Survey and discussions on TEI-LIST have made clear the dismay of the scholarly community with this wordiness. Minimisation will have to be carried much further, and software will have to be developed with a feature similar to the reveal codes/hide codes function on many word processors. This is not a minor problem but points to an underlying reality. If structural features are indicated by format, this indication suffices. Those features which require explicit coding will be more complex, more prone to error, more difficult to enter consistently, more difficult to verify and proofread. Scholars are not likely to undertake such onerous tasks whose results will be so fragile.
- It should be recalled that in the final analysis the success of the TEI standards will depend on their acceptance and use by the scholarly community.
- In general, the very wordy nature of the tags recalls an archaic period in computing, when the user was expected to specify everything to the machine. A more contemporary and user- friendly mode of tagging is expected by current users and must be sought, since few users can be expected to put up with such wordiness any more.
p. 28 (2.1.7) Entity Reference (string substitution). This is excellent. It must be stressed more, alluded to more, and shown frequently in examples.
p. 55 (4.1.4) Since most scholarly work in literature is based on a canonical text, in which pagination and lineation frequently varies with the PRINTING not just the edition; it is essential to identify the date of printing and the print shop in the header material of a machine readable file of a text based on a printed edition. Reference back to the original, verification and proofreading are impossible without them.
p. 62. We suggest putting print shop and date of printing between the information on the publication and that of the distribution. This would also be the appropriate place to identify the location and shelf mark for manuscripts and incunabulae.
p. 65 (4.5) The encoding declarations are of course the ideal place to put allusions to and/or explanations of the local coding conventions. Please stress this fact here. In fact, we recommend making it a condition of conformity to TEI standards that local coding for features not available on the key-board used (font changes, accented letters, etc.), be documented in a header record.
p. 71 (5.1), para 1. The definition of text, "an extended stretch of natural discourse, whether written or spoken", is not correct. Not all texts are extended. Spoken natural discourse is not text until transcribed in written form.
p. 71 (5.1), para 5. Again, the ability to point to a unique place in the text of the original printed document is essential to the needs of literature scholars. This must be stressed here and shown in the examples. The Survey is eloquent on this matter.
p. 77 (5.2.5) Colophon -- not a term everyone can be expected to know. Note that the Pleiade edition shows this as front matter. Given the practical importance of printing date and print shop information included here, we recommend that it be put at the beginning of the file, right after the publisher identification.
p. 77 (5.3.1) Given their importance for locating a quoted or identified passage, line breaks should be mentionned here and their importance stressed. The Survey made this abundantly clear.
p. 93 (5.6) A strong recommendation to code page breaks: EXCELLENT. Please put in as strong or a stronger recommendation to code line breaks, i.e. always put in unless there is a compelling reason not to do so, even in prose texts. To do otherwise would be to ignore the contribution of the scholars who participated in the Survey.
pp. 125-6 (5.11.2) Information about the layout of the edition input (i.e. page and line breaks), which permits reference back to the original text being studied, is crucial to the needs of most literature scholars. To state that the "line-break" tag "is intended only for cases where lineation of a prose text is considered of importance in its own right" (p. 126), suggests that such reference is rare, whereas it is THE NORM. It MUST NOT be downplayed in this fashion.
- Our judgement, confirmed by the Survey, is that most scholars use electronic text in a fashion that requires the ability to make unambiguous reference back to a precise place in canonical printed text on which it is based. Thus lineation of a prose text is always considered important a priori, unless for cases like the Bible, a clear case can be made for coding in a different fashion. In short, the suggestion that lineation can somehow not be important in a text runs counter to the needs and practices of scholars of literature.
p. 177 (7.3.1.1) It is not necessary to specify the metre attribute in every line. That is the work of the analyst not the archivist or the scanner corrector.
p. 178 (7.3.1.2) Even for rhyme of type "aa" French prosody recognizes at least three types: rime suffisante (not necessarily the same as assonance), rime pauvre, and rime riche. Perhaps this should also be taken into account. BETTER, given the range of languages to which the Guidelines are to apply and the large number of prosodic systems in question, perhaps the Guidelines should not be so prescriptive. The Work Group expects to work on optional codes for such things, once more pressing requirements of literature scholars have been attended to.
p. 200 Putting tags, entities and redefinitions in a separate file for calling up by many texts is an excellent idea. Unfortunately the example is not at all clear, and makes this seem much more complex and confusing than it is or need be.
pp. 207-09. It is a trap for the unwary and an irritation to the experienced to show the suppression of typographical information (line breaks) in an extended example like this. The justification that the edition used wasn't very good - "the edition being used is of little editorial interest in itself" (208) - makes things worse; poor editions should not be converted to machine-readable form!
pp. 219-33 (A.6) We agree that in the case of the Bible the older and more authoritative method of identifying passages should prevail.

1.2 Coding Levels

The Guidelines recommend three levels of coding:

Required in any TEI conformant document (e.g. Title, author, etc.)
Required for interchange, but a more succinct local code is recommended (e.g. accented letters, non-roman alphabetics).
Optional e.g. <Word in italics because of irony, unless the author really meant just to try to represent the intonation of the speaker> really </word in italics because of irony, unless the author really meant just to try to represent the intonation of the speaker>. It is not always easy to tell which is which from the present version of the document. This distinction must be made clear.

We recommend a very small number of required codes: just what is necessary to identify fully the edition and printing used and to find a given passage in it in terms of pages and lines, divisions into chapters, acts and scenes, cantos, or books, etc., the character set used, and the representation used for features in the text but not in the character set (i.e. accented letters, font changes). All other codes must be optional. Examples of optional codes should be furnished. We repeat that the distinction between the two types must be made abundantly clear even to the uninformed, casual or negligent reader.

In our view, a possible method would be to separate out each type and group them as required, or optional. An alternate method would be to tag each heading with a parenthetical indication of which class each tag or tag type belongs to. The optimum method would be to do both.

Further comments on coding levels follow:

p. 1 (1.1.2) The Guidelines recommend the use of simpler and less wordy codes in a local environment, which codes are to be translated into full TEI coding for interchange. EXCELLENT!!! BRAVO!!! PLEASE DO MORE OF THIS! It should be made very clear that this is the RECOMMENDED approach. Examples of existing coding schemes upgradable to TEI level taken from existing archives should be given. Other examples (made up for the purpose) should be given. It must be made clear to the user that clean, clear and easy codes are to be the NORM for local use, and that the full TEI codes are for interchange and possibly archive purposes only.
p. 4 para 5. Interchange format does not allow any tag reduction. This is legitimate. But it MUST be made clearer that local minimization is encouraged, as long as automatic upgrading to full TEI codes is possible from the local code.
pp. 13-14 The examples are the perfect place to show a local code first, then the full TEI code.
pp. 45-52 (3.2) Character Sets. It MUST be made clear that this applies to interchange only. Local codes MUST be recommmended and SHOWN which are easy to input and easy to use on a screen and printer of MAC, DOS and Mainframe machines (at least 2 sets of examples for each of the three). Preferably get some from existing databases and some from the various forms of 8859.
- The exclusion of such an important punctuation mark as the exclamation mark puts a needless coding burden on scholars. This exclusion should be removed. SGML should not take precedence over the needs of scholars. Similar arguments can be made in favour of the pound sign and square brackets.
p. 58 (para 4) The exclusion of recording the names of the person or persons who actually did the recording work reveals an inappropriate class and/or gender bias. Please delete this paragraph.
pp. 58-59 The examples provide an excellent opportunity to show both local codes and TEI codes.
p. 59 (4.3.2) para 5. The changes listed "corrections of mis- spellings of data, changes in the arrangements of the contents, changes in the output format", are not in fact minor. This paragraph contradicts p. 55 (4.1.6). Please clarify, or better still, choose. pp. 82-83 (5.3.6, 5.3.7) It MUST be made clear that these very wordy and error-prone features are optional. Please try to cut down their length. It is essential to warn the potential user of their complexity and of the difficulty of coding them accurately in a text of any size. Their optional nature MUST be made more clear. In their present state they are counter productive, both because of their wordiness and because of the technical naivete which such wordiness embodies.
pp. 84-6 (5.3.8) List handling is excessively wordy and takes too much for granted. There must be an example of a simplified local code as well as the full TEI code here.
pp. 86-89 (5.3.11) Numbers: a perfect example here of a trap for the unwary. Only "may" on p. 87 shows that this extremely wordy coding is optional.
pp. 89-90 (5.4) This is a good idea but for a post-input markup. This fact must be made clear and encouraged. Mention that this is a relatively rare occurrence.
p. 93 (5.6.1) It is absolutely necessary to have an example here and to show both local and TEI formats.
p. 94 (5.6.1) It is absolutely necessary here to have an example and to show both local and TEI coding. It is very doubtful that any scholar or dritic will ever use this kind of coding. Something more straightforward and user friendly is required.
p. 97 (5.6.4) Seems to suggest only fully explicit coding in milestones. You really need to show brief local codes here, PLUS their expansion into TEI codes.
p. 103 (5.8.1) Explicit tagging of sentences. This is overkill. This must be clearly indicated as optional and another part needs to be added suggesting how to set up a local code permitting automatic conversion to this level of coding.
pp. 110 ff (5.10.3) The examples from pp. 110 through 117 are prime candidates for examples of both local and full TEI codes. The Critical Edition example is particularly weak. The example is trivial. The only clear presentation is the uncoded one. The explicit and wordy recording of the lack of variants, and the use of "&zero.var" for omissions, are bizarre in the extreme and flie in the face of a millenium of scholarly practice. This attempt to reduce three parallel texts to a single linearly expressed notation is clearly defective. The text has been destroyed and converted into an unreadable list of real and potential variants.
- The prime function of any text is to be read. This conversion has destroyed the text as text. Reference must be made to experts in this domain and their advice must be followed. Here again we hope to work on this, once more fundamental questions have been resolved.
p. 170 (7.2.1) The encoding declarations are an EXCELLENT idea and to be encouraged, indeed made required. They also foster the definition of local standards which can be converted automatically into TEI format.
pp. 207-09 A perfect place for a two-step example the first part showing local code, the second showing TEI code.

1.3 Coding Types

Here are discussed the two types of coding Presentational (capital letters, line breaks, italics, etc.), and Descriptive (Proper noun, italics showing irony, stress or a foreign word, etc.)

Our perspective is that coding (inputting or converting text) is not the same as interpreting. Descriptive coding as presented in the Guidelines is squarely in the domain of interpretation. Most scholars do not want interpreted texts; they expect to do that job themselves. They made this abundantly clear in the Survey; we must not ignore them. When possible scholars hire assistants to input texts, and do not expect these assistants to do the interpretation. This whole aspect needs to be brought into conformity with scholarly practice, otherwise the TEI standards will not be respected.

To repeat one-to-one conversion of typographical features is not controversial; it should be done as faithfully as possible. It must be a requirement in a TEI conformant text. Coding or interpretation in the sense of description of authorial "intention" or the choice among several alternatives on the basis of judgement is a different matter, which is designated descriptive coding. It can be allowed but never recommended. The Guidelines are quite unclear on this matter, and seem to make conflicting suggestions in different places.

Descriptive mark up can at the limit be made an option for those who feel they must do it. But it must be made clear that such tagging is OPTIONAL and NOT REQUIRED.

Comments on details follow:

p. 12 (2.1.2) Direct quotation, indirect quotation, indirect discourse, free indirect discourse, authorial comment, description or narration -- all of these aspects of a text can blend one into another. Which is which is open to interpretation and debate. It is ludicrous to tag them as if such distinctions could be made once and for all. Not only must the optional nature of such tagging be stressed, but potential users must be cautioned to exercise prudence in such coding, to define categories carefully, to test them by hand on small samples and shake them down on larger samples of electronic text, before undertaking the tagging of a full text.
p. 71 (5.1) Presentational mark up is allowed here, as well as descriptive. NO! Presentational mark up should be recommended, with descriptive at most recognized as possible if one wants to use it, but with warnings against it. The examples will have to be revised.
pp. 77-78, 88, etc. The concept of crystals (or the choice of term) is not made clear, the examples are difficult to follow. Revision seems in order.
pp. 78-9 (5.3.2) This section is presented primarily in terms of descriptive mark up, which is wrong. The presentational should be recommended, if only because it avoids the excessive wordiness of the descriptive approach. The wordiness of the so-called presentational mark up must be reduced, for example "highlighted rendition=italic" can be replaced with "ital" without any loss of information. In fact, the longer form is more descriptive than presentational. The earlier examples of handling of the underlying features of italics, require so subjective an interpretation that any scientific rigour in a text coded using them would be destroyed.
pp. 79-81 (5.3.3) Do NOT recommend tagging of underlying features, just the opposite. Stick with the <q> </q> for open and close quotes, suggest something else for block quotes, e.g. <bq>. Remind the user that she can use open and close quotes or guillemets (other things for embedded quotes) for a local code and have a conversion program take care of the rest.
- "Guillemets" by the way is used in the plural. There is no such thing as a single guillemet. What you show as such are greater than and less than signs. What is the use of 66U, etc. when character set tables are in the appendix?
- The recommendation to use "rendition = unmarked" (p. 80) with "q" is bizarre in the extreme. Many readers, and some of the better software, can be expected to identify an item as unmarked without the aid of a specific tag.
pp. 81-82 (5.3.4, 5.3.5) Perfect traps for the unwary. This is interpretation and dependant on time; it adds unnecessary work, confusion and possibility for error. Particularly true in the case of "croissant" (p. 81) and in the example on p. 82.
p. 83 (5.3.7) If anyone in our community sees the bibliographic tagging on 83, the TEI is a dead letter. The issues of how to handle names, abbreviaations in names etc. is important and not easy for programers to deal with but if this level of coding has to be done at the capture or transmission stage, we assure you, no one will use TEI. (Sorry archivists and programmers might, but no one who is putting text into machine readable form in order to do anything critical or scholarly with it will ever do this kind of hiding of information in layers and layers of codes.
p. 103 (5.8.1) Explicit tagging of sentences. This takes for granted that such can be known, which is not the case for numerous poets, and even novelists since the l930's cf. Celine, Simon, etc. in French. Here is an excellent example of why descriptive coding is wrong.
p. 105 (last para) It is most questionable whether one should EVER remove an interpretable feature from a text and replace it by an interpretation. Not only does this make impossible verification of the data (it has to be re-interpreted not proofread) but it also involves the coder usurping the role of the scholar who does the interpretation.
p. 123 (5.11) Here presentational mark up is described as exceptional and extraordinary, earlier it was presented as a valid alternative; consistent standards never hurt.
- More important, presentational mark up should BE the standard, with descriptive only an option which is allowed with cautions.
p. 123 (5.11) Use of "descriptive" in line one and of "presentation" in line 4 shows the problem presented by the SGML approach. If presentational markup had been used from the start as the sine qua non -- none of this would be a problem.
p. 124 (5.11.1) The example. What edition was used? What are the page and line boundaries? Or was this all made up too?
- This example is a perfect demonstration of the weakness of descriptive mark up: "Anglice" is not found in the standard Latin dictionary (Lewis and Short). What are we dealing with here? Are the italics quotes, emphasis or ironic? Let the coder code and leave the interpretation to the scholar.
p. 176 (7.3) First, according to certain schools of interpretation texts can and should be regarded in isolation, and it is not the place of the TEI to pass judgement on this question of literary theory.
Second, presentational mark up is essential because the Guidelines deal with coding a text, not its interpretation. The role of a given textual feature is ALWAYS open to interpretation, so the function of a good coding scheme is to facilitate interpretation, not pre-empt it.
p. 214 (bottom) The Hamlet example. The stage type describes only the first half of the stage direction; this is the problem with descriptive tagging.
Someone should try to reduce the wordiness of this tagging, particularly in the case of the speaker distinctions.

1.4 Other

This section contains comments that do not fit easily into the categories used above.

pp. 75-76 (5.2.4) Why use <div0> etc.? The names given to the sections by the author are the text. If the author choses to use a number "I" or "2" surrounded by blank space that is what SGML should do. It if cannot code blank lines and blanks, then we are in rather serious trouble as literature scholars. We will be forced to describe, when presentation is what we want to do. This whole section is really designed for programmers, not for people in our area -- this type of material will only frighten users away from the Guidelines; it is virtually incomprehensible and in the long run not even true. There are alternatives other than the one listed, using the facts of the text, rather than any imposed divisions: large or small.
p. 76 (5.2.4) The distinction between legal and illegal forms is not clear. In any case the legalistic terminology is not appropriate.
p. 79 (line 7). The "second" sentence. TYPO. It is the only sentence in the example unless the TEI standards have subtleties which escaped the committee.
p. 88 (example 1) TYPO. </date> must go after "seventy-seven" if you care to be consistent with the date coded earlier as 1977-06-12.
p. 90 (example after <del>) "Dumb clucks": Belittling the reader in this fashion is not amusing; it is offensive. Remove it and find a real example from a real text.
p. 95 Assumes exactly what we do not want to assume: "text has been entered without preserving pagination". No need for artificial reference scheme; one already exists (the page numbers and carriage returns at the end of the lines).
p. 96 (4.6.2) What can it mean to mark as "absent" a piece of text that is not present? What exactly is there to be marked?
p. 105 (5.8.2) Soft hyphens EXIST in source texts. Please suggest more clearly how to handle them when they occur.
pp. 110 ff. (5.10.3). Find a real text for a real example here. The imaginary and "humourous" one trivialises what is being done.
p. 129 (6.1) para 2. Trying to define forms with no reference to content is a mug's game. The whole concept of structure shows that form determines content and content determines form, in varying degrees according to the context, example, and interpretative perspective, of course. In other words, you must create unanimity among the community of scholars BEFORE you can define the forms they can use. Not a practicable enterprise.
p. 130 (6.1) The principle for linguistics (welcome all theoretical positions, favour none) is EXCELLENT. We recommend the same thing for literature; this is the basic premise of most of the preceding comments.
pp. 140-44 (6.2.4) Incredibly wordy and unreadable coding for linguistic features. If the linguists consider this a good idea, more power to them. We recommend not getting into this for literature texts.
p. 169 (7.1) "verse, drama and narrative". Narrative used in the sense of prose. Not all prose is narrative (cf. Cook books, or the TEI Guidelines), not even all literary prose is narrative (some is descriptive). If you are going to try to dictate, or even make suggestions, to scholars in literature, you must get the technical language right, and "sermons guidebooks, recipe books, etc." (p. 176) are NOT narratives, formal or otherwise, in any accepted sense of the word.
p 180 (7.3.2.1) Overkill if both speaker and speech tell that the speaker is Cordelia -- why not just say so once by recognizing the abbreviation of the speaker's name that is in the text to be the "tag" that it is. The real problem in dealing with speech in plays is that the speaker's tag needs to appear with each sentence (or all the words) of long speeches. Identifying "Cor" as <speaker> Cor. </speaker> does not contribute to solving this problem.
p. 180 (7.3.2.2) Excellent example of giving the simple tag, mentioning that some investigators may want to also encode this, that and the other, but not giving prescriptive examples.
p. 181 (7.3.2.4) French texts of plays also show the date and place of the first production as well as the names of the actors. You should provide for this.
p. 181 (7.3.3) Use PROSE not narrative, to include the essay and free form creations (cf. Butor's works).
p. 207 -- the original of this example is a printed document, not scanner output. Please begin by showing the original not an intermediary stage of processing.
P. 215 The idea of removing speaker tags, then identify them as speaker 1 and speaker 2, but then to actually give them names in the speech tag that follows, is to say the least messy. Either the speaker is tagged in the text or is not.
p. 215 Note "Mar.Marc" -- clearly a leftover fragment of a redundant tag.
p. 270 Alternate Base for DTD for drama---If this goes out to any public other than programers, then the TEI standards will not be used. Give us one reason why anyone would want to.
- Place of insertion to be chosen: Concern was expressed in the Work Group about the integrity of electronic texts. Simply counting the size of a file in bytes does not guarantee that one can recognize modifications in it. Shareware exists which generates a unique number for a text; a number which will change if any modifications are made to it. Please look into the possibility of recommending such software, or better recommending that it, and the number generated by it be included with archived or shared texts.