The TEI Guidelines (Version 1.1 10/90): A Critique


TEI Literary Studies Work Group (AI3)

18 October 1991

Table of Contents


1 Background

This critique of version P1.1 of the TEI Guidelines was drafted by the five members of the Literature Texts Work Group. These people work with texts in four natural languages, several literary genres and periods from the Middle Ages to the present. Among them they have recorded several million words of text, directed the development of several software systems, and published several dozen articles and half a dozen books based on computer analyses of texts; the methodology of these publications varies from traditional literary history to advanced statistical analyses.

Much of the following critique is based on the Survey of the needs of scholars in literature carried out by the Work Group; to which some forty interdisciplinary producing scholars responded. A copy of the results of this Survey is available from the TEI Project. A preliminary version of this critique was circulated to the Editors of the project, and Michael Sperberg-McQueen's responses to it have been extremely helpful in arriving at this final version.

1.1 Perspective

The Work Group is impressed by the finished character of the current version of the Guidelines document, and the almost total absence of typographic errors. As people who work with and generate texts on a daily basis, we recognize the amount of effort which such an achievement represents. We wish to begin by expressing high praise for the current Guidelines as the result of concentrated and efficacious work on a difficult problem. Michael and Lou should be particularly singled out for this praise.

The comments which follow are offered in a spirit of friendly collaboration in the hope that that will make an impressive document even better and will bring it more closely into conformity with the needs and perspectives of scholars working with literature.

The Work Group understands that the TEI is proposing a coding system for interchange, not for entry of texts. We realize also that many things are suggested as options, not as requirements. It must however also be recognized that simple considerations of efficiency -- it is practical to have a locally standard code as close as possible to the interchange code -- will tend to foster the use of TEI codes at the local level; ASCII was originally proposed as an interchange code; it is now a standard for alphanumeric representation.

The very polished and comprehensive nature of the present Guidelines, also, means that there will be a tendency for them to become standards, both for interchange and local processing, and even data entry; this possibility must be faced and taken into account as they are drafted. By a similar process optional codes, in the absence of clear distinction between the optional and the required, will tend to be considered as recommended or required, in spite of occasional or implicit indications to the contrary.

Three of the Poughkeepsie principles bear on this matter.

1.1.1 The Poughkeepsie Principles

It is our opinion that these three principles are of particular importance to scholars in literature, and that they are not sufficiently reflected in the current version of the Guidelines. Our reasons for this opinion will become clear in the rest of this report.

1.1.2 The Perspective of the Literature Scholar

Like most practitioners of an intellectual discipline, Literature Scholars are accustomed to working from a methodological perspective. The Guidelines would profit greatly from a theoretical introduction, making clear what is meant by such terms as "text", "tag", "hierarchy", etc. The fragments of discussion of this topic found here and there in the Guidelines (e.g. p. 71) are not adequate for this purpose. We realise that generating such definitions will not be an easy task given that in a printed text titles, footnotes, and variants are clearly tags to the text, but in a TEI text they are treated as text. How the nature of text and tag changes as a result of a change in medium is not at all clear.

Similarly, we in literature recognize in a single text a plethora of structures: physical (page and line breaks), formal (parts, chapters, paragraphs), grammatical, semantic, actantial, narrative, psychological, and so on. Each can be deemed hierarchical from certain perspectives. Do the Guidelines permit all of these structures to be defined as hierarchies? Does it require such definition for their manipulation? Does it allow them to be handled simultaneously so that their interrelations can be examined? The suggestions for treating parallel texts in 5.10.12 (pp. 122-3) and elsewhere are not very clear on these matters.

Literary texts usually aim at richness of expression and multiplicity of levels of possible meaning. Can SGML-based Guidelines integrate this basic characteristic of literature, or do they attempt to abolish it?

We realise that these are vexed questions, recalcitrant to simple answers, particularly when one accepts -- as we do with high praise -- the principle enunciated by the linguists (p. 130) that all theoretical positions must be welcomed by the Guidelines, but no one must be given pride of place. On the other hand, we consider it crucial for the acceptance of the Guidelines by our constituency that a thoughtful discussion of these matters be found at the beginning of the Guidelines document. For instance, the discussion of highlighting on pp. 78 and 124 would seem, in the absence of such a discussion, to be based on the premise that authorial intention is discernible from the text; such a premise ceased being intellectually respectable in our field about fifty years ago.

The pragmatics of work on literature texts is also a source of concern in a number of areas.

The scholar in literature typically works with large amounts of data, since computer processing is used mainly when it is not practical to commit a text to memory.

These scholars are concerned mainly with inputting texts as rapidly and with as reasonable a cost as possible, verifying it as effectively and cheaply as possible, and getting on as quickly as possible with the analytic work which was their reason for working with the machine.

Except when they are generating a canonical text, literature scholars work with a specific edition of a text which is considered canonical in the sense that it is the one which is cited and quoted in serious professional work. According to situations, this specific edition will be a critical edition, a prestigeous edition, a trade edition. They will want to refer easily to pages and lines in this text. That the electronic version of this be stable and not subject to change other than to correct errors is also a requirement. This perspective is made perfectly clear in the responses to the Survey and in the practices of the great repositories of machine-readable texts, like the Tresor de la langue francaise.

Literature scholars are not interested in, in fact many object vehemently to, the perspective of obtaining texts which already contain - explicitly or implicitly - literary interpretations. The responses and comments elicited by the Survey bear eloquent witness to this.

For these reasons we recommend that the Guidelines clearly distinguish between a minimal set of required tags and a wide range of optional tags to be used at the discretion of the text preparer.

The present version of the Guidelines is not in harmony with our perspective. Some Examples:

1.2 Coding Levels

The Guidelines recommend three levels of coding:

  1. Required in any TEI conformant document (e.g. Title, author, etc.)
  2. Required for interchange, but a more succinct local code is recommended (e.g. accented letters, non-roman alphabetics).
  3. Optional e.g. <Word in italics because of irony, unless the author really meant just to try to represent the intonation of the speaker> really </word in italics because of irony, unless the author really meant just to try to represent the intonation of the speaker>. It is not always easy to tell which is which from the present version of the document. This distinction must be made clear.

We recommend a very small number of required codes: just what is necessary to identify fully the edition and printing used and to find a given passage in it in terms of pages and lines, divisions into chapters, acts and scenes, cantos, or books, etc., the character set used, and the representation used for features in the text but not in the character set (i.e. accented letters, font changes). All other codes must be optional. Examples of optional codes should be furnished. We repeat that the distinction between the two types must be made abundantly clear even to the uninformed, casual or negligent reader.

In our view, a possible method would be to separate out each type and group them as required, or optional. An alternate method would be to tag each heading with a parenthetical indication of which class each tag or tag type belongs to. The optimum method would be to do both.

Further comments on coding levels follow:

1.3 Coding Types

Here are discussed the two types of coding Presentational (capital letters, line breaks, italics, etc.), and Descriptive (Proper noun, italics showing irony, stress or a foreign word, etc.)

Our perspective is that coding (inputting or converting text) is not the same as interpreting. Descriptive coding as presented in the Guidelines is squarely in the domain of interpretation. Most scholars do not want interpreted texts; they expect to do that job themselves. They made this abundantly clear in the Survey; we must not ignore them. When possible scholars hire assistants to input texts, and do not expect these assistants to do the interpretation. This whole aspect needs to be brought into conformity with scholarly practice, otherwise the TEI standards will not be respected.

To repeat one-to-one conversion of typographical features is not controversial; it should be done as faithfully as possible. It must be a requirement in a TEI conformant text. Coding or interpretation in the sense of description of authorial "intention" or the choice among several alternatives on the basis of judgement is a different matter, which is designated descriptive coding. It can be allowed but never recommended. The Guidelines are quite unclear on this matter, and seem to make conflicting suggestions in different places.

Descriptive mark up can at the limit be made an option for those who feel they must do it. But it must be made clear that such tagging is OPTIONAL and NOT REQUIRED.

Comments on details follow:

1.4 Other

This section contains comments that do not fit easily into the categories used above.