Electronic Textual Editing: When not to use TEI [John Lavagnino]

The Text Encoding Initiative Guidelines: describe an approach to transcribing texts for digital representation; if the principles underlying that approach seem appropriate for your scholarly project, then the next question is whether your texts are really the right kind of texts. Earlier chapters of this book discuss a number of common genres that are specifically covered, but this is not an exhaustive list; the TEI did not intend to rule out any of the many sorts of text that interest scholars. It is true that there are no chapters in this book or in the Guidelines on encoding cookbooks, newspapers, guidebooks, quotation dictionaries, instruction manuals, commonplace books, or mail-order catalogues, but it is not because they have been considered and found wanting; there just hasn't been space or occasion to discuss them specifically. For the most part these particular genres can be readily handled using the TEI's provisions for encoding prose texts, with the addition of some other elements for their distinctive features. If you are a scholarly editor of texts then the TEI is applicable to your texts.

But scholarly editions involve the creation of new writing as well as work on existing texts: editions usually include introductions and commentary in some form, and may extend to such things as analytical essays, catalogues of sources or witnesses, and bibliographies. The TEI Guidelines: apply just as well to creating new texts as to transcribing old ones, but other approaches may be a better choice for some highly-structured collections of information: for example, the Encoded Archival Description (EAD) DTD is designed for archival finding aids, of the sort that an edition might create in the course of its work, and a finding aid in this form would be more useful to a library than a TEI-encoded description. Similarly, the MASTER DTD is specifically adapted for encoding descriptions of manuscripts; it is also based on the TEI DTD, and indeed is now under consideration for incorporation into a future version of the TEI Guidelines. A project that involves the creation of new software might choose to use the DocBook DTD for that software's documentation, as it is a DTD designed for that purpose and there are existing tools for using such information in ways that users of software need (Walsh and Muellner). In cases of this sort, there's good reason to adopt a practice that was developed for a specific kind of writing or scholarship and that produces encoded information that is as well-documented and robust as TEI data; following such standard practices is likely to increase the utility of these specialized kinds of information.

If you've chosen the TEI for your project, that actually isn't the end of choice; there are further questions about exactly how you use it. In normal use of TEI it remains important to decide which components of the whole thing to use: choosing TEI doesn't mean choosing to include every possible element in your documents. The existence of the <date> element does not imply the obligation or recommendation that every date in a text be tagged as such; some TEI elements are required in certain contexts, but a great many of them are described as optional, and it is intended that their use be left to the scholar's judgment. Extra markup is costly, and it is essential that a project decide just which features need to be marked in order to serve its scholarly ends. It is tempting to add markup for which no specific use is intended, but which might be of possible interest to someone in the future; but only the especially well-funded project can afford this. And apart from the expense, it is worth considering just how useful the encoded information will be to other scholars who may see the phenomenon in question differently and would want to develop their own encoding. Personal names, for example, may seem at first a straightforward category that requires little extra time to tag; but scholars who have worked on encoding personal names have found them to be hard to define and delimit (see Butler et al., Flanders et al., Neuman, McCarty, and Mikheev et al.; for more on the general point, see Matthew Driscoll, “Levels of Transcription.” , earlier in this book).

Given a decision on features to be encoded, you will also want to choose among the many ways that the TEI DTD allows you to encode them: textual errors and corrections may be encoded using <sic>, <corr>, or <app>, for example. The work of encoding is simpler if such things don't needed to be reconsidered every time the feature comes up. Some scholarly communities have developed their own guidelines for using the TEI guidelines, in which they specify a preferred way for handling things they often see or that are distinctive to their materials; if there is such a group in your area of work it's a good idea to consider following their lead. (See for instance Anne Mahoney's chapter on inscriptions in this book.)

The choice of elements needs to be based on their definitions in the TEI Guidelines: and not just on their names or brief descriptions: so the <l> element is for a line of verse (which might be displayed on several typographical lines) and not for a single typographical line on a page; <add> and <supplied> sound very similar but are for different things (the first for additions present in the documents you're working with, the second for additions by editors and encoders). Getting such things wrong amounts to misdescribing the text. And if there isn't already a tag that can be used for what you need to describe, don't force an existing one into the role. Most projects will run into textual features that matter in their work but that aren't covered by the existing Guidelines: : P4 section 1.3.2, “Future Developments.” , talks about some specific areas that are known to be incomplete, but generally there's plenty of territory that hasn't been covered. Extending the DTD to handle such features is the orthodox proceeding, and you should expect to have to do it. When a feature appears with some frequency, working out the details of an extension is not generally too awful, as considering the array of examples helps to reveal the general structure of the feature; the odd feature (such as a complex diagram) that only appears once or twice is harder to deal with, as generalization is harder. Depending on the needs of the project, it may be desirable to represent such unusual things using images rather than relying solely on transcription and tagging.

All of these considerations have had to do with the form to be taken by the ultimate products of a project, the final encoded files. But there are reasons for not using TEI at particular points during the lifetime of a project, even if a TEI product is still the result.

The appropriate scholarly tools for the early exploratory stages of a project may be pen and paper, or chalk and a large blackboard, or a word processor; some will find that the precision and formality required for TEI-encoded texts is not helpful at a stage when you may be entertaining many conflicting ideas about what sort of information will be in your edition and how it will be structured. Some may also find it most productive to start by thinking about ways in which the edition will be presented to its readers, and not in terms of the information structures needed to achieve that. Experience has shown that electronic texts that are closely tied to one mode of presentation tend to be short-lived; but thinking about an actual presentation to readers is still an effective way of working out what an edition is going to do, and in a later stage the design may be adapted for TEI encoding. The TEI Guidelines certainly aren't about particular texts or the scholarly goals behind publishing them, but working out our thinking on those points is one essential stage in making an edition.

Scholars new to XML may also find that devising a tagging system from scratch for their texts is instructive: you will understand a standard DTD in a different way after going through the intellectual labor of trying to create one. These explorations of an edition's form, or of its encoding system, can't go on for very long, since any text that is generated will typically be hard to convert to TEI form: they'll need to be seen as trials that can be thrown away.

During the main phase of work on a project, there may be reasons to consider creating texts in a form somewhat different from the final form they're destined to take. Some projects have found that the very generic names of some TEI elements are a minor problem: though the existing elements are appropriate for the purpose, it may still be easier to tell staff members to enter a <stanza> element rather than an <lg type="stanza"> element. This is especially likely if the generic range of the texts in question is circumscribed, so that one encounters a restricted set of features. Devising a customized DTD for document creation (possibly just using the standard TEI customization mechanism) and converting the documents to a more standard markup at a later stage is a reasonable approach, if the conversion is one that is readily automated. (At one time a version of this approach was often used in which no TEI markup was used directly, but instead very specific ad hoc markup tailored to particular texts was invented: a system in which %, ∗, #, and other nonalphabetic characters had special meaning and were later expanded into proper markup. But circumstances have changed enough that this isn't often a good choice; such ad hoc systems always run into problems expressing any but the simplest structures, and plenty of XML editors are now available that face no such limitation.)

The project well along in its lifetime that was never TEI-based faces a difficult choice. Certainly there may be advantages to the switch, both intellectual and political, if the TEI approach is appropriate; but switching is always time-consuming and costly, and will typically require or cause some changes in thinking about the project's editorial approach. Projects that get completed are more valuable to the scholarly community than unfinished projects with more perfect methodologies.

This chapter has assumed so far the appropriateness of the TEI approach for your editorial project; but it is possible that the approach does not fit, and in that case it should not be used. Two requirements of this approach can be especially problematic: first, you need to understand your texts; and second, you need to believe in the integrity and utility of selective transcriptions.

You need to understand your texts in order to translate them from stone or paper versions to digital versions. It is no doubt evident that transcribing anything written in a script you don't understand is hard to do well; but unfamiliar conventions of layout raise problems as well. (See, for example, Cloud on headings in George Herbert's Temple, and Hirst on little-known conventions of layout and markup that Mark Twain used.) There is less room in a digital edition for evading interpretive questions by printing something with an ambiguous appearance; in order to make the edition work as intended it is generally necessary to interpret features and not merely reproduce their appearance.

In order to use the TEI approach you need also to believe in transcription. It is impossible for a transcription to reproduce the original object; it is always a selection of features from that object: the words but not their size on the page or the depth of the chisel marks, major changes in type style but not variations in the ink's darkness from page to page or over time. Any such features that do seem essential for a particular transcription can be encoded; what's impossible is notating every observable feature. And it may be that the creation of a digital description of such features has little value for analysis: what you really want may just be the opportunity to see an image of the original (assuming that the different selection of features involved in imaging is more acceptable). There are two common cases in which a transcription might be regarded more as an index of words in page images rather than as a reasonable working representation of the text: works intended as mixtures of words and images, and very complex draft manuscripts in which the sequence of text or inscription is difficult to make out.

These two considerations about the appropriateness of the TEI approach apply to most systems of electronic transcription that an edition might consider: as scholarly editors we need to make specific claims about what the text is and communicate them clearly to others, and we are engaged in analyzing texts and creating new representations of them, not in creating indistinguishable replicas. But it is still a real question whether that is the right thing to do for any given project; it's essential to recognize that an editorial project must take a particular view of the texts in question and choose particular scholarly goals, and those decisions determine whether an edition based on transcription can be made.

Last recorded change to this page: 2007-10-31 • For corrections or updates, contact webmaster AT tei-c DOT org