Editors' List of

Potential Work Items

C. M. Sperberg-McQueen

Lou Burnard

5 June 1996

[Draft; for the information of the TRC]
[This document is a working paper prepared by the editors of the Text Encoding Initiative for the information of the TEI Technical Review Committee, and for its use in choosing new work items and chartering work groups. It represents the current views of the authors, but it certainly should not be taken as representing official policy of the TEI. Omission of an item from this list does not imply the authors are opposed to its adoption as a work item.]

1 Types of Work
2 Continuing Work Items
3 New Tag Sets
4 Extension and Revision of P3 Materials
5 Harmonization with Other DTDs
6 Theoretical Questions and Informational Working\nPapers
7 Liaison with Other Bodies
8 Evaluation

This document lists a variety of technical work which, in our opinion, should be addressed by the Text Encoding Initiative during the ongoing maintenance and development of the TEI. The items listed are, for the most part, derived from notes made over the last few years, from suggestions made by users of the TEI and participants in TEI work groups, and from our own experience in applying the Guidelines in a variety of areas. They have been grouped fairly loosely into related topic areas, but with no claim that the topics mentioned are the only ones in which work is needed, nor that the items listed represent an exhaustive summary in any given area. The level of detail associated with each item varies widely, but this should not be interpreted as indicating anything much about the importance or urgency of carrying out the associated work.

The purpose of this document is to suggest the kinds of work which needs to be done, and to propose some specific items which, in our view, need to be addressed, not to provide a detailed blueprint for the next phase of TEI development. It is however hoped that such a blue print will emerge out of the discussion which this document is intended to provoke, and we have therefore tried to be relatively concrete in our suggestions.

1 Types of Work

We have found it helpful to distinguish the following kinds of activities in describing the work items listed below:

correction
completion
extension
harmonization
alignment
application

By correction, we mean the emendation of parts of the existing scheme which are agreed to be faulty, but for which no obvious emendation has been identified by the editors. (See documents TEI ED W57 on error correction policy, and TEI ED W67 for the current list of `known errors').

By completion, we mean the expansion or conclusion of work only adumbrated or inadequately addressed in TEI P3. Parts of the existing scheme which may or may not be regarded as faulty may be regarded as in need of completion or correction: the two terms overlap somewhat.

By extension, we mean the definition of a new base or additional tag set for elements not already addressed by the Guidelines, or addition of new element definitions to an existing tag set, or the further elaboration of the analysis underlying the design of a tag set.

By harmonization, we mean systematic clarification of the similarities and differences between the TEI DTD and some other DTD in sufficient detail to guide automatic and hand-assisted translation between them, and where appropriate the modification of one or both so as to make possible automatic cross-translation without information loss. The purpose of harmonization is to aid migration between the TEI and other schemes, and to make clear their relative strengths and weaknesses.

By alignment of the TEI scheme with some other standard or encoding scheme (not typically a DTD), we mean clarification of the ways in which the two standards may be used together (including specification of any limits imposed by using them together), description of their mutual relation, and where appropriate the modification of one or the other to achieve a more useful relation between them. For example, an alignment of TEI P3 with ISO 10646 (the 32-bit Universal Character Set standard) would explain how users of TEI P3 must declare their use of that character set to their SGML-conformant and TEI-aware software, and explain how the TEI writing system declaration and the character/glyph model underlying ISO 10646 relate to each other. It might also suggest revising the Writing System Declaration to make it more obviously compatible with ISO 10646.

By application of the TEI scheme, we mean (in this context) identification and presentation of good practice in applying the TEI scheme to some well-defined subject area, recommendations concerning how general mechanisms defined by the TEI may most profitably be applied in given problem areas, and the like. The result of such work might take the form of a standard TEI <encodingDesc> describing the encoding and editorial practices recommended for the area.

Some work items involve more than one of these activities, which is why we have not used them as the basis for classification below.

We have not yet attempted to prioritize work areas. The topics into which we have grouped work items are intended for convenience and should not be given excessive weight.

2 Continuing Work Items

This section lists items which at some stage were proposed for inclusion in the Guidelines, but which were not completed in time. Some of them may no longer be necessary; others could be added with some extension; yet others require definition of entirely new tag sets.

a tag set for letters and memoranda, specifically historical letters
a chapter explaining in more detail how and why to combine base tag sets
a chapter explaining in more detail how to construct a user-defined base tag set
a chapter on local installation and support of TEI-aware and SGML-conformant software
a chapter containing an extended example of a feature system declaration
a chapter discussing the relationship of the TEI encoding scheme to other standards, including
- ISO 12083 (the current revised version of the DTD first developed for the Electronic Manuscript Project of the Association of American Publishers, previously ANSI Z39.59)
- ISO 10646
- various standards on bibliographic citation
- ISBD (the International Standard Bibliographic Description)
- AACR2 (the Anglo-American Cataloguing Rules, 2d edition)
- standards for construction of thesauri and term lists
a list of useful entity names
Writing System Declarations for all public entity sets in
- ISO 8879 (Information processing - Text and office systems - Standard Generalized Markup Language (SGML))
- ISO TR 9573 (Information processing - SGML support facilities - Techniques for using SGML)
- all parts of ISO 8859 (Information processing - 8bit single-byte coded graphic character sets)

3 New Tag Sets

Many possibilities exist for new tag sets. Those listed here are simply ones which have been proposed to us, or which we have ourselves identified as desirable.

basic mathematics based on semantics rather than typography
legal documents (legislation, judicial opinions, briefs etc.) both ancient and modern.
management of office documents (along the lines of chapter on this topic in TEI P1)
technical documentation, including documentation of SGML markup schemes (as sketched out in TEI U5 and other TEI-compatible tag sets)
encoding of layout and other physical information. (This may imply revision of existing tag sets for manuscripts, analytical bibliography, and text criticism as well as definition of new elements)
analytic bibliography
codicology
description of museum and art historical data. in collaboration with CIMI (Computer Interchange of Museum Information [?])
detailed description of manuscript cataloguing data
encyclopaedias and reference works
newspapers and newspaper corpora

4 Extension and Revision of P3 Materials

We list here a number of areas in which we are conscious that the currently published text of P3 offers opportunity for significant improvement, whether because the published text is simply inadequate or because the preparatory work prior to publication of P3 showed that there were substantial interesting topics still to address. In some cases, we have specified more exactly what we think needs to be added; in others, we are not sure what needs to be added, but we know that we could like a revised text more than the current text.

Conformance issues: the current chapter needs to explain more concretely what is meant by TEI conformance and how it can be determined; also the implications for DTD documentation.
The rules for blind and other interchange require more examples and more detailed discussion
The chapter on DTD modification needs at least one fully worked-out example and some closer discussion of the implications for conformance of TEI modification. Ways of more tightly constraining the TEI model in particular applications should be discussed and exemplified.
The discussion of the <rendition> element in the header needs revision in the light of the published DSSSL standard. (The Document Style Semantics and Specification Language is a new ISO standard for document rendering and processing in general.)
The chapter on formulas and tables needs to consider scientific formulae and to discuss ways of harmonizing the TEI table model with others (notably CALS, SoftQuad, and ISO 12083.)
TEI keywords should be introduced systematically for all semi-closed attribute lists and all attribute values which a project might want to close, e.g. type.
The optionality status (i.e. required, mandatory when applicable, recommended, or optional) of all elements and attributes needs systematic review.
The applicability of the dictionary tag set to older dictionaries, dictionaries of dead languages (esp. historical dictionaries), and dictionaries recording field-linguistic work should be verified and possibly extended
Examples of applications of feature structures in literary and other analyses
Additional work items relating to text criticism:
- generation of examples from existing critical editions
- utility of the existing tag set for extremely dense traditions and extremely selective apparatus (e.g. Biblical materials), including question of back-translation
- systematic review of R. Cover's voluminous comments and suggestions
- clarification of relation between text-critical apparatus elements and the simple mirror elements for editorial intervention in the core
- harmonization of the TEI tag set with existing systems for recording apparatus, including:
  - CATSS (Computer-Assisted Tools for Septuagint Studies) vertical format
  - Lavagnino/Wujastyk, edmacs format
- alignment of the TEI tag set with existing collation systems, including:
  - Collate (Peter Robinson)
  - UNITE (Francisco Marcos-Marin?)
  - CASE (Peter Schillingsburg)
  - Tustep (Wilhelm Ott)
Further work on the TEI Header, specifically:
- expansion to cover archival descriptions and harmonization with e.g. the Extended Archival Description.
- definition of recommended practice for application in electronic text libraries
- generalization of the header to handle metadata for arbitrary electronic and non-electronic resources
- extension of the header to handle geo-spatial metadata (specification of the geographic and temporal `footprint' of the data)
- work on harmonization with MARC, Z39.50, and IAFA templates
- further discussion and exemplification of the decls mechanism for complex documents and corpora
- discussion of implementation issues, management of Independent Header Sets (IHS) databases, etc.
Further work on alignment and linking, specifically on encoding of multimedia integrating text and images
Tagging for scholarly editions, both historical / documentary editions and literary / critical editions

5 Harmonization with Other DTDs

This section lists a number of areas in which harmonization activities seem to us particularly crucial.

introduction of ICADD architectural form attributes into TEI DTDs
linkage of TEI elements with HyTime architectural forms (by attributes or by free-standing link-process definition)
working paper clarifying relationship of TEI extended pointer mechanism, HyTime pointers, and DSSSL queries
harmonization of TEI main DTD with IBM IDDOC architecture
harmonization of TEI main DTD with Davenport Group's DocBook DTD
alignment of TEI writing system with other character representation and documentation methods, notably ISO 10646
harmonization of the TEI tag set for terminological data with the DTD of ISO 12 200 (electronic terminology interchange format)

6 Theoretical Questions and Informational Working Papers

Some of the topics listed here may result in definition of new tag sets or modifications to existing ones. Most of them however are concerned with specifying good practice and discussing the implications of TEI practice.

clarification of issues relative to architectural forms:
- should the main TEI DTD be recast just as a set of architectural forms? should all TEI DTDs?
- what exactly does (or should) the teiForm attribute mean?
working paper on the expression of constraints on documents beyond those expressible in a DTD: is a separate constraint specification language desirable? feasible? Can DSSSL and HyTime be used?
working paper (for information of TEI users) on SGML query languages and data manipulation languages or transformers
formal specification (in Z, the Vienna Development Method [VDM], or other formalism) of SGML document structure, for use in later specifications of SGML-aware software
DSSSL tutorial for TEI participants and users
description, for information of software implementors, of implications of TEI markup for useful software behavior:
- proximity searching (e.g. across app elements)
- word indexing (when does a TEI tag imply a word boundary? when must a word indexer ignore the TEI tag? when might one reasonably index more than one form of a word?)
- searching through annotations (how are searchable features represented in fs, note, interp and similar elements?
- implications of join element for structure searching
- TEI methods of text synchronization and alignment: their applications in multilingual corpora, mixed media publications, etc.

7 Liaison with Other Bodies

This section lists a number of ISO and other work groups with whom closer liaison should be sought by the TEI, formally or informally, to ensure interoperability and reduce duplication of effort:

ISO/IEC JTC 1/SC 18 (SGML, DSSSL, ...)
ISO/IEC JTC 1/SC 2 (character sets)
ISO TC 46 (lbraries, information systems, publishing)
ISO TC 37 (terminological data) (?)
IETF (Internet Engineering Task Force) work group for HTML revision
SGML Open
decide whether to seek adoption of TEI as Federal Information Processing Standard

8 Evaluation

A systematic evaluation of the TEI Guidelines should be undertaken. This would imply :

planning evaluation
evaluation by outside readers from user communities
evaluation by outside SGML experts
testing by systematic random sampling from populations of documents which should be encodable by the TEI Guidelines

Editors' List of

Potential Work Items

Table of Contents