Editors' List of
Potential Work Items
C. M. Sperberg-McQueen
Lou Burnard
5 June 1996
[Draft; for the information of the TRC]
[This document is a working paper prepared by the editors of the Text
Encoding Initiative for the information of the TEI Technical Review
Committee, and for its use in choosing new work items and chartering
work groups. It represents the current views of the authors, but it
certainly should not be taken as representing official policy of the
TEI. Omission of an item from this list does not imply the authors are
opposed to its adoption as a work item.]
Table of Contents
This document lists a variety of technical work which, in our
opinion, should be addressed by the Text Encoding Initiative during
the ongoing maintenance and development
of the TEI. The items listed are, for the
most part, derived from notes made over the last few years, from
suggestions made by users of the TEI and participants in
TEI work groups, and from our own experience in applying the
Guidelines in a variety of areas. They have been grouped fairly
loosely into related topic areas, but with no claim that the topics
mentioned are the only ones in which work is needed, nor that the
items listed represent an exhaustive summary in any given area. The
level of detail associated with each item varies widely, but this
should not be interpreted as indicating anything much about the
importance or urgency of carrying out the associated work.
The purpose of this document is to suggest the kinds of work which
needs to be done, and to propose some specific items which, in our
view, need to be addressed, not to provide a detailed blueprint for
the next phase of TEI development. It is however hoped that such a
blue print will emerge out of the discussion which this document is
intended to provoke, and we have therefore tried to be relatively
concrete in our suggestions.
We have found it helpful to distinguish the following kinds
of activities in describing the work items listed below:
- correction
- completion
- extension
- harmonization
- alignment
- application
By correction, we mean
the emendation of parts of the existing scheme which are agreed to
be faulty, but for which no obvious emendation has been
identified by the editors. (See documents
TEI ED W57 on
error correction policy, and
TEI ED W67 for the current
list of `known errors').
By completion, we mean the expansion or conclusion of
work only adumbrated or inadequately addressed in TEI P3. Parts of the
existing scheme which may or may not be regarded as faulty may be
regarded as in need of completion or correction: the two terms overlap
somewhat.
By extension, we mean the definition of a new base or
additional tag set for elements not already addressed by the
Guidelines, or addition of new element definitions to an existing
tag set, or the further elaboration of the analysis underlying
the design of a tag set.
By harmonization, we mean systematic clarification of
the similarities and differences between the TEI DTD and some other
DTD in sufficient detail to guide automatic and hand-assisted
translation between them, and where appropriate the modification of
one or both so as to make possible automatic cross-translation without
information loss. The purpose of harmonization is to aid migration
between the TEI and other schemes, and to make clear their relative
strengths and weaknesses.
By alignment of the TEI scheme with some other
standard or encoding scheme (not typically a DTD), we mean
clarification of the ways in which the two standards may be used
together (including specification of any limits imposed by using them
together), description of their mutual relation, and where appropriate
the modification of one or the other to achieve a more useful relation
between them. For example, an alignment of TEI P3 with ISO 10646
(the 32-bit Universal Character Set standard)
would explain how users of TEI P3 must declare their use of that
character set to their SGML-conformant and TEI-aware software, and
explain how the TEI writing system declaration and the character/glyph
model underlying ISO 10646 relate to each other. It might also
suggest revising the Writing System Declaration to make it more
obviously compatible with ISO 10646.
By application of the TEI scheme, we mean (in this
context) identification and presentation of good practice in applying
the TEI scheme to some well-defined subject area, recommendations
concerning how general mechanisms defined by the TEI may most
profitably be applied in given problem areas, and the like. The
result of such work might take the form of a standard
TEI <encodingDesc> describing the encoding and
editorial practices recommended for the area.
Some work items involve more than one of these activities, which is
why we have not used them as the basis for classification below.
We have not yet attempted to prioritize work areas. The topics into
which we have grouped work items are intended for
convenience and should not be given excessive weight.
This section lists items which at some stage were proposed for
inclusion in the Guidelines, but which were not completed in
time. Some of them may no longer be necessary; others could be added
with some extension; yet others require definition of entirely new
tag sets.
- a tag set for letters and memoranda, specifically historical
letters
- a chapter explaining in more detail how and why to combine
base tag sets
- a chapter explaining in more detail how to construct a
user-defined base tag set
- a chapter on local installation and support of TEI-aware and
SGML-conformant software
- a chapter containing an extended example of a feature system
declaration
- a chapter discussing the relationship of the TEI encoding scheme to
other standards, including
- ISO 12083 (the current revised version of the DTD first
developed for the Electronic Manuscript Project of the Association of
American Publishers, previously ANSI Z39.59)
- ISO 10646
- various standards on bibliographic citation
- ISBD (the International Standard Bibliographic Description)
- AACR2 (the Anglo-American Cataloguing Rules,
2d edition)
- standards for construction of thesauri and term lists
- a list of useful entity names
- Writing System Declarations for all public entity sets in
- ISO 8879 (Information processing - Text and office
systems - Standard Generalized Markup Language (SGML))
- ISO TR 9573 (Information processing - SGML support
facilities - Techniques for using SGML)
- all parts of ISO 8859 (Information processing - 8bit
single-byte coded graphic character sets)
Many possibilities exist for new tag sets. Those listed here are
simply ones which have been proposed to us, or which we have ourselves
identified as desirable.
- basic mathematics based on semantics rather than typography
- legal documents (legislation, judicial opinions, briefs etc.)
both ancient and modern.
- management of office documents (along the
lines of chapter on this topic in TEI P1)
- technical documentation, including documentation of
SGML markup schemes (as sketched out in TEI U5 and
other TEI-compatible tag sets)
- encoding of layout and other
physical information. (This may imply revision of existing tag sets
for manuscripts, analytical bibliography, and text criticism as well
as definition of new elements)
- analytic bibliography
- codicology
- description of museum and art historical data. in collaboration
with CIMI (Computer Interchange of Museum Information [?])
- detailed description of manuscript cataloguing data
- encyclopaedias and reference works
- newspapers and newspaper corpora
We list here a number of areas in which we are conscious that the
currently published text of P3 offers opportunity for significant
improvement, whether because the published text is simply inadequate or
because the preparatory work prior to publication of P3 showed that
there were substantial interesting topics still to address.
In some cases, we
have specified more exactly what we think
needs to be added; in others, we are
not sure what needs to be added, but we know that we could like a
revised text more than the current text.
- Conformance issues: the current chapter needs to explain more
concretely what is meant by TEI conformance and how it can be
determined; also the implications for DTD documentation.
- The rules for blind and other
interchange require more examples and more detailed discussion
- The chapter on DTD modification needs at least one
fully worked-out example and some closer discussion of the
implications for conformance of TEI modification. Ways of more tightly
constraining the TEI model in particular applications should be
discussed and exemplified.
- The discussion of the <rendition> element in the header
needs revision in the light of the published DSSSL standard. (The
Document Style Semantics and Specification Language is a new ISO
standard for document rendering and processing in general.)
- The chapter on formulas and tables needs to consider scientific
formulae and to discuss ways of harmonizing the TEI table model with
others (notably CALS, SoftQuad, and ISO 12083.)
- TEI keywords should be introduced systematically for all
semi-closed attribute lists and all attribute values which a project
might want to close, e.g. type.
- The optionality status (i.e. required, mandatory when applicable,
recommended, or optional) of all elements and attributes
needs systematic review.
- The applicability of the dictionary tag set to older
dictionaries, dictionaries of dead languages (esp. historical
dictionaries), and dictionaries recording field-linguistic work should
be verified and possibly extended
- Examples of applications of feature structures in literary
and other analyses
- Additional work items relating to text criticism:
- generation of examples from existing critical editions
- utility of the existing tag set for extremely dense traditions and
extremely selective apparatus (e.g. Biblical materials), including
question of back-translation
- systematic review of R. Cover's voluminous comments and
suggestions
- clarification of relation between text-critical apparatus elements
and the simple mirror elements for editorial intervention in the
core
- harmonization of the TEI tag set with existing systems for
recording apparatus, including:
- CATSS (Computer-Assisted Tools for Septuagint Studies)
vertical format
- Lavagnino/Wujastyk, edmacs format
- alignment of the TEI tag set with existing collation systems,
including:
- Collate (Peter Robinson)
- UNITE (Francisco Marcos-Marin?)
- CASE (Peter Schillingsburg)
- Tustep (Wilhelm Ott)
- Further work on the TEI Header, specifically:
- expansion to cover archival descriptions and harmonization with
e.g. the Extended Archival Description.
- definition of recommended practice for application in electronic
text libraries
- generalization of the header to handle metadata for
arbitrary electronic and non-electronic resources
- extension of the header to handle geo-spatial
metadata (specification of the geographic and temporal
`footprint' of the data)
- work on harmonization with MARC, Z39.50, and IAFA
templates
- further discussion and exemplification of the
decls mechanism for complex documents and
corpora
- discussion of implementation issues, management of Independent
Header Sets (IHS) databases, etc.
- Further work on alignment and linking, specifically on encoding
of multimedia integrating text and images
- Tagging for scholarly editions, both historical / documentary
editions and literary / critical editions
This section lists a number of areas in which harmonization
activities seem to us particularly crucial.
- introduction of ICADD architectural form attributes into TEI
DTDs
- linkage of TEI elements with HyTime architectural forms (by
attributes or by free-standing link-process definition)
- working paper clarifying relationship of TEI extended pointer
mechanism, HyTime pointers, and DSSSL queries
- harmonization of TEI main DTD with IBM IDDOC architecture
- harmonization of TEI main DTD with Davenport Group's DocBook
DTD
- alignment of TEI writing system with other character
representation and documentation methods, notably ISO 10646
- harmonization of the TEI tag set for terminological data with
the DTD of ISO 12 200 (electronic terminology interchange
format)
Some of the topics listed here may result in definition of new tag
sets or modifications to existing ones. Most of them however are
concerned with specifying good practice and discussing the
implications of TEI practice.
- clarification of issues relative to architectural forms:
- should the main TEI DTD be recast just as a set of architectural
forms? should all TEI DTDs?
- what exactly does (or should) the teiForm attribute
mean?
- working paper on the expression of constraints on documents beyond
those expressible in a DTD: is a separate constraint specification
language desirable? feasible? Can DSSSL and HyTime be used?
- working paper (for information of TEI users) on SGML query
languages and data manipulation languages or transformers
- formal specification (in Z, the Vienna Development Method [VDM],
or other formalism) of SGML
document structure, for use in later specifications of SGML-aware
software
- DSSSL tutorial for TEI participants and users
- description, for information of software implementors, of
implications of TEI markup for useful software behavior:
- proximity searching (e.g. across app elements)
- word indexing (when does a TEI tag imply a word boundary? when
must a word indexer ignore the TEI tag? when might one reasonably index
more than one form of a word?)
- searching through annotations (how are searchable features
represented in fs, note, interp and similar
elements?
- implications of join element for structure
searching
- TEI methods of text synchronization and alignment: their
applications in multilingual corpora, mixed media publications, etc.
This section lists a number of ISO and other work groups with whom
closer liaison should be sought by the TEI, formally or informally, to
ensure interoperability and reduce duplication of effort:
- ISO/IEC JTC 1/SC 18 (SGML, DSSSL, ...)
- ISO/IEC JTC 1/SC 2 (character sets)
- ISO TC 46 (lbraries, information systems,
publishing)
- ISO TC 37 (terminological data) (?)
- IETF (Internet Engineering Task Force) work group for HTML
revision
- SGML Open
- decide whether to seek adoption of TEI as Federal Information
Processing Standard
A systematic evaluation of the TEI Guidelines
should be undertaken. This would imply :
- planning evaluation
- evaluation by outside readers from user communities
- evaluation by outside SGML experts
- testing by systematic random sampling from populations of
documents which should be encodable by the TEI
Guidelines