Organization and Work of Text Documentation Committee

TEI TD R1


C. M. Sperberg-McQueen

University of Illinois at Chicago

3 May 1989

Table of Contents


1 Focus

The Text Documentation committee will provide tags for:

  1. identifying the encoding itself with information such as: responsible parties, date and place of origin, title, distributor(s), copyright status, extent [size], internal organization, revision history, contents if an anthology, etc.
  2. minimal bibliographic identification[1] of the source(s) encoded: for printed copy texts, bibliographic information on the copy text(s); for unprinted texts, normal identifying information; for all texts, copyright status, notes on portions copied and omitted, level of fidelity to copy text, alterations made silently, alterations marked in the encoding[2]
  3. additional characterization of the information encoded, for borrowing and interchange (level of encoding, tag set(s) used, analytic results included; also descriptions of the data that go beyond catalog information: e.g. socio-linguistic descriptions of speakers in a transcript of spoken texts, or dialect identifications for anthologies or corpora)
  4. formal declaration of encoding options (e.g. tag-meaning declarations in tag sets with controlled semantics; modifications of the standard document type definition; additional tags) -- this responsibility is to be discharged in cooperation with committees ML (for the syntax), TR and AI (for the substance of these declarations)
The characterization of the data and the formal declaration of encoding options may be collapsible into a single set of tags, or it may be better to keep them separate (accepting a certain controlled redundancy).

2 Required Expertise

The committee must be familiar with library cataloguing theory and practice for printed matter and machine-readable data files; with enumerative bibliography; and with data-archive management (both internal procedures and user assistance).

3 Work Plan, Organization and Subcommittees

3.1 Work Plan

The committee must first formulate the requirements for the tasks listed above:

  1. identification of the encoding (MRDF cataloguing information)
  2. identification of the text (cataloguing of printed matter and of oral texts)
  3. data description and declarations

In formulating the requirements, the subcommittees should first survey the relevant practice of the field (notably the International Standard Bibliographic Description rules, the Anglo-American Cataloguing Rules (current version), the American National Standard for Bibliographic References (ANSI Z39.29), the MARC format for library cataloguing records, and existing data-archive catalogues and cataloguing schemes (ICPSR, Essex SSRC, Standard Study Description as implemented at various European archives, and the catalogues in Oxford, Mannheim, Nancy, Pisa, Louvain, Bergen, and Oslo). From this survey, a coherent set of requirements should be formulated.

The second task of the committee is to combine and reconcile the three sets of requirements and to formalize them in an SGML tag set. Substantive characterization of an encoding overlaps so much with the formal declarations of that encoding that a formalization in this area may have to wait until the other committees have finished specifying declaration form and content.

The two meetings of this committee should be used (1) to present to each other the results of the requirements surveys and to begin the process of harmonization, and (2) to review the work of the subcommittees or individual committee members in proposing specific tag-and-attribute formulations of the feature set, adopt it with modifications, or suggest further development (in which case final approval must be by mail ballot).

3.2 Subcommittees

Three subcommittees are required, though if the committee is small there may be little point in constituting them formally:

  1. bibliographic description of the encoding
  2. bibliographic description of the text
  3. data description and declaration (including documentation of the editorial interventions made during the encoding process)

4 Coordination and Inter-committee Communication

4.1 Overlaps with Other Committees

Bibliographic description overlaps with the charge of committee TR (Text Representation), which must handle encoding of printed bibliographies; the internal header prepared by committee TD (Committee on Text Documentation) should be a superset of the bibliographical tags specified by committee TR. It seems likely that the TD subcommittee on bibliographic description of the text should serve simultaneously as a subcommittee of committee TR.

Data description must inherently serve two constituencies: the borrowers who need it in order to select their data, and the archivists who must maintain the descriptions. Committee TD, comprising mostly archivists, must actively solicit suggestions from committees TR, AI (Analysis and Interpretation), and ML (Metalanguage and Syntax Issues) concerning useful data descriptors; the other committees must effectively represent the archive borrower community. (Notable example: sociolinguistic descriptors for spoken corpora must be handled by the data description tags, but it is socio-linguists, not data archivists, who are most likely to know what descriptors will be worth recommending. This particular set of descriptors poses the additional problem that spoken texts will not be explicitly addressed until after the first drafting cycle, while committee TD is not expected to live past the first cycle.)

The formal declarations fall into the responsibility of this committee in some sense, because they clearly belong in the encoding's internal documentation (header) section, and because they overlap so heavily in function with the data description tags. But the other committees must formulate the syntax and the required content for declarations. Committee TD needs to be aware of work on the declarations because it affects the data description tags, but the other committees should expect to bear primary responsibility for working out the details of declarations.

4.2 Communication with Other Committees

Committee TD should formally transmit preliminary accounts of its requirements survey for data description and declarations to the other committees for their information and for comment. It should transmit to committee TR both the requirements survey for bibliographic identification of texts and a fully worked out set of tags for marking in-text bibliographic citations.

The other committees should formally transmit to committee TD their decisions regarding form and content of declarations and desiderata for tags functioning as data descriptors.

5 Membership

Membership should include representatives of the library community and the data archive community, with special attention both to practitioners and to those active in standardization efforts.

Notes

[1] By specifying minimal bibliographic identification we do not mean to limit the bibliographic section of an encoding to primitive bibliographic information, but only to convey that not all encodings will or need contain more information than is necessary to locate the copy text: standard practice for bibligraphic citation is as relevant as library cataloging practice.
[return to text]

[2] The committee must provide declarations for the types of alteration and normalization commonly performed during or after transcription of texts, and should provide guidance for users seeking to decide when such alterations constitute a new version or edition of the encoding or of the work.
[return to text]