Date:     Thu, 27 Apr 89 09:29 EST
From:     <IDE@VASSAR>
To:       u35395@uicvm

Brief Project Description for Circulation to AB Members

The first meeting of the Text Encoding Initiative's Advisory Board
brought together seventeen representatives from key professional and
learned societies representing academic disciplines across the spectrum
from hard core computer science to lexicography, literary studies and
anthropology as well as the professional interests of librarians and
publishers. The purpose of the event, hosted by the University of
Illinois at Chicago, was to seek the views of the newly constituted
Advisory Board concerning the structure and proposed strategy of the
Text Encoding Initiative (TEI), to explain its relevance to the
interests of the societies and to encourage active participation in the
work of the Initiative by the societies' members.

History and Structure of TEI

In the fall of 1987, the Association for Computers and the
Humanities (ACH) under the directorship of Nancy M. Ide
organized a conference at Vassar College from
which emerged a set of resolutions upon the necessity and feasibility of
defining a set of guidelines to facilitate both the interchange
of existing encoded texts and the creation of newly encoded
texts. The guidelines would specify both <em>what<\em> features
should be encoded and also <em>how<\em> they should be encoded,
as well as suggesting ways of describing the resulting encoding
scheme and its relationship with other pre-existing such schemes.

In the intervening period, ACH, together with the Association for
Literary and Linguistic Computing (ALLC) and the Association for
Computational Linguistics (ACL), has defined a four year work plan to
achieve these goals, which was presented at the Chicago Meeting. Funding
for the work plan has been provided by a substantial grant from the
American National Endowment for the Humanities, which will cover the
bulk of the costs of American participation in the Initiative for the
first phase of the project, due to end June 1990. Equivalent funding for
European participation has been obtained from the European Economic
Community, and it is also hoped to secure further support from industry
and government.

Committee Structure

The work plan will be co-ordinated by a six-member steering
committee and two Editors, one American and one British. It calls
initially for the setting up of four Working Committees, each
responsible for a distinct part of the work plan.

Committee 1, the Committee for Text Documentation, with a
membership drawn largely from the library and archive management
communities, will deal with issues concerning the cataloguing and
identification of key features of encoded texts. It will draw on
work already done in this field for social science data, for
example in the establishment of the Standard Study Description.
All the Committees will be expected to work within established
frameworks where these are available; the relevance here of work
already done in establishing Anglo-American Cataloguing Rules for
machine-readable sources is apparent.

Committee 2, for Text Representation, is concerned with the
encoding of such features as layout and character sets. It will
aim to provide precise recommendations covering all the features
of continuous discourse for which a convention already exists in
printed or written sources. This will involve a consideration of
the character sets of all alphabetic scripts currently used in
computer-based research. Explicit consideration of non-alphabetic
scripts, though not excluded, has been deferred; transcriptions
of spoken language will however be included. It will also
recommend ways of representing the structural divisions of a text
(book, chapter, paragraph etc.) and all other features
conventionally signalled in printed or written texts, such as
emphasis, quotation, critical apparatus etc.

Committee 3, the Committee for Text Analysis and Interpretation, has the
largest and most open-ended set of responsibilities of the four. It will
aim to provide discipline-specific sets of tags appropriate to the
analytic procedures favored by that discipline, but in such a way as to
permit their extension and generalization to other disciplines using
analogous procedures. Because this is a very large task, committee 3
will focus initially on a single discipline (linguistics), chosen
primarily because of its clear relevance to all other text-based types
of analysis. As work proceeds,the focus of committee 3 will shift toward
literary analysis and other humanites disciplines.

Committees 1, 2, and 3, with an average membership of ten, will set up
sub-committees which will do the preliminary design work for tag sets
within specialized areas. Committee 3 already has one subcommittee,
concerned with tag sets for dictionary markup, which has already
produced a set of preliminary guidelines for monolingual dictionaries.
A subcommittee of committee 2 is also being formed, concerned with the
tagging of historical sources, to take advantage of the substantial
progress already made in this area by the informal network of European
scholars collaborating on the <it>Leio<\it> project at the Max Planck
Inst. fuer Geschichte in Goettingen FRG, at Graz University in Austria
and elsewhere.

Committee 4 was charged at the Poughkeepsie meeting with the definition
of a "metalanguage" - a language capable of specifying and describing
mark up languages. The emergence of an ISO standard (6689, Standard
Generalized Markup Language) and its increasing acceptance within both
government and publishing communities, has removed that burden, but
replaced it by that of assessing the extent of compatibility possible
between the tag sets proposed by the other three committees and the SGML
standard. The Guidelines will work within the syntactic framework of
SGML, departing from it if (and only if) it proves inadequate to the
needs of research. So far no areas of divergence have been identified,
though there has been considerable discussion within the Committee
(which began work last month) on the extent to which all features of
SGML can be recommended. The committee's main task will be to validate
and test the Guidelines as they emerge, to arbitrate on matters of
SGML-conformance and also to propose ways of mapping existing encoding
standards to the Guidelines.

<h1> The Chicago Meeting In addition to the three sponsoring
organizations, the following associations are currently represented on
the Advisory Board:
<it> American Anthropological Association;  American Historical
Association; American Philological Association; American Society for
Information Science; Association for Computing Machinery; Association
for Documentary Editing; Association for History and Computing;
Association Internationale Bible et Informatique; Canadian Linguistic
Association; Dictionary Society of North America; Electronic Publishing
SIG; International Federation of Library Associations and Institutions;
Linguistic Society of America;Modern Language Association.<\it>

After an initial presentation about the history, background, objectives
and structure of the TEI, delegates were invited to comment on their own
interest and the constituencies they served.  A series of presentations
concerning the implications of the TEI for Humanities Research, for
Computational Linguistics and for the Language Industries followed. The
goals and responsibilities of each of the working committees were then
described, as outlined above. The second full day of the meeting began
with a very brief tutorial on SGML and a longer description of the
design principles, scope and end products of the Guidelines.  After a
wide ranging and useful discussion, in which some constructively
critical reactions were expressed, members of the Advisory Board
expressed  approval of the objectives, organisational structure and
design goals of the Initiative, as they had been presented at the
meeting. It also noted the draft work plans submitted by the Heads of
Committees to the meeting, with the understanding that these would be
revised in accordance with experience and the suggestions made by the
Board.

If you would like more information about the TEI, please contact
<insert name of association representative>
<\doc>
\nopagenumbers
\centerline{\bf Sponsoring and Advisory Organizations}
\vskip .25in
\parindent=0pt
\centerline{Sponsoring Organizations and Representatives}
\bigskip
{\obeylines

Association for Computers and the Humanities Nancy M. Ide Vassar College
\medskip
Association for Computational Linguistics Donald E. Walker Bell
Communications Research
\medskip

Association for Literary and Linguistic Computing Susan Hockey Oxford
University Computing Service

\bigskip

\centerline{Advisory Board Organizations and Representatives}
\bigskip
American Anthropological Association Chad K. MacDaniel University of
Maryland
\medskip


American Historical Association Elizabeth A. R. Brown Brooklyn College
\medskip


American Philological Association Jocelyn Penny Small Rutgers University
\medskip

American Society for Information Science Clifford A. Lynch University of
California


\medskip
Association for Computing Machinery/Special Interest Group for
Information Retrieval Scott Deerwester University of Chicago
\medskip

Association for Documentary Editing David Chestnutt University of South
Carolina
\medskip

Association for History and Computing Dr. Manfred Thaller Max-Planck
Institut fur Geschichte
\medskip

Association Internationale Bible et Informatique Wilhelm Ott Universitat
T\"ubingen
\medskip


Canadian Linguistic Association Anne-Marie di Sciullo Universit\'e du
Quebec \`a Montreal
\medskip

Dictionary Society of North America Thomas Cresswell
\vfill\eject

Electronic Publishers Special Interest Group Betsy Kiser Online Computer
Library Center (OCLC)
\medskip

International Federation of Library Associations and Institutions Dr J.
D. Byrum Jr.  The Library of Congress
\medskip


Linguistic Society of America Stephen Anderson Johns Hopkins University
\medskip

Modern Language Association Randy Jones Brigham Young University
\medskip

\bye