<!DOCTYPE TEI.2 PUBLIC '-//TEI//DTD TEI Lite 1.0//EN'
    "file:///d:/SGML/Public/TEI/Derived/xmllite.dtd" 
 [ 
<!--*     "http://www.hcu.ox.ac.uk/TEI/Lite/DTD/teixlite.dtd"> *-->

<!ENTITY lt '&#62;'>

<!--* XML *-->
<!ENTITY mdash '&#x2014;' >
<!ENTITY uuml '&#252;'>
<!ENTITY ouml '&#246;'>
<!ENTITY auml '&#228;'>
<!ENTITY eacute '&#233;'>
<!ENTITY agrave '&#224;'>

<!--* SGML *-->
<!--* 
<!ENTITY mdash  SDATA '[mdash ]' >
<!ENTITY uuml   SDATA '[uuml  ]'>
<!ENTITY ouml   SDATA '[ouml  ]'>
<!ENTITY auml   SDATA '[auml  ]'>
<!ENTITY eacute SDATA '[eacute]'>
<!ENTITY agrave SDATA '[agrave]'>
*-->

]>
<TEI.2>
<teiHeader>
<fileDesc>
<titleStmt>
<title>Proposal for Funding for
An Initiative to Formulate Guidelines for
the Encoding and Interchange of Machine-Readable Texts</title>
</titleStmt>
<publicationStmt>
<authority>TEI Consortium</authority>
<pubPlace>Bergen, Charlottesville, Oxford, Providence</pubPlace>
<date>2001</date>
<idno type="TEI">TEI SC G2</idno>
</publicationStmt>
<sourceDesc>
<bibl>Ide, Nancy, et al.
<title>Proposal for Funding for
An Initiative to Formulate Guidelines for
the Encoding and Interchange of Machine-Readable Texts</title>
February 1988.
<note>Translated into TEI Lite for historical purposes
by C. M. Sperberg-McQueen, 17 June 2001.</note>
</bibl>
</sourceDesc>
</fileDesc>
<revisionDesc>
<list>
<item>2001-06-17 : CMSMcQ : put together from an archival copy of the
relevant Script files, partly as a way of trying to ensure that I get
all the relevant Script files extracted from the backup.</item>
</list>
</revisionDesc>
</teiHeader>
<text>
<!--* 
.im gmlmla
.ch /`/&oquote(1)/
.tr # 40
.dc gml <
.dc mcs >
.* Document proper begins.
*-->
<front>
<titlePage>
<docTitle n="Text Encoding Initiative">
<titlePart>Proposal for Funding for</titlePart>
<titlePart>An Initiative to Formulate Guidelines for</titlePart>
<titlePart>the Encoding and Interchange of Machine-Readable Texts</titlePart>
</docTitle>

<titlePart>
[Copy of NEH Proposal, February, 1988.
Please note that specific details of the timetable and funding
are tentative and subject to ongoing revision.
Detailed budget information has been omitted from this copy of the proposal.]
</titlePart>

<!--* <author>A Joint Steering Committee Representing *-->
<docAuthor>The Association for Computers and the Humanities<lb/>
Nancy Ide, Vassar College<lb/>
C. M. Sperberg-McQueen, University of Illinois at Chicago</docAuthor>

<docAuthor>The Association for Computational Linguistics<lb/>
Robert Amsler, Bell Communications Research<lb/>
Donald Walker, Bell Communications Research</docAuthor>

<docAuthor>The Association for Literary and Linguistic Computing<lb/>
Susan Hockey, Oxford University<lb/>
Antonio Zampolli, University of Pisa</docAuthor>

<titlePart>[TEI SC G2]</titlePart>
<!--* N.B. Date supplied from external evidence.
    * CMSMcQ 2001-06-17 
    *-->
<docDate>February 1988</docDate>
<titlePart>[This version of this document was created for
public distribution in April 1988.]</titlePart>
</titlePage>
<div1 type="abstract">
<p>The Association for Computers and the Humanities, the Association
for Computational Linguistics, and the Association for
Literary and Linguistic Computing propose to continue a five-phase
project to develop and promote guidelines for the preparation
of machine-readable texts for scholarly research and for the
interchange of such texts among research sites.</p>
<p>
Phase 1 of the
project (Planning and High-Level Design) was funded by the National
Endowment for the Humanities with support from
Vassar College; it is now complete.
During that phase, a planning conference on Text Encoding Practices
convened in November, 1987, at Vassar College to determine the
feasibility and desirability of the proposed guidelines and
to formulate aims and technical specifications for them.
The sponsoring and other participating organizations also
agreed upon an organizational structure to continue the work.</p>
<p>
This proposal covers phases 2 through 4 (Detailed Design and
Drafting; Revision; Review and Approval) of the project.
During these phases, four working committees of approximately ten
members each will study the problems associated with
<list type="bullets">
<item>the documentation of encoded texts,</item>
<item>the representation of texts at the typographic
level,</item>
<item>the representation of scholarly analysis and interpretation
in an encoded text, and</item>
<item>formal descriptions of the syntax of this and other encoding
schemes.</item>
</list>
An editor in chief and a consulting or associate editor will
coordinate the work of these committees and ensure the coherence
and clarity of the guidelines as a whole.  A steering committee
comprising representatives of the sponsoring organizations will
supervise the project as a whole, and the results will be
submitted to an advisory board of representatives selected by
a larger group of participating organizations.</p>
<p>
Funding is requested (1) to support the editorial staff and the heads of
the committees, and (2) to subsidize meetings of the working committees,
steering committee, and advisory board by paying travel and subsistence
expenses for American participants.  European funds will be sought
to subsidize the participation of European scholars as committee heads
and members of the working committees.</p>
<p>
Following successful completion of this work, the guidelines will
be published (phase 5 of the project) and the sponsoring organizations
will undertake to maintain them by revision and expansion as necessary.
No funding is requested for phase 5 in this proposal.
</p>
</div1>
<divGen type="toc"/>

</front>
<body>

<!--* include file=nh22ratn *-->
<div1><head>Rationale</head>
<div2><head>The Need for a Common Text-Encoding Scheme</head>
<p>Before they can be studied with the aid of
computers, texts must be
<emph>encoded</emph> in machine-readable form.  Standard data-processing
practice furnishes convenient solutions for basic text-representation
problems, but many texts of interest to scholarly research present
difficulties not resolved by the industrial standards.
Over the years, scholars have developed many different methods for:
<list type="ordered">
<item>representing individual characters of a text not foreseen
in industry practice (e.g. accented characters, special symbols,
non-Roman alphabets);
<note place="foot"><p>Or even the distinction between upper and lower case letters, which
was at one time not preserved by standard data-processing industry
practice.</p></note></item>
<item>reducing texts with footnotes, marginalia, text-critical
apparatus or other complications into the single linear sequence
assumed by most computer file systems;</item>
<item>encoding the logical divisions of the text (e.g. book, chapter,
verse);</item>
<item>representing analytic or interpretive information relevant to
the text (e.g. syntactic, morphological, or semantic analysis);</item>
<item>documenting the source of an encoded text and the nature of
the recording.</item>
</list>
The collection of rules, techniques, or conventions used to solve these
problems for a given text or set of texts is an <q>encoding scheme</q>
or <q>format</q>.
</p>
<p> 
Over the decades, scores&mdash;probably hundreds, in fact&mdash;of such
encoding schemes have been developed from scratch or adapted from
existing schemes and used to encode thousands of texts.
Some were created to serve the needs of large-scale projects
such as the Thesaurus Linguae Graecae at Irvine or the Responsa Project
at Bar-Ilan (Israel);
some were created as specifications for text input to text-analytic
software such as COCOA (<q>word COunt and COncordance program on
Atlas</q>), the Oxford Concordance Program, or the Waterloo Concordance
Program (WatCon);
and still others
were developed to regularize large text archives such as those of the
Treasury of the French Language at Nancy (TLF) and Chicago (ARTFL),
the Institute of the German Language (IDS) at Mannheim, and the Institute
for Computational Linguistics (ILC) at Pisa, each containing millions
of words of texts.
A great many schemes, however, were developed for individual
projects working with smaller bodies of text.  These texts may or may
not have been deposited with a text archive, and their encoding scheme
may or may not have been documented&mdash;so that even if they are
accessible in machine-readable form they are not necessarily usable as
they exist.
Finally, there are the schemes developed not for research projects
or text-analysis software but by computer software developers for
commercial or academic word-processing (e.g. Runoff, Script, Scribe,
nroff, troff, TeX).
</p>
<p> 
Because of this multiplicity of encoding schemes, one might find the
same information (e.g. a chapter division in a novel) encoded in any
of the following ways, as well as in countless others:
<note place="foot"><p> The various examples given include&mdash;but not in this order,
since some forms apply to more than one program, and vice
versa&mdash;references usable for processing with
the ARchival Retrieval and Analysis Program (A<hi rend="scap">RRAS</hi>),
C<hi rend="scap">OCOA</hi> (Word COunt and COncordance on Atlas),
IBM S<hi rend="scap">CRIPT</hi>/DCF,
the T<hi rend="scap">E</hi>X macro package L<hi rend="scap">A</hi>T<hi rend="scap">E</hi>X,
the Oxford Concordance Program,
software designed for the Thesaurus Linguae Graecae,
Waterloo S<hi rend="scap">CRIPT</hi>,
Waterloo GML, and
Electronic Text Corporation's W<hi rend="scap">ORD</hi> C<hi rend="scap">RUNCHER</hi>.
</p></note>
<eg>
    |chap 1
    &lt;C 1>Loomings
    \chapter
    \chapter[1]{Loomings}
    :h1.1.  Loomings
    MOBY001001Loomings
    |c1
    .chapter Loomings
    .cp;.sp 6 a;.ce .bd 1.  Loomings
    ~x
</eg>
</p>
<p> 
Scholars have realized for years that the multiplication of incompatible
text formats wastes time, effort, and money.  Because there is no
common, generally recognized format, scholars must choose among the
variety of mutually incompatible schemes, or develop new ones from
scratch.  Because existing schemes often reflect the research interests
of their originators and the peculiarities of the texts they studied,
scholars beginning new projects often find them ill-suited to their
own needs and elect to develop new schemes,
at great cost of time and effort.  Because many texts are encoded for
specific investigations and are made available to other researchers only
as an afterthought, if at all, the practice followed in encoding them is
often documented poorly or not at all.  Later users of these texts must
then decipher the encoding or inquire for more details from the
originator, instead of proceeding to their own analysis.  Even more
important in the long run, the lack of a common encoding scheme causes
wasteful duplication of effort:  when scholars using one type of
computer or one set of software are unable to use&mdash;or sometimes even
to read&mdash;texts encoded with other software on other machines, they
must keyboard or scan them again.
</p>
<p> 
A common text encoding scheme developed for the needs of scholarly
research could eliminate or minimize many of these problems.  Scholars
would not have to develop personal encoding schemes; documentation of
the encoded text would be simpler; duplication of effort would be
reduced.  Wide acceptance of a common format would encourage software
developers to accommodate that format, thus making it possible to use
the same text with many different software systems&mdash;not possible now
without substantial changes to the tagging within a text.  Obviously, in
a world of limited resources for textual research, both large text
archives and individual scholars stand to gain from a standard or
normalized practice.
</p>
<p> 
No existing encoding scheme is likely to gain acceptance as a standard.
Though existing research-oriented encoding formats serve the needs of
the projects for which they were developed, none is sufficiently
flexible or generalizable to apply to the encoding of textual materials
across the full spectrum of applications and research interests.  Some
are outdated schemes, based on 80-column card input or other obsolete
technologies.  They rely on the sequence numbers of punched cards to
provide structural information, or depend upon the characteristics of
specific hardware devices.  Other schemes are too intimately connected
with the peculiarities of the text corpus they were designed for to be
readily applicable to other texts or text types.  They may assume, for
example, that no structural hierarchy has more than three levels, (so
that <hi>book, chapter, verse</hi> may be tagged easily, but not
<hi>poem, canto, stanza, verse</hi>), or that no encoding will require
more than twenty different types of tags.  Encoding schemes developed by
the data-processing industry tend to lack both portability and the
generality required for a research-oriented text format.  They may make
no provision for structural hierarchies at all, or place the burden for
such niceties upon the user, who must write extensions to the software
to produce the desired effects.  Quite frequently, commercial schemes
encode only the layout of the text on the page, forcibly confusing (for
example) words italicized for emphasis, words in a foreign language, and
words italicized as part of a bibliographic citation.  And it must be
admitted that some existing schemes, both commercial and
academic, are obscure in operation, difficult to learn, and
rebarbative in appearance.
</p>
<p> 
Textual computing needs a single, easy-to-use, and flexible scheme,
suitable for encoding all types of textual materials and accommodating
a wide variety of scholarly research interests.  No one scheme among
those now existing, however, can be found to unite all these attributes.
If textual scholarship is to have a single common format for
machine-readable texts, it must be developed.</p></div2>
<div2><head>Previous Efforts</head>
<p>Scholars have tried in the past to create a standard form of text
encoding, or a common format in which to exchange existing texts.  Ten
years ago, with the support of the National Endowment for the
Humanities, a conference of North American experts convened to discuss
text encoding standards and related issues in San Diego.  In 1980, the
European Science Foundation sponsored a meeting on the same issues in
Pisa, in conjunction with a conference on computerized lexicography.
Neither of these earlier efforts led to any substantial agreement, still
less to a common format for texts in computer-readable form.  At the San
Diego meeting, many participants agreed in principle on the need for a
common encoding scheme, but dissension concerning the details of the
encoding scheme was so great that the project was ultimately abandoned.
At the Pisa meeting, the participants concurred on the need to work for
<q>normalization</q> of practice in lexical databases and text corpora,
but were unable to agree to the term <q>standardization,</q> largely
because some feared the loss of local decision-making responsibility.
Some cooperative efforts among European centers have resulted from the
Pisa meeting, but no common text format is being developed.</p></div2>
<div2><head>The Current Situation</head>
<p>The situation is more promising now.  Earlier efforts underscored
the need for a common encoding scheme, but failed to generate sufficient
consensus on first principles.  At the planning conference for the
current effort, by contrast, over thirty representatives of
universities, professional organizations and text archives agreed not
just on the need for common practice, but also upon basic principles to
govern the guidelines for encoding and exchange, and upon an
organizational structure to continue the work.  This consensus is the
result of several key factors:
first, as time passes, more is known about the problems of text encoding
and basic principles become clearer.
Second, even though funding limitations and a compressed timetable made
it impossible to invite everyone with a major potential contribution to
the effort, nevertheless the Vassar conference succeeded in
bringing together more representatives of key organizations and active
research centers than had ever met in one place before to discuss these
problems.
Third, the recently developed Standard Generalized Markup Language
(SGML), defined by the international standard ISO 8879, appears to
provide an invaluable tool for developing a simple, flexible, extensible
encoding scheme capable of satisfying the widely varying needs of
textual researchers.
And finally, the newly achieved consensus also reflects the growing
urgency of the need.  At the San Diego and Pisa meetings, it was
predicted that if the humanities computing community did not adopt
common practices for text encoding, chaos would ensue.  At the Vassar
meeting, no one needed to predict chaos:  it is, as several speakers
observed, the status quo.
</p>
<p> 
As more and more new practitioners enter the field of
computer-assisted textual study, the chaos grows ever wilder.  With the
declining price of optical scanners for reading large volumes of text
into electronic form, the number of research texts available
electronically will grow geometrically.  The increasing availability of
CD-ROM for mass storage and the boom in projects to create CD-ROM for
wide distribution of textual data will complicate matters further, since
the texts on CD-ROM are frozen and cannot be reformatted to suit a
different text-analysis program.
<note place="foot"><p>It should be noted that CD-ROM are at present usually encoded in
proprietary formats unreadable by any program but the maker's; this
reflects both the lack of any industry standard for physical
organization of CD-ROM and the desire of vendors to prevent unauthorized
copying.  In this environment, an encoding scheme of the type we propose
has no function, since the data format is already known to the only
program intended to read it.  As efforts to develop a standard physical
organization for CD-ROM progress, however, and the data on CD-ROM, WORM
disks, and other optical storage devices become accessible to generic
disk-reading routines, the need for a standard, well designed, and well
documented encoding scheme will re-surface.</p></note>
Several projects are already underway to encode and distribute massive
data bases of texts via CD-ROM, each proceeding or planning to proceed
with its own encoding scheme.  Together, the scanner and the CD-ROM
promise to aggravate the problem of anarchic encoding practices by
several orders of magnitude within a very few years unless action can
be taken soon.
</p>
<p> 
The consensus on goals and methods achieved at the Vassar
planning conference indicates that the text-computing community is
sufficiently alarmed by the current and expected situation to set aside
past differences and begin work immediately, in earnest, towards the
establishment of common ground for text encoding and text interchange.
The costs of failure are all around us, and grow higher as time passes;
we believe the humanities research community must act as
quickly as possible to build a consensus on the basis of the
discussions and agreements made at Vassar, and to formulate a
concrete set of guidelines for text encoding and text interchange, in
the context of scholarly research and teaching.</p></div2>
<div2><head>Impact</head>
<p>The impact of this project on humanities computing will be substantial.
At this time several projects to put massive amounts of text
(including literary texts,
bibliographies and dictionaries), into machine-readable form
are in the planning stages.
<note place="foot"><p>Among the projects now in prospect are the Oxford project to create
a computer database of pre-Restoration English drama, the Queens
University project to create an electronic library of Canadian
literature, and the effort among computational linguists to organize a
cooperative encoding of the classic <title>Century Dictionary</title>.
Further along is a project organized by scholars in Toronto and Otago,
New Zealand, to create a Tudor Textbase containing a range of written
works from the sixteenth century.  This corpus will form the basis of a
glossary or short dictionary of early Tudor English. The work of
entering the texts is already in progress at Otago and Toronto; Oxford
University Press have expressed an interest in publishing the text base
and lexicon in electronic form.  The organizers have expressed interest
in testing the proposed common text encoding format on their corpus.
</p></note>
Many are intended to be made widely available on CD-ROM.  Most of these
projects are developing encoding schemes for their data, completely in
isolation from one another.  Thus it is assured that without guidelines
to suggest a common encoding scheme, each project will develop a scheme
quite different from, and very likely incompatible with, those
developed in all the other projects.  This will be true even if all of
these projects base their work on SGML, since SGML defines no specific
encoding scheme but only a framework within which encoding schemes
(<q>tag sets</q>) may be developed.  Existing SGML applications like IBM
GML,  Waterloo GML, or the Association of American Publishers Electronic
Manuscript Project tag set, do not provide an adequate basis for
normalization of practice as they restrict themselves to the most
common textual structures and do not cover the encoding needs of most
texts intended for scholarly research.
</p>
<p> 
If a common encoding scheme existed, the effort of creating an
encoding format for specific projects would be minimized.  Furthermore,
the materials created by these projects would be in a uniform format,
comprehensible to anyone familiar with the single, accepted encoding
scheme.  Even more important, we can assume that the existence of a
common format will prompt software developers to accommodate this format.
<note place="foot"><p>The developers of the Oxford Concordance Program have already
announced their intention to support any common format created by this
project.  As noted below, it is also expected that other software
compatible with the new scheme will be created, as offshoots of this
project, as byproducts of the development work on this project, or
independently.</p></note>
Therefore, the materials created by projects over the next decade
could serve as input to as-yet undeveloped software designed for any
number of text analytic tasks.  If both the creators of textual
scholarly materials and software developers utilize a common encoding
format, the texts may be used with any software package.  Thus the texts
will be widely usable with no modifications, save the addition of tags
for specialized purposes, with any software package accommodating the
common scheme.
<note place="foot"><p>The prospect of compatibility of one text format with a wide
variety of processing programs is one strong reason, apart from the
intrinsic merits of its syntax, to explore SGML very carefully as
a possible basis for the new encoding scheme.  Because of the variety
of existing and projected programs supporting SGML applications,
SGML offers the prospect of uniting common word processing functions
with research-oriented text encoding by means of a relatively simple
common syntax of <q>markup tags</q> used to describe the logical
structure of a document.  Word processing programs can generate
correctly formatted printed copies, in a variety of styles, of a text
marked with such tags.  Analytic programs built to handle SGML syntax
could produce concordances, lists of lemmata, syntactic analysis or
summaries of syntactic data, or other research tools from the same file.
</p></note>
The common scheme, which will provide a syntax as well as a tag set for
almost all applications, will also facilitate specialized modifications
when they are necessary.
</p>
<p> 
In short, the development of a common encoding scheme for textual data
will have profound effects on humanities computing, both by eliminating
the need to create encoding schemes within specific projects, and by
nurturing an environment where machine-readable
texts will be usable, and
distributable for use with, software which performs a wide variety of
formatting and analytic tasks.
</p>
</div2>
</div1>

<!--* <include file=nh22hist> *-->
<div1><head>History and Current Status of the Project</head>
<p>The current project began in 1987 on the initiative of the Association
for Computers and the Humanities, which proposed a five-phase project
to prepare guidelines for text encoding and text interchange.  (See
Appendix A for description.)  The first phase called
for an international planning conference to discuss issues of content
and feasibility and to work out organizational arrangements among the
groups interested in participating in the effort.  The later phases
(discussed in more detail below) called for the drafting, circulation
and revision, approval, and publication of the guidelines.
</p>
<p> 
The planning conference, funded by the National Endowment for the
Humanities, was held at Vassar College in Poughkeepsie, New York,
on November 12 and 13, 1987.
Thirty-one experts from universities, learned societies, and text
archives from North America, Europe, Israel, and Japan, met for
intensive discussion of the desirability, feasibility, and basic
principles of a common set of guidelines for machine-readable text
encoding.  (A list of participants is in Appendix B.)
The firm consensus was that the textual computing community emphatically
needs a common format for the interchange of existing data, and that
individual scholars and projects alike need recommendations for minimal
text encoding practices, as well as the facility to extend those minimal
practices to cover special problems of interest to individual
researchers.  The group further agreed that such a common framework was
not only necessary but feasible, and agreed, after vigorous discussion,
on several basic principles to govern the scope and the organization of
a set of guidelines for encoding textual materials.</p>
<div2><head>Recommendations of the Planning Conference</head>
<p>After lengthy discussions of the basic purposes and functions
of the guidelines, the syntax of the scheme they should propose, and
the organization of the drafting process, the participants at the
planning conference recorded their consensus in the following
closing statement:
<q rend="block">
<list type="ordered">
<item>The guidelines are intended to provide a standard format for data
interchange in humanities research.</item>
<item>The guidelines are also intended to suggest principles for the
encoding of texts in the same format.</item>
<item>The guidelines should
<list type="ordered">
<item>define a recommended syntax for the format,</item>
<item>define a metalanguage for the description of text-encoding schemes,</item>
<item>describe the new format and representative existing schemes both
in that metalanguage and in prose.</item>
</list></item>
<item>The guidelines should propose sets of coding conventions suited for
various applications.</item>
<item>The guidelines should include a minimal set of conventions for
encoding new texts in the format.</item>
<item>The guidelines are to be drafted by committees on
<list type="ordered">
<item>text documentation</item>
<item>text representation</item>
<item>text interpretation and analysis</item>
<item>metalanguage definition and description of existing and proposed
schemes,</item>
</list>
coordinated by a steering committee of representatives of the principal
sponsoring organizations.</item>
<item>Compatibility with existing standards will be maintained as far as
possible.</item>
<item>A number of large text archives have agreed in principle to support
the guidelines in their function as an interchange format.  We encourage
funding agencies to support development of tools to facilitate this
interchange.</item>
<item>Conversion of existing machine-readable texts to the new format
involves the translation of their conventions into the syntax of the new
format.  No requirements will be made for the addition of information not
already coded in the texts.</item>
</list></q>
Three overall goals were thus defined for the guidelines:
<list type="bullets">
<item>To specify a common interchange format for machine readable texts.</item>
<item>To provide a set of recommendations for encoding new textual
materials.</item>
<item>To document the major existing encoding schemes, and develop a
metalanguage in which to describe them.</item>
</list>
Consensus was also reached on a possible syntactic basis for the
guidelines and the organization of the work.</p></div2>
<div2><head>Common Interchange Format for Machine-Readable Texts</head>
<p>Because some archives already hold large quantities of data, and even
more because many have made substantial investments in locally developed
analytic software keyed to the specific formats they have developed for
their data, archivists generally have neither desire nor motive to
convert their existing holdings to a new format for internal storage.
Existing archives will not necessarily be able to use the new scheme
even for new texts, which they will need to encode for consistency with
their existing holdings.  The text archivists at the Vassar
conference nevertheless saw a pressing need for a common format for data
interchange.  Such a format would allow each archive to write software
with which to convert its present data formats into the common
interchange format before sending texts to other users and convert
incoming texts from the common interchange format into the form around
which the data archive and its specialized local software are built.
This procedure, requiring translation to and from only one lingua
franca, would constitute a dramatic improvement over the Babel of
current practice, which requires translation to and from many external
formats.</p></div2>
<div2><head>Recommendations for Encoding New Textual Materials</head>
<p>The guidelines must also provide recommendations to those who are
encoding texts for the first time and are not required to conform to an
existing scheme, to assist them in deciding what textual features to
encode and how to encode them.  Inflexible requirements cannot be
formulated because the varieties of both textual materials and research
interests defy exhaustive classification or prescription.  Still,
recommended practices reflecting the consensus of informed scholarship
can improve data compatibility and help ensure higher data quality in
the future.  Such recommendations, as noted already, will not in any
sense be binding upon the practice of existing archives or commit them
to the slow, costly conversion of their holdings into the new format.
But as one participant in the planning conference observed, the texts
already available in machine-readable form&mdash;hundreds or perhaps
thousands of millions of words, all told&mdash;will represent only a drop
in the ocean of texts that will be encoded within the next ten to
fifteen years.  Since many of those texts will be encoded by scholars or
new research centers without an investment in any existing scheme, the
new guidelines should explicitly recommend the encoding of specific
minimal textual features about which the community of users can achieve
consensus.
<note place="foot"><p>The recommendations will specify both <emph>what</emph> features are
to be encoded (e.g. chapter divisions, title of text, etc.) and
<emph>how</emph> those features are to be represented (by stipulating the
use of specific tags or markup conventions).</p></note>
</p>
<p> 
Desirable as standardization is, the requirements of textual research
vary with the researcher and the text.  No simple set of absolute
requirements can be applicable to all texts or purposes, and the idea of
a set of absolute requirements was eventually set aside as being
reductive and simplistic.  The guidelines will recommend encoding the
textual features commonly found useful in more than one kind of
analysis, but even these commonly useful features will not be
<emph>requirements</emph> of the scheme.  With the scheme, as without it,
individual scholars will be faced with the inescapable necessity to take
responsibility for the useful encoding of their individual texts.
</p>
<p> 
In addition to the minimal recommended basic encoding, the guidelines
will define sets of textual features relevant to specific disciplines
or text types, and provide techniques for representing them.  These
extensions to the basic encoding will be strictly optional and no
recommendation that they be universally encoded will be implied.  For
scholars interested in a given type of textual problem, however (e.g.
lexical analysis, textual criticism, thematic study, etc.),
these extensions will provide a basis for compatible encoding of
similar information by different projects, and thus for easy interchange
of the encoded texts.
</p>
<p> 
For most types of textual study, the extensions will present a
formal syntax for representing salient textual features or analytic
categories, without further complications.  For example, the extension
for metrical study would provide methods for encoding the number of
syllables in a given line, the scansion(s) suggested for the line,
the metrically significant features (e.g. stress, vowel length) of
syllables, and so on.  For manuscript criticism and the encoding of
variant readings, methods must be developed
for representing variants, relating them to the lemma in the text,
classifying the variation, indicating the manuscripts, and so forth.
For such extensions to be successful, they must be based on a thorough
understanding of the intellectual problems involved, and their
preparation is expected to involve significant research.
</p>
<p> 
In many types of textual analysis, of course, the textual features
studied vary widely depending upon the theoretical orientation of the
researcher.
<note place="foot"><p>Thus one metrist may need to encode vowel length but be indifferent
to the number of syllables in a line; another may be interested in
lexicality of individual words and syllables as a possible metrical
determinant but not care about vowel length.
Theoretical divergences are even more obvious, and more profound,
in syntactic study and lexicology.</p></note>
For the most part, the guidelines we propose to develop are not the
appropriate forum for airing theoretical issues, and the guidelines
will perforce take an eclectic position, providing conventions for
encoding whatever textual features the individual scholar does wish to
consider.  For a few types of analysis, those most frequently performed
in literary and linguistic computing, it will be worthwhile to attempt
more than eclecticism.  In these cases an effort will be made to
establish a dialogue among representatives of various schools and work
for agreement on some recommended minimum content and a common
polytheoretical basis for encoding, at least to the extent of making it
easier to exchange data between projects working on different
theoretical lines.  (Such harmonization of diverse theoretical bases is
of course not guaranteed to be successful, and will be attempted only
where adjudged wise.)</p></div2>
<div2><head>Metalanguage and Documentation of Major Existing Encoding Schemes</head>
<p>A third important task is to document, in a single place and a single
vocabulary, the major existing schemes for text encoding.  Information
of this type will have immense value both for those attempting to use
and interpret texts encoded in one of these existing schemes, and for
those familiar with existing schemes who want a concise account of their
differences from the new format.  Reliable knowledge of existing
encoding schemes is essential, of course, for planning any interchange
format, and will also assist in the formulation of recommended encoding
practices.
</p>
<p> 
Moreover, the computational linguists at the planning conference
observed that if existing encoding schemes can be described not only in
prose but in a formal language, it will be possible to generate, from
the formal description of an encoding scheme, a program to translate
texts from that scheme into the common interchange format.  Because of
the advantages of such automatically generated translation programs, the
group as a whole agreed that a special committee should undertake to
develop a <q>metalanguage</q> for the description of encoding schemes
and to formulate descriptions in that metalanguage of the major existing
schemes.
</p>
<p> 
To develop the metalanguage, it will be necessary both to perform an
extensive survey of current and past practices in scholarship and
in computerized typesetting,
<note place="foot"><p>The automatic or semi-automated conversion of typesetting tapes to a
more generally useful form is regarded as a key benefit of the
metalanguage development effort.  At present, many publishers routinely
destroy the computer tapes used to set new books; even if the tapes are
retained, they often do not contain the last typographic corrections
(done by hand).  And when publishers are willing to share the tapes with
researchers, the researchers typically face a daunting task to
understand the encoding and render it useful.</p></note>
and to explore the formal properties of the markup conventions they use.
After isolating, in an analytic phase, the minimal set of syntactic
features required to describe the various existing text formats, the
working party responsible for metalanguage development must synthesize a
formal language to describe the syntactic and semantic features of the
existing schemes and ensure that the syntax of the new encoding scheme
is capable of expressing the distinctions and regularities expressed by
the older schemes without information loss.  Both of these phases will
require significant knowledge of formal language theory; the task of
creating a formal language with sufficient descriptive power may prove
to be challenging and is expected to constitute an interesting research
problem in this field.</p></div2>
<div2><head>Committee Organization</head>
<p>Subsequent discussions at the planning conference addressed the overall
plan of work required to develop the guidelines.
</p>
<p> 
Since the problems and aims of text encoding vary with the type of text
encoded and with the discipline of the encoder, while many problems of a
formal text-representation scheme cut across disciplinary borders, the
division of labor for drafting a scheme cannot perfectly mirror the
logical organization of the scheme itself.  After clarifying these
points, the planners agreed to divide the work on pragmatic grounds
among four committees:
<list type="bullets">
<item>committee on text documentation</item>
<item>committee on text representation</item>
<item>committee on text interpretation and analysis</item>
<item>committee on metalanguage issues</item>
</list></p></div2>
<div2><head>Syntax of the Scheme</head>
<p>No final decision about the syntactic basis for the new text encoding
scheme was made at the planning conference.  All present agreed that if
possible the syntax should be borrowed from some existing scheme, rather
than created out of whole cloth.  The syntax must be relatively simple,
capable of expressing the fine distinctions and occasionally complex
overlapping hierarchical structures required by textual material, and
must allow for user-defined extensions to the pre-defined set of tags.
</p>
<p> 
The most obvious candidate is the Standard Generalized Markup Language.
SGML forms the basis for the recently developed markup scheme
of the Association of American Publishers Electronic Manuscript Project,
and a survey of encoding problems recently performed at Queens University
concluded that SGML offered a better basis for research-oriented text
encoding than anything else currently available.
<note place="foot"><p>
<bibl>Cheryl A. Fraser, <title level="a">An Encoding Standard for 
Literary Documents,</title>
M.S. Thesis (Queen's University, Ontario), 1986.</bibl>  The work was
performed under the direction of Prof. David T. Barnard.</p></note>
SGML and markup languages built in conformity with it have already
proven flexible and powerful; while existing SGML applications do not
provide markup tags for many textual features needed for research, most
of what they do provide is also needed for research work.  It would be
pointless to develop again from scratch what has already been developed
during years of study and testing.  Some participants at the planning
conference asked whether SGML, with its reliance on simple hierarchies
of text structures, can handle the multiple conflicting hierarchies of
textual research (e.g. <hi>canto, stanza, line</hi> and <hi>poem,
sentence, word</hi>).  Those versed in SGML believed it would meet the
needs of the case, and it was decided to begin work on the syntax with
SGML as a model, and abandon SGML only if it proved inadequate for the
needs of research.
</p>
<p> 
The closing statement expresses this commitment as part of the general
goal of compatibility with existing standards.</p></div2>
<div2><head>Sponsorship and Participation</head>
<p>During preparations for the Vassar conference, three major international
organizations dedicated to the exploitation of modern technology in
humanistic or textual disciplines tentatively agreed to sponsor the
initiative jointly.  These are:
<list type="bullets"><item>the Association for Computers and the Humanities (ACH)</item>
<item>the Association for Computational Linguistics</item>
<item>the Association for Literary and Linguistic Computing</item>
</list>
At the Vassar meeting,
a temporary steering committee was formed, consisting of representatives
of each of these organizations, to oversee the project until a permanent
steering committee could be named by the executive councils of the
sponsoring associations.
</p>
<p> 
Provisional agreements have also been reached for a number of other
organizations to participate in the work of the initiative.  At this
writing (January, 1988), these organizations include:
<list type="bullets">
<item>the American Historical Association (AHA)</item>
<item>the American Philological Association (APA)</item>
<item>the Association for Computing Machinery, Special Interest
Group for Information Retrieval (ACM/SIGIR)</item>
<item>the Association for Documentary Editing (ADE)</item>
<item>the Association of American Publishers (AAP)</item>
<item>the Linguistic Society of America (LSA)</item>
<item>the Modern Language Association of America (MLA)</item>
</list>
These organizations are mostly North American in their base, although
several are international in membership.  Equivalent European societies
will also be invited to participate.
</p>
<p> 
Appended to this proposal (as Appendix D) are draft
memoranda of understanding describing the commitments of the sponsoring
and participating organizations; these will be submitted to the
governing bodies of the organizations, a list of which is
attached in Appendix E.
</p></div2></div1>

<!--* <include file=nh22plan> *-->
<div1><head>Plan of Work</head>
<p>The further work of the project corresponds to phases 2 through 5 of
the original plan:
<list type="bullets"><item>Phase 2:  Detailed Design and Drafting
(lower-level overall design, followed by drafting)</item>
<item>Phase 3:  Revision
(public circulation of drafts, public comment, and revision)</item>
<item>Phase 4:  Review and Approval</item>
<item>Phase 5:  Publication and Maintenance</item>
</list>
The tasks of each phase will be undertaken as appropriate by a
steering committee of representatives from the sponsoring organizations,
an advisory board of representatives from each participating
organization, editors appointed by the steering committee, and
drafting committees appointed by the steering committee upon the
nomination of participating organizations.</p>
<div2><head>Organization</head>
<div3><head>Sponsoring Organizations</head>
<p>The sponsoring organizations (those applying for this grant) will
undertake the project of developing and disseminating the guidelines for
text encoding and text interchange.  The terms of their cooperation in
the project are defined by a <q>Memorandum of Understanding</q>
(reproduced in appendix D), which has been prepared by
the steering committee and submitted by them to the sponsoring
organizations for formal approval.  The sponsoring organizations will
exercise their responsibility for the project through the steering
committee, which is composed of two representatives from each sponsoring
organization.</p></div3>
<div3><head>Participating Organizations</head>
<p>The participating organizations include major organizations for
literary, linguistic, and humanities research, computer science, and
data processing.  They endorse the idea of a common encoding scheme
and interchange format and participate in an advisory capacity as to the
content of the guidelines and their suitability for the needs of the
organizations' members.  The participating organizations will influence
the guidelines both through the work of their members on drafting
committees and subcommittees and through the participation of an
official representative on the project's advisory board.</p></div3>
<div3><head>Steering Committee</head>
<p>The steering committee is appointed by the sponsoring organizations,
each represented by two members.  It will oversee the project on behalf
of the sponsoring groups, appoint and supervise the editors and
committee heads, and appoint voting members of the working committees
(who may be nominated by participating organizations or by the steering
committee itself).
 
The steering committee will meet eight times during the project, in
addition to meeting twice with the advisory board.  The editors will
also attend these meetings, at which issues of design, strategy, and
(where appropriate) funding will be considered.  The steering committee
has thus far met once, at the Istituto di Linguistica Computazionale in
Pisa, in December of 1987.  A second meeting is planned for March 12-13,
1988, in Morristown, New Jersey.
 
The Association for Computers and the Humanities is represented on the
steering committee by Prof. Nancy Ide of Vassar College and Dr. C. M.
Sperberg-McQueen of the University of Illinois at Chicago.
 
The Association for Computational Linguistics is represented by Dr.
Donald Walker and Dr. Robert Amsler, both of Bell Communications
Research, Morristown, New Jersey.
 
The Association for Literary and Linguistic Computing is represented by
Ms. Susan Hockey of Oxford University and Prof. Antonio Zampolli of the
University of Pisa.
 
Abbreviated vitas of the steering committee are attached to this
document as Appendix F.</p></div3>
<div3><head>Advisory Board</head>
<p>The advisory board will comprise the steering committee, the editors,
and one representative of each participating organization.  It will
consider questions of basic principle and ensure that the interests of
the participating organizations and their members are adequately
addressed.  Individual members of the advisory board will act as
links between their organizations and the working committees and
circulate progress reports and interim drafts within their organizations.
 
Members of the advisory board will be named by the participating
organizations in the manner each chooses.</p></div3>
<div3><head>Editors</head>
<p>The editor in chief, appointed by the steering committee and serving
half time, will coordinate the day-to-day work of the project and act as
its administrative head.  The steering committee expects also to appoint
an associate or consulting editor to share the editorial tasks.
However, such an appointment depends upon the availability of qualified
individuals, which cannot be determined at this time.  The budget
included in this proposal requests funding for the consulting editor at
one quarter time.
 
The editors will:
<list type="ordered">
<item>ensure the proper circulation of documents among those working on
the project, and perform other duties of a scientific secretariat;</item>
<item>coordinate the work of the four working committees and their
subcommittees, serving as liaison among the committees and between the
committees and the steering committee;</item>
<item>draft the documents describing the organization of committee work,
in conjunction with the steering committee;</item>
<item>draft the basic charge or list of responsibilities for each
committee;</item>
<item>receive from the committees the results of their work with all the
relevant information, and ensure their compatibility each with the other;</item>
<item>review and edit the final document, integrating the work of the
various committees into a single coherent whole and rewriting portions
of the document as necessary so as to ensure consistency; subject to the
guidance of the steering committee, the editors are responsible for the
wording of the final documents;</item>
<item>administer the paperwork of the initiative, and
distribute travel funds and subsidies; and</item>
<item>coordinate the effort with members of the advisory committee and the
associations they represent, to ensure widespread participation and
publicity.</item>
</list>
In all of these tasks the editors will work under the supervision of
the steering committee, and with its help.
 
The editor in chief will be Dr. C. M. Sperberg-McQueen of the University
of Illinois at Chicago.  Dr. Sperberg-McQueen was unanimously selected
as editor in chief by the other members of the steering committee.
Centrally involved in the preparation of the initial proposal to NEH to
fund the Vassar conference, he was also the primary author of two
documents prepared and distributed as background for the meeting, one of
which formed the basis of the design for the guidelines ultimately
accepted by the participants.  His leadership skills, exhibited in
preparation for the meeting and in the discussions there, promise well
for his coordination of the project.  Since November, he has served as
secretary of the steering committee.
 
With a doctorate in comparative literature and years of experience in
academic computing, Dr. Sperberg-McQueen is one of the still small group
of scholars in this country with substantial backgrounds in both a
humanities discipline and computing.  This combination led to his being
invited, in January, 1985, to become Princeton University's first
consultant in humanities computing.  While at Princeton, he worked on
(among other things) the collection of textual data for users, the
analysis of texts in French, German, and English, and the installation
and use of concordance and text-analysis software.  He provided
technical support for an ambitious long-term project to encode the
thousands of historical texts of the Cairo Geniza&mdash;a complex
corpus of medieval and modern texts and fragments which documents
centuries of life in the Hebrew and Islamic communities of Cairo.  For
the Geniza project Dr. Sperberg-McQueen helped develop software
for Hebrew-Arabic-English text editing, and procedures for encoding and
manipulating these complex texts.
 
Dr. Sperberg-McQueen is currently a systems programmer at the University
of Illinois at Chicago, where he supports the library automation system.
He also provides technical support for faculty research projects,
including ones on Greek epigraphy and Roman law.  He is a member of the
Executive Council of the Association for Computers and the Humanities
and serves on the ACH Committee for Text Encoding Practices.
A vita is attached in Appendix F.</p></div3>
<div3><head>Committee Heads</head>
<p>The steering committee will appoint heads for the four working
committees, who will organize the work and meetings of the committees
and help ensure the intellectual soundness of the committees' work.
The committee heads will report regularly to the editors and
steering committee, cooperate with the editors and other committee
heads, and organize ad hoc subcommittees, as necessary, to deal
with topics too specialized to be handled by the committee as a
whole.  They will be responsible for arranging meetings of their
committees, drafting summary documents representing the work of the
committee, and ensuring that the work of the committee meets the
project schedule.
 
The committee heads occupy a key position within this project:  they
oversee the development of the standard within a specified area, they
draft the documents reflecting committee decisions, and they are in
large part responsible for ensuring that the committee's work continues
in a timely fashion.  These activities will demand a substantial
commitment of time to the project.  Therefore, funds permitting,
quarter time release from their other duties will be obtained for the
committee heads.</p></div3>
<div3 id="drafcom"><head>Drafting Committees</head>
<p>Voting members of the four committees called for by the Vassar planning
conference will be nominated by the participating organizations or the
steering committee and appointed by the steering committee.  The
committees will consider specific problem areas and recommend specific
text encoding practices to solve them.  The deliberations and analyses
of these committees constitute the crucial work of this project, and the
organizational structure is set up to assist in that work as far as
possible.
 
The four committees and their areas of responsibility will be:
<list type="bullets">
<item><p><label>Committee on Text Documentation</label>
</p><p>
This committee will address problems of labeling an encoding so that its
source text and other identifying characteristics are well documented.
The needs of library cataloguing, archive documentation, end users, and
processing programs will all be considered.  The computational
documentation of the file (in the form of declarations, etc.) may also
be considered by this committee, but the major responsibility for the
content of declarations will be borne by the committees on text
representation and analysis and, for the syntax of the declarations, by
the committee on metalanguage issues.
</p></item>
<item><p><label>Committee on Text Representation</label></p><p>
This committee will address the problems of representing in
machine-readable form (1) the physical aspects of a copy source, (2) all
information explicitly present in the copy text on the physical or
graphetic level, and (3) all the textual features (e.g. emphasis, words
in other languages, basic text structure) conventionally represented by
the typography of a printed edition, whether present in the copy text or
added by the encoder or later analysts.
</p><p>
Topics included in the field of this committee thus include the marking
or encoding of:
<list type="bullets">
<item>quotations</item>
<item>mathematical formulas</item>
<item>figures, tables, and illustrations and their captions</item>
<item>hyphenations (including declaration of how hyphenation is treated)</item>
<item>punctuation</item>
<item>diacritics and `special' character sets</item>
<item>change of language or alphabets</item>
<item>the conventional use, in a given encoding, of characters as
alphabetics, punctuation, diacritics, or separator</item>
<item>topography or layout of the text</item>
<item>recto and verso, color of page, etc.</item>
<item>logical structure of a text (chapters, paragraphs, etc.)</item>
<item>conventional reference numbers for a text</item>
<item>lineation (on page, in column, in logical subdivision, etc.)</item>
<item>editorial additions, deletions, or corrections</item>
<item>editorial apparatus (apparatus criticus)</item>
<item>special problems of numismatic, epigraphic, or paleographic material</item>
<item>special problems posed by the physical realization of a genre
(e.g. comic strips).</item>
</list>
Unlike those of other specific genres, the special problems of spoken
texts and of dictionaries will not be taken up here but in the committee
on text analysis and interpretation.  It should be noted that the
distinction between this committee and the next is not that between
objective and subjective or interpretive information, since the
typographic level of the text addressed by this committee can indicate
specific editorial interpretations of the text by means of font, layout,
or special punctuation.
</p><p>
As noted, the voting members of the working committees will be formally
appointed by the steering committee after the first meeting of the
advisory board.  At the Vassar planning conference, however, the
following participants expressed an interest in working on the problems
assigned to this committee:
<list type="simple">
<item>Lou Burnard, Oxford University</item>
<item>David Chesnutt, University of South Carolina</item>
<item>Yaacov Choueka, Bar-Ilan University</item>
<item>Jacques Dendien, National Institute of the French Language</item>
<item>Paul Fortier, University of Manitoba</item>
<item>Randall Jones, Brigham Young University</item>
<item>Terence Langendoen, Graduate Center, City University of New York</item>
<item>Junichi Nakamura, Kyoto University</item>
<item>Wilhelm Ott, University of T&uuml;bingen</item>
<item>Eugenio Picchi, Institute for Computational Linguistics, Pisa</item>
<item>Jean Schumacher, CETEDOC, Louvain-la-neuve</item>
<item>Paul Tombeur, CETEDOC, Louvain-la-neuve</item>
</list>
</p></item>
<item><p><label>Committee on Text Analysis and Interpretation</label></p><p>
This committee will address problems of representing, in
machine-readable form, the results of interpretive and analytic work by
scholars.  A full list of the possible areas of application that fall
under this heading will be developed in the early stages of this
project; a short list of examples includes
<list type="bullets">
<item>phonology</item>
<item>morphology</item>
<item>syntax</item>
<item>stylistics</item>
<item>metrics</item>
<item>information retrieval</item>
<item>thematic study</item>
<item>semantics</item>
<item>content analysis</item>
<item>lexicography.</item>
</list>
For pragmatic reasons, problems peculiar to some specific text types
will also be handled here, wherever texts of that type are typically
encoded primarily by scholars interested in a specific type of analysis.
Most notably, transcripts of oral speech, dictionaries, and glossaries,
which are of paramount interest to computational linguists and
lexicographers, will be treated by this committee rather than the
committee on text representation.
</p><p>
The porous borderline between representation and interpretation of
texts will require careful coordination between this committee and the
preceding one on issues of content.  The presentation of statistical
summaries and certain kinds of textual-critical analysis are two
of the more obvious borderline cases that must receive special
attention.
</p><p>
At the Vassar planning conference, the following participants expressed
an interest in working on the problems assigned to this committee:
<list type="simple">
<item>Lou Burnard, Oxford University</item>
<item>Roy Byrd, IBM Research</item>
<item>Nicoletta Calzolari, University of Pisa</item>
<item>David Chesnutt, University of South Carolina</item>
<item>Yaacov Choueka, Bar-Ilan University</item>
<item>Paul Fortier, University of Manitoba</item>
<item>Robert Kraft, University of Pennsylvania</item>
<item>Stig Johansson, University of Oslo</item>
<item>Ian Lancashire, University of Toronto</item>
<item>Terence Langendoen, Graduate Center, City University of New York</item>
<item>Penny Small, Rutgers University</item>
<item>Paul Tombeur, CETEDOC, Louvain-la-neuve</item>
</list></p></item>
<item><p><label>Committee for Metalanguage Issues</label></p><p>
This committee is charged with several tasks:
<list type="bullets">
<item>They will examine existing encoding schemes, and in particular SGML
and the American Association of Publisher's standard for electronic
manuscript markup, to determine the degree to which the syntax of the
markup scheme recommended by the guidelines can be compatible with these
schemes, or can be an extension of these schemes.  Compatibility with
the existing international standard is a major desideratum of this
project; clearly, there is no need for our project to duplicate the work
already done on SGML, the design of which has been carefully developed
over the past several years.</item>
<item>They will develop a formal metalanguage for the description of
encoding schemes, and formulate in it adequate descriptions of the major
existing schemes.  The purposes of this metalanguage are (1) to ensure
that the encoding scheme proposed by the guidelines is compatible with
existing schemes, in the sense that anything expressed in an existing
scheme is translatable into the new scheme; and (2) to provide, through
the metalanguage, a formal mechanism to simplify the design of programs
to translate from existing schemes to the new scheme.</item>
<item>Because this group will have special skills in formal language
theory, they will also be active in formulating specifics of syntax for
the new encoding scheme (for example specifying the form to be taken by
the declaration of new tags or multiple character sets) and ensuring
its notational extensionality.  As noted already, SGML and the
Association of American Publishers tag set will be carefully considered
as models for the syntax of the new scheme.</item>
</list>
</p><p>
At the Vassar planning conference, the following participants expressed
an interest in working on the problems assigned to this committee:
<list type="simple">
<item>David Barnard, Queens University</item>
<item>Lou Burnard, Oxford University</item>
<item>Paul Fortier, University of Manitoba</item>
<item>Eugenio Picchi, Institute for Computational Linguistics, Pisa</item>
<item>Jean Schumacher, CETEDOC, Louvain-la-neuve</item>
</list></p></item>
</list>
 
The four committees are expected to meet six times each during the
project, with the exception of the committee on text documentation,
which will meet only twice owing to its more narrowly
circumscribed duties.
 
Several face-to-face meetings are imperative in order for the work of
the committees to move forward:  the discussions they make possible
are essential to ensure that the guidelines address the needs of
specific applications adequately and appropriately.  Wide, vigorous
discussion also helps ensure widespread acceptance of the guidelines
after their completion.  While we expect that between meetings much of
the work of the committees will take place by means of electronic mail,
the need for face-to-face exchanges cannot be overemphasized.  The work
of the Vassar meeting could not have been accomplished without the
give-and-take of vigorous discussion, which typically
involved the majority of the participants commenting and responding on
any given point.  The decisions with which the committees are faced will
often be difficult, indeed possibly controversial; broadscale discussion
is the only sure way to ensure that these decisions accurately reflect
the needs of the research community.
 
Substantial subsidies for the work of these committees are not feasible.
Much of the work will have to be voluntary and unremunerated.  Where
possible, meetings will be held in conjunction with major conferences in
the field, so that travel expenses will be minimized.
 
In order to encourage proper professional credit for the work performed
in the committees, the steering committee will arrange, where possible,
for the publication of working papers from the committees in the
journals of the field.  Thus far the editors of several scholarly
journals associated with the sponsoring societies and their
representatives (<title>Computers and the Humanities</title>,
<title>Literary and Linguistic Computing,</title> <title>Computational
Linguistics,</title> and <title>Linguistica Computazionale</title>) have
expressed a willingness to publish such papers, where appropriate.  For
other papers, the steering committee will issue a series of working
papers, to be published in the manner of technical reports.</p></div3>
<div3><head>Subcommittees</head>
<p>Where the charge to a committee is broad enough to suggest or require
it, the committee head may organize subcommittees to consider special
problems, e.g. those of a specific discipline, or those of texts in a
specific language or script.  The membership of these subcommittees will
be open to all volunteers; there is no expectation that subcommittee
members will be voting members of the parent committee.</p></div3></div2>
<div2><head>Phases of the Work</head>
<div3><head>Lower-level Design</head>
<p>The first task in the actual preparation of the guidelines will be to
extend the principles enunciated at the Vassar conference, and
to prepare a more detailed design for the encoding scheme.  The editors
will prepare, in cooperation with the steering committee, a document
describing the design goals and principles for the scheme; this basic
design document will then be circulated to all participating
organizations and other interested groups or individuals for comment,
after which the steering committee and editors will revise it.
 
While the basic design document is circulating for public comment, the
committee on metalanguage issues will develop a preliminary syntax for
the encoding scheme, in order to define the syntactic framework within
which the content-oriented committees on text documentation, text
representation, and text analysis will work.  As stated above (sec. 2.6),
it is expected that the syntax of the encoding scheme will conform in
general with the international standard SGML, except where deviations
may be necessary to accommodate the particular requirements of
historical, linguistic, and literary textual analysis.</p></div3>
<div3><head>First Meeting of the Advisory Board</head>
<p>When the preliminary syntax has been completed, and the basic design
document has been revised, the steering committee will convene an
inaugural meeting of the Advisory Board.  The basic design principles
and syntax will be presented, discussed, and modified as necessary prior
to their approval by the advisory board.
 
The advisory board will also discuss the organization of the four
drafting committees, so that the steering committee, in a meeting
immediately following the advisory board's, can proceed with
the constitution of the drafting committees.</p></div3>
<div3><head>Drafting</head>
<p>Once the basic design goals and syntactic framework have been set,
the various drafting committees will begin their work.  The editors
and steering committee will by this time have prepared a formal
charge or brief for each committee, delineating its field of
responsibility and recommending attention to specific problem points.
The committee heads will begin the committee work by drafting an
analysis of the problem area, to serve as a starting point for
discussion and further work.  The detailed organization of the
drafting process is best left to the committee heads, but it is expected
that individual subcommittees will draft working papers on areas of
specific concern, each working paper analyzing the textual features
relevant to the area and proposing a set of SGML tags for encoding
them.  A great deal of earlier work must be reviewed and synthesized
by a general theoretical analysis in order for such working papers to be
successful.
 
Most of the committee work can reasonably by done by mail, whether
conventional or electronic, but several committee meetings will be held,
when possible in conjunction with major academic conferences.  (As noted
above in section 3.1.7, the committees are expected to meet six times
each, except for that on text documentation, which will meet twice.)
The editors will attend whenever possible, in the interests of
coordinating the work of the various groups; since the work of many of
the areas falling under different committees will obviously overlap,
such coordination efforts are essential.  Full reimbursement of
participants' travel costs to committee meetings would be prohibitively
expensive, but this proposal does budget funds for travel subsidies for
members of working committees who would not otherwise be able to attend
the meetings.</p>
<div4><head>Electronic Conferencing</head>
<p>In addition to conventional and electronic mail, electronic conferencing
will also be used to speed the drafting process.  The University of
Illinois at Chicago has agreed to sponsor one or more electronic
conferences for the discussion of issues raised in the text encoding
initiative.  These conferences will be based in the academic network
Bitnet and will be open to all interested parties. Experience in this
project and others shows that such electronic discussions can materially
aid in the preparation of complex documents.</p></div4></div3>
<div3><head>Public Review and Comment</head>
<p>This phase overlaps with the preceding.  As the working committees
prepare their recommendations, interim drafts will be circulated for
comment among the committee heads and the advisory board.  The advisory
board will be responsible for passing the drafts on to individuals or
committees in the participating organizations who will be competent to
comment on the substance of the draft.  Drafts will also be posted on
the electronic conferencing systems already mentioned, and comments
will be solicited from participants in the conference.
 
When a more nearly final form of the guidelines is ready, it too will
be circulated to the advisory board and participating organizations and
will be posted electronically.  Additionally, the existence of a
complete draft of the guidelines will be publicized by the sponsoring
and participating organizations, and copies sent to all inquirers for
comment.</p></div3>
<div3><head>Revision</head>
<p>The comments on the complete draft of the guidelines will be considered
by the steering committee, editors, and working committees, and the
guidelines will be revised accordingly.</p></div3>
<div3><head>Approval</head>
<p>The revised draft will then be submitted to the advisory board for
discussion, final amendment, and approval.  This final discussion will
take place at a second meeting of the advisory board.</p></div3>
<div3><head>Publication</head>
<p>Following the approval of the guidelines by the advisory board, the
sponsoring organizations will arrange for the publication of the
guidelines, either in journals or in book form.  Funds for this
phase are not sought at the present time.</p></div3>
<div3><head>Maintenance</head>
<p>The sponsoring organizations have agreed to maintain a mechanism for
revising and extending the guidelines after their first publication,
based on experience with the guidelines in practice.  This joint
maintenance mechanism will be responsible for issuing supplemental
interpretations of the guidelines and extensions of the guidelines
to further problem areas.  Revisions of the guidelines as a whole
may also be undertaken as appropriate.
 
It may prove appropriate, after approval and publication of the
guidelines, for the sponsoring organizations to seek adoption of the
guidelines as national or international standards.  On the other hand,
since the guidelines are expected to take the form of a tag set defined
in accordance with the Standard Generalized Markup Language (already an
international standard) and may possibly function as a compatible
extension of the Association of American Publishers tag set,
which is now being proposed as an American National Standard, such
adoption may prove neither appropriate nor necessary.  In any case, a
formal description of the guidelines will be prepared, structured in the
fashion conventional for standards documents.</p></div3></div2>
<div2><head>Timetable</head>
<p>
<table>
<row><cell>June, 1988</cell>
<cell>NEH Grant begins.  Informal discussions at ALLC meeting
in Jerusalem.</cell></row>
<row><cell>June - August, 1988</cell>
<cell>Editors draft detailed design goals, overview
of committee responsibilities, notes on committee procedures, and
charges for the individual committees.  Editors collect documentation
for existing encoding schemes.</cell></row>
<row><cell>August, 1988</cell>
<cell>Steering committee meets, reviews design goal and
committee organization documents, considers personnel issues, constitutes
committee for preliminary syntax.</cell></row>
<row><cell>August - December, 1988</cell>
<cell>Syntax committee (nucleus of later
metalanguage committee, possibly with additions) develops preliminary
SGML-based syntax for encoding scheme.</cell></row>
<row><cell>December, 1988 - January, 1989</cell>
<cell>Advisory Board meets to approve
detailed design goals and preliminary syntax and discuss committee
organization.  (Meeting in North America.)
Steering committee appoints heads of documentation, representation,
and analysis committees, and makes first appointments of committee
members.
Target date for completion of first collection of documentation for
existing schemes; copies of documentation obtained thus far distributed
to committee heads.</cell></row>
<row><cell>January - March, 1989</cell>
<cell>Committee heads write initial analyses of
their problem areas (`startup papers'), to serve as starting points for
the work of their committees.</cell></row>
<row><cell>March, 1989</cell>
<cell>Steering committee meets to
review progress and plan work ahead.  (Meeting in Europe.)</cell></row>
<row><cell>March - October, 1989</cell>
<cell>Working committees and subcommittees meet
to analyse their problem areas, document and discuss existing practice,
and consider possible recommendations.</cell></row>
<row><cell>June, 1989</cell>
<cell>Steering committee meets with committee heads to review
progress; working committees meet for working sessions at ICCH, Toronto.</cell></row>
<row><cell>October, 1989</cell>
<cell>Steering committee meets to review progress
(meeting in Europe).  Committees report in writing.  Target date for
completion of committees' preliminary analysis of their problem areas.</cell></row>
<row><cell>October, 1989 - February, 1990</cell>
<cell>working committees meet to develop
draft recommendations from their preliminary analyses.</cell></row>
<row><cell>February, 1990</cell>
<cell>Steering committee and committee heads meet to
review progress and resolve problems (meeting in North America).
First draft of each committee's recommendations expected for this time.</cell></row>
<row><cell>February - March, 1990</cell>
<cell>Editors prepare combined first draft of
guidelines, it circulates to advisory board and other interested parties
for examination and trials on texts.</cell></row>
<row><cell>March - December, 1990</cell>
<cell>Public considers first draft of guidelines
and makes suggestions.  Committees and subcommittees test their draft
recommendations on sample corpora, exchange their examples, revise their
recommendations based on their experience and comments from public.</cell></row>
<row><cell>June, 1990</cell>
<cell>Steering committee meets.  (Meeting in North America.)</cell></row>
<row><cell>October, 1990</cell>
<cell>Steering committee meets (with committee heads) to
review revisions and resolve problems.</cell></row>
<row><cell>January, 1991</cell>
<cell>Revised recommendations sent from committees to
editor.  Editors begin preparation of final document.</cell></row>
<row><cell>March, 1991</cell>
<cell>Steering committee meets to consider editors' draft
of final document, suggest revisions.</cell></row>
<row><cell>April, 1991</cell>
<cell>Editors revise draft in accordance with steering
committee recommendations.  Fair copies are distributed to advisory
board.</cell></row>
<row><cell>May - June, 1991</cell>
<cell>Advisory board meets to approve final version.
Steering committee makes arrangements for publication.  Sponsoring
organizations institute a joint mechanism for maintenance and revision
of the guidelines.
</cell></row>
</table></p></div2>
<div2><head>Concrete Results</head>
<p>Ultimately, this project will produce a single potentially large
document which will:
<list type="bullets"><item>define a format for encoded texts, into which texts prepared
using other schemes can be translated,</item>
<item>define a formal metalanguage for the description of encoding schemes,</item>
<item>describe existing schemes (and the new scheme) formally in that
metalanguage and informally in prose,</item>
<item>recommend the encoding of certain textual features as minimal
practice in the encoding of new texts,</item>
<item>provide specific methods for encoding specific textual features
known empirically to be commonly used, and</item>
<item>provide methods for users to encode features not already provided
for, and for the formal definition of these extensions.</item>
</list>
 
In the process of preparing this document, a number of working documents
(mentioned above in the Timetable) will be prepared, to document the
passage from one phase of the project to another.  Some of these will be
of transient interest, important primarily as preliminary records of
decisions about the final guidelines.  Others will be works of analysis
expected to have permanent utility as expositions of the reasoning
behind certain portions of the guidelines.  These latter will be
published either in the journals of the field or in the working papers
of the project.  In addition, it is expected that software of various
types will be developed in connection with this project:  conversion
programs to translate files from other encoding schemes into the new
format, extensions to various common formatting programs to allow them
to process texts encoded in the new tag set, and possibly various
types of analytic software.  These will not be deliverables of this
project in any strict sense, but it is reasonable to suppose that
they will be produced only if this project proceeds.

<!--* <include file=nh22tech> *--></p></div2></div1>
<div1><head>Notes on the Use of Automation Technology</head>
<div2><head>Rationale</head>
<p>Computing machinery will be used in the conduct of this grant because
<list type="bullets">
<item>electronic mail speeds correspondence and the exchange of drafts
among participants,</item>
<item>word processing systems make the production of large documents by
groups of authors faster and more reliable,</item>
<item>electronic conferencing provides one of the best methods of
publicizing this initiative among the interested community, and</item>
<item>some concrete implementation of the encoding scheme is required
in order to test the recommendations of the working committees.</item>
</list>
 
Computers are, in addition, part of
the normal working conditions for the participants in the project,
and are used in their daily work.  The computational load
on individual members of the working committees is unpredictable,
and costs are expected to be borne by their host institutions.
Funds are included for the editorial site in Chicago, which will
handle most of the documents involved in the project, many of them
in multiple versions; these funds will be contributed by the
University of Illinois at Chicago.</p></div2>
<div2><head>Hardware</head>
<p>The editor in chief will perform his work on the academic computing
facilities of the University of Illinois at Chicago; at the time of
writing these are an IBM 3081 model K running VM/SP,
and an IBM 3090 model 120 E running MVS/SP.  In addition, the
editor in chief will be provided by UIC with a personal computer,
so that he can develop techniques for adapting microcomputer
word processing software to the encoding scheme, and vice versa.</p></div2>
<div2><head>Software</head>
<p>Text processing software available at UIC and likely to be used for
this project includes the VM System Product Editor (Xedit),
Waterloo Script, Waterloo GML, TeX, and the Oxford Concordance
Program.
 
Although it is anticipated that software may be developed to process
data encoded in the format developed by this project, such software
development will take place in other projects which will have their
own funding requirements.  No formal software development is
anticipated as part of the central activities under this grant.</p></div2>
<div2><head>Costs</head>
<p>Costs for the Chicago editorial site have been estimated using
past billings on the machinery in question for work of about the
same intensity in terms of machine time.  It may be noted that
the machine costs at UIC compare favorably with those in many
comparable centers at institutions around the country.

<!--* <include file=nh22fund> *--></p></div2></div1>
<div1><head>Funding</head>
<p>The first phase of this project (Planning and High-Level Design)
was funded by the National Endowment for the Humanities under grant
RT-20880-87, with support also from Vassar College.
 
The current proposal requests partial funding for the major portion of
the project (drafting, revising, and approving the guidelines),
principally for the participation of U.S. citizens.  Funds for European
participation in the meetings of advisory board and working committees
will be sought from European sources.  Because the European involvement
in this project dates only from last December, there is thus far little
concrete to be reported on the progress of our search for European
funds.  The British Library is funding a one-year position at the Oxford
University Computing Service to study problems associated with the
encoding and use of texts in the Oxford Text Archive; we hope to draw
upon the results of that study for this project.  Some European research
centers (e.g. the Istituto di Linguistica Computazionale in Pisa) have
expressed a willingness to cooperate in the effort to develop the
guidelines and take part in the necessary fundamental research in the
encoding-related problems of textual analysis.  Some of these centers
expect to be in a position to fund workshops or specific research
projects relevant to the project, within the framework of their
institutional activities and their specified research goals for the
coming years.
Preliminary discussions are now underway with the most likely sources
of broader European funding.
And one member of the Standing Committee on Humanities of the European
Science Foundation is now preparing, in due form, a proposal that
the ESF support some of the research needed for this project.

<!--* <include file=nh22bdgt> *--></p></div1>
<div1><head>Budget</head>
<div2><head>Justification</head>
<p>This section follows the budget summary form item by item, indicating
the rationale for each expenditure and the method used to estimate
each cost.</p>
<div3><head>Salaries and Wages</head>
<p>The editors will coordinate the project, attend meetings of the working
groups and of the steering committee, publicize the effort, assist in
the work of the committees, and edit the text of the guidelines
themselves.  The editor in chief will devote 50% of his time to the
project, and have ten hours a week of student clerical help for
administrative and mechanical tasks.  The consulting or associate editor
will work quarter time on the project, attending committee meetings and
helping the editor ensure the intellectual cohesion of the encoding
scheme as a whole.
 
The editor in chief's salary is projected from the current level, with
raises of five percent annually.  The student clerical help is budgeted
for ten hours per week, fifty weeks annually, at $5.00 per hour the
first year, with five percent hourly raises the second and third years.
The hourly rate is set slightly higher than average on the UIC campus in
order to attract better qualified help; the five percent raises are
desired in order to make it easier to retain the same student over the
grant period, if possible.
 
The heads of the working committees on text representation, text
analysis, and metalanguage are asked to make a substantial time
commitment.  They must analyze the problems of their area, organize
subcommittees to report on specialized problems, run the meetings of
their own committees, and take general responsibility for seeing that
their committees perform their analysis work capably and produce useful
recommendations.  The active participation of many scholars in the
committee work will be essential to ensure the lively discussion of
basic principles and widespread support for the results without which
this project must fail.  It is equally essential to secure the vigorous
participation of competent committee heads to channel the discussion and
direct it to useful results.
 
In order to enable the committee heads to devote serious effort to their
responsibilities, it is important to give them time free of their normal
burdens.  Accordingly, we budget for 25% release time for the three
committee heads named, over the central two years of the project.  (The
committee on text documentation has a much smaller task, and its head is
expected to need no release time; if necessary, the editor in chief will
head this committee.)
 
As noted above, the consulting or associate editor and the heads of the
working committees have not yet been named.  Their salaries are
estimated at $37,500 on the assumption that they will probably be, on
average, slightly senior to the editor in chief.  It is anticipated that
the consulting editor and two of the committee heads will be U.S.
citizens; one committee head is expected to be a European funded from
European sources.  (An amended budget will be filed if these
expectations prove incorrect.)</p></div3>
<div3><head>Fringe Benefits</head>
<p>The fringe benefit rate for the editor in chief is 13.126%; that for the
student clerical help is 0.26%.  These are the standard rates at UIC.
 
Fringe benefits for the consulting editor and committee heads are
calculated at 25%, which appears to be a more usual rate.
<note place="foot"><p>Our information is limited, but seems to have a clear pattern.
The rate at Oxford is 25%; at Vassar, 24.5%; at the University
of California at Berkeley, 25% or 30%; at Princeton, 30%.</p></note></p></div3>
<div3><head>Travel</head>
<p>The size of the travel budget (it is the largest single item in the
budget) reflects our conviction that face-to-face discussion of the
issues is necessary if the working committees are to create a consensus
on the difficult technical issues they are expected to address.  The
editors must travel widely and regularly, both to encourage broad
participation by discussing the project at conferences and to
participate in the deliberations of the working committees.  The
steering committee must meet regularly to ensure that the editor and the
working committees are making adequate progress in their work, and to
hear reports from the committee heads and discuss issues with the
committee heads as a group.
 
In order to minimize travel costs overall, meetings will be held where
possible in conjunction with conferences in the field of literary and
linguistic computing; other meetings are expected to take place at or
near important centers for literary and linguistic computing, so that
consultations with scholars at those centers can be combined with the
work of the meeting.
 
Travel costs are listed separately for the steering committee, the
committee heads (for travel to steering committee meetings), the
advisory board, the editors, and the working committees.  Each group's
travel is summarized on a single line, since the individual points of
departure will be known only after the participating organizations have
nominated, and the steering committee has constituted, the working
committees.  Specific destinations will be known as the schedule of
conferences in the field becomes clearer; in what follows, it is assumed
that the committee meetings, like the committee memberships, will be
divided between North America and Europe, with slightly more meetings in
North America, and slightly more participants from North America.
 
Fares and subsistence costs are estimated as follows:
<list type="bullets">
<item>Air travel within North America is given an average cost of $500
round trip.  We are advised that the average fare in the U.S. is more
nearly $750,
<note place="foot"><p>By the Cliff Johnson Travel Agency in Oak Park, Illinois.</p></note>
but since most of these trips will be planned well in advance, the
average fare will be somewhat better.  At most meetings one or more
participants will be in their home city, which will further depress the
average travel cost.</item>
<item>For steering committee meetings, air fare within the U.S. is
estimated at $300, since the North American members of the steering
committee are concentrated in the East and Midwest.</item>
<item>For travel within Europe, air travel is given an average fare of
$400.</item>
<item>For travel from North America to Europe, or vice versa, air travel
is given the average cost of $1100.
<note place="foot"><p>Cliff Johnson Travel estimates the average fare at $1500, but as
with the North American travel we assume that special fares will be
available often enough to depress the average.</p></note></item>
<item>Per diem expenses for U.S. travel are estimated at $102 ($80 for
room, $22 for meals); for European travel they are estimated at $100
(room and meals combined), in accordance with the voucher approval
policies of the state of Illinois.</item>
<item>Meetings are expected to last two days; participants crossing the
Atlantic to attend the meeting are estimated to spend one and a half
days each way in transit; other participants are estimated to spend half
a day each way in transit.
<note place="foot"><p>These figures are based on observation of travel patterns at
the Vassar conference and the later meeting of the steering
committee in Pisa.</p></note></item>
</list></p>
<div4><head>Steering Committee</head>
<p>The steering committee will meet as a group eight times over the course
of the project, as well as twice with the advisory board.  Four of the
meetings will be in North America, four in Europe.  They are allocated
to the three years of the grant as follows:
<list type="bullets">
<item>1988-89:  one meeting in the U.S., two in Europe.</item>
<item>1989-90:  two meetings in the U.S., one in Europe.</item>
<item>1990-91:  one meeting in the U.S., one in Europe.</item>
</list></p></div4>
<div4><head>Committee Heads</head>
<p>In order to ensure the smooth and regular work of the project, the
committee heads will report regularly to the steering committee on
the progress of their committee and the problems they are encountering.
Three meetings of the steering committee will be devoted to discussions
with the committee heads; these are allocated one to each year of the
project:
<list type="bullets">
<item>1988-89:  one meeting in the U.S.</item>
<item>1989-90:  one meeting in the U.S.</item>
<item>1990-91:  one meeting in Europe.</item>
</list></p></div4>
<div4><head>Advisory Board</head>
<p>The advisory board will meet twice, both times in North America.
Based on the list of organizations being invited to participate, nine
Americans and five Europeans are expected to constitute the group,
to which are added the steering committee, making thirteen Americans
and seven Europeans.  The first meeting will ratify the design
principles agreed upon at the planning conference at Vassar;
it will take place in the first year.  The second meeting, in the
third year of the grant, will approve the guidelines themselves.</p></div4>
<div4><head>Editors</head>
<p>The editors will attend as many meetings of the working committees
as possible; in addition, they will attend the meetings of the steering
committee, and travel to text archives and other centers of textual
computing in order to confer with practitioners on details of existing
encoding schemes and suggestions for the new scheme.  Many of these
trips will take the editors to conferences in the field, where they
will render accounts of the progress and principles of the project to
interested scholars.
 
On average, the editors are expected to make about one trip per month,
divided evenly between North America and Europe, with an average
length of five days.</p></div4>
<div4><head>Working Committees</head>
<p>While it may be impossible to pay in full for every trip by every
member of every working committee or subcommittee to attend every
meeting of the group, nevertheless it is essential to provide as full
a subsidy as possible.  If the guidelines are to express the consensus
of the textual computing community, they must be discussed very fully
while they are drafted.  While we expect that many scholars will be
willing to contribute some of their own travel costs in order to help
make sure the guidelines reflect the technical needs of their
discipline, it is important to show, by financial support, how
central the working committees are to the success of the project
as a whole.
 
The working committees are scheduled to complete their work in about
two years; over this period the committee on text documentation is
expected to have two meetings, and the committees on representation,
analysis, and metalanguage six meetings each.  (Adjustments will be
made as necessary.)  Of these twenty committee meetings, twelve are
expected to be in North America, eight in Europe.  For purposes of
budgeting, the meetings are estimated as having ten participants each,
six North Americans and four Europeans.  In the budget, the meetings
are allocated as follows to the three years of the grant:
<list type="bullets">
<item>1988-89:  two meetings in the U.S., two in Europe.</item>
<item>1989-90:  six meetings in the U.S., four in Europe.</item>
<item>1990-91:  four meetings in the U.S., two in Europe.</item>
</list></p></div4></div3>
<div3><head>Supplies and Materials</head>
<p>Estimates for the supplies and materials needed for the project are
based on the costs incurred by the editor in chief over the past year
working at the same site.
 
The monthly usage of mainframe computer resources is estimated, when
monetarized, to be about $800.  (Over the past year the editor in
chief has used about $1600 of resources per month.)  Office supplies
are estimated based on the per capita expenditure of the host computer
center.  The cost of a personal computer, to be provided by the
host institution, is that of a typical model now available (IBM PS/2
Model 50) at the institutional price.  The personal computer is
required in order to run common microcomputer software and test its
compatibility with the new encoding scheme, and to examine products
designed for processing documents written in an SGML-compatible
markup language.  (Commercial monthly rental of a PC in Chicago runs
about $250 per month for IBM PC/XT-compatible machines, about $400 per
month for PS/2 machines.  The PC is therefore budgeted as for an
outright purchase.  At the conclusion of the grant, ownership of the PC
will be retained by the host institution.)
 
In order to provide a terminal connection for the student assistant,
a modem for the campus network must be installed; for this the
standard on-campus charge is that shown ($800).  (No terminal rental
or purchase is shown, because it is anticipated the student will use
the microcomputer as a terminal.)</p></div3>
<div3><head>Services</head>
<p>Monthly telephone and postal costs are hard to estimate with certainty
but are expected to be rather high, given the need to keep in touch with
as broad a spectrum of interested people as possible, not all of whom
are accessible through electronic mail.  The UIC computer center has
agreed to pay telephone and postage charges as shown ($300 per month
and $100 per month).
 
The standard charge for printing at the host site is four cents per
page (only laser printing is available).  The estimate of 7500 pages
per month is based on the experience of the editor in chief over the
past year, and includes the printing of small runs of the working
papers to be issued by the project as technical reports.</p></div3>
<div3><head>Subcontracts and Indirect Costs</head>
<p>The three applicant organizations will subcontract the administration
of the grant to the University of Illinois at Chicago (UIC) and to other
colleges and universities as appropriate.  Release time for salaries
for the editorial staff and committee heads will be subcontracted to
the appropriate institutions; administration of travel funds and
the central editorial site will be handled by UIC.
 
In addition to the main project budget in this section, a budget for
the subcontract with UIC is attached as Appendix G.  As the committee
heads and consulting editor remain to be named, details of the
subcontracts for their salaries remain to be worked out with their
institutions and no separate budget is included at this time.
 
The <q>Indirect Costs</q> section of the budget shows a rate of
38% on the funds allocated to the subcontract with UIC.  This rate
reflects the Indirect Cost Agreement of April 3, 1987, between
UIC and the Office of Naval Research.
However,
the University of Illinois at Chicago has generously agreed to waive a
portion of its federally negotiated indirect cost recovery rate on
travel funds provided by NEH to this project.  On these funds, a rate
of 10% will be charged to the grant, and the other 28% of the indirect
costs will be contributed by the university.
This represents a waiver
of approximately 60% of the total indirect costs normally recoverable
by the university.
The cost sharing
column also shows UIC's contribution of the indirect costs on services
provided by UIC without charge to the grant.
 
For funds to be subcontracted to other institutions (specifically the
salaries and fringe benefits of the committee heads and the consulting
editor), an indirect cost rate of 45% is shown.  This figure
is obtained by averaging the overhead charges of a number of
institutions
<note place="foot"><p>Specifically:
Oxford University (44%),
Princeton University (67%),
the University of California at Berkeley (47%),
the University of Illinois at Chicago (38%),
the University of North Carolina at Chapel Hill (43.5%),
and
the University of South Carolina (50%).
Vassar College charges 72% but excludes fringe benefits.  On a faculty
salary, this is equivalent to an overhead charge of 58% figured on
salary plus fringe benefits.
The arithmetic average of these figures is 49.6%, but private
institutions with high overhead rates are perhaps slightly
overrepresented in the sample.
</p></note>
and rounding down.  Half of this indirect cost figure is shown in
the cost sharing column, as the other subcontractors will be expected
to make financial contributions roughly proportionate to that made
by UIC.</p></div3></div2>
<div2><head>Budget Summary Sheets</head>
<p>[Note:  Detailed budget information has been omitted from this copy of
the NEH proposal.]</p></div2></div1>

</body>
<back>
<!--* 
.*  GML LS reset 0  - next two lines killed, unneeded.  MSM, 3 May 88
.* sr @GML@LS = 0
.* ls 0
.* The next two appendices get bf appx after their headings, pf at end
*-->
<!--* 
<include file=nh22neh1>
.* NB After chapter heading, this section pushes font set APPX onto
.* font stack, and pops it at the end.  YOU MUST DEFINE FONTSET APPX.
*-->
<div1 id="neh1"><head>Funding Proposal for Phase 1 (Planning Conference)</head>
<head>Extracts</head>
<?WS .bf appx?>
<div2><head>Introduction:  the Need for Text Encoding Standards</head>
<p>The ability of computers to perform mechanical tasks
reliably and quickly can be as useful in the manipulation of texts
as in the manipulation of mathematical quantities.
But before information can be manipulated by a computer,
it must be <emph>represented</emph> in the computer.
Whatever the nature of the information to be processed,
the machine works by
the fast manipulation of symbols which represent that information.
And the quality of the result depends largely upon the
quality of the machine's symbolic representation of the information
to be worked with.</p>
<div3><head>What Text `Encoding' Is</head>
<p>Scholars working with textual material,
like those working on mathematical, physical, or chemical problems,
must thus find appropriate ways of <emph>representing</emph> or
<emph>encoding</emph> their data (in this case, their texts) in forms
suitable for mechanical manipulation.  Typically, a scheme for
encoding texts must include:
<list type="ordered">
<item>Methods for recording the individual characters of a text.
For European languages written in the Latin alphabet,
the encoding of most characters is given by standard practices of
the data-processing industry, but special encoding conventions may
be needed for:
<list type="ordered">
<item>diacritics, such as those needed for French, German, Spanish,
and many other languages;</item>
<item>the special consonants and vowels needed for other languages,
such as the Scandinavian
languages, Hungarian, Polish, or Turkish, or the medieval forms of
other languages (e.g. Old and Middle English);</item>
<item>punctuation marks conventionally used in some languages, but not
present in modern data-processing standards:  Greek colon, the
various manuscript symbols used for Latin <mentioned>et</mentioned>, etc.;</item>
<item>special symbols like the signs of the zodiac, chemical and
astrological symbols,
or simple line drawings inserted into the text by the author;</item>
<item>characters distinct in Western printed books, but not distinct
in modern data-processing standards (opening and closing quotations,
the various forms of the dash and hyphen, ligatures, etc.), where
the distinctions are important for the research being undertaken.</item>
</list>
Some character representation must also be found for texts in other
alphabets, for which there may be no industry standards:
a character code may be devised from scratch,
or adopted from existing sources, or the text may be represented in
transliteration.
Special problems arise when texts include mixtures of languages and
alphabets, e.g. texts in Hebrew and Greek with footnotes in Latin,
or text in Armenian with Russian footnotes and a Chinese glossary.</item>
<item>Conventions for reducing texts to a single linear sequence
wherever footnotes, text-critical apparatus, parallel columns of
text (as in polyglot texts), or other complications make the linear
sequence problematic.</item>
<item>Methods for recording the logical and physical divisions of the
source text:  e.g. book, chapter, paragraph, sentence; play, act,
scene, line; or volume, page, printed line, etc., so that passages
found by a search can be located in a printed copy of the text.
For analytical bibliography, exact details of type usage in early
printed books must be encoded,
including details of ligatures used, justification of lines,
font use, inverted letters, printers' ornaments, and devices.
For numismatic
or epigraphic collections, information of this type might become
equally complex.</item>
<item>Methods for recording linguistic and literary information
important for scholarship, whether explicitly marked in the text
or extra-textual:
author, date, genre,
sentence boundaries, syntax,
dialect, direct and indirect speech, use of archaisms,
scansion of verse, stylistic features, allusions and quotations, etc.
Here the variety of elements to be
represented is as great as the variety of approaches among those who
use machine-readable texts.</item>
<item>Conventions for delimiting comments and other material in the
machine-readable file which is not strictly part of the text being
encoded.</item>
</list>
The limitations of early keypunches and printers led early on to
elaborate encoding schemes for character data:  asterisks to signal
uppercase letters, numeric renditions of
accents, transliteration conventions for Greek and Cyrillic, and so
forth.  More recently, the production of large text corpora in
a single encoding scheme
<note place="foot"><p>Among pioneering text corpora one may mention
the Thesaurus Linguae Graecae,
the Brown Corpus of modern American English and its various
supplements, and
the Treasury of the French Language
corpus of modern French materials.</p></note>
and the development of standard concordance packages
intended to handle wide varieties of text types
<note place="foot"><p>Early mainframe concordance programs to achieve wide distribution
include the batch-oriented
Oxford Concordance Program and WatCon (the Waterloo Concordance
Program), and more recently the interactive program
ARRAS (the ARchival Retrieval and Analysis System).
More recently still, microcomputer versions of these programs have
been developed, as have
other microcomputer-based concordance programs too numerous to name.
</p></note>
have led to greater concentration on conventions for
representing the logical and
physical divisions of a text, as well as to a greater awareness of
the need for a standard set of practices in
encoding texts for machine-aided analysis.</p></div3>
<div3><head>The Need for Standard Encoding Practices</head>
<p>For many years, scholars interested in the computer-aided analysis
of texts being in a small minority both within computing and within
the humanities, most projects began and ended in solitude.  Into the
1960s and '70s many researchers developed their own software and
even more their own (incompatible) systems of encoding, driven in
part by the needs of their software and in part by the peculiarities
of the individual texts they worked with.</p>
<p>Gradually, the community of potential users grew.
Communication among those interested in the field improved.
The potential uses of machine-readable texts were seen
to extend beyond the production of printed concordances and to
include literary and linguistic studies performed readily by
machine but not readily or not at all by hand.
As a result, interest grew in preserving machine-readable
texts for later re-use, and the exchange of texts grew more frequent.
This exchange, increasingly, is
institutionalized in the form of large text archives, which hold and
distribute copies of texts for computer-aided analysis, whether
prepared elsewhere or at the archive itself.
<note place="foot"><p>The Oxford Text Archive, one of the first and largest of these
archives, was founded in 1976.  Other archives have appeared in many
European countries, sometimes with specific linguistic or
disciplinary limitations, sometimes without.
While there is still no major general-purpose
archive in the U.S., the American Philological Association does
maintain a Repository of Machine-Readable Classical Texts, and a
number of universities have begun to prepare and collect machine-
readable texts.</p></note></p>
<p>
Text exchange is still hampered, however, by
the many and inconsistent schemes used to
encode the texts in the archives.
Many
machine-readable texts, originally prepared for personal or local use,
lack any formal documentation of their status or organization:
what the codes mean,
who encoded the text,
what edition was used as the source,
etc.
Existing text archives do valiant work
trying to document what they have and
improve what they acquire,
but the rise of computer use among humanists
promises a flood of new machine-readable texts
in the near future
which will overwhelm our current ability to
document and assimilate variations in encoding
practices.
Humanists' use of computers, especially
microcomputers, has increased manyfold over the last
two or three years, and one of the first thoughts of most humanists,
when beginning to move from word processing to computer-aided
research, is to begin typing into the machine the text or texts
they are working on at the moment.</p>
<p>Similar problems in the social sciences have been
successfully addressed by formulating standard practices
for data representation and documentation.
These standards are supported and adhered to by most
major social-science data archives and statistical analysis
programs, so that social scientists who use computers in their work
face a less chaotic world than do their humanist counterparts.</p>
<p>A single lucid, compendious set of
recommendations for encoding textual data
in machine-readable form is essential,
if the coming floods of machine-readable texts are to
be usable by the entire scholarly community.
The experience of the past four decades
has laid an adequate groundwork for standardization:
we know now many of the pitfalls to be avoided;
the rise, in this decade, of new legions of
computer users in the textual disciplines
requires us to formalize that experience
and point out those pitfalls now,
so that those who create new texts&mdash;individual scholars
working alone, and large archives engaged in mammoth
projects, alike&mdash;can turn their minds to
the problems that have not yet been solved,
instead of (as the saying goes) re-inventing the wheel and running,
as always, the risk of putting the axle in the wrong place.</p></div3>
<div3><head>Advantages of a Standard Practice for Text Encoding</head>
<p>It has become clear that in the textual disciplines, just as in
the social sciences, secondary analysis will soon become&mdash;if
it is not already&mdash;far more common than primary analysis.
<note place="foot"><p>Secondary analysis is analysis by scholars or parties other than
those who first entered the material into machine readable form,
and for purposes other than those first envisaged.
Primary analysis is analysis by the original owner of the data,
for the originally conceived purposes.
In textual studies, secondary analysts (e.g. lexicographers)
often need to aggregate a variety of independently encoded shorter
texts and collections into a larger corpus for linguistic research;
this is feasible only if the editorial and encoding practices of
the smaller texts are the same, or can be harmonized.
</p></note>
Clear and complete documentation of the encoding scheme, ease of
data exchange, and accessibility to more than one type of analysis
package, will all become correspondingly important.</p>
<p>Guidelines for encoding textual data
can be expected to benefit a variety of interested parties:
<list type="ordered">
<item>Scholars encoding texts for the first time will be able to find
guidance in choosing what information to encode, and what to omit.
Without guidelines, those who encode texts
must invent their own scheme,
and those doing so for the first time may often overlook
obvious requirements for useful encoding.
A set of guidelines which would remind them of less obvious problems
would thus help ensure a higher level of quality and usefulness
in encoded texts.</item>
<item>Scholars who encode in accordance with the guidelines
will be able to exchange texts with others with
less special effort to document the encoding
scheme.
Instead of describing every tag used and including
a separate document to convey information (such as copy text or
source and date of the encoded version itself)
not contained in the text proper, text preparers would need merely
to refer to the guidelines themselves and perhaps discuss briefly the
peculiarities of the individual text.
This would be a particular benefit in
the administration of large archives of machine-readable texts.</item>
<item>Scholars who acquire texts encoded by others
will enjoy the converse benefit:
the text they obtain will be easier to understand,
because better documented.
(Under current conditions, a great many texts circulate
with no significant documentation at all,
despite the best efforts of the archives to acquire and
distribute documentation for their texts,
and even if the texts use a common encoding scheme like the
COCOA tags adopted for the Oxford Concordance Program
<note place="foot"><p>"COCOA"
stands for "word COunt and COncordance program
on Atlas", the name of an early British program,
which first introduced the flexible and simple
style of tagging still called, in the Oxford documentation,
"COCOA tags".
</p></note>
the user must guess at the significance of the actual tags used.
Standardized text encoding practices would
end this problem.)</item>
<item>Developers of text-analysis software,
who at present usually develop their own special encoding schemes,
will (like the scholars encoding texts)
be reminded of textual features that need to be accommodated
in an encoding scheme.
At present, software-determined encoding schemes have an alarming
tendency to handle only the relatively restricted class of texts
in which the software developers are conscious of an interest:
nineteenth-century novels, for example, or Biblical texts.
So a text encoding scheme could be expected,
if it were followed,
to result in better software development and more
generally useful programs for textual analysis.</item>
<item>Users of texts, who typically may perform a variety
of analyses on a text, involving often a number
of different programs, will have less work to edit the
various texts they acquire into a common scheme, and
will less frequently have to change that scheme
when they move from one program to another
for a different type of analysis.</item>
</list></p></div3></div2>
<div2><head>The Need for Prompt Action</head>
<p>With every passing month,
one hears of
more plans for encoding and
distribution of texts and whole text
archives on CD-ROM,
<note place="foot"><p>At least two CD-ROM versions of the Thesaurus Linguae Graecae
are planned:
one to be distributed by the Brown University
"Isocrates"
project together with the text retrieval software developed by the
Harvard Classics department, and one to be distributed with the
Ibycus microcomputer developed by David Packard.
In addition, Pennsylvania University has announced plans for yet
another CD-ROM with ancient texts.
Queen's University, together with several other Ontario
universities, is now negotiating with
a major publisher of Canadian material
with the goal of producing a machine-readable version of
the New Canadian Library (over 200 volumes), and
possibly distributing it via CD-ROM.
Commercial projects are becoming almost too numerous
to keep track of.
</p></note>
and more investments of time and money are made
in the chaotic wilderness of divergent text encoding schemes
that marks the current state of affairs.
Many of those involved in planning and executing
text-encoding and text-distribution projects
express the wish that some standard practice existed,
and some archive directors are delaying major
projects in the hope that some sort of
international agreement on encoding schemes,
or at least upon a conceptual framework for the description
of such schemes,
can be agreed upon soon.
<note place="foot"><p>
At Oxford, two major projects have already identified an urgent need
for encoding standards.
One of these will encode all the set texts used by undergraduates for
honors courses in classics and modern languages;
the other, a more ambitious undertaking, will encode all surviving
early texts of pre-Restoration drama in English.
The first project has now been funded and work will begin early in the
spring, while the second is still seeking funding;
they may be regarded as typical of those going on at many UK
universities.
The text archive project in Canadian literature at Queen's University
is also dependent upon encoding standards which are at present
being designed as a prerequisite to scanning the material itself.
</p></note></p>
<p>If guidelines for standard practices can be agreed upon soon,
many of the projects still in the planning
stage can still (and, it is to be hoped, will)
encode their texts in accordance with the
guidelines.
If, however, there is no prospect of any guideline for common practice
within the foreseeable future, these projects, and others,
will have no choice but to proceed without guidelines,
and a great chance for standardization
and quality control will have been missed.</p>
<div3><head>Inadequacy of Existing Schemes</head>
<p>Existing text tagging schemes may be divided into two classes:
first, those associated directly with specific concordance software
or large text-corpus projects (COCOA tags, the tagging scheme used
by the Thesaurus Linguae Graecae, or the tags required by WatCon,
or ARRAS); and second, general-purpose schemes
developed within the data-processing industry
for text preparation, exchange, revision, and publication.
Cutting across this division one may also distinguish schemes
which tag the content or structure of the text
and those which specify its appearance on the page (or screen).</p>
<p>It would be clearly advantageous if some common encoding scheme were
usable both for analytic research and for the preparation of printed
versions (editions, typeset concordances, commentaries with extended
quotations, lexica, glossaries, dictionaries) of textual material.
This would make it easier to publish the results of
textual research, make texts prepared for publication
usable for research purposes,
and also ease the learning process for scholars who use the same
computer resources both for writing (word processing) and for
textual analysis.
If one scheme, or two related schemes, could be used for both purposes,
there would be less confusion
about how to mark a given phenomenon in the text, and more
likelihood that both encoding schemes could be learned and used.</p>
<p>No one existing scheme, however, is adequate to both uses.
Many members of the first (research-oriented)
group suffer from inadequate generality,
arbitrary limits on the depth of hierarchical tagging of the logical
or physical structure of the text, or from limits on the maximum
number of distinct tags.  All suffer from the ignorance of the
data-processing industry at large:
that is, none of them, even the best
conceived, can be used for any purpose other than concordance-making
or other analytic purposes.
Within the second group (schemes oriented to office automation
and machine-assisted publishing), many schemes require more
technical sophistication than can realistically be expected of
even experienced computer users, and no scheme provides adequate
facilities for encoding all the attributes of a text that may
be required for research.</p>
<p>Within the second group
two recently promulgated schemes for machine-readable text markup
merit special mention here.
<note place="foot"><p>
One could also mention others, such as the
Xerox Interscript encoding, the IBM Document Interchange
Architecture and Document Content Architecture, or the Office
Document Architecture and Office Document Interchange Format
standards promulgated by the International Organization for
Standardization (ISO).
They suffer from the comparison with SGML:
in comparison, they appear to be too opaque in their construction and
too distant from the readable text in their storage formats
to achieve any significant
following among humanists interested in encoding texts,
and they will not be discussed here.
A useful discussion of these schemes from the relevant
point of view may be found in Cheryl A. Fraser,
"An Encoding
Standard for Literary Documents,"
(M.A. Thesis, Queen's University,
1986), x + 212 lvs. </p></note>
The
Standard Generalized Markup Language (SGML) recently promulgated by
the ISO, and the SGML-style tag set recently developed at great
effort and expense by the Association of American Publishers (AAP)
merit closer discussion.</p>
<p>SGML represents an attempt to standardize the syntax of
"text
formatting" or "markup" languages
<note place="foot"><p>Two prominent examples of "markup languages"
are the IBM and
Waterloo Generalized Markup Language (GML) products, based
respectively on IBM's Document Composition Facility (DCF) Script,
and on the genetically related Waterloo Script.  Similar programs,
which may or may not include the same terminology and concepts,
and which thus may or may not be regarded strictly as
"markup languages,"
but which perform much the same function,
include
<ident>TeX</ident> (Tau Epsilon Xi, a public-domain typesetting language
distributed by the American Mathematical Society),
the TOPS-10 program <ident>Runoff,</ident>
the Vax VMS program <ident>Scribe,</ident>
and <ident>ROFF</ident>
(a ubiquitous "runoff" or text-formatting program)
and its descendants,
most prominent the Unix utilities
<ident>nroff</ident> and <ident>troff</ident>.
</p></note>
and to this end it defines both a
"reference concrete syntax" that
prescribes specific characters for specific functions and a
generalized "abstract syntax"
that allows different implementations
of GML to use different characters for the same functions.
One of the greatest strengths of the SGML approach
is thus its flexibility:  it allows implementors to define
their own sets of metacharacters within the same syntactic framework.
For example, a GML tag is preceded, in the reference concrete
syntax, by the '&lt;' sign,
and followed by the '>' sign.  In Waterloo
GML, the tag must also be delimited by specific characters, but by
default GML tags are preceded by colons and followed by either a
space or a period.  The Waterloo implementation diverges from the
concrete syntax but remains compatible with the abstract
syntax.</p>
<p>
SGML does not define a specific set of tags
with which to mark texts up.  This is left to the individual
implementation, and is expected to vary from application to
application,
<note place="foot"><p>Thus an emphasized or italicized phrase may be preceded in one
GML by the tag "&lt;ital>"
and followed by the tag "&lt;/ital>", while
another GML may accomplish the same thing with the tags
":hp1." and
":ehp1."
('highlighted phrase, style 1', and
'end highlighted
phrase, style 1') or "&lt;emph>"
"&lt;emph/>".
The GMLs may also distinguish such
phrases from the titles of works, which by convention are also
italicized in print, but which are logically distinct in kind and
may be represented by the tags
"&lt;ttl>" ... "&lt;/ttl>"
or ":cit." ... ":ecit."
</p></note>
although in IBM mainframe sites one may find an informal standard
centered around the specific tag sets provided by IBM's Document
Content Facility GML and the similarly conceived Waterloo GML.</p>
<p>The generality of the SGML abstract syntax and the flexibility of
the scheme by which tag sets may be constructed and extended
recommend SGML as a potential medium for devising a common encoding
scheme for texts used in humanities research and teaching.  It is
not wholly clear whether all levels of a text are equally well
served with an SGML-based syntax:  word-by-word tagging (e.g. of
sentence functions, word classes, and other parsing information), at
least, might be better achieved with a different scheme.  SGML
remains, however, a very promising development and one that should
be carefully considered in developing guidelines
for research-oriented text encoding.</p>
<p>The specific implementation of SGML by the Association of
American Publishers merits, <foreign>a fortiori</foreign>, even greater
scrutiny than SGML in the abstract.
Efforts are underway to give this tag set the status of an
American national standard, which would make it even more
attractive.
The AAP tag set does solve many of the problems facing anyone encoding
texts for literary, historical, or linguistic analysis.
It allows, for example, for a very fine-grained delineation of the
logical structure of a document.
It also provides detailed
rules for representing characters not present in the coded character
sets (ASCII and EBCDIC) of the data-processing industry, a clear and
comprehensible syntax, and thoroughly worked-out methods of
representing tabular, columnar, or other difficult layouts and even
mathematical and chemical formulas in a linear form.
The format is thus extremely promising.</p>
<p>But although well conceived and well executed, the AAP tag set
does not in its present form
supply the broad range of tagging types needed to handle
the information humanists need to encode with their texts.
In particular, the publishers sought for obvious reasons
to retain the greatest possible freedom for textual revision
and to reserve as much freedom as possible to the book designer
(not the author) in specifying the physical layout of the text.
Thus the AAP tag set, like most formatting languages, provides for
automatic numeration of chapters, pages, lists of numbered points, etc.
Unlike most other formatting languages, it also
severely restricts the degree to which the physical
structure (type face, title formats, page breaks, etc.) in a text
can be specified in tags.
Such decisions are effectively reserved for the book designer,
who specifies in a "style sheet"
how the various logical components
of a manuscript are to be realized in layout and type styles.
These decisions make perfect sense for
the preparation of manuscripts for book or electronic publication,
which is the AAP's primary area of interest.
In texts prepared for research, however, the
numeration of pages, chapters, and numbered lists of points is given
explicitly in the source, and ought to be indicated explicitly in
the encoding.  Errors in transcription will be easier to find, and
location information will be more useful, if it is.  Similarly,
description of the physical layout of the text, because it is less
interpretive and thus less controversial, may be preferable,
or at least easier, for research purposes
than the interpretation of the layout in
terms of a logical hierarchy of structural elements in the text.
And for many types of study (analytic bibliography for the modern
period, codicology, papyrology, numismatics and epigraphy for the
older periods, and textual criticism for all periods) the
description of the physical realization of the text is at least as
important as the logical structuring of the text.</p>
<p>The promise of the AAP tag set for textual analysts, therefore,
lies in its potential as the basis around which a solution might be
constructed, rather than in its providing a solution ready-made.
At the very least, the AAP tag set will require extension for enumerated
textual components and for the physical description of the copy
text, in order to suffice for some scholarly purposes.
In order to provide an encoding scheme broad and general enough
to provide tagging across the full range of interests and concerns
for humanists, even broader extensions are necessary.
Such extensions would provide tags for items such as the following:
syntactic, stylistic, metrical, morphological, and semantic features
at the level of syllable, word or phrase;
structural parts of a text such as chapter or act and scene;
discontinuous and recurrent features such as speaker, stage
direction, direct or indirect quotation; and
spans of text that do not begin with a whole word, such as lacunae or
words on the recto and verso of a page
or the lineation of a printed text.
The marking of specific <emph>points</emph> in a text,
as well as of <emph>passages</emph>, which are marked
for beginning and end, and the marking of arbitrary spans of text,
must also be made easy.
Since research by its nature is often concerned with features of texts
not studied before by others (and thus not foreseen in any scheme for
text encoding),
the text-encoding guidelines should also
specify means to include user-definable extensions
that will be understandable and usable by others.
Implementing tags such as these requires consideration of a variety
of issues related to the humanistic analysis of texts
that have not been addressed in SGML or the AAP scheme
simply because their concerns are restricted.</p>
<p>We conclude, therefore, that SGML and the AAP's implementation of
SGML, while likely to provide a usable base for a common set of
text-encoding guidelines, do not as presently constituted solve the
problems of text encoding for historical, literary, or linguistic
research.
Because SGML appears to be increasingly accepted as a framework for
representing texts in machine-readable form, and the AAP tag set
seems likely to increase rapidly in importance and utility,
it would be sensible to try to make any set of
text-encoding guidelines compatible with the AAP tag set, or at least
with the SGML syntax.
We propose to make a concerted effort to
make our text-encoding guidelines, where possible, compatible with
appropriate existing schemes, of which SGML and AAP appear at present
to be the most promising.
The AAP has agreed to attend the planning conference and
participate in the drafting, validation, and approval of
the proposed text-encoding guidelines.</p></div3></div2>
<div2><head>Phases of a Project to Develop a Text Encoding Standard</head>
<p>In order to achieve the advantages mentioned above,
guidelines for text encoding practices
must address the concerns of as many
interests as possible:
ease of migration from existing
encoding schemes,
compatibility with encoding schemes intended for similar uses,
simplicity and ease of application,
adequacy for the most commonly required text-analytic procedures,
and ability to encode the information needed for
less common but no less important
specialized research applications.
Excessive haste in formulating the guidelines
will surely impair our ability to achieve these goals.
We propose, therefore, a five-phase
process for formulating a set of guidelines,
with each phase issuing in written specifications and
drafts which will be refined and expanded
in the later phases.
This proposal requests funding for the first phase;
the later phases will be described in more detail
in other funding proposals.</p>
<p>In the long run, the guidelines developed in this project
should encompass the encoding of any sort of textual material,
in any language, for virtually any scholarly purpose.
For pragmatic reasons, we expect to limit the scope of the
immediate effort in several ways.
First, the guidelines will deal only with the problems of encoding
texts for research or teaching purposes:  preparation of
'electronic manuscripts' for electronic or paper publication
is beyond their scope.  (As noted above, compatibility
with industry standards for manuscript publication is a
desideratum, but only one of several.)
The guidelines will emphatically not be concerned with
physical storage methods for files in any medium.
Further, the immediate effort will be concerned primarily,
perhaps exclusively, with the encoding of texts in Latin-based
alphabets.
The problems of encoding texts in non-Latin alphabets
are not qualitatively different from those of Latinate scripts,
however, so most of the underlying principles of any encoding scheme
would be transferable.
Explicit, wholly adequate guidelines for handling Cyrillic,
Hebrew, Arabic, and other non-Latin alphabets, however,
will probably await later extensions of the guidelines,
as will any attempt to deal adequately with non-alphabetic
scripts.
Finally, the needs of specialized disciplines (e.g. numismatics,
epigraphy, paleography, and possibly some of the more arcane
philological sub-disciplines) will be reserved for later
investigation by special groups and handled in extensions to
the guidelines.
For the first version of the guidelines, we anticipate
focusing on the needs of literary and linguistic research as
commonly practiced, and those of text-oriented historical research
(including documentary editing) which are similar in nature.
In setting the basic design of the guidelines,
every effort will be made to ensure that extensions
to texts and textual research of other types
will be feasible and straightforward.</p>
<p>The five phases foreseen for the project are:
<list type="ordered">
<item><label>Planning and High-Level Design,</label>
to be performed by
an international planning conference of text-archive directors,
the formulators of existing schemes,
and other parties that represent interests and
perspectives that must be taken into account in the formulation of
the guidelines.
Also during this phase, the ACH and other cooperating groups
will make arrangements for the working party needed in the next phase.</item>
<item><label>Detailed Design and Drafting,</label>
to be performed by a smaller working party
organized by cooperating groups during and after
the planning conference.
In this phase the actual guidelines will be developed.</item>
<item><label>Revision,</label>
to be performed by the smaller working party
on the basis of comments from members of the planning conference
as well as any and all interested
parties.</item>
<item><label>Review and Approval,</label>
to be performed by representatives of
interested organizations.</item>
<item><label>Publication and Maintenance,</label>
to be performed by
the Association for Computers in the Humanities.
This phase includes the development of eventual extensions
and revisions of the guidelines.</item>
</list></p>
<div3><head>Planning and High-Level Design Phase:  International Planning Conference</head>
<p>In this phase we propose to convene
an international planning conference of
European and North American
text-archive directors
and other interested parties,
to discuss text encoding issues and
suggest the general structure and approach
of common guidelines for text encoding practices.</p>
<p>Also during this phase, the ACH will formally invite a number of
organizations to cooperate in establishing
text-encoding guidelines (specifically to participate in phases
2 through 5 by helping
draft, revise, approve and support the guidelines)
and the cooperating groups
will together appoint a draft committee or working party
to be responsible for the second and third phases of the work.
<note place="foot"><p>Details of the cooperative agreement remain to be worked out
between ACH and the other interested groups,
but are expected to include common representation on the working
party and arrangements for final validation and approval of the
guidelines which may supersede those sketched out later in this document.
</p></note>
Most of these organizations will be represented informally
at the planning conference.
No formal representation of all groups is planned; however, if
an organization not already represented at the planning conference
feels the need of such representation,
then a representative to the planning conference may be
appointed (funds permitting).</p>
<p>The planning conference
is described in more detail in the next section.
Its task is to clarify fundamental issues and provide guidance
for the later phases of the process.
Actually drafting the guidelines is not feasible in so short a time, as
the actual text of any set of guidelines
will have to include substantial amounts of technical information
and documentation on existing practices,
so that writing a draft will require
substantial time from a number of individuals,
and correspondence with a large number of individuals
in a position to provide relevant technical information.
The planning conference, accordingly, will not attempt to
specify all the details of the guidelines, but only to
discuss advantages and problems of existing practices,
clarify the basic issues,
settle the fundamental policy questions
of scope, structure, and general approach of the guidelines,
leaving details and technical points
to be worked out by a smaller technical committee
or 'working party' organized for that task.</p></div3>
<div3><head>Detailed Design and Drafting Phase:  Working Group</head>
<p>In this phase
a working party
appointed by the cooperating groups
will first publish the architecture or general plan developed at the
planning conference, accept comments from interested parties,
and revise the formal specification of the architecture.</p>
<p>Within the architecture or general shape of the
text encoding guidelines specified by the planning
conference (as revised),
the smaller committee will
work out the details of the guidelines
and formulate a full draft of the guidelines on paper.
While the workings of this committee are best
left for themselves to arrange,
it seems clear that much of the work will have to
be done by mail or by electronic network
correspondence.
At least one and probably two face-to-face working meetings
will be necessary to allow for
intensive discussion of technical issues.</p></div3>
<div3><head>Revision Phase:  Publication, Comments and Revision Cycles</head>
<p>This phase comprises one or more
cycles of publication, comment, and revision
in which the draft guidelines are circulated
to interested parties and the public for comments and suggestions.
The working party will be responsible
for considering the suggestions and comments
and incorporating them, or not, into the
draft guidelines as is appropriate.</p></div3>
<div3><head>Review and Approval Phase:  Review Conference</head>
<p>The revised draft prepared by the working committee
will be submitted, after public comment and revision,
to the appropriate learned societies and professional
bodies for their formal approval.
The exact form this approval will take is a matter for discussion
within the societies, but it is our view that
an appropriate mechanism to ensure the widest discussion
and hence widest acceptance of the draft proposed standard
would be to convene
a second conference
of representatives from all of these groups
(and perhaps other interested bodies such as the American National
Standards Institute) at a later stage, at which properly constituted
representatives could vote on any controversial elements in the
proposed guidelines.
Funding for this second conference is not however sought at this point.</p></div3>
<div3><head>Publication and Maintenance Phase</head>
<p>After the draft is completed and approved,
the Association for Computers in the Humanities
will publish it and undertake to provide
continuing maintenance and support for it,
including making arrangements for periodic review and revision
as needed.
While we do not anticipate constant change,
which would undermine the advantages of the guidelines,
further experience with the
guidelines could well lead to suggestions for modifications
and improvements.
The advancing sophistication both of our computer systems
and of the textual research performed with their aid,
moreover, will encourage regular extensions of
the guidelines to cover new areas of research.
As the initiator of this effort
the Association for Computers in the Humanities
undertakes to maintain a technical committee
to receive correspondence on the guidelines,
assist interested parties with implementation
and interpretation,
and cooperate with other groups on
future revisions and extensions.</p></div3></div2>
<div2><head>Tasks and Functions of the Planning Conference</head>
<div3><head>Tasks the Conference Should Accomplish</head>
<p>The conference must set the general framework for
further work on the text encoding guidelines.
The conference, therefore, must specify
the essential structure of the final guidelines
clearly and correctly enough
to provide guidance to the working group,
without being prematurely specific in technical details.
Questions the conference must address include:
<list type="ordered">
<item>Fundamental questions:
<list type="ordered"><item>Under what conditions is a standard encoding scheme in fact
feasible and desirable?</item>
<item>Is there a set of basic principles in terms of which all or most
existing encoding schemes can be described?</item>
<item>If so, what are those principles?</item>
<item>Which of those principles should be incorporated into the new
guidelines?</item>
<item>With what existing encoding schemes (e.g., SGML and the AAP scheme)
should the new guidelines
(ideally) be compatible?</item>
</list></item>
<item>Scope of the guidelines:
<list type="ordered">
<item>What types of textual material should encoding guidelines cover?
Continuous texts only, or also structured texts like dictionaries,
glossaries, lexica, word lists, word frequency lists, parsing rules,
corpora of random or non-random samples, commentaries on other texts,
etc.?</item>
<item>What types of analysis should an encoding scheme attempt to support?
Stylistics, textual criticism, analytic bibliography, computational
linguistics, thematic studies, metrics, authorship identification
studies, commentary?</item>
<item>What other activities (data exchange, archival storage, ...)
should be addressed by the guidelines?</item>
</list></item>
<item>Structure of the Guidelines:
<list type="ordered">
<item>What form should the guidelines take?  Specifically,
what should the table of contents of the guidelines look like?</item>
<item>How specific should the guidelines be?
Should they specify or suggest specific tags with which to encode
texts, or only a structural plan for the tags, with the specific
lists of tags to be determined by those who encode the text?
(That is, should the guidelines specify only <emph>how</emph>
texts should be encoded, or <emph>what</emph> features of the text
ought ideally to be encoded?)</item>
<item>If specific tags are specified, should the list be exhaustive
or provide only a core of listed tags,
with provisions for user-defined special-purpose extensions?</item>
</list></item>
<item>Possible Content of the Guidelines:
<list type="ordered">
<item>How extensive should the list (if any) of required or recommended
tags be, and what types of information should they convey?</item>
<item>What information (if any) about
physical layout and appearance of the text (manuscript,
edition) should be specified?</item>
<item>What information (if any) about linguistic units (parsing,
meaning, dictionary forms of words, etc.) should be tagged?</item>
<item>What interpretive information (stylistic, metrical,
rhetorical, narratological, etc.) should be tagged?</item>
<item>Should the guidelines attempt to specify explicitly
what binary codes are to be used for tags,
require the user to do so,
or specify a normal practice and
allow deviations in individual cases?</item>
<item>Should the guidelines include recommendations on how tags
should be encoded, and how they should be re-defined?</item>
<item>If specific binary codes are to be suggested or required,
how should they be chosen?</item>
</list></item>
<item>Organizational Issues:
<list type="ordered">
<item>Who should be responsible for producing a draft of
the guidelines?</item>
<item>Once the working group has produced a draft,
how should it be criticized, revised, and accepted?
(Note:  one possible outline of the further work
is described elsewhere in this proposal,
but details of the further proceeding are subject
to change.)</item>
<item>Who should be responsible for accepting
the draft and encouraging its use?</item>
<item>Who should be responsible for publishing,
maintaining, and (as need arises) revising the guidelines?</item>
</list></item>
</list></p>
<p>Both in order to illuminate the questions just mentioned
and in order to clarify current practice,
representatives to the planning conference will be asked
to describe any encoding schemes they have devised or use,
indicating both how tags are indicated in the text and
specifically what features of the text are included in the tags.
Descriptions will be requested in writing and distributed
to all participants beforehand.
One immediate product of the conference, therefore, will be a broad
account of the various methods in use now for encoding texts for
humanistic research.
This account may be integrated with the eventual guidelines or
published separately.</p></div3>
<div3><head>Tasks the Conference Should Not Undertake</head>
<p>The first working conference
should not attempt to decide all the questions of
text encoding themselves;
the details of the draft guidelines should be left
to the working party.
Among the issues we specifically recommend the conference
leave undecided:
<list type="ordered">
<item>What specific tags should be included,
and at what levels (required, recommended, optional)
of the scheme.</item>
<item>What binary codes should be used in the guidelines.</item>
<item>If the working party is instructed to strive for compatibility
with certain existing standards, what deviations from those
standards might be necessary or allowable.
(This determination is best left to the working party itself.)</item>
</list>
</p></div3></div2></div1>
<?WS .pf?>

<!--* <include file=nh22who> *-->
<!--* .* NB This text uses fontset APPX after the heading. *-->
<div1 id="who">
<head>Participants in the ACH Meeting on Text Encoding Practices</head>
<head>(Vassar Planning Conference)</head>
<?WS .bf appx?>
<note place="block"><emph>NOTE: Network addresses are from Bitnet</emph></note>
<list type="bullets">
<item>Helen Aguera,
National Endowment for the Humanities,
1100 Pennsylvania Avenue NW,
Washington, D.C. 20506, USA</item>
<item>
Robert A. Amsler,
Bell Communications Research,
435 South Street, MRE 2C396,
Morristown, NJ 07960-1961, USA,
AMSLER@FLASH.BELLCORE.COM</item>
<item>
David T. Barnard,
Head, Department of Computing and Information Science,
Queen's University,
Kingston, Ontario,
Canada K7L 3N6,
BARNARD@QUCIS</item>
<item>
Lou Burnard,
Oxford University Computing Service,
13 Banbury Road,
Oxford OX2 6NN,
England,
LOU@VAX.OXFORD.AC.UK</item>
<item>
Roy Byrd,
IBM Research, H1-C14,
P.O. Box 704,
Yorktown Heights, New York 10598, USA,
BYRD@YKTVMH</item>
<item>
Nicoletta Calzolari,
Istituto di Linguistica Computazionale C.N.R.,
Via Della Faggiola 32,
I-56100 Pisa, Italy,
GLOTTOLO@ICNUCEVM</item>
<item>
David Chesnutt,
Department of History,
University of South Carolina,
Columbia,  SC  29208, USA,
N330004@UNIVSCVM
(Association for Documentary Editing, American Historical Association)</item>
<item>
Yaacov Choueka,
Department of Mathematics and Computer Science,
Bar-Ilan University, Ramat-Gan, Israel, 52100,
CHOUEKA@BIMACS</item>
<item>
Jacques Dendien,
Institut National de la Langue Francaise,
Chateau du Montet,
rue du Doyen Roubault,
F-54500 Vandoeuvre,
France,
DENDIEN@FRCIIL71</item>
<item>
Paul A. Fortier,
Dept of Romance Languages,
University of Manitoba,
Winnipeg, Manitoba,
Canada R3T 2N2,
FORTIER@UOFMCC</item>
<item>
Thomas Hickey, Consulting Research Scientist,
Office of Research,
OCLC Online Computer Library Center,
6565 Frantz Road,
Dublin, Ohio 43017, USA,
TH@OCLCRSUN</item>
<item>
Susan Hockey,
Oxford University Computing Service,
13 Banbury Road,
Oxford OX2 6NN,
England,
SUSAN@OX.VAX.AC.UK
(Association for Literary and Linguistic Computing)</item>
<item>
Nancy M. Ide,
Department of Computer Science,
Box 520 Vassar College,
Poughkeepsie, New York 12601, USA,
IDE@VASSAR
(Association for Computers and the Humanities)</item>
<item>
Stig Johansson,
University of Oslo English Department,
P.O. Box 1003/Blindern,
Oslo 3, Norway,
H_JOHANSSON%USE.UIO.UNINETT@CERNVAX</item>
<item>
Randall Jones,
Humanities Research Computing Center,
Brigham Young University,
Provo, UT  84602, USA,
JONES@BYUADMIN
(Modern Language Association)</item>
<item>
Robert Kraft,
Department of Religion,
University of Pennsylvania,
Philadelphia, Pennsylvania 19104, USA,
KRAFT@PENNDRLN</item>
<item>
Ian Lancashire,
Center for Computing in the Humanities,
Robarts Library, 14th Floor,
130 St. George Street,
University of Toronto,
Toronto, Ontario,
Canada,
IAN@UTOREPAS</item>
<item>
D. Terence Langendoen,
Graduate Center,
City University of New York,
33 West 42 Street,
New York, New York 10036, USA,
TERGC@CUNYVM
(Linguistic Society of America)</item>
<item>
Charles (Jack) Meyers,
National Endowment for the Humanities,
1100 Pennsylvania Avenue NW,
Washington, D.C. 20506, USA</item>
<item>
Junichi Nakamura,
Kyoto University,
Japan,
NAKAMURA%NAGAO4.KUEE.KYOTO-U.JUNET%UTOKYO-RELAY.CSNET@RELAY.CS.NET</item>
<item>
Wilhelm Ott,
Universit&auml;t T&uuml;bingen,
ZDV Brunnenstrasse, 27,
D-7400 T&uuml;bingen,
West Germany,
ZRSZOT1@DTUZDV2</item>
<item>
Eugenio Picchi,
Istituto di Linguistica Computazionale C.N.R.,
Via Della Faggiola 32,
I-56100 Pisa, Italy,
GLOTTOLO@ICNUCEVM</item>
<item>
Carol Risher,
Association of American Publishers, Inc.,
2005 Massachusetts Avenue W.,
Washington, D.C. 20036, USA
(Association of American Publishers)</item>
<item>
Jane Rosenberg,
National Endowment for the Humanities,
1100 Pennsylvania Avenue NW,
Washington, D.C. 20506, USA</item>
<item>
Jean Schumacher,
CETEDOC,
Faculte de Philosophie et Lettres,
College Erasme,
Place Blaise Pascal 1,
B-1348 Louvain-la-Neuve,
Belgium</item>
<item>
J. Penny Small,
392 Central Park West,
Apartment 4A,
New York, New York 10025, USA
(American Philological Association)</item>
<item>
C.M. Sperberg-McQueen,
Computer Center (M/C 135),
University of Illinois at Chicago,
Box 6998,
Chicago, Illinois 60680, USA,
U18189@UICVM</item>
<item>
Paul Tombeur, Director,
CETEDOC,
Faculte de Philosophie et Lettres,
College Erasme,
Place Blaise Pascal 1,
B-1348 Louvain-la-Neuve,
Belgium,
THOMDOC@BUCLLN11</item>
<item>
Frank Tompa (New OED Project, Waterloo),
Bell Communications Research,
435 South Street, MRE 2A339,
Morristown, NJ 07960-1961, USA,
TOMPA@FLASH.BELLCORE.COM</item>
<item>
Donald E. Walker,
Bell Communications Research,
445 South Street, MRE 2A379,
Morristown, New Jersey 07960-1961, USA,
WALKER@FLASH.BELLCORE.COM
(Association for Computational Linguistics)</item>
<item>
Antonio Zampolli,
Istituto di Linguistica Computazionale C.N.R.,
Via Della Faggiola 32,
I-56100 Pisa, Italy,
GLOTTOLO@ICNUCEVM</item>
</list>
</div1>
<?WS .pf?>
<!--* 
.* These are set at normal text size
<include file=nh22recs>
*-->
<div1><head>Closing Statement of the Vassar Planning Conference</head>
<head>The Preparation of Text Encoding Guidelines</head>
<opener rend="rightalign">
<dateline><name type="place">Poughkeepsie, New York</name>
<date>13 November 1987</date></dateline>
</opener>
 
<list type="ordered">
<item>The guidelines are intended to provide a standard format for data
interchange in humanities research.</item>
<item>The guidelines are also intended to suggest principles for the
encoding of texts in the same format.</item>
<item>The guidelines should
<list type="ordered">
<item>define a recommended syntax for the format,</item>
<item>define a metalanguage for the description of text-encoding schemes,</item>
<item>describe the new format and representative existing schemes both
in that metalanguage and in prose.</item>
</list></item>
<item>The guidelines should propose sets of coding conventions suited for
various applications.</item>
<item>The guidelines should include a minimal set of conventions for
encoding new texts in the format.</item>
<item>The guidelines are to be drafted by committees on
<list type="ordered">
<item>text documentation</item>
<item>text representation</item>
<item>text interpretation and analysis</item>
<item>metalanguage definition and description of existing and proposed
schemes,</item>
</list>
coordinated by a steering committee of representatives of the principal
sponsoring organizations.</item>
<item>Compatibility with existing standards will be maintained as far as
possible.</item>
<item>A number of large text archives have agreed in principle to support
the guidelines in their function as an interchange format.  We encourage
funding agencies to support development of tools to facilitate this
interchange.</item>
<item>Conversion of existing machine-readable texts to the new format
involves the translation of their conventions into the syntax of the new
format.  No requirements will be made for the addition of information not
already coded in the texts.</item>
</list></div1>

<div1 id="memo"><head>Draft Memoranda of Understanding</head>
<!--* <include file=nh22spon> *-->
<div2><head>Memorandum of Understanding Among the Sponsoring Organizations</head>
 
<p>An agreement entered into by the undersigned, on the dates indicated,
on behalf of the organizations named, to cooperate in developing,
formulating, and disseminating guidelines for the encoding of texts in
machine-readable form for research or teaching and a common interchange
format for the exchange of literary and linguistic data.</p>

<div3><head>Purpose</head>
<p>The organizations entering into this agreement ('sponsoring
organizations') undertake to sponsor, as described herein, an initiative
to draft, publish, and support guidelines for the encoding of texts in
machine-readable form for research or teaching and a common interchange
format for the exchange of literary and linguistic data.
</p>
</div3>
<div3>
<head>Basic Principles</head>
<p>The basic principles governing the initiative and the resulting
format shall be those enunciated in the closing statement of the
conference sponsored by ACH at Vassar College in Poughkeepsie, New York,
on November 12 and 13, 1987 (appended).
</p>
</div3>
<div3>
<head>Organization of the Initiative</head>
<p>
The initiative shall be guided by a steering committee comprising two
voting representatives of each sponsoring organization.  The steering
committee shall appoint one or more editors, as they shall choose, to
coordinate the actual drafting of the guidelines and interchange format.
The steering committee shall also constitute working committees from the
literary and linguistic computing community for the actual drafting and
development work.
 
The function of the steering committee will be to appoint the editor(s),
to monitor the work of editor(s) and drafting committees, to consider
and settle policy questions with the editors, to present the text to
the advisory board of participating organizations for approval, and
to authorize the publication of the final text.
</p></div3>
<div3>
<head>Commitments of the Sponsoring Organizations</head>
<p>The sponsoring organizations undertake:
<list type="ordered">
<item>to appoint two members to the steering committee of the
initiative.</item>
<item>to encourage their members to contribute their technical
expertise to the success of the initiative by serving on the drafting
committees.</item>
<item>to publicize the initiative in their newsletters and other
organs of communication.</item>
<item>to circulate drafts and partial drafts to their members for
comment.</item>
<item>to encourage publication of working papers of the initiative
in their journals (if appropriate).</item>
<item>to encourage the discussion of text encoding problems at
meetings organized or sponsored by the organization (e.g. in special
sessions).</item>
<item>to contribute to the initiative such administrative services
as are already routinely performed by the association (e.g. provision
of mailing lists or mailing labels).</item>
<item>to endorse the use of the guidelines and interchange format
for the encoding of texts and the exchange of already encoded texts.</item>
<item>to publish, in concert with the other sponsoring organizations,
the final form of the guidelines and interchange format, and encourage
their dissemination among the interested community of teachers and
scholars.</item>
<item>to establish and maintain some mechanism (e.g. a standing
committee or bureau) for the continuing development of the guidelines
and interchange format in cooperation with the other sponsoring
organizations after completion of the first version.  These mechanisms
shall be responsible for accepting comments and suggestions from the
membership of the sponsoring organizations, monitoring the success of
the recommendations, and considering revisions and extensions in the
light of experience.</item>
</list>
</p>
</div3>
<div3>
<p> 
Signed:
</p>
<p rend="trailvspace">
For the Association for Computers and the Humanities:
</p>

<p rend="trailvspace">
For the Association for Computational Linguistics:
</p>

<p rend="trailvspace">
For the Association for Literary and Linguistic Computing:
</p>
</div3>
</div2>

<!--* <include file=nh22porg> *-->
<div2><head>Memorandum of Understanding with Participating Organizations</head>
 
<p>An agreement to define the terms of participation in the cooperative
initiative for text encoding.</p>

<div3><head>Purpose</head>
<p>The organizations entering into this agreement ('participating
organizations') will participate (as described below) in an initiative
to draft, publish, and support guidelines for the encoding of texts in
machine-readable form for research and teaching and a common interchange
format for the exchange of literary and linguistic data.</p></div3>
<div3><head>Basic Principles</head>
<p>The basic principles governing the initiative and the resulting
format shall be those enunciated in the closing statement of the
conference sponsored by ACH at Vassar College in Poughkeepsie, New York,
on November 12 and 13, 1987 (appended).</p></div3>
<div3><head>Organization of the Initiative</head>
<p>The initiative shall be guided by a steering committee comprised of
two voting representatives from each of three primary <term>sponsoring
organizations</term>:  the Association for Computers and the Humanities,
the Association for Computational Linguistics, and the Association for
Literary and Linguistic Computing.  The steering committee shall appoint
one or more editors to coordinate the actual drafting of the guidelines
and interchange format.  The steering committee shall also constitute
working committees from the literary and linguistic computing community
for the actual drafting and development work.</p></div3>
<div3><head>Advisory Board</head>
<p>Each participating organization will name one representative to an
advisory board.  The advisory board will provide liaison between the
steering committee and editor(s) and the membership of the participating
organizations, to ensure:
<list type="ordered">
<item>that the needs of the membership for encoding literary and
linguistic data are adequately addressed in the guidelines and
interchange format;</item>
<item>that similar or related efforts within specific areas of the
literary and linguistic computing community are made aware of and, where
appropriate, coordinated with the work within this initiative; and</item>
<item>that the membership of participating organizations is made and
kept aware of the work of this initiative and provided the opportunity
to participate in it.</item>
</list></p></div3>
<div3><head>Commitments of the Participating Organizations</head>
<p>The participating organizations undertake:
<list type="ordered">
<item>to appoint one member to the initiative's advisory board.</item>
<item>to encourage their members to contribute their technical
expertise to the success of the initiative by serving on the drafting
committees.</item>
<item>to publicize the initiative in their newsletters and other
organs of communication.</item>
<item>to circulate drafts and partial drafts to their members for
comment.</item>
</list></p></div3>
<div3><head>Publication</head>
<p>The names of the participating organizations will be listed in the
front matter of the final document describing the guidelines and
interchange format, under the heading "Advisory Board of Participating
Organizations."
</p>
</div3>
<div3>
<p>Signed:</p>
</div3>
</div2>
</div1>
<!--* <include file=nh22orgs> *-->
<div1 id="orgs"><head>Organizations Invited to Participate</head>
<list type="ordered">
<item>the American Historical Association (AHA)</item>
<item>the American Philological Association (APA)</item>
<item>the Association for Computing Machinery, Special Interest
Group for Information Retrieval (ACM/SIGIR)</item>
<item>the Association for Documentary Editing (ADE)</item>
<item>the Association for History and Computing (AHC)</item>
<item>the Association of American Publishers (AAP)</item>
<item>the Dictionary Society of North America (DSNA)</item>
<item>the European Association for Lexicography (Euralex)</item>
<item>the International Linguistic Association (AILA)</item>
<item>the Joint Steering Committee for the Revision of the Anglo-American
Cataloguing Rules (or some other representative of the library
community)</item>
<item>the Linguistic Society of America (LSA)</item>
<item>the Modern Language Association of America (MLA)</item>
<item>the Societas Linguistica Europaea (SLE)</item>
</list>

<!--* .* The vitas are set in fontset VITA, to defeat bold DT tag *--></div1>
<div1 id="vita"><head>Vitas of the Steering Committee</head>
<?WS .bf vita?>

<!--* <include file=amslvita> *-->
<div2><head>Robert A. Amsler</head>
<p><address>
<addrLine>Artificial Intelligence and Information Science Research Group</addrLine>
<addrLine>Bell Communications Research</addrLine>
<addrLine>435 South Street</addrLine>
<addrLine>Morristown, NJ 07960</addrLine>
<addrLine>(201) 829-4278</addrLine>
</address>
</p>

<div3><head>Education</head>
<list rend="overhang">
<item>Ph.D., Computer Sciences/Information Science/Ethnosemantics,
University of Texas, Austin, 1980.</item>
<item>M.S., Computer Sciences/Mathematics, Courant Institute of
Mathematical Sciences, New York University, 1969.</item>
<item>B.A., with honors, Mathematics, Florida Atlantic
University, Boca Raton, Florida, 1967.
</item></list></div3>

<div3><head>Current Employment</head>
<list rend="overhang">
<item>1984-Present:  Research Computer Scientist, Artificial
Intelligence and Information Science Research Group,
Bell Communications Research.</item>
</list>
</div3>
<div3><head>Previous Professional Experience</head>
<list rend="overhang">
<item>1983:  Computer Scientist, Natural-Language and Knowledge-Resource
Systems Group, Advanced Computer Systems Department, SRI International.</item>
<item>1981-1982:  Computer Scientist, Artificial Intelligence Center
SRI International.</item>
<item>1976-1980:  Computer Programmer, Linguistics Research
Center.</item>
<item>summer/fall, 1976; summer 1974 - fall 1975:  Computer Programmer,
Computation Center, University of Texas at Austin.</item>
<item>summer, 1975:  Research Assistant,
University of Southern
California, Information Sciences Institute, Marina del Rey,
California.</item>
<item>1971-1974:  Research Assistant,
Computer Sciences Dept., University of Texas at Austin.</item>
<item>1969:  Mathematician,
Central Intelligence Agency, Scientific
Applications Division, Washington, D.C.
</item>
</list>
</div3>
<div3><head>Professional Societies</head>
<p> 
Association for Computational Linguistics, American Association
for Artificial Intelligence, Association for Computing Machinery,
Institute of Electrical and Electronics Engineers, American Society
for the Advancement of Science.</p></div3>
 
<div3><head>Grants</head>
<list rend="overhang">
<item>February, 1979:  As
a graduate student (and programming manager of the
Linguistics Research Center) at University of Texas at Austin, wrote
proposal and managed NSF Grant MCS77-01315 <title>Development of a
Computational Methodology for Deriving Natural Language Semantic
Structures from Machine-Readable Dictionaries</title> awarded by proxy to
Winfred P. Lehmann and Robert F. Simmons (graduate students ruled
ineligible to submit grant proposals by University of Texas).</item>
<item>1982:  NSF
<q>New Investigator's Grant</q> awarded to me while at SRI
International's AI Center.
</item>
</list></div3>
 
<div3><head>Professional Publications</head>
<listBibl>
<bibl>
<title level="u">Semantic Space: A Computer Technique for Modeling Connotative
Meaning</title>. M.S. degree, N.Y.U.  February, 1969.</bibl>
<bibl>
<title level="a">Modeling dictionary data,</title> (with Robert F. Simmons) in
<title>Directions in Artificial Intelligence, Natural Language
Processing</title>,
(Ed. by Ralph Grishman) Computer Science Report 7,
Courant Institute of Mathematical Sciences,
C.S. Dept., New York University. August, 1975. Pp. 1-26.</bibl>
<bibl>
<title>The Structure of The Merriam-Webster Pocket Dictionary</title>.
Ph.D. dissertation. TR-164. C.S. Dept., University of Texas, Austin,
December, 1980.</bibl>
<bibl>
<title level="a">Inference Nets for Modeling Geoscience Reference Knowledge,</title> (with
Julie H. Bichteler and Jonathan Slocum). <title>Proceedings of the 43rd
ASIS Annual Meeting: Communicating Information</title>,
Anaheim, Calif. Oct. 5-10, 1980. Washington: American
Society for Information Science, 1980.</bibl>
<bibl>
<title>Report to the Faculty Computer Committee from the Textual
Applications Group (TAG).</title> (with D. Richardson) University of Texas
at Austin, September, 1980.</bibl>
<bibl>
<title level="a">A Taxonomy for English Nouns and Verbs,</title>
<title>Proceedings of the 19th Annual
Meeting of the Association for Computational Linguistics</title>,
Stanford, California. June 29-July 1, 1981. Menlo Park,
California: Association for Computational Linguistics, 1981. Pp. 133-138.</bibl>
<bibl>
<title level="a">Computational Lexicology: A Research Program,</title>
In <title>Proceedings of the 1982
National Computer Conference</title>,  Houston, Texas. June 7-10, 1982.
Arlington:AFIPS Press. Pp. 657-663.</bibl>
<bibl>
<title level="a">Natural Language Access to Structured Text</title> (with Jerry R.
Hobbs and Donald E. Walker). <title>COLING-82: Proceedings of the Ninth
International Conference on Computational Linguistics,</title>
Prague, July
5-10, 1982. (edited by Jan Horecky). Amsterdam: North-Holland, 1982, Pp.
127-132.</bibl>
<bibl>
<title level="a">Computer-Assisted Compilation of a Nahuatl Dictionary,</title> (with F.
Karttunen) in <title>Computers and the Humanities</title>. 1984.</bibl>
<bibl>
<title level="a">The Use of Machine-Readable Dictionaries in Sublanguage Analysis,</title>
(with D. E. Walker, describing the results of my 1982 NSF new
investigator's grant) in <title>Proceedings of NYU Workshop on Sublanguage
Description and Processing</title>, New York City, January 19-20, 1984.
(Also appeared as chapter in
<title>Analyzing
Language in Restricted Domains:  Sublanguage Description and
Processing,</title> edited by Grishman and Kittredge.)</bibl>
<bibl>
<title level="a">Machine-Readable Dictionaries,</title> chapter in <title>Annual Review of
Information Science and Technology</title>, Vol. 19, ed. by Martha
E. Williams.  White Plains: Knowledge Industries Publications, 1984.</bibl>
<bibl>
<title level="a">Deriving Lexical Knowledge from Existing Machine-Readable Information
Sources.</title>
Bellcore TM-ARH-008761 (02/18/87). Paper presented at the
EEC-sponsored workshop
<title>Automating the Lexicon: Research and Practice
in a Multilingual Environment,</title>
held 19-23 May, 1986 in Marina di
Grosseto, Italy.  It will be published in the proceedings of the
workshop (in progress (1988)).</bibl>
<bibl>
<title level="a">Words and Worlds.</title> Bellcore TM-ARH-009470 (06/09/87). Published in
the <title>Proceedings of the 3rd `Theoretical Issues in Natural
Language Processing'</title>
(TINLAP3) Workshop at New Mexico State University at Las Cruces, NM.
(Presentation available as videotape).</bibl>
<bibl>
<title level="a">How Do I Turn This Book On? - Preparing Text for Access as a
Computational Medium.</title>
Bellcore TM-ARH-010121 (09/10/87). Published in
<title>Proceedings of the 3rd Annual Conference
of the University of Waterloo's
Centre for the New Oxford English Dictionary
(Uses of Large Text Databases)</title>
held on the Univ.  of Waterloo campus in
Waterloo, Ontario, Canada on Nov.
9-10, 1987.
</bibl></listBibl>
</div3>
</div2>

<!--* <include file=hockvita> *-->
<div2><head>Susan Margaret Hockey</head>
<p><address>
<addrLine>Oxford University Computing Service,</addrLine>
<addrLine>13 Banbury Road,</addrLine>
<addrLine>Oxford,</addrLine>
<addrLine>OX2 6NN</addrLine>
<addrLine>England</addrLine>
</address>
<address>
<addrLine>Telephone:  Oxford (0865) 273226</addrLine>
<addrLine>BITNET:  SUSAN@VAX.OXFORD.AC.UK</addrLine>
<addrLine>ARPA etc:  SUSAN%VAX.OXFORD.AC.UK @ UK.AC.UCL.CS.NSS</addrLine>
</address>
</p>
 
<div3><head>Present Position</head>
<p> 
Teaching Officer for Computing in the Arts and Section
Manager, Computing in the Arts,
Oxford University Computing Service
</p>
</div3>
 
<div3><head>Education</head>
<p><table rend="datedlist">
<row><cell>1965-69</cell>
<cell>Lady Margaret Hall, Oxford (Mary Hammill Exhibitioner)</cell></row>
<row><cell>March 1967</cell>
<cell>Honour Moderations in Greek and Latin Literature:
Class II</cell></row>
<row><cell>June 1969</cell>
<cell>Final Honour Schools in Oriental Studies (Egyptian
with Akkadian): Class I
</cell></row></table></p>
</div3>
 
<div3><head>Positions Held</head>
<p><table rend="datedlist">
<row><cell>1969-75</cell>
<cell>Computer programmer at the Atlas Computer Laboratory, Chilton.
(The Atlas Laboratory at that time provided computer facilities for
university applications, where the application was not suitable for the
university's own computer.)</cell></row>
<row><cell>1975-</cell>
<cell>Teaching Officer for Computing in the Arts at Oxford
University Computing Service.
Over the past 12 years the post has developed
into Manager of the Computing in the Arts section with a staff of 10 (of
whom 5 are on permanent appointments and 5 fixed-term).</cell></row>
<row><cell>1979-</cell>
<cell>Fellow (by Special Election) at St Cross College,
Oxford.  Director of Computing in the College.</cell></row>
<row><cell>1984</cell>
<cell>Visiting Distinguished Professor at the University of Alberta,
Edmonton (January-February).
</cell></row></table>
</p></div3>
 
<div3><head>Professional Activities</head>
<p><table rend="datedlist">
<row><cell>1973-</cell><cell>Founder member of the Association for Literary and
Linguistic Computing and committee member for most of
the period since its foundation.</cell></row>
<row><cell>1978</cell>
<cell>Founder member of the Association for Computers and the Humanities</cell></row>
<row><cell>1979-83</cell>
<cell>Member of the Executive Council of the Association for Computers and
the Humanities</cell></row>
<row><cell>1979-83</cell>
<cell>Editor of the Bulletin of the Association for Literary and
Linguistic Computing.</cell></row>
<row><cell>1984-</cell>
<cell>Chairman of the ALLC.</cell></row>
<row><cell>1985-</cell>
<cell>Member of the Editorial Committee for Literary and Linguistic
Computing and of the program committee for the annual ALLC conferences.
</cell></row></table>
</p></div3>
 
 
<div3><head>Conference papers</head>
<p>
Numerous conference papers, invited lectures, and workshops in Great
Britain and abroad.
I have also been a consultant/academic visitor to the departement
d'Informatique at the Universit&eacute;
de Montr&eacute;al in 1979 and 1986.
 
<?WS .cc 7 ?></p></div3>
<div3><head>Books</head>
<listBibl>
<bibl>
<title>A Guide to Computer Applications in the Humanities,</title>
London:  Duckworth,
and Baltimore: Johns Hopkins, 1980; reprinted as Johns Hopkins paperback,
1984.</bibl>
<bibl>
<title>SNOBOL Programming for the Humanities,</title> Clarendon Press, 1986.</bibl>
</listBibl>
 
<?WS .cc 7 ?></div3>
<div3><head>Articles in Journals and Books</head>
<listBibl><bibl>
(with R.F. Churchhouse) <title level="a">The Use of an SC4020 for Output of a
Concordance Program</title>, p221-229 in <title>The Computer in Literary and
Linguistic Research</title>,
ed. R.A. Wisbey, Cambridge University Press, 1971.</bibl>
<bibl>
<title level="a">A Concordance to the Poems of Hafiz with Output in Persian
Characters</title>, p291-306 in <title>The Computer and Literary
Studies</title>, eds. A.J. Aitken, R.W.
Bailey and N. Hamilton-Smith, Edinburgh University Press, 1973.</bibl>
<bibl>
<title level="a">Input and Output of Non-standard Character Sets</title>, <title>ALLC
Bulletin</title>, 1, No
2 (1973), 32-37.</bibl>
<bibl>
(with V. Shibayev) <title level="a">The Bilingual Analytical and Linguistic
Concordance - BALCON</title>, <title>ALLC Bulletin</title>, 3 (1975) 133-139.</bibl>
<bibl>
(with Alan Jones and George Mandel) <title level="a">Indexing Hebrew Periodicals with
the Aid of the FAMULUS Documentation System</title>,
p38-46 in <title>The Computer in
Literary and Linguistic Studies (Proceedings of the Third International
Symposium</title>, eds. Alan Jones and R.F. Churchhouse, Cardiff:
University of Wales Press, 1976.</bibl>
<bibl>
(with Gordon Appleton) <title level="a">A Course on the Use of Computers in Textual
Analysis and Bibliography for the Board of Celtic Studies of the
University of Wales</title>,
<title>UMRCC Journal</title>, 5, no 2 (1978), 25-28.</bibl>
<bibl>
<title level="a">Colloquium on the Use of Computers in Textual Criticism; A Report</title>,
<title>ALLC Bulletin</title>, 6 (1978), 180-181.</bibl>
<bibl>
<title level="a">Computing in the Humanities</title>, <title>ICL Technical Journal</title>,
Vol 1 Issue 3 (1979), 280-291.</bibl>
<bibl>
(with Ian Marriott) <title level="a">The Oxford Concordance Project (OCP)</title>, series
of four articles in <title>ALLC Bulletin</title>, 7 (1979),
35-43, 155-164, 268-275 and 8
(1980), 28-35.</bibl>
<bibl>
(with Ian Marriott) <title level="a">OCP - The Oxford Concordance Project</title>,
p337-345 in <title>Proceedings of the International Conference on
Literary and Linguistic
Computing Israel</title>, ed. Zvi Malachi, Tel-Aviv, 1979.</bibl>
<bibl>
<title level="a">Report on ALLC Symposium at Easter 1980</title>, <title>ALLC
Bulletin</title>, 8 (1981),
268-270.</bibl>
<bibl>
<title level="a">The Oxford Courses for Literary and Linguistic Computing</title>,
p175-182 in <title>Computers in Literary and Linguistic Research:
Proceedings of the VII
ALLC International Symposium</title>, eds. L. Cignoni and C Peters, Pisa:
Giardini Editori e Stampatori, 1983.</bibl>
<bibl>
<title level="a">Report on the Twelfth ALLC International ALLC Conference - Nice 5-8
June 1985</title>,
<title>ALLC Bulletin</title>, 13 (1985), 77-79.</bibl>
<bibl>
<title level="a">Literature and the Computer at Oxford University</title>, p53-75 in
<title>Literary Criticism and the Computer</title>, eds Bernard
Derval and Michel Lenoble,
Montreal, 1985.</bibl>
<bibl>
<title level="a">Some Future Developments for Arts Computing</title>, <title>University
Computing</title>, 7 (1985), 33-37.</bibl>
<bibl>
<title level="a">Letter from Oxford</title>, p16-26 in <title>Today's Research: Tomorrow's
Teaching: Conference on Computing in the Humanities</title>,
conference preprints, ed. Ian
Lancashire, Toronto, April 1986.</bibl>
<bibl>
<title level="a">OCR: The Kurzweil Data Entry Machine</title>, <title>Literary and Linguistic
Computing</title>, 1 (1986), 63-67. (This article is being translated into
Japanese.)</bibl>
<bibl>
<title level="a">Workshop on Teaching Computers and the Humanities Courses</title>,
<title>Literary and Linguistic Computing</title>, 1 (1986), 228-229.</bibl>
<bibl>
<title level="a">Report on the Thirteenth ALLC Conference</title>, <title>Literary and
Linguistic Computing</title>, 1 (1986), 233-235.</bibl>
<bibl>
<title level="a">An Historical Perspective</title>, p20-30 in <title>Information Technology
in the Humanities: Tools, Techniques and Applications</title>,
ed. Sebastian Rahtz,
Ellis Horwood, 1987.</bibl>
<bibl>
<title level="a">SNOBOL in the Humanities: Keynote Address</title>, p1-25 in <title>ICEBOL
86, Proceedings of the 1986 International Conference on the
Applications of
SNOBOL and SPITBOL</title>, ed. Eric Johnson, Dakota State College,
Madison SD.</bibl> <bibl>
(with Jeremy Martin), <title level="a">The Oxford Concordance Program Version 2</title>,
<title>Literary and Linguistic Computing</title>, 2 (1987), 125-131.</bibl>
<bibl>
<title level="a">Some Considerations in Providing an Academic Typesetting Service:
Experiences at Oxford University</title>,
p 91-102 in <title>Studies in Honour of
Roberto Busa S.J.</title>, ed. Antonio Zampolli, Pisa: Giardini Editori e
Stampatori, 1987.</bibl>
</listBibl>
 
<?WS .cc 7 ?></div3>
<div3><head>In Press</head>
<listBibl><bibl>
<title level="a">A Survey of Practical Aspects of Computer-Aided Maintenance and
Processing of Natural Language Data</title>,
in <title>Computational Linguistics. Ein internationales Handbuch zur
computergestutzten Sprachforschung und ihrer
Anwendung</title>, eds I. Batori, W. Lenders and W. Putschke, to be
published by Walter de Gruyter.</bibl>
<bibl>
review of Cynthia Spencer, <title>Programming for the Liberal Arts</title>,
Rowman and Allenheld, 1985, to appear in <title>Computers and
the Humanities</title>.</bibl>
</listBibl></div3>
 
<div3><head>Publications Pending</head>
<listBibl><bibl>
Series editor for Oxford University Press for <title level="s">Oxford Studies on
Computing and the Humanities</title>. This series will consist of
monographs on various applications of computers in humanities research.</bibl>
<bibl>
(with Nancy Ide, Ian Lancashire, and Glyn Holmes)
editor of a volume of state-of-the-art essays on computing
and the humanities, to be published by University of Pennsylvania Press
in 1989.</bibl>
<bibl>
(with Robert Kraft), <title level="a">Computer Analysis of
Ancient Texts: A Historical Survey</title>,
to appear in <title>Aufstieg und
Niedergang der Romischen Welt</title>, Band II, 35.</bibl>
<bibl>
<title level="a">Creating and Using Large Text Databases for Scholarly Research in the
Humanities: Some Practical Issues</title>,
to appear in Festschrift for B. Quemada, ed A. Zampolli.</bibl>
</listBibl>

<!--* <include file=idevita> *--></div3></div2>
<div2><head>Nancy M. Ide</head>
<p><address>
<addrLine>Department of Computer Science</addrLine>
<addrLine>Vassar College</addrLine>
<addrLine>Poughkeepsie, New York 12601</addrLine>
<addrLine>(914)452-7000 ext. 2478</addrLine>
<addrLine>Bitnet:  IDE@VASSAR</addrLine>
</address></p>
 
<?WS .cc 7 ?>
<div3><head>Education</head>
<list rend="overhang">
<item>B.S. Psychology, The Pennsylvania State University, 1972.</item>
<item>B.A. English, The Pennsylvania State University, 1972.</item>
<item>M.A. English, The Pennsylvania State University, 1976.
Thesis: <title level="u">Thematic Structure in William Blake's `Proverbs of Hell.'</title></item>
<item>Ph.D. English with Computer Science minor (M.S. equivalent),
The Pennsylvania State University, 1982.
Dissertation topic: <q>Patterns of Imagery in William Blake's
<title>The Four Zoas</title>.</q></item>
<item>Visiting Scholar, Linguistics Summer Institute, 1986.</item>
</list>
</div3>

<?WS .cc 7 ?>
<div3><head>Current Employment</head>
<list rend="overhang"><item>
1982-present:  Assistant Professor of Computer Science and member of
the Cognitive Science faculty,
Vassar College.
</item></list>
</div3>
 
<?WS .cc 7 ?>
<div3><head>Books</head>
<listBibl>
<bibl><title>Pascal for the Humanities</title>, Philadelphia:  University of
Pennsylvania Press, 1987.</bibl>
<bibl><title>Methodologies in Humanities Computing,</title> co-edited with S.
Hockey, G. Holmes, I. Lancashire, forthcoming from University of
Pennsylvania Press, 1989.</bibl>
</listBibl>
 
<?WS .cc 7 ?></div3>
<div3><head>Articles</head>
<listBibl>
<bibl><title level="a">Image Patterns and the Structure of William Blake's <title>The Four
Zoas,</title></title> <title>Blake: An Illustrated Quarterly,</title> 20, 4 (Spring,
1987).</bibl>
<bibl><title level="a">The Lexical Database in Semantic Studies,</title> forthcoming in
<title>Linguistica Computazionale: Computational Lexicology and
Lexicography</title>, ed. A. Zampolli, 1988.</bibl>
<bibl><title level="a">Computers and the Humanities Courses: Philosophical Bases and
Approach,</title> <title>Computers and the Humanities,</title>
vol. 21, no. 3, 1987.</bibl>
<bibl><title level="a">Semantic Studies and Computational Linguistics,</title> forthcoming in
<title>Methodologies in Humanities Computing,</title> ed. S. Hockey, G.
Holmes, N. Ide. I. Lancashire, University of Pennsylvania Press, 1989.</bibl>
<bibl><title level="a">Semantic Patterning:  Time Series and Fourier Analysis,</title>
forthcoming in
<title>Festschrift in Honor of Etienne Evrard,</title> ed. C. Delcourt,
Liege, 1988.</bibl>
<bibl><title level="a">The Computational Determination of Meaning
in Literary Texts,</title> in
<title>Computers and the Humanities: Today's Research, Tomorrow's
Teaching,</title> Toronto: University of Toronto, 1986.</bibl>
<bibl><title level="a">Patterns of Imagery in Blake's <title>The Four Zoas</title>,</title>
<title>Proceedings of the Twelfth International ALLC Conference</title>
(selected papers), ed. E. Brunet, Geneva: Slatkine, 1985.</bibl>
<bibl><title level="a">Computers and the Humanities: Considerations
for Course Design and
Content,</title> <title>Proceedings of the Seventh International Conference on
Computers and the Humanities</title> (selected papers), ed. R. Jones,
Dortrecht, The Netherlands: D. Reidel, 1987.</bibl>
<bibl><title level="a">Documentation for the Non-scientist,</title> <title>ACM/SIGUCC
Newsletter</title>, Winter, 1981.</bibl>
</listBibl>
 
<?WS .cc 7 ?></div3>
<div3><head>Papers</head>
<p>Numerous invited papers and other conference contributions in the
fields of humanities computing and computer science education.
 
<?WS .cc 7 ?></p></div3>
<div3><head>Professional Activities</head>
<p>
President, Association for Computers and the Humanities,
1985-present.
Member, Liberal Arts Computer Science Consortium, 1984-present.
International Representative for the United States
to the Association for Literary and Linguistic Computing, 1987-present.
International Representative for the Northeastern United States
to the Association for Literary and Linguistic Computing, 1984-1987.
Editorial Advisory Board, <title>The Humanities Computing
Yearbook,</title> ed. I. Lancashire and W. McCarty, Oxford University
Press.
Organizer and Co-director, Meeting on Text Encoding Practices,
funded by the National Endowment for the Humanities, Vassar College,
November 1987.
Organizer and Director, Workshop on Computers and the Humanities
Courses, sponsored by the Alfred P. Sloan Foundation and the Association
for Computers and the Humanities, Vassar College, July, 1986.
Member, Executive Council, Association for Computers and the
Humanities, 1984-1985; nominating committee, 1984, 1985 (chairman).
Chair, Steering Committee for the
Scholarly Research Text Initiative,
1987-present.
Chair, Association for Computers and the Humanities
Committee on Text
Encoding Practices, 1987-present.
Member, Steering Committee,
National Educational Computer Conference,
1986-present.
Member, Steering Committee, Eastern Small College Computing
Conference, 1985-87.</p>
<p>
Member of:
Association for Computational Linguistics, Association for Computers
and the Humanities, Association for Computing Machinery, Association for
Literary and Linguistic Computing, Linguistic Society of America,
Modern Language Association.
 
<?WS .cc 7 ?></p></div3>
<div3><head>Grants and Awards</head>
<list rend="overhang">
<item>National Endowment for the Humanities Grant for the Development of
Text Encoding Guidelines, 1987.</item>
<item>
Alfred P. Sloan Foundation grant for the Workshop on Teaching
Computers and the Humanities Courses, 1985.
</item>
</list></div3>
</div2>

<!--* <include file=cmsmvita> *-->
<div2><head>C. M. Sperberg-McQueen</head>
<!--* 
.sr edh = 'd'
.* In any case, do this.
.tr # 40
   *-->
<p><address>
<addrLine>Computer Center (M/C 135)</addrLine>
<addrLine>University of Illinois at Chicago</addrLine>
<addrLine>Box 6998</addrLine>
<addrLine>Chicago, Illinois 60680</addrLine>
<addrLine>(312) 386-3584 / 996-2477</addrLine>
<addrLine>Bitnet:  U18189@UICVM</addrLine></address>
</p>

<div3><head>Education</head>
<p><table rend="datedlist">
<row><cell>1985</cell>
<cell>Ph.D., Comparative Literature, Stanford University.
Dissertation:  <q>An Analysis of Recent Work on
<title>Nibelungenlied</title>
Poetics.</q></cell></row>
<row><cell>1982-83</cell>
<cell>Georg-August Universit&auml;t zu G&ouml;ttingen</cell></row>
<row><cell>1978-79</cell>
<cell>Universit&eacute; de Paris IV (Sorbonne)</cell></row>
<row><cell>1977</cell>
<cell>A.M., German Studies, Stanford University</cell></row>
<row><cell>1977</cell>
<cell>A.B., German Studies and Comparative Literature, with distinction,
with Honors in Humanities and Honors in German Studies,
Stanford University</cell></row>
<row><cell>1975-76</cell>
<cell>Rheinische Friedrich-Wilhelms-Universit&auml;t Bonn,
Freie Universit&auml;t Berlin
</cell></row>
</table></p>

<?WS .cc 7 ?></div3>
<div3><head>Publications</head>
<p><table rend="datedlist">
<row><cell>In press</cell>
<cell><title level="a">Sigurdharkvidha in skamma</title> and
<title level="a">Brot af Sigurdharkvidhu.</title>
In <title>Encyclopedia of Scandinavia in the
Middle Ages,</title> ed. Phillip Pulsiano.
New York:  Garland, [forthcoming].</cell></row>
<row><cell>In progress</cell>
<cell>(With Prof. Joseph Harris) <title>Bibliography of the Eddas.</title></cell></row>
<row><cell>1987</cell>
<cell>Review of Nancy M. Ide, <title>Pascal for the Humanities</title>
(Philadelphia:  Univ. of Pennsylvania Press, 1987).  <title>Computers
and the Humanities</title> 21 (1987):  261-64.</cell></row>
<row><cell>1985</cell>
<cell><title level="a">The Legendary Form of <title>Sigurdharkvidha in
skamma</title>.</title>
<title>Arkiv f&ouml;r nordisk filologi</title> 100 (1985):  16-40.</cell></row>
<row><cell>1979</cell>
<cell><title>Approaching `Mother Courage,'
or, Who's Afraid of
Bertolt B.?</title>
Stanford Honors Essays in Humanities, 22.
Stanford, Ca., 1979.
</cell></row>
</table>
</p></div3>

<?WS .cc 7 ?>
<div3><head>Talks</head>
<p><table rend="datedlist">
<row><cell>1987</cell>
<cell><title level="a">Issues of Scope and Flexibility in Designing
Guidelines for Machine-Readable Text Encoding.</title>
Association for Computers and the Humanities, National Endowment for
the Humanities:  Meeting on Text Encoding Practices.
Poughkeepsie, New York, 12 November 1987.</cell></row>
<row><cell>1987</cell>
<cell><q>Concordances and Beyond:  Data Base Techniques for
Literary and Linguistic Research</q>
Claremont Colleges, Mellon Foundation:  Computer Applications to the
Curriculum&mdash;Beyond the Quantitative (workshop for selected Claremont
Colleges faculty).
Claremont, California, 1-2 June 1987.</cell></row>
<row><cell>1987</cell>
<cell><q>Providing Centralized Support
for Humanities Computing.</q>
International Conference on Computers and the Humanities.
Columbia, S.C., 10 April 1987.</cell></row>
<row><cell>1987</cell>
<cell><q>What Computers Ought to be Doing for <emph>Us:</emph>
Three Daydreams.</q>
Pomona College.
Claremont, California, 10 February 1987.
</cell></row></table>
</p></div3>
<div3><head>Teaching</head>
<p><table rend="datedlist">
<row><cell>1980</cell>
<cell>Section Leader, Humanities 62 (supervised two discussion sections
meeting weekly for two hours in survey of medieval and Renaissance
thought and literature), Stanford University</cell></row>
<row><cell>1979-80</cell>
<cell>German 1 and 2 (first-year German), Stanford University
</cell></row></table></p></div3>
<div3><head>Honors and Fellowships</head>
<p><table rend="datedlist">
<row><cell>1982-83</cell>
<cell>Deutscher Akademischer Austauschdienst fellowship in
G&ouml;ttingen</cell></row>
<row><cell>1977-81</cell>
<cell>Departmental Fellow, Stanford University</cell></row>
<row><cell>1976</cell>
<cell>Phi Beta Kappa, Stanford University
</cell></row></table>
</p></div3>
<div3><head>Professional Activities</head>
<p><table rend="datedlist">
<row><cell>1987-</cell>
<cell>Member, Executive Council, Association for Computers and the
Humanities</cell></row>
<row><cell>1987</cell>
<cell>Panelist,
Discussion on Instructional Software Development.
IBM Seminar for Deans of Arts and Letters.
Colorado Springs, Colorado, 8 April 1987.</cell></row>
<row><cell>1986</cell>
<cell>Panelist, Final Summary Session.
Association for Computers and the Humanities, Sloan Foundation:
Workshop on Teaching Computers-in-the-Humanities Courses.
Vassar College, Poughkeepsie, New York, August 1986.</cell></row>
<row><cell>1986</cell>
<cell>Organizer and Chair,
Round Table Discussion of Foreign Language Processing.
IBM Advanced Education Projects Conference.
San Diego, California, February 1986.</cell></row>
<row><cell>1986-</cell>
<cell>Member, Association for Computers and the Humanities,
Working Committee on Text Encoding Practices.</cell></row>
<row><cell>1985-86</cell>
<cell>Member, Executive Board, Northeast Association for Computers in the
Humanities</cell></row>
<row><cell>1983</cell>
<cell>Editorial consultant (Nibelungenlied stanza, Middle High German
metrics), revised edition of <title>Princeton Encyclopedia of
Poetics.</title></cell></row>
<row><cell>1982-86</cell>
<cell>Outside reader (German medieval literature), <title>MLN</title>.</cell></row>
<row><cell>1981-82</cell>
<cell>Editorial assistant, <title>MLN</title> 97.3 (1982).</cell></row>
<row><cell>1980-</cell>
<cell>Extensive work developing computer-based tools for humanities
teaching and research; teaching computer use to humanists, and
consulting with humanists on computer applications in research and
teaching.
</cell></row></table>
</p></div3>

<?WS .cc 7 ?>
<div3><head>Employment</head>
<p><table rend="datedlist">
<row><cell>1987-</cell>
<cell>Research Programmer, University of Illinois at Chicago
Computer Center</cell></row>
<row><cell>1985-86</cell>
<cell>Humanities Computing Specialist, Princeton University Computing
Center (Research Services Group)
</cell></row></table>
</p></div3></div2>

<!--* <include file=walkvita> *-->
<div2><head>Donald E. Walker</head>
<!--* 
.* .ll 6.5in  .po 1i
.* .de P      .sp .5    .ft R     .fi  .ps 11    .in .3i
.* .de H      .sp 1     .ft B     .nf  .ps 11    .in 0
.* .de PP     .sp .5    .ft R     .fi  .ps 11    .in 1i    .ti -1i
.* .de BB     .sp .5    .ft R     .fi  .ps 11    .in .3i   .ti -.3i
.* .de RR .nf .sp .5    .ft R          .ps 11    .in 0
.* .sp 1i .ps 12 .vs 12 .ce 2 .ft B
*-->
<p><address>
<addrLine>Bell Communications Research</addrLine>
<addrLine>435 South Street, MRE 2A379</addrLine>
<addrLine>Morristown, NJ 07960</addrLine>
<addrLine>Telephone: 201:829-4312</addrLine>
<addrLine>Arpanet: walker@flash.bellcore.com</addrLine>
<addrLine>Usenet: ucbvax!bellcore!walker</addrLine></address>
</p>
 
<?WS .cc 7 ?>
<div3><head>Specialized Professional Competence</head>
<p>
<emph>Research and Research Management in Computer and Information
Science.</emph>  Focusing primarily at the intersection of computational
linguistics, artificial intelligence, and information science.
Particular interests in the development of natural language and
knowledge resource systems, that is, the design and implementation of
computer-based facilities that support accessing, organizing, and using
knowledge and information.  Extensive experience with natural language
understanding systems&mdash;both text and speech, text processing systems,
and personal file management systems with applications in medicine and
law.  Current activities concentrating on the lexicon, machine-readable
dictionaries and other online reference works, and the management of
massive document files.</p>
<p>
Projects administered have been funded in excess of $10,000,000.</p></div3>
 
<div3><head>Professional Appointments</head>
<p><table rend="datedlist">
<row><cell>1984-</cell>
<cell>District Manager, Artificial Intelligence and Information
Science Research, Bell Communications Research, Morristown,
New Jersey.</cell></row>
<row><cell>1983-84</cell>
<cell>Program Manager, Natural-Language and Knowledge-Resource
Systems. SRI International, Menlo Park, California.</cell></row>
<row><cell>1971-83</cell>
<cell>Project Leader and Senior Research Linguist, SRI
International, Menlo Park, California.</cell></row>
<row><cell>1962-71</cell>
<cell>Head, Language and Text Processing, The MITRE Corporation,
Bedford, Massachusetts.</cell></row>
<row><cell>1961-62</cell>
<cell>Technical Staff, The MITRE Corporation.</cell></row>
<row><cell>1961-71</cell>
<cell>Visiting Scholar, Research Affiliate, and Guest, Research
Laboratory of Electronics, Massachusetts Institute of Technology,
Cambridge, Massachusetts.</cell></row>
<row><cell>1960-61</cell>
<cell>Clinical Assistant Professor of Psychology, Department of
Psychiatry, Baylor University College of Medicine, Houston, Texas.</cell></row>
<row><cell>1959-61</cell>
<cell>Research Associate, Department of Psychiatry, Baylor
University College of Medicine, Houston, Texas.  Principal Investigator,
Public Health Service Grant on
Language Variability in Psychiatric Patients.</cell></row>
<row><cell>1957-61</cell>
<cell>Research Psychologist (part time), Veterans Administration
Hospital, Houston, Texas.</cell></row>
<row><cell>1955</cell>
<cell>Visiting Assistant Professor of Linguistics in the
Linguistic Institute and Research Associate in the Office of the
University Examiner (Summer),
University of Chicago, Chicago, Illinois.</cell></row>
<row><cell>1953-1961</cell>
<cell>Assistant Professor of Psychology, Rice University,
Houston, Texas.
</cell></row>
</table></p></div3>
 
<?WS .cc 7 ?>
<div3><head>Education</head>
<p><table rend="datedlist">
<row><cell>1945-1947</cell>
<cell>Deep Springs College, Deep Springs, California</cell></row>
<row><cell>1947-1952</cell>
<cell>University of Chicago, Chicago, Illinois; Psychology
Ph.D. 1955; Thesis: The relation between creativity and selected test
behaviors for mathematicians and chemists</cell></row>
<row><cell>1952-1953</cell>
<cell>Yale University, New Haven, Connecticut; Linguistics</cell></row>
<row><cell>1953</cell>
<cell>Indiana University (Summer), Bloomington, Indiana; Linguistics
</cell></row></table>
</p></div3>
 
<?WS .cc 7 ?>
<div3><head>Honors</head>
<p>Association for Computational Linguistics,
Certificate of Recognition, 1987.
American Association for Artificial Intelligence,
Certificate of Recognition, 1984.
Association for Computing Machinery National Lecturer, 1973-1974.
Social Science Research Council Fellow, 1952-1953.
Phi Beta Kappa, elected 1952.
Sigma Xi, elected 1952.
University Fellow and various scholarships,
University of Chicago, 1947-1952.
Deep Springs Scholarships, Deep Springs College, 1945-1947.
 
<?WS .cc 7 ?></p></div3>
<div3><head>Major Professional Activities</head>
<list rend="overhang"><item>
<hi>Association for Computational
Linguistics:</hi>  Secretary-Treasurer,
1976-present; President, 1968; Vice President, 1967</item>
<item>
<hi>International Joint Conferences on Artificial
Intelligence:</hi>  Secretary-Treasurer of the Board of Trustees
and of the Conferences,
1977-present; Trustee, 1973-1977; Past General Chair, 1973
Conference; General Chair, 1971 Conference; Program Chair, 1969
Conference; Council Member, 1969-1973</item>
<item>
<hi>International Federation for Documentation:
Committee on Linguistics
in Documentation</hi>, Chair 1973-1980; Vice-Chair, 1981-present</item>
<item>
<hi>American Association for Artificial Intelligence:</hi>
Secretary-Treasurer,
1979-1983; Finance Committee, 1981-1984; member of AAAI organizing
committee</item>
<item>
<hi>American Federation of Information Processing Societies:</hi>
Member, Board of Directors, 1967-1971, 1979-1983; Secretary, 1971-1972</item>
<item>
<hi>American Society for Information Science:</hi>  Chair,
Special Interest
Group on Automated Language Processing, 1970-1972</item>
<item>
<hi>Association for Computing Machinery:</hi>  National
Lecturer, 1973-1974;
Visiting Scientist Program, 1969-1970</item>
<item>
<hi>National Academy of Sciences:</hi>  U.S. National Committee for FID
(International Federation for Documentation), Chair, 1979-1982;
member, 1974-1982; NAS Delegate to the FID 1980 General Assembly;
member, SYSTRAN Panel, Advisory Committee to the Air Force Systems
Command, 1970-1972</item>
<item>
<hi>National Science Foundation:</hi>  Consultant,
Office of Science Information
Service, 1971; panelist and reviewer for proposals for Computer Science,
Linguistics, Information Science and Technology, 1966-present</item>
<item>
<hi>National Institutes of Health:</hi>  Special Study Section, 1979</item>
<item>
<hi>U.S.#Air Force Scientific Advisory
Board:</hi>  Member, Ad Hoc Committee
on Machine Translation, 1974</item>
<item>
<hi>Center for Applied Linguistics:</hi>  Advisory Committee, 1972</item>
<item>
<hi>Houston Psychological Association:</hi> Secretary-Treasurer,
1957-1958;
Executive Committee, 1959-1960
</item></list>
 
<?WS .cc 7 ?></div3>
<div3><head>Editorial Boards</head>
<list rend="overhang">
<item>
<title>American Journal of
Computational Linguistics</title>:  Managing Editor,
1976-present; Editorial Board, 1974-1977</item>
<item>
<title>Annual Review of
Information Science and Technology</title>: Advisory Board,
1983-1986</item>
<item>
<title>Artificial Intelligence,
An International Journal</title>:  Editorial Board,
1968-present</item>
<item>
<title>Data and Knowledge
Engineering</title>:  Associate Editor, 1985-present</item>
<item>
<title>Linguistica Computazionale</title>:  Editorial Board, 1983-present</item>
<item>
<title>Computers and the Humanities</title>:  Editorial Board, 1982-present</item>
<item>
<title>Encyclopedia of Artificial
Intelligence</title>:  Editorial Board, 1983-1987</item>
<item>
<title>International Forum on Information and
Documentation</title>: Editorial Board,
1978-1982</item>
<item>
<title level="s">Linguistic Calculation</title>
(Series sponsored by the Kval Institute for
Information Science and published by D. Reidel Publishing Company):
Editorial Board, 1982-present</item>
<item>
<title>Sprache und Datenverarbeitung</title>: Editorial Board, 1977-present</item>
<item>
<title level="s">Studies in Natural Language Processing</title>
(Series sponsored by the
Association for Computational Linguistics and published by Cambridge
University Press):  Editorial Board, 1982-present
</item></list></div3>
 
<?WS .cc 7 ?>
<div3><head>National Professional Societies</head>
<p>
American Association for Artificial Intelligence, 1979-present.
American Psychological Association, 1953-1972.
American Society for Information Science, 1970-present.
Association for Computers and the Humanities, 1979-present.
Association for Computational Linguistics, 1965-present.
Association for Computing Machinery, 1963-present.
Association for Literary and Linguistic Computing, 1985-present.
Dictionary Society of North America, 1985-present.
Euralex, 1985-present.
Linguistic Society of America, 1960-present.
Society for the Study of Artificial Intelligence
and Simulation of Behavior,
1979-present.
 
<?WS .cc 7 ?></p></div3>
<div3><head>Publications and Major Reports</head>
<listBibl><bibl>
<title level="a">The Development of a Technique for Measuring
Stimulus-Bound&mdash;Stimulus-Free
Behavior.</title>  <title>American Psychologist</title>, 6, 365, 1951.</bibl>
<bibl>
<title level="a">Consistent Characteristics in the Behavior of Creative Mathematicians
and Chemists.</title>  <title>American Psychologist</title>, 7, 371, 1952.</bibl>
<bibl>
The Relation Between Creativity and Selected Test Behaviors for
Mathematicians and Chemists.  Ph.D. Thesis, University of Chicago, 1955.</bibl>
<bibl>
<title level="a">Psycholinguistics:  A Survey
of Theory and Research Problems.</title>  Edited by
Charles E. Osgood and Thomas A. Sebeok.
<title>Indiana University Publications
in Anthropology and Linguistics</title>, Memoir 10.  Waverly Press,
Baltimore, 1954.  Also
<title>Journal of Abnormal and Social Psychology</title>, 1954, Supplement.
Reissued by Indiana University Press, Bloomington, Indiana, 1965.</bibl>
<bibl>
<title level="a">Parents of Schizophrenics, Neurotics, and Normals</title>
(with Seymour Fisher,
Ina Boyd, and Diane Sheer).  <title>A.M.A. Archives of General
Psychiatry</title>, 1959, 1, 149-166.</bibl>
<bibl>
<title level="a">The Brain and Behavior.</title>
<title>Rice Institute Pamphlet</title>, 1960, 47, 48-80.</bibl>
<bibl>
<title level="a">Whittaker's `Postulates of Impotence' and Theory in Psychology</title>
(with Trenton W. Wann).  <title>Psychological Record</title>, 1961, 11,
383-393.</bibl> <bibl> <title level="a">The Interpretation of Data:  Puberty Rites</title>
(with Edward Norbeck and
Mimi Cohen).  <title>American Anthropologist</title>, 1962, 64, 463-485.</bibl>
<bibl>
<title level="a">The Structure of Languages for Man and Computer</title>
(with James M. Bartlett).
SS-10, The MITRE Corporation, Bedford, Massachusetts, November 1962.
Presented at the First Conference on Information System Sciences,
The Homestead, Virginia, November 1962.</bibl>
<bibl>
<title level="a">The Concept `Idiolect':  Contrasting Conceptualizations in
Linguistics.</title> In <title>Proceedings of the Ninth International
Congress of Linguistics</title>
(Cambridge, Massachusetts, August 27-31, 1962).  Edited by Horace G.
Lunt. Mouton, The Hague, 1964.  Pp. 556-561.</bibl>
<bibl>
<title>Information System Sciences</title> (Ed., with Joseph Spiegel).
Spartan Books, Washington, D.C., 1965.</bibl>
<bibl>
English Preprocessor Manual (Ed.).
SR-132, The MITRE Corporation, Bedford, Massachusetts, December 1964;
revised, May, 1965.</bibl>
<bibl>
<title level="a">The MITRE Syntactic Analysis Procedure for Transformational
Grammars</title> (with Arnold M. Zwicky, Joyce Friedman, and Barbara C.
Hall).
<title>AFIPS Conference Proceedings</title>: <title>Fall Joint Computer
Conference</title>,
1965, 27, 317-326.</bibl>
<bibl>
<title level="a">Recent Developments in the MITRE Analysis Procedure</title> (with Paul G.
Chapin,
Michael L. Geis, and Louis N. Gross).
MTP-11, The MITRE Corporation, Bedford, Massachusetts, June 1966.
Presented at the Annual Meeting of the Association for Machine
Translation and Computational Linguistics, Los Angeles, July 1966.</bibl>
<bibl>
<title>Information System Science and Technology</title> (Ed.).
Thompson Book Company, Washington, D.C., 1967.</bibl>
<bibl>
Critique of papers by Silvio Ceccato and Sidney M. Newman.
(International Symposium on Relational Factors in Classification,
College Park, Maryland, June 1966.)  <title>Information Storage and
Retrieval</title>,
1967, 3, 216-218, 348-350.</bibl>
<bibl>
<title level="a">SAFARI, an On-Line Text-Processing System.</title>
<title>Proceedings of the American Documentation Institute</title>, 1967, 4,
144-147.</bibl>
<bibl>
<title>Proceedings of the International Joint Conference on Artificial
Intelligence</title> (Ed., with Lewis M. Norton).  Washington, D.C., 1969.</bibl>
<bibl>
<title level="a">On-Line Computer Aids for Research in Linguistics</title> (with Louis N.
Gross).
In <title>Information Processing 68</title>.  Edited by A.J.H. Morrell.
North-Holland, Amsterdam, 1969.</bibl>
<bibl>
<title level="a">Computational Linguistic Techniques in an On-Line System for Textual
Analysis.</title>
<title>International Conference on
Computational Linguistics</title>: <title>COLING 1969</title>
(Sanga-Saby, Sweden, 1-4 September 1969).
Preprint No. 63. KVAL, Stockholm, 1969.</bibl>
<bibl>
<title>Interactive Bibliographic
Search</title>:  <title>The User</title>/<title>Computer
Interface</title> (Ed.).
AFIPS Press, Montvale, New Jersey, 1971.</bibl>
<bibl>
<title level="a">Social Implications of Automatic Language Processing.</title>
In <title>Research Trends in Computational
Linguistics</title>.  Center for Applied
Linguistics, Washington, D.C., 1972.  Pp. 78-86.</bibl>
<bibl>
Speech Understanding Research.
Annual Report, Project 1526, Artificial Intelligence Center,
Stanford Research Institute, Menlo Park, California, February 1973.</bibl>
<bibl>
<title level="a">Automated Language Processing.</title>
In <title>Annual Review of Information Science and Technology</title>, Volume 8.  Edited
by Carlos A. Cuadra and Ann W. Luke.  American Society for Information
Science, Washington, D.C., 1973.  Pp. 69-119.</bibl>
<bibl>
<title level="a">Speech Understanding Through Syntactic and Semantic Analysis.</title>
<title>IEEE Transactions on Computers</title>, 1976, C-25, 432-439.
Also in <title>Advance Papers</title>, <title>Third International Joint Conference
on Artificial Intelligence</title> (Stanford, California, 20-23 August
1973).  Stanford Research Institute, Menlo Park, California, 1973,
208-215.</bibl>
<bibl>
<title level="a">Speech Understanding, Computational Linguistics, and Artificial
Intelligence.</title>
In <title>Computational and Mathematical Linguistics</title>, <title>Proceedings of the
International Conference on Computational Linguistics</title>
(Pisa, Italy, 27 August-1 September 1973.), Volume I.    Edited by
Antonio Zampolli and Nicoletta Calzolari.  Casa Editrice Leo S. Olschki,
Firenze, 1977.  Pp. 725-740.</bibl>
<bibl>
Speech Understanding Research.
Annual Report, Project 1526, Artificial Intelligence Center,
Stanford Research Institute, Menlo Park, California, May 1974.</bibl>
<bibl>
<title level="a">The SRI Speech Understanding System.</title>
<title>IEEE Transactions on Acoustics</title>, <title>Speech and Signal Processing</title>, 1975,
ASSP-23, 397-416.
Also in <title>Contributed Papers</title>, <title>IEEE Symposium on Speech Recognition</title>
(Carnegie-Mellon University, Pittsburgh, Pennsylvania, 15-19
April 1974).  IEEE, New York, 1974, 32-37.]</bibl>
<bibl>
Speech Understanding Research (with William H. Paxton, Jane J. Robinson,
Gary G. Hendrix, and Ann E. Robinson).
Annual Report, Project 3804, Artificial Intelligence Center, Stanford
Research Institute, Menlo Park, California, June 1975.</bibl>
<bibl>
<title level="a">Progress in Speech Understanding Research at SRI.</title>
<title>Proceedings of the Fourth International Congress of Applied
Linguistics</title> (Stuttgart, German Federal Republic, 25-30 August 1975).
Edited by Gerhard Nickel.  Hochschul Verlag, Stuttgart, 1976.</bibl>
<bibl>
<title level="a">Artificial Intelligence and Language Processing:  A Directory of
Research Personnel.</title>
<title>American Journal of Computational Linguistics</title>, 1976, Microfiche 39.</bibl>
<bibl>
Speech Understanding Research.  Semiannual Technical Report, Project 4762,
Artificial Intelligence Center, Stanford Research Institute, Menlo Park,
California, June 1976.</bibl>
<bibl>
Speech Understanding Research (Ed.).  Final Technical Report, Project 4762,
Artificial Intelligence Center, Stanford Research Institute, Menlo Park,
California, October 1976.</bibl>
<bibl>
<title level="a">An Overview of Speech Understanding Research at SRI</title>
(with Barbara J. Grosz, Gary G. Hendrix, William H. Paxton,
Ann E. Robinson, Jane J. Robinson, and Jonathan Slocum).</bibl>
<bibl>
<title level="a">Speech Understanding Systems:  Report of a Steering Committee</title>
(with Mark F. Medress, Franklin S. Cooper, James W. Forgie, C. Cordell Green,
Dennis H. Klatt, Michael H. O'Malley, Edward P. Neuburg, Allen Newell,
D. Raj Reddy, H. Barry Ritea, June E. Shoup-Hummel, and William A. Woods).
<title>Artificial Intelligence</title>, 1977, 9, 307-316; <title>SIGART Newsletter</title>,
April 1977, 62, 4-8.</bibl>
<bibl>
<title level="a">Procedures for Integrating Knowledge in a Speech Understanding System</title>
(with William H. Paxton, Barbara J. Grosz, Gary G. Hendrix, Ann E. Robinson,
Jane J. Robinson, and Jonathan Slocum).  <title>Proceedings of the Fifth
International Joint Conference on Artificial Intelligence</title> (Cambridge,
Massachusetts, 22-25 August 1977).  Carnegie-Mellon University,
Pittsburgh, Pennsylvania, 1977.  Pp. 36-42.</bibl>
<bibl>
<title level="a">Research on Speech Understanding and Related Areas at SRI.</title>
In <title>Proceedings</title>, <title>Voice Technology for Interactive Real</title>-<title>Time
Command</title>/<title>Control Systems Applications</title> (NASA Ames Research
Center, Moffett Field, California, 6-8 December 1977).  Edited by
Robert Breaux, Mike Curran, and Edward M. Huff. NASA Ames Research
Center, Moffett Field, California, 1977.  Pp. 45-63.</bibl>
<bibl>
<title>Natural Language in Information Science</title>:  <title>Perspectives and
Directions for Research</title> (Ed., with Hans Karlgren and Martin Kay).
Skriptor, Stockholm, 1977.</bibl>
<bibl>
<title level="a">Introduction.</title>  In <title>Natural Language in Information Science</title>:
<title>Perspectives and Directions for Research</title>.  Edited by Donald E. Walker,
Hans Karlgren, and Martin Kay.  Skriptor, Stockholm, 1977.  Pp. 3-5.</bibl>
<bibl>
<title level="a">The Workshop and Its Results.</title>  In <title>Natural Language in Information
Science</title>: <title>Perspectives and Directions for Research</title>.
Edited by Donald E. Walker, Hans Karlgren, and Martin Kay.  Skriptor,
Stockholm, 1977.  Pp. 7-18.</bibl>
<bibl>
<title>Understanding Spoken Language</title> (Ed.).  Elsevier North-Holland, New York,
1978.</bibl>
<bibl>
<title level="a">Introduction and Overview.</title>
In <title>Understanding Spoken Language</title>.  Edited by Donald E. Walker.
Elsevier North-Holland, New York, 1978.  Pp. 1-13.</bibl>
<bibl>
<title level="a">Natural Language Access to a Melanoma Data Base</title> (with Martin Epstein).
In <title>Proceedings of the Second Annual Symposium on Computer Applications in
Medical Care</title>, Washington, D.C., November 5-9, 1978.  IEEE, New York,
1978.  Pp. 320-325.</bibl>
<bibl>
<title level="a">Information Retrieval on a Linguistic Basis.</title>
In <title>Aspects of Automatized Text Processing</title>.
Edited by Sture Allen and Janos Petofi.
Helmut Buske, Hamburg, 1979.  Pp. 137-156.</bibl>
<bibl>
<title level="a">SRI Research on Speech Understanding.</title>
In <title>Trends in Speech Recognition</title>.  Edited by Wayne A. Lea.
Prentice-Hall, Englewood Cliffs, New Jersey, 1980.  Pp. 294-315.</bibl>
<bibl>
<title level="a">Economic Impact of Research on Natural Language</title> (with Hans Karlgren).
<title>Proceedings of FID Congress 1980</title>, Copenhagen, Denmark, 25-28 August 1980.
Also <title>International Forum on Information and Documentation</title>, 1981, 6, 7-10.</bibl>
<bibl>
<title level="a">Natural Language Access to Medical Text</title> (with Jerry R. Hobbs).
<title>Proceedings of the Fifth Annual Symposium on Computer Applications
in Medical Care</title>. IEEE, New York, 1981.  Pp. 269-273.</bibl>
<bibl>
<title level="a">The Organization and Use of Information: Contributions of Information
Science, Computational Linguistics and Artificial Intelligence.</title>
<title>Journal of the American Society for Information Science</title>, 1981, 32,
347-363.</bibl>
<bibl>
<title level="a">Organizing and Using Textual Information.</title>
<title>1982 Office Automation Conference Digest</title>.  Arlington, Virginia:
AFIPS Press, 1982.  Pp. 681-686.</bibl>
<bibl>
<title level="a">Reflections on 20 Years of the ACL: An Introduction.</title>
<title>Proceedings of the 20th Annual Meeting of the Association for
Computational Linguistics</title>.  Menlo Park, California: Association for
Computational Linguistics, 1982. Pp. 89-91.</bibl>
<bibl>
<title level="a">A Society in Transition.</title>
<title>Proceedings of the 20th Annual Meeting of the Association for Computational
Linguistics</title>.  Menlo Park, California: Association for Computational
Linguistics, 1982. Pp. 98-99.</bibl>
<bibl>
<title level="a">Natural Language Access Systems and the Organization and Use of
Information.</title>
<title>COLING-82: Proceedings of the Ninth International Conference
on Computational Linguistics</title>, edited by Jan Horecky.  Amsterdam:
North-Holland, 1982.  Pp. 407-412.</bibl>
<bibl>
<title level="a">Natural Language Access to Structured Text</title> (with Jerry R. Hobbs
and Robert A. Amsler).
<title>COLING-82: Proceedings of the Ninth International Conference
on Computational Linguistics</title>, edited by Jan Horecky.  Amsterdam:
North-Holland, 1982.  Pp. 127-132.</bibl>
<bibl>
<title level="a">Sublanguages</title> (with Richard Kittredge, Joan Bachenko, Ralph Grishman, and
Ralph Weischedel). In <title>Applied Computational Linguistics in Perspective:
Proceedings of the Workshop</title>, edited by Carroll Johnson and Joan Bachenko.
<title>American Journal of Computational Linguistics</title>, 1982, 8, 79-82.</bibl>
<bibl>
<title level="a">The Polytext System--A New Design for a Text Retrieval System</title>
(with Hans Karlgren).  In <title>Questions and Answers</title>, edited by Ferenc Kiefer.
Reidel, Dordrecht, Holland, 1983.  Pp. 273-294.</bibl>
<bibl>
<title level="a">Computational Strategies for Analyzing the Organization and Use of
Information.</title>
In <title>Knowledge Structure and Use</title>: <title>Implications for Synthesis
and Interpretation</title>, edited by Spencer A. Ward and Linda J. Reed.
Temple University Press, Philadelphia, Pennsylvania, 1983.  Pp. 229-284.</bibl>
<bibl>
Contributor, Information Science Panel Report, <title>Proceedings of the
Information Technology Workshop</title>, (Leesburg, Virginia, 5-7 January 1983).</bibl>
<bibl>
<title level="a">Automatic Segmentation and Analysis of Digital Data Streams</title> (with Robert
A. Amsler and Bernard Elspas).  Final Report: Project Definition Phase,
Project 5383.  Computer Science and Technology Division, SRI International,
Menlo Park, California, May 1983.</bibl>
<bibl>
<title level="a">The Use of Machine-Readable Dictionaries in Sublanguage Analysis</title> (with
Robert A. Amsler).
In <title>Analyzing Language in Restricted Domains</title>, edited by Ralph
Grishman and Richard Kittredge.  Lawrence Erlbaum Associates,
Hillsdale, New Jersey, 1986. Pp. 69-83.</bibl>
<bibl>
<title level="a">Knowledge Resource Tools for Information Access.</title>
<title>Future Generations Computer Systems</title>, 1986, 2, 161-171.</bibl>
<bibl>
<title level="a">Knowledge Resource Tools for Accessing Large Text Files.</title>
In <title>Machine Translation: Theoretical and Methodological Issues</title>,
edited by Sergei Nirenberg.  Cambridge University Press, Cambridge,
England, 1987.  Pp. 247-261.
<title level="a">Distributed Expert-Based Information Systems: an Interdisciplinary
Approach</title> (with Nicholas Belkin and others), <title>Information Processing
and Management</title>, 1987, 23:5, in press.</bibl>
<bibl>
<title>Automating the Lexicon: Research and Practice in a Multilingual
Environment</title> (Ed. with Antonio Zampolli and Nicoletta Calzolari).
Submitted for publication in the Association for Computational
Linguistics series, <title level="s">Studies in Natural Language Processing</title>.</bibl>
<bibl>
<title level="a">Introduction.</title>
In <title>Automating the Lexicon: Research and Practice in a Multilingual
Environment</title>, edited by Donald E. Walker, Antonio Zampolli, and
Nicoletta Calzolari.  Submitted for publication in the Association for
Computational Linguistics series, <title level="s">Studies in Natural Language
Processing</title>.</bibl>
</listBibl>
<closer><dateline><date>
November 1987
</date></dateline></closer>
</div3>

<!--* <include file=zampvita> *--></div2>
<div2><head>Antonio Zampolli</head>
<p><address>
<addrLine>Dipartimento di Linguistica</addrLine>
<addrLine>Universit&agrave; di Pisa</addrLine>
<addrLine>Via S. Maria 36</addrLine>
<addrLine>I-56100 Pisa</addrLine>
<addrLine>Italy</addrLine>
<addrLine>Tel. 050 (24773).</addrLine>
</address>

<address>
<addrLine>Istituto di Linguistica Computazionale</addrLine>
<addrLine>Via della Faggiola 32</addrLine>
<addrLine>I-561000 Pisa</addrLine>
<addrLine>Italy</addrLine>
<addrLine>Tel. 050 (502082).</addrLine>
</address></p>

 
<?WS .cc 7 ?>
<div3><head>Current Position</head>
<p>Full Professor of Mathematical
Linguistics at the University of Pisa and Director of
the Institute for Computational Linguistics of the Italian National
Research Council (CNR).
 
<?WS .cc 7 ?></p></div3>
<div3><head>Teaching Experience</head>
<p><table rend="datedlist">
<row><cell>1970-1976</cell> <cell>Associate Professor of Mathematical Linguistics
(<q>professore incaricato</q>) at the Faculty of Letters and Philosophy
of the University of Pisa, Italy.</cell></row>
<row><cell>1975-1977</cell> <cell>Full Professor of General Linguistics
(<q>professore straordinario</q>) at the University of Genoa, Italy.</cell></row>
<row><cell>1978-</cell> <cell>Full Professor of Mathematical Linguistics
(<q>professore ordinario</q>) at the Faculty of Letters and Philosophy
at the University of Pisa, Italy.</cell></row>
<row><cell>1975-1978</cell> <cell>Professor in Computational and Mathematical Linguistics
at the School for Graduate Studies in Linguistic Sciences
(<q>Scuola di Perfezionamento in Scienze Linguistiche</q>)
at the Faculty of Letters and Philosophy of the University of Pisa,
Italy.</cell></row>
<row><cell>1982-</cell> <cell>Director of the PhD Curriculum in Computational and
Applied Linguistics for the Doctorate of Research (PhD in Linguistics).</cell></row>
<row><cell>1970</cell> <cell>Director of the International Summer School on Mathematical
and Computational Linguistics <q>Linguistic and Literary Electronic Data
Processing</q>.</cell></row>
<row><cell>1972-1977</cell> <cell>Director of the II, III, and IV
International Summer Schools on
Mathematical and Computational Linguistics.</cell></row>
<row><cell>1982-</cell> <cell>Lecturer of the course in Linguistics for the School of
Graduate Studies in Phoniatrics and Logopedics of the Faculty of Medicine
of the University of Pisa.
</cell></row></table>
 
<?WS .cc 7 ?></p></div3>
<div3><head>Scientific Activity in International Committees and Associations</head>
<list rend="overhang">
<item>ICCL (International Committee on Computational Linguistics):
Vice President.</item>
<item>ALLC (Association for Literary and Linguistic Computing):  Elected
to the Directive Committee; ALLC representative for Italy;
<q>Corresponding Editor</q> of the <title>Bulletin</title>; President of the
Specialist Group for Italian texts; Member of the Specialist Group
for International Standards for Data Storage
in Machine-readable Form and for Networks.
From 1983:  President of the ALLC.</item>
<item>CIRPHO (Cercle Internationale de Recherche Philosophique par
ordinateur):  Vice President.</item>
<item>FID (Federation international de documentation):  Member of
<q>Linguistics in documentation</q> ad hoc group.</item>
<item>AILA (Association Internationale de Linguistique
Appliqu&eacute;e):
Vicepresident; Editor of the Bulletin; Member of the Executive
Committee; Co-president of the Scientific Commission for Applied
Computational Linguistics; Member of the Committee for the
International Summer Schools; Member of the Committee for the
Scientific Commissions; Member of the Financial Committee.</item>
<item>ACH (Association for Computers and the Humanities): Vice President
(pro-tempore).
From 1983:  Italian representative of ACH.</item>
<item>EJCSC (European Joint Committee for Scientific Cooperation of the
Council of Europe): Co-president of the group for the coordination of
automatic text processing procedures.</item>
<item>CETIL (<q>Comit&eacute; d'Experts pour
le Transfert de l'Information
entre Langues europeennes</q> of the EC): Vicepresident; Member of the
group for the Coordination of Computer-aided Translation.</item>
<item>CIDST (Comit&eacute; pour
l'Information et Documentation Scientifique
et Tecnique of the EC): Vicepresident of the ad hoc group for the
Automatic Processing of Multilingual Information.</item>
<item>Member of the Scientific Council of the <title>Index
Thomisticus.</title></item>
<item>Member of the Committee of Experts of the Italian Ministry for
Foreign Affairs for the teaching of Italian abroad.
</item>
</list>
<p><table rend="datedlist">
<row><cell>1982-</cell> <cell>Member of the Ad hoc Group for lexicography of the
European Science Foundation.</cell></row>
<row><cell>1982-</cell> <cell>Italian correspondent for INFOTERM (Vienna).</cell></row>
<row><cell>1983-</cell> <cell>Italian representative for ACPM (Committee of
EUROTRA).</cell></row>
<row><cell>1983-</cell> <cell>Member of Executive Committee of EUROLEX.</cell></row>
<row><cell>1985-</cell> <cell>Italian delegate in the Commission of the European
Community CGC 12 (Management and Coordination Advisory Committee -
Linguistic Problems).</cell></row>
<row><cell>1985-</cell> <cell>Chairman of the ad hoc Task Force (for Machine
Translation) of the CGC 12.</cell></row>
<row><cell>1985-</cell> <cell>Co-opted member responsible for the use of computers in
the humanities in Europe for the Standing Committee for the Humanities
of the European Science Foundation.
</cell></row></table></p>
 
<?WS .cc 7 ?></div3>
<div3><head>Organisation of International Meetings</head>
<p><table rend="datedlist">
<row><cell>1968</cell> <cell>Chairman of the Organizing Commitee
of the <q>Colloque International sur le
Dictionnaire latin de machine</q>.</cell></row>
<row><cell>1970</cell> <cell>Chairman of the Organizing Committee of the
<q>Colloque International L'elaboration
electronique en lexicologie et en lexicographie</q> (Pisa).</cell></row>
<row><cell>1972</cell> <cell>Member of the <q>Program Committee</q> for the
<q>5th International Conference on Computational
Linguistics</q>.</cell></row>
<row><cell>1973</cell> <cell>General Coordinator of the 5th International Conference on
Computational linguistics (Pisa).</cell></row>
<row><cell>1974</cell> <cell>Responsible for the <q>Computational
Linguistics</q> section
of the <q>4th International Congress of Applied Linguistics</q>
(Stuttgart, 1975).</cell></row>
<row><cell>1974</cell> <cell>Member of the <q>Program Committee</q> for the
<q>6th International Conference on Computational
Linguistics</q> (Montreal, 1975).</cell></row>
<row><cell>1974</cell> <cell>Member of the Honorary Committee of the
<q>2nd International Conference
on Computers and the Humanities</q> (Los Angeles, 1975).</cell></row>
<row><cell>1975</cell> <cell>Member of the Scientific Program Committee and Co-president
of the <q>Computational Linguistics and Machine Translation</q> section
of the <q>4th AILA World Congress</q> (Stuttgart).</cell></row>
<row><cell>1976</cell> <cell>Member of the Scientific Program Committee for the section
on <q>Lexicography and Quantitative linguistics</q> of the
<q>International Conference on Computational Linguistics</q> (Ottawa).</cell></row>
<row><cell>1978</cell> <cell>Co-president of the Scientific Committee for the
<q>Computational Linguistics</q> section
of the <q>5th AILA World Congress</q> (Montreal).</cell></row>
<row><cell>1978</cell> <cell>President of the <q>Scientific Program Committee</q> of the
<q>International Conference on Computational Linguistics</q> (Bergen).</cell></row>
<row><cell>1980</cell> <cell>Member of the <q>Scientific Program Committee</q> of the
<q>International Conference on Computational Linguistics</q> (Tokyo).</cell></row>
<row><cell>1981</cell> <cell>Responsible for the workshop <q>On the Possibilities and
Limits of the Computer in producing and publishing Dictionaries</q>,
organised for the European Science Foundation (Pisa).</cell></row>
<row><cell>1981</cell> <cell>Member of the Organizing Committee of the Round Table on the
Application of the Computer to Spanish (Pisa).</cell></row>
<row><cell>1982</cell> <cell>Member of the Scientific Program Committee of the
<q>International
Conference on Computational Linguistics</q> (Prague).</cell></row>
<row><cell>1982</cell> <cell>Chairman of the Organizing Committee and member of the
Program Committee of the
VII ALLC Symposium <q>Computers in Literary and Linguistic
Research</q> (Pisa).</cell></row>
<row><cell>1983</cell> <cell>Member of the <q>Scientific Programme Committee</q>
of the ICCL (San Francisco).</cell></row>
<row><cell>From 1983</cell> <cell>Co-chairman of the ALLC Symposia and Proceedings.</cell></row>
<row><cell>1984</cell> <cell>Member for computational linguistics of the Scientific
Committee of Eurolex II (Zurich 86).
</cell></row></table>
</p></div3>
 
<?WS .cc 7 ?>
<div3><head>Research Work</head>
<p><table rend="datedlist">
<row><cell>1960</cell> <cell>Degree <hi>cum laude</hi> in Classics at
the University of Padua
with a thesis in glottology (Studies in
linguistic statistics performed using an IBM system).</cell></row>
<row><cell>1961</cell> <cell>NATO study grant
at the <q>International Summer Institute on Mechanical
Translation</q>.</cell></row>
<row><cell>1962</cell> <cell>NATO study grant at the
<q>International Summer Institute on Automatic
Documentation</q>.</cell></row>
<row><cell>1960-1965</cell> <cell>Research Director and Assistant to the Director at the
<q>Centro per l'Automazione dell'Analisi Linguistica (CAAL) di
Gallarate</q>.</cell></row>
<row><cell>From 1966</cell> <cell>Consultant to the Accademia della Crusca, Opera del
Vocabolario.</cell></row>
<row><cell>1967-1978</cell> <cell>Director of the <q>Divisione Linguistica</q> of CNUCE
(Institute of the Italian Research Council).</cell></row>
<row><cell>From 1971</cell> <cell>Director of the project <q>Vocabolario Italiano di
macchina</q> (Italian Machine Dictionary).</cell></row>
<row><cell>1974</cell> <cell>Member of the research group for <q>Moderne Tecnologie della
documentazione bibliografica ed emerografica corrente per le
Universit&agrave;.
e gli Enti di ricerca in Italia</q> of the Committee for Technological
Research of CNR.</cell></row>
<row><cell>1974</cell> <cell>Representative for Computational
Linguistics of the CNR Scientific
Committee for the project to Automatize the Documentation
of the House of Commons.</cell></row>
<row><cell>From 1978</cell> <cell>Director of the
<q>Istituto di Linguistica Computazionale
(ILC)</q> of the Italian National Research Council
(Pisa). The Institute has a staff of thirty and the following research
projects are currently in progress
<lb/>
Procedures and tools for the automation of philological research;
Software and technologies for linguistic data processing;
Processing and management
of textual archives and linguistic multifunctional databases;
Italian Machine Dictionary;
Automatic syntactic analysis of Italian; Studies and textual
analyses of present day Italian; Morphosyntactic analyses and
lemmatisation of Spanish Knowledge representation language and semantic
parser.
<lb/>
The Institute also collaborates in more than 50
text-processing projects
underway in Italy and abroad which use the encoding
standards and the generalised procedures of the ILC, and has the
statutory obligation of contacts, collaborations and exchanges
between Italy and other countries in the field of literary and
linguistic data processing using the computer.
</cell></row></table>
 
<?WS .cc 7 ?></p></div3>
<div3><head>Editorial Activity</head>
<p>Member of the Advisory Council of <title>Computers and the
Humanities</title> (New York).</p>
<p>Member of the Editorial Advisory Board of <title>TA-Information</title>
(Paris).</p>
<p>Member of the Editorial Advisory Board of <title>ITL - Review
of Applied Linguistics</title> (Louvain).</p>
<p>Member of the Editorial Board of
the <title>ALLC Bulletin</title>
(Cambridge, U.K.).</p>
<p>Member of <q>Comit&eacute; de redaction</q> of the
<title>Cahiers de Lexicologie</title> (Besancon).</p>
<p>Editor of the <title>AILA Bulletin</title>.</p>
<p>Member of the Editorial Advisory Board of Multilingua (The Hague).</p>
<p>Director of the journal <title>Linguistica Computazionale</title>.
 
<?WS .cc 7 ?></p></div3>
<div3><head>Conferences and Seminars</head>
<p>Invited speaker at Institutes and Research Centres in:
Amsterdam, Aarhus, Barcellona, Budapest,
Brussells, Chicago, Copenhagen, Liege, Leuven,
London, Lubjana, Luxembourg, Madrid, Minneapolis,
Moscow, Nancy, New York, Oviedo, Paris,
Prague, Provo, Stanford, Stockholm, Varna, Vienna, etc.
</p></div3></div2></div1>
<?WS .pf?>
<div1><head>Budget for Subcontract with the University of Illinois at Chicago</head>
<!--* 
.* Following line deleted, next added, for distribution copies:  MSM 5/3
.* [See forms inserted after this sheet.]
*-->
<p>
[Note:  Detailed budget information has been omitted from this copy.]
</p>
<!--* 
.* >Letters from the University of Illinois at Chicago
.* tters are reproduced after this sheet.]
*-->
</div1>
</back>
</text>
</TEI.2>
<!-- Keep this comment at the end of the file
Local variables:
mode: sgml
sgml-default-dtd-file:(concat sgmlvol "/SGML/Public/Emacs/teilite.ced")
sgml-omittag:t
sgml-shorttag:t
End:
-->
