<?xml version="1.0" encoding="utf-8"?>
<TEI xmlns="http://www.tei-c.org/ns/1.0">
  <teiHeader>
    <fileDesc>
      <titleStmt>
        <title>TEI ED P1: Design Principles
for Text Encoding Guidelines</title>
<title>Done into XML</title>
      </titleStmt>
      <publicationStmt>
        <p>Distributed from the TEI Web site</p>
      </publicationStmt>
      <sourceDesc>
        <p>Converted to TEI P5 XML from the HTML version at
	http://cmsmcq.com/1990/edp1.html, itself derived from the
	original versions, in plain text and in GML, available
	from http://www.tei-c.org/Vault/SC/teipcp1.txt and
	http://www.tei-c.org/Vault/SC/teipcp1.gml respectively.
</p>
      </sourceDesc>
    </fileDesc>
  </teiHeader>
  <text>
    <front>
      <divGen type="toc"/>
     </front>



<!--Need to find a way to insert the current date
   
    body {  
      margin: 2em 1em 2em 70px;  
      font-family: New Times Roman, serif;    
      color: black;    
      background-color: white;
    }
    p {
      margin-top: 0.6em;      
      margin-bottom: 0.6em;
    }
    p.bibl {
      margin-top: 0.6em;      
      margin-bottom: 0.6em;
      margin-left: 2em;
      text-indent: -2em;
    }
    .Real-P {
      margin-top: 0.6em;      
      margin-bottom: 0.6em;
    }
    div.address {
      margin-top: 0.6em;      
      margin-bottom: 0.6em;
      margin-left: 2em;
    }
    div.note {
      margin-top: 1em;      
      margin-left: 2em;
      text-indent: -2em;
    }
    pre {  
      font-family: monospace 
      margin-left: 2em 
    }  
    a:hover { 
      background: #CCF 
    }
    td.no { 
      background: #CCF 
    }
    .author { 
      font-size: x-large
    }
    .bio { 
      font-size: small;
      font-style: italic
    }
  </style></head><body><div class="doc">



-->
<body>
<div>
<head>Design Principles
for Text Encoding Guidelines</head>

<head type="sub">TEI ED P1</head>
<head type="date">14 December 1988<lb/>rev. 9 January 1990</head>

<div type="abstract"><head>Abstract</head><p>
This document defines the basic design goals and working principles
for the text encoding guidelines to be created by the Text Encoding
Initiative.
</p><p>
It extends the principles enunciated by the Poughkeepsie Planning
Conference of November 1987 (see TEI document no. TEI PCP1) to
questions of detail not covered there, and provides basic
interpretations of the clauses of the Poughkeepsie Principles.
</p></div>
<!--
<list>
<item>1. <ptr target="#b2b1b3b3b1">Introduction</a></item>
<item>2. <ptr target="#b2b1b3b3b3">The Poughkeepsie Principles
<lb/>
Closing Statement of Vassar Conference
<lb/>
The Preparation of Text Encoding Guidelines</a></item>
<item>3. <ptr target="#b2b1b3b3b5">Purpose of the Guidelines</a><list>
<item>3.1. <ptr target="#b2b1b3b3b5b1">Guidance for New Encodings</a></item>
<item>3.2. <ptr target="#b2b1b3b3b5b2">Common Interchange Format</a></item>
<item>3.3. <ptr target="#b2b1b3b3b5b3">Documentation of Existing Markup Schemes</a></item>
</list></item>
<item>4. <ptr target="#b2b1b3b3b7">Design Goals</a></item>
<item>5. <ptr target="#b2b1b3b3b9">Scope of the Guidelines</a><list>
<item>5.1. <ptr target="#b2b1b3b3b9b1">Activities Covered</a></item>
<item>5.2. <ptr target="#b2b1b3b3b9b2">Types of Research</a></item>
<item>5.3. <ptr target="#b2b1b3b3b9b3">Text Types</a></item>
<item>5.4. <ptr target="#b2b1b3b3b9b4">Languages and Scripts</a></item>
</list></item>
<item>6. <ptr target="#b2b1b3b3c11">Content of the Guidelines</a><list>
<item>6.1. <ptr target="#b2b1b3b3c11b1">Level of detail</a></item>
<item>6.2. <ptr target="#b2b1b3b3c11b2">Form</a></item>
</list></item>
<item>7. <ptr target="#b2b1b3b3c13">Structure of the Guidelines</a><list>
<item>7.1. <ptr target="#b2b1b3b3c13b1">Kernel / Additions</a></item>
<item>7.2. <ptr target="#b2b1b3b3c13b2">Draft Table of Contents</a></item>
<item>7.3. <ptr target="#b2b1b3b3c13b3">Prescription and Description</a></item>
</list></item>
<item>8. <ptr target="#b2b1b3b3c15">Specific Design Issues</a><list>
<item>8.1. <ptr target="#b2b1b3b3c15b1">Nomenclature</a></item>
<item>8.2. <ptr target="#b2b1b3b3c15b2">Attributes vs. Elements</a></item>
<item>8.3. <ptr target="#b2b1b3b3c15b3">Character Sets and Transliteration</a></item>
<item>8.4. <ptr target="#b2b1b3b3c15b4">Descriptive Markup</a></item>
</list></item>
</list>

-->
<divGen type="toc"/>


<div type="div" xml:id="b2b1b3b3b1"> 
<head>Introduction</head><p>
The Text Encoding Initiative is a cooperative undertaking of the textual
research community to formulate and disseminate guidelines for the
encoding and interchange of machine-readable texts intended for
literary, linguistic, historical, or other textual research.  It is
sponsored by the Association for Computers and the Humanities (ACH), the
Association for Computational Linguistics (ACL), and the Association for
Literary and Linguistic Computing (ALLC).  A number of other learned
societies and professional associations support the project by their
participation in the Initiative's Advisory Board.  The project is funded
in part by the U.S. National Endowment for the Humanities.
</p><p>
The primary goal of the Text Encoding Initiative is to provide explicit
guidelines which define a text format suitable for data interchange and
data analysis; the format should be hardware and software independent,
rigorous in its definition of textual objects, easy to use, and
compatible with existing standards.  The Standard Generalized Markup
Language (SGML) is expected to provide an adequate basis for the
guidelines.
</p><p>
This document attempts to set out the fundamental principles upon which
the work of the Text Encoding Initiative is to proceed.  In it, the
guidelines for text encoding and text interchange to be formulated by
the Initiative are referred to simply as “the guidelines”; the
encoding scheme specified by the guidelines is referred to as “the TEI
scheme” to distinguish it from other encoding schemes extant or
prospective.  “Encoding scheme” and “markup scheme” are here
used interchangeably; the term “tag set,” which conveys a different
sense, is sometimes also used since the expectation is that the TEI
markup scheme will consist largely of a set of SGML tags together with
an account of their interrelationships and meanings.
</p></div>


<div xml:id="b2b1b3b3b3">
<head>The Poughkeepsie Principles
<lb/>
Closing Statement of Vassar Conference
<lb/>
The Preparation of Text Encoding Guidelines</head>
<dateline>Poughkeepsie, New York: 
13 November 1987</dateline>
<list type="ordered"><item>The guidelines are intended to provide a standard format for
    data interchange in humanities research.</item>
<item>The guidelines are also intended to suggest principles for
    the encoding of texts in the same format.</item>
<item>The guidelines should
    <list type="ordered"><item>define a recommended syntax for the format,</item>
<item>define a metalanguage for the description of text-encoding
        schemes,</item>
<item>describe the new format and representative existing schemes
        both in that metalanguage and in prose.</item>
</list></item>
<item>The guidelines should propose sets of coding conventions
    suited for various applications.</item>
<item>The guidelines should include a minimal set of conventions
    for encoding new texts in the format.</item>
<item>The guidelines are to be drafted by committees on
    <list type="ordered"><item>text documentation</item>
<item>text representation</item>
<item>text interpretation and analysis</item>
<item>metalanguage definition and description of existing and
        proposed schemes,</item>
</list>
    coordinated by a steering committee of representatives of the
    principal sponsoring organizations.</item>
<item>Compatibility with existing standards will be maintained as
    far as possible.</item>
<item>A number of large text archives have agreed in principle to
    support the guidelines in their function as an interchange
    format.  We encourage funding agencies to support development of
    tools to facilitate this interchange.</item>
<item>Conversion of existing machine-readable texts to the new
    format involves the translation of their conventions into the
    syntax of the new format.  No requirements will be made for the
    addition of information not already coded in the texts.</item>
</list><p>
The principles agreed upon at the Poughkeepsie Planning Conference are
expounded in more detail and supplemented with other material in the
sections which follow.
</p></div>

<div type="div" xml:id="b2b1b3b3b5">
<head>Purpose of the Guidelines</head>
<div type="div" xml:id="b2b1b3b3b5b1">
<head>Guidance for New Encodings</head><p>
Points 2, 4, and 5 of the Poughkeepsie Principles mandate that the
guidelines should simplify the tasks facing projects to encode
new texts in machine-readable form by making it unnecessary for such
projects to design an encoding scheme from scratch.  By recommending a
standard minimum set of textual features commonly found useful, the
guidelines should help raise the quality and ensure the re-usability of
new encodings; sets of special-purpose tags for specific research
disciplines should make it easier for independent projects in those
disciplines to exchange data and results.
</p><p>
To help those who encode new texts, the guidelines must provide
<emph>guidance</emph> for researchers who might otherwise be perplexed
at some of the complications of machine-readable texts and encode
unnecessary textual features at the cost of omitting features which
prove more desirable.  The guidelines should reduce, not increase, the
perplexity of deciding what to encode.</p>
</div></div>
<div type="div" xml:id="b2b1b3b3b5b2">
<head>Common Interchange Format</head><p>
Principles 1, 4, 8, and 9 direct that the guidelines must be suitable
for the interchange of encodings among sites using different schemes.
This should be of assistance to data archives, their borrowers, and even
to software developers who can rely on this interchange format as a
documented interface between their software and its textual data.
</p><p>
For interchange, it must be possible to translate from any existing
scheme for text encoding into the TEI scheme without loss of
information.  All distinctions present in the original encoding must be
preserved.  Additionally, conventions used in the original encoding
should be documentable within the interchange format.
</p><p>
When the TEI scheme is used as an interchange format for pre-existing
encodings, the recommendations for minimum tagging mandated by Principle
5 do not apply.  As Principle 8 makes clear, translation into the TEI
scheme should not be construed as requiring the addition of any new
information not present in the original encoding.
</p></div>
<div type="div" xml:id="b2b1b3b3b5b3">
<head>Documentation of Existing Markup Schemes</head><p>
The guidelines will include descriptions of selected existing markup
schemes contrasted with the TEI scheme.  This will help clarify the new
scheme for those familiar with existing schemes; it should also assist
users confronted with data encoded in existing schemes, and those who
must translate encodings from one scheme to another.
</p><p>
In addition to an informal description in prose, the guidelines will
also provide a formal description of selected existing schemes in a
metalanguage to be prepared for the purpose.  The use of a formal
metalanguage will, it is hoped, encourage rigorous documentation of
existing schemes, and may make possible the automatic production of
software to translate data from formally documented schemes into the TEI
scheme.
</p></div></div>


<div type="div" xml:id="b2b1b3b3b7">
<head>Design Goals</head><p>
The following design goals are to govern the choices to be made by the
working committees in drafting the guidelines.  Higher-ranked goals
should count more than lower-ranked goals.  The guidelines should
<list type="ordered"><item>suffice to represent the textual features needed for research</item>
<item>be simple, clear, and concrete</item>
<item>be easy for researchers to use without special-purpose software</item>
<item>allow the rigorous definition and efficient processing
        of texts</item>
<item>provide for user-defined extensions</item>
<item>conform to existing and emergent standards</item>
</list>
For the most part, the design goals are self-explanatory, but some
commentary is in order.</p>
<p>
The TEI, as an undertaking of the research community, is responsible
primarily to that community for creating encoding practices adequate to
research needs.  Since researchers do use commercial software and
publish their results, the practices and needs of commercial software
developers and publishers must also be considered, but they are not to
outweigh the needs of textual research.<ref
xml:id="ref-to-b2b1b3b3b7b2b1" target="#b2b1b3b3b7b2b1">[1]</ref>
</p><p>
Research work requires above all the ability to define rigorously (i.e.
precisely, unambiguously, and completely) both the textual objects being
encoded and the operations to be performed upon them.  Only a rigorous
scheme can achieve the generality required for research.  A rigorously
defined encoding scheme can also allow many text-management tasks to be
automated.
</p><p>
For a scheme to be adopted by the research community, it must be clear,
concrete, and easy to use.  Otherwise, it will simply be ignored.
</p><p>
Since research necessarily involves the asking of questions that have
not been asked before, a research-oriented encoding scheme must also be
extensible.  Some measure of extensibility, then, is an absolute
requirement for the TEI markup scheme.  As a design goal,
“extensibility” refers not to this absolute requirement (a solution
for which we can take as a given), but to the ease with which various
portions of the scheme can be fitted with extensions and the ease with
which various possible kinds of extensions can be created.
</p><p>
“Compatibility with existing standards and practice” is to be
sought, but (as its rank suggests) not at the expense of the other design
goals.  The standards most relevant to this goal are SGML and existing
applications of SGML, as well as the standards now being developed for
page description and similar applications.<ref
xml:id="ref-to-b2b1b3b3b7b6b3" target="#b2b1b3b3b7b6b3">[2]</ref>
The Text Encoding Initiative will develop a conforming SGML
application, if it can meet the needs of researchers by doing so.
Where research needs require constructs unavailable with SGML, however,
research must take precedence over the standard.
</p><p>
The Initiative is not committed to using the full range of constructs
available with SGML; the metalanguage committee is responsible for
assessing the guidelines' compatibility with commonly available
software.
</p></div>


<div type="div" xml:id="b2b1b3b3b9">
<head>Scope of the Guidelines</head>
<div type="div" xml:id="b2b1b3b3b9b1">
<head>Activities Covered</head><p>
The guidelines should contain explicit recommendations for
<list><item>the encoding of new texts (which textual features should be
    captured, as a minimum, and how they should be marked)</item>
<item>the addition of new information or corrections to existing
    encodings</item>
<item>the interchange of existing encodings</item>
<item>archival documentation of encodings</item>
<item>text and encoding documentation for purposes of bibliographic
    control</item>
<item>documentation of selected markup schemes in terms of the
    recommended scheme</item>
</list>
</p></div>
<div type="div" xml:id="b2b1b3b3b9b2">
<head>Types of Research</head><p>
Ultimately, the guidelines should support work in any discipline based
on textual material.  Pragmatically, this goal exceeds anyone's capacity
right now.  The first published version of the guidelines should provide
explicit guidance for encoding the textual features of interest to those
disciplines most commonly using machine assistance:  the more
computationally oriented branches of linguistics; lexicography;
thematic, metrical, and stylistic studies; historical editing; content
analysis.  Discipline-related work will concentrate first on linguistic
issues.  Later drafting will extend the guidelines to problems specific
to literary studies, historical research, and other textual disciplines.
</p></div>
<div type="div" xml:id="b2b1b3b3b9b3">
<head>Text Types</head><p>
The guidelines should provide explicit guidance for the text types most
commonly encountered in textual research; esoteric genres may be left
for user-defined extensions.  The first drafting cycle will concentrate
on simple forms (simple nonfiction, unillustrated prose narrative,
poetry, plays) and basic reference works (notably monolingual and
multilingual dictionaries).  Some attention will be paid to all text
types represented in the major linguistic corpora.
</p></div>
<div type="div" xml:id="b2b1b3b3b9b4">
<head>Languages and Scripts</head><p>
The goal of the Initiative is to devise and document encoding methods
appropriate to every language used officially or studied
extensively with machine assistance in Europe and North America.  Since
character-encoding problems grow progressively more complex and
progressively more dependent on hardware configurations as the script
diverges from the left-to-right alphabetic pattern of English, the
Initiative will attempt first to address the basic problems of
left-to-right alphabetic languages, progressing from there to
right-to-left and other monodirectional scripts, and on to
multidirectional, multi-script texts and texts in non-alphabetic
languages.</p></div></div>


<div type="div" xml:id="b2b1b3b3c11">
<head>Content of the Guidelines</head>
<div type="div" xml:id="b2b1b3b3c11b1">
<head>Level of detail</head><p>
The guidelines will make specific concrete recommendations for
delimiters, separators, tags for use in encoding and interchange, codes
for “special” characters, methods of declaring extensions, and
methods of describing existing markup schemes.  Additionally, specific
descriptions will be provided for the syntax and semantics of a small
set of existing schemes.
</p><p>
The recommendations for interchange among users will take into account
the character sets now in use in academic computing environments and the
problems of inter-machine transfer.<ref xml:id="ref-to-b2b1b3b3c11b1b2b1" target="#b2b1b3b3c11b1b2b1">[3]</ref>
At the same time, the guidelines will be device independent and
will neither rely upon nor discuss specific hardware or software.</p></div>
<div type="div" xml:id="b2b1b3b3c11b2">
<head>Form</head><p>
The guidelines will to the extent possible take the form of sets of SGML
tags and attributes.  Examples will be provided.  Formal document
type definitions will be prepared if they promise to make the guidelines
appreciably clearer or more useful.</p></div></div>


<div type="div" xml:id="b2b1b3b3c13">
<head>Structure of the Guidelines</head>
<div type="div" xml:id="b2b1b3b3c13b1">
<head>Kernel / Additions</head><p>
The guidelines will provide for
<list><item>the encoding of the text itself</item>
<item>the documentation of the text's source</item>
<item>the documentation of the encoding itself and its peculiarities</item>
</list>
</p><p>
The provisions of the guidelines can be divided into a central core
or kernel of principles and tags applicable to all texts or to the
great majority of texts (“general-purpose tags”) and various sets
of tags or encoding conventions applicable to texts in specific
languages or scripts, texts of specific text types, or texts encoded for
specific disciplinary purposes (“special-purpose tags”).  Within
each set of tags, distinctions can be made between recommended and
optional practices, but only general-purpose tags will be recommended
for all texts.  Each set of tags devised for a specific language,
script, text type or discipline may itself comprise a kernel of common
tags and one or more sets of optional tags extending the kernel.  When
these sets of tags are used consistently in groups, the encoding
practice of individual texts can be described by listing the tag sets
used (e.g. “encoded with basic set of general-purpose tags plus level
3 of metrical and level B of lexical tags”).</p></div>
<div type="div" xml:id="b2b1b3b3c13b2">
<head>Draft Table of Contents</head><p>
A draft table of contents for the guidelines follows:
<list><item>1  Principles of Text Encoding
    <list><item>1.1  Why Markup Is Necessary at All
             (A brief discussion about functions of descriptive markup,
             why it is not presentational, etc.)</item>
<item>1.2  The Advantages of Standardized Markup</item>
</list>
    </item>
<item>2  About These Guidelines
    <list><item>2.1  Intended Applications
             (Database/retrieval/analysis as well as printing and
             formatting.  Research community rather than commercial.
             Relevance to language industries.)</item>
<item>2.2  Design Principles
             (How features are defined and described in the Guidelines.)</item>
<item>2.3  Structure of the Guidelines
             (Base features, optional features, "boxes" and their
             base and optional features or "levels of description".
             Document prolog and document body.)</item>
</list>
    </item>
<item>3  SGML Markup
    <list><item>3.1  Principles and Definitions
             (Introduction to SGML:  tags, elements, content models,
             document type declarations.  Alternatives to DTDs.
             SGML declarations in general.)</item>
<item>3.2  SGML Declarations for the TEI Guidelines
             (Description of SGML features used in TEI Guidelines,
             text of formal SGML Declaration for TEI texts.)</item>
<item>3.3  Non-SGML Declarations for TEI Texts
             (How to declare what tags you have used.  How to declare
             use of specific levels of description, substitution of tag
             names, etc.  Cross-reference to later sections for details
             of declarations for pre-defined material; cross-reference
             to chapter 9 for full details on declaring modifications
             and extensions.)</item>
</list>
    </item>
<item>4  Characters and Character Sets
    <list><item>4.1  Principles and Definitions
             (Characters, character sets, character repertoires.
             What is ASCII.  Seven-bit and eight-bit ASCII.
             Standard ways of extending ASCII.  Vendor-specific
             non-standard extended ASCIIs.  SGML-supported
             character sets.  EBCDIC.  IBM PC character set.
             Macintosh, Mac extensions by other vendors.  Adobe
             Postscript.  Transliteration schemes.  Entity
             references.)</item>
<item>4.2  Recommendations
             (For character sets, names of ISO sets and EBCDIC code
             pages.  Possibly include recommended transliterations,
             sample USEMAP and CHARSET declarations, etc.?  For
             entities, simply list recommended name, description and
             appearance of various special characters.  Possibly
             relegate detailed lists and code-pages to appendices.)
    <list><item>4.2.1  Recommended Character Sets</item>
<item>4.2.2  Recommended Entity Names</item>
<item>4.2.3  Declaring New Character Sets or Character Entities</item>
</list>
    </item>
</list>
    </item>
<item>5  Bibliographic Control of Electronic Texts
    <list><item>5.1  Principles and Definitions</item>
<item>5.2  Recommended Features and Tags
             (Bibliographic identification of machine-readable text.
             Bibliographic identification of source text(s).
             Documenting changes to the source text during pre-editing
             or data entry.  Documenting changes to the machine-readable
             text.)</item>
<item>5.3  Correspondence between Recommended Tags and MARC fields</item>
</list>
    </item>
<item>6  Features Common to All Texts
    <list><item>6.1  Principles and Definitions</item>
<item>6.2  Recommended Features and Tags
    <list><item>6.2.1  Basic Text Structure
               (Front matter, body, back matter, chapters, sections,
               etc. down to paragraph level.)</item>
<item>6.2.2  Non-structural Text Segments
               (Features below paragraph level, including highlighting,
               emphasis, quotation, index entries, special layout,
               language and script, illustrations ...)</item>
<item>6.2.3  Figures and Tables</item>
<item>6.2.4  Bibliographic References</item>
<item>6.2.5  Critical Apparatus</item>
<item>6.2.6  Parallel Texts</item>
<item>6.2.7  Cross Reference and Textual Links</item>
</list>
    </item>
</list>
    </item>
<item>7  Features for Specific Text Types
    <list><item>7.1  Principles and Definitions</item>
<item>7.2  Recommended Features and Tags
    <list><item>7.2.1  Mixed Corpora</item>
<item>7.2.2  Literary Texts</item>
<item>7.2.3  Technical and Scientific Texts</item>
<item>7.2.4  Historical Documents</item>
<item>7.2.5  Dictionaries and Lexica</item>
<item>7.2.6  Transcripts of Spoken Texts</item>
</list>
    </item>
</list>
    </item>
<item>8  Analytic and Interpretive Features
    <list><item>8.1  Principles and Definitions</item>
<item>8.2  Recommended Features and Tags
    <list><item>8.2.1  Syntactic Features</item>
<item>8.2.2  Morphological Features</item>
<item>8.2.3  Phonological Features</item>
<item>8.2.4  Lexical Features</item>
</list>
    </item>
</list>
    </item>
<item>9  Extending the Guidelines
    <list><item>9.1  Modifying the Guidelines
             (Substituting short forms or different names for tags or
             attributes.  Using a tag with a different meaning.
             Changing legal attribute values.  Restricting where tags
             can occur; allowing tags to occur in new places; doing
             away with syntactic restrictions altogether.)</item>
<item>9.2  Defining Additional Features
             (Adding new tags:  defining where they can occur, what
             can occur inside them, and what they mean.  Defining new
             attributes for old or new tags.)</item>
<item>9.3  Worked Example</item>
</list>
    </item>
<item>10  Full Alphabetical List of Features
            (A summary for each recommended tag, giving
            name, definition, description, associated features and
            brief example of usage)</item>
<item>11  Translation Table
            (shows equivalent name in each EC language for every
            name listed in sections 3.2.2 and 9.  Omit in 1990?)</item>
<item>12  Use of the Guidelines for Document Interchange
            (Portability.  Different needs of document capture, storage,
            processing, and interchange.  Reducing danger of character
            set confusions during document interchange.)</item>
<item>Appendices
<list><item>A  How the Guidelines were Developed
           (brief note about TEI structure and history)</item>
<item>B  Mapping from the Guidelines to Other Encoding Schemes
           (Translating into the Guidelines from existing encoding
           schemes.  Translating back out.)</item>
<item>C  Examples of Tagged Texts</item>
</list>
</item>
</list>
</p><p>
The editors, in consultation with the committee heads and the Steering
Committee, will have primary responsibility for sections 1, 2, 12, and A
(Principles of Text Encoding, About the Guidelines, Document Interchange,
and History of the TEI).  They will assemble section 10 from the
work of the committees.
</p><p>
The Committee on Text Documentation will have primary responsibility for
section 5 (Bibliographic Control) and will consult with the Committee
on Metalanguage and Syntax Issues on the overall organization of
section 3.3 (Non-SGML Declarations).  They will advise the Text
Representation committee on section 6.2.4 (Bibliographic References).
</p><p>
The Committee on Text Representation will have primary responsibility
for sections 4 (Character Sets), 6 (Features Common to All Text Types),
and 7.1 through 7.4 (Corpora, Literature, Technical and Scientific
Documents, Historical Documents), as well as any other sections inserted
into section 7 on further text types.  They should consult with the
Committee on Text Documentation on section 6.2.4 (Bibliographic
References) and will partially determine the content of declarations
relevant to their subject domain (section 3.3).
</p><p>
The Committee on Text Analysis and Interpretation will have primary
responsibility for sections 7.5 and 7.6 (Dictionaries, Spoken Texts) and
section 8 (Analytic and Interpretive Tags).  They will contribute also
to section 3.3 (Non-SGML Declarations).
</p><p>
The Committee on Metalanguage and Syntax Issues will have primary
responsibility for sections 3, 9, and B (SGML, Extensions, and Mapping
to Other Schemes).  They will collaborate with the Committee on Text
Documentation on section 3.3 (Non-SGML Declarations).
</p><p>
All four working committees will contribute to sections 10 (Full List
of Features) and C (Examples).</p></div>
<div type="div" xml:id="b2b1b3b3c13b3">
<head>Prescription and Description</head><p>
Compliance with the guidelines is necessarily a voluntary matter;
use of the term “requirement” in connection with the guidelines
must therefore not be misconstrued.  Within the context of the
guidelines, however, the committees will be able to specify a mix of
requirements, recommended practices, optional features and practices,
required choices among defined alternatives (“electives”), and
possible user-defined extensions.  Provision will also be made for
documentation of user-specified deviations from the recommendations of
the guidelines.</p></div></div>


<div type="div" xml:id="b2b1b3b3c15">
<head>Specific Design Issues</head>
<div type="div" xml:id="b2b1b3b3c15b1">
<head>Nomenclature</head><p>
For clarity, each generic identifier (“tag name”) should be a full
natural-language word or phrase in English, French, or Latin.  Working
committees may if they wish define full sets of tag names in more than
one language.
</p><p>
To avoid collisions, abbreviations must be set centrally; working
committees may recommend abbreviations but should not rely on them.</p></div>
<div type="div" xml:id="b2b1b3b3c15b2">
<head>Attributes vs. Elements</head><p>
Any SGML tag set including elements with attributes can be rewritten as
a tag set including no attributes, only elements (one for each original
element or attribute).  This makes it unnecessary to decide whether a
given textual feature should be expressed as a tag or an attribute, and
simplifies the design process.  It also simplifies the processing of the
text stream.
For these reasons, some SGML applications
(e.g. the AAP tag set and the SGML-like processors of the Centre for the
New Oxford English Dictionary at the University of Waterloo) eschew
attributes entirely.
</p><p>
Attributes, on the other hand, express more clearly than separate
element tags the association of information units in a text.  An element
called “chapter” with attributes of “number” and “title”
clearly displays the one-to-one relationship holding among chapter,
chapter number, and chapter title.  A set of three separate elements, on
the other hand, obscures the one-to-one relationship for the human
reader.  This is true even if the document type definition enforces a
one-to-one relation among the three elements.
The use of attributes also permits a restriction to be made on the
content of a particular feature (i.e. the legal values of an attribute)
which may be useful in some situations.
The tradeoffs between simplicity of parsing and clarity of notation
are still being explored; no decision has yet been reached.</p></div>
<div type="div" xml:id="b2b1b3b3c15b3">
<head>Character Sets and Transliteration</head><p>
Standard transliterations will be documented where available.  Standard
character sets (including those registered under ISO registration
procedures) will be documented if they are useful.</p></div>
<div type="div" xml:id="b2b1b3b3c15b4">
<head>Descriptive Markup</head><p>
Descriptive markup will be preferred to procedural markup.  The tags
should typically describe structural or other fundamental textual
features, independently of their representation on the page.  In some
cases, however, the physical appearance of the original text carrier is
the primary object of interest. In others, there may be no consensus as
to the meaning of all aspects of the text's physical appearance, which
must therefore be represented as explicitly as possible.  In neither
case however should the primary purpose of the markup be viewed simply
as the reproduction of the appearance of the original.</p></div></div>


<div><head>Notes</head>
<note xml:id="b2b1b3b3b7b2b1" target="#ref-to-b2b1b3b3b7b2b1">1] Commercial and
    research interests do not, in any case, always conflict.  Both
    are best served by an intellectually adequate analysis of
    textual problems and their representation.  Very few problems in
    the research area lack analogues in commercial areas, even
    though in research the problems may occur more often and more
    forcefully.</note>
<note xml:id="b2b1b3b3b7b6b3" target="#ref-to-b2b1b3b3b7b6b3">2] Also relevant, but
    less difficult to accommodate, are the national and
    international standards for data interchange, character sets,
    character names, etc., and the standards governing library
    cataloguing, dataset description, transliterations, etc.</note>
<note xml:id="b2b1b3b3c11b1b2b1" target="#ref-to-b2b1b3b3c11b1b2b1">3] This means that problems
    of ASCII-EBCDIC translation, and the limitations of the
    ASCII-EBCDIC translations in common use, will be specifically
    addressed; the interchange character set should be the set of
    all characters consistently and reversibly translated by all
    such translation programs.  The recently developed 190-character
    extensions to ASCII and EBCDIC will also be discussed.</note>
</div></body>
  </text>
</TEI>
