Date:         Thu, 19 Oct 89 19:29:17 CST
From:         "Robin C. Cover" <ZRCC1001@SMUVM1>
Subject:      ASCII FILE TEI TRW2
To:           "Dr. C.M. Sperberg-McQueen" <U35395@UICVM>

===================================================================
For anyone who prefers easier reading, here are conversions for some
(American ascii) hi-bit characters & standard print modes:
a-umlaut = a/"  o-umlaut = o/"  u-umlaut = u/"
e-acute  = e/'
e-circumflex = e//  u-circumflex = u//  i-circumflex = i//
<bo>bold string</>
<it>italic string</it>
<<endnote numbers>>
===================================================================

Draft Copy, Working Paper                        3909 Swiss Avenue
Copyright (c) Robin C. Cover (October 1989)      Dallas, Texas 75204
TEI, Text Representation Committee               zrcc1001@smuvm1.BITNET
TEI TRW2 (Revision .9; October 16 1989)          attctc!utafll!robin.UUCP


<bo>SPECIAL CHALLENGES FOR THE ENCODING OF RELIGIOUS TEXTS</bo>

<bo>INTRODUCTION</bo>

    The following is a draft copy of a working paper for the TEI
Text Representation subcommittee.  My assignment was to describe
some of the special concerns for encoding of "religious" texts of
antiquity and modern times.  I employ the terms "encoding" and
"markup" as nearly synonymous, but occasionally reserve the
former as a broader term: (a) to include the recording of a wide
range of information about copy texts; (b) to remain non-
committal about the propriety or possibility of using descriptive
markup [mnemonic tags stored within the "flat" text file] as the
specific means of encoding.<<1>>

<bo>SACRED TEXTS</bo>

    Within the domain of the world's "religious" texts, sacred
texts must hold a privileged place.  It is my suspicion (though
perhaps just an unfounded complaint) that literary religious
texts constitute some of the more complex kinds of documents in
world literature, especially when we reckon with the phenomenon
of "virtual" documents within the sphere of canonical/sacred
literature.  I will attempt to prove this claim with a survey of
the special problems encountered in encoding and/or markup of
sacred texts.  I have isolated several categories of inter-
related features of sacred texts, and supply descriptions of some
implications for encoding and markup.

<bo>FACTOR 1. SACRED TEXTS ARE WRITTEN IN ANCIENT LANGUAGES: THE
CONCOMITANT DOMINANCE OF NON-ROMAN, NON-ALPHABETIC AND LIGATURED
SCRIPTS.</bo>

    Minutes of the first meeting of the TEI Text Representation
Committee (Lou Burnard, TEI TR M 1; University of Toronto, June
6, 1989) indicated that during the first [two-year] phase of the
TEI's work, only alphabetic scripts and the nine official
languages of the modern European community would be embraced
(Danish, Dutch, English, French, German, Greek, Italian,
Portuguese, Spanish), with consideration of Slavic, ancient Greek
and Latin as highly desirable.  A brief survey of religious
populations in the modern world and their sacred literatures
reminds us that this initial agenda of the TEI will not take us
very deeply into "religious" texts.  The world's three "revealed"
religions are Judaism, Christianity and Islam.  The historic
Jewish scriptures are written in Hebrew and Aramaic; Orthodox
Christian scriptures were written in Hebrew and Aramaic (together
amounting to about 75% of the Christian Bible) and Greek; Islamic
scriptures of the Qur'an are written in Arabic.  Hindu, Buddhist
and other Far Eastern scriptures are also largely disqualified
under TEI Phase-I restrictions, by languages and scripts.  For
reasons discussed later, it would only complicate matters to
suggest that English (German, French) <it>translations</it> of
sacred texts be targeted for encoding as a first step. The
suggestion that translations would be "easier to deal with" would
be relevant only if we ignore the fact that the derivative texts
<it>are</it> translations, and it would fail to address the real
needs of textual scholarship (focus on original-language texts).
Thus, the remainder of this paper discusses problems which may be
more germane to Phase-II work of TEI.

    Many issues of text encoding for these ancient
languages/scripts have yet to be addressed, and in my judgment
will require the cooperation of special advisory teams from
several relevant professional societies.  If TEI intends to
provide guidance for orientalists in encoding of sacred texts of
the distant past (religious literature of the ancient
Mediterranean, Middle East, etc.) then cooperation must be sought
from scholars who treat cuneiform, hieroglyphic and other forms
of ideographic writing.  Given the vast corpus of religious
literature in Sumerian, Babylonian, Assyrian, Hittite, Elamite,
Egyptian, Ugaritic, Aramaic (etc.), it would be disappointing if
the guidelines of TEI were not sufficiently general/extensible so
as to provide guidelines for text encoding in these fields.
Computers are already being used for textual analysis in the
study of literature in these "dead" languages, but there is a
desperate need for encoding standards, especially at the lowest
strata of the writing systems.  Since ISO documents governing
international character sets do not cover most of these ancient
languages and scripts, I believe TEI should accept responsibility
for helping coordinate standards efforts within the specialized
professional societies.

<bo>FACTOR 2. SACRED TEXTS WERE TRANSLATED IN ANCIENT TIMES INTO
FOREIGN LANGUAGES: THE COMPLEXITY OF CRITICAL EDITIONS AND
INTERLINEAR FORMATS.</bo>

    Because sacred texts were authoritative, popular, and held
central position in ancient scribal curricula, they were usually
translated in antiquity. The fact of translation is
inconsequential of itself, but becomes significant for encoding
in that critical apparatuses for sacred texts are highly complex,
and sometimes force us to deal with ancient interlinear texts.
The special typographic and text-geographic problems of
interlinear formats are discussed under Factor 7 below.  The
complexity of the critical editions is heightened for sacred
texts because: (a) surviving scriptures, in whole or part, are
sometimes attested only in derivative translation languages which
need to be mapped onto each other and onto a
presumed/reconstructed original-language text; (b) printing or
screen display of variants in the textual apparatus necessitates
the use of symbols which imply the method of reconstruction of an
eclectic text through retroversion; (c) printing or screen
display of the apparatus sometimes requires (for paleographic
reasons) that multiple ancient languages be displayed in their
native scripts.  For example, critical editions of the Hebrew
Bible will include witnesses in the languages and scripts of
Greek, Ethiopic (syllabic script), Armenian (non-roman script)
Arabic (ligatured script), Coptic (non-Roman), Syriac (several
ligatured scripts and dialects), etc.

    If the representation of textual witnesses in these ancient
languages/scripts in <it>printed</it> critical editions is a very
demanding task, the markup for representing document structure
and textual relationships in <it>electronic</it> critical
editions will be an order of magnitude more difficult.  I refer
to two primary issues: (a) the fact that electronic critical
editions useful for data analysis should support in the encoding
<it>more</it> information than the simple typography of critical
editions represents; (b) translating the cryptic syntax of sigla
used in printed critical editions into intelligent encoding is
itself highly complicated.  The complexity of critical editions
as a function of ancient translation can be further elucidated as
follows: modern retroversions (back-translations) must be tagged
to designate level of confidence, formal versus semantic
equivalence, ancient transliterations, etymological (mis-
)interpretation, homiletical/expositional features of translation
variants versus linguistic/exegetical mappings, etc.  When sacred
texts have come to us in interlinear bi-lingual format (as in the
popular Sumerian-Akkadian bilingual tradition), the relationship
between source and target texts must be evaluated text-critically
and appropriately marked-up in the encoding.   The genetic
filiations of texts ("parent" and "daughter" textual generations)
and textual readings within language groups (or across language
groups) must be represented in the encoding if the stemmatic
relationships or textual affinities are understood.  To the
extent that encoding of critical editions should support the
database and "hypertextual" features alluded to here, markup
theory and implementation would appear very demanding.

    If the complexity of a sacred text's critical apparatus
derives from the fact that textual evidence comes from multiple
languages across a wide span of time, it is exacerbated by the
sheer mass of textual data.  It may surprise humanities scholars
who have never used critical editions of sacred texts that a
printed page of a critical edition may contain 10% "text" and 90%
critical apparatus, as measured by character count; the Isaiah
volumes from the Hebrew University Bible Project reflect this
situation.<<2>>  A recent critical edition of the Gospel of Luke
contains over 500 pages and a 1:30 ratio of text to apparatus.<<3>>
Even if the deepest complexity for encoding of critical texts
will be met in the attempt to create tags (with implied syntax)
for designating links and relationships of many <it>kinds</it>
within the textual apparatus, the surfeit of data will add to the
challenge.

    The standard scholarly edition of the printed Hebrew Bible
(<it>Biblia Hebraica Stuttgartensia [BHS]</it>) supplies an
ominous example of the complexity of markup for electronic
critical editions which implicate multiple languages among their
textual witnesses.  Although the critical apparatus of the
<it>BHS</it> is spartan in the extreme, and the main text
represents merely a medieval codex (not a diplomatic edition),<<4>>
the structure of this Bible is nevertheless very complicated.  A
100-page monograph has been written from a generative linguistics
perspective just to help students understand the syntax and
semantics of the textual apparatus.<<5>>  Most critical editions of
Bibles contain far more textual data in critical apparatus than
does <it>BHS</it> (e.g., volumes of critical text for Isaiah in
the Hebrew University Bible Project contain approximately ten
times more textual data than does <it>BHS</it>), they frequently
contain multiple levels in the apparatus (as did the predecessor
to <it>BHS</it>, the <it>BHK</it>), they sometimes contain
additional syntactic ambiguities by reason of more text-critical
sigla, and are otherwise more complex than <it>BHS</it>.

    Thus, the critical text of Hebrew Bible may not represent the
most pathological case: encoding critical editions of the Greek
"Septuagint" tradition (the Old Greek itself being used as
<it>one</it> major witness in reconstructing the textual history
of the Hebrew Bible) may be equally challenging in that more
precise stemmatic and family relationships can be identified in
the relatively greater wealth of textual data.  But suppose the
Hebrew Bible <it>were</it> the most pathological case: should it
be avoided (in the priorities of the TEI) because it is so
complicated?  The Hebrew Bible constitutes holy scripture for
three primary world religions which account for more than 50% of
the earth's religious population.  In my judgment, the TEI
project ought to be able to embrace encoding of these sacred
texts in critical editions and other formats without
embarrassment.

    On the other hand, the challenges involved in "markup" of
critical editions of sacred texts appear immense: my current
perspective and understanding of encoding through descriptive
markup do not prepare me to visualize how the introduction of
inline tags will serve the interests of the text critic.  So I
question whether critical editions ought be to "marked up" at
all.  I would defer to the judgment of scholars who have produced
print copies of critical editions of sacred texts, and to those
who currently employ text-critical databases in research (e.g.,
the CATSS project at the University of Pennsylvania/Hebrew
University, Jerusalem;<<6>> research at the University of Tu/"bingen in
the Abteilung fu/"r literarische und dokumentarische
Datenverarbeitung, Zentrums fu/"r Datenverarbeitung<<7>>), but several
issues are worthy of evaluation.  It must be borne in mind that
for editing critical editions of sacred texts, not just a few,
but hundreds or thousands of manuscripts may be involved.<<8>>  I
highlight here the three most important issues:

    (1) Are standard critical apparatuses in print format the
most useful means of viewing textual alliances and textual
history?  Probably not: they are convenient, but in
semantic/syntactic density they are a concession to limitations
of space on paper medium.  Reference was made above to the
<it>BHS</it> critical apparatus, which despite its recent
vintage, is regarded a disappointment by many scholars and most
students.  The omission of important textual evidence and
extremely cryptic syntax are the most commonly-felt frustrations,
but both inadequacies are a function of the need to print the
Hebrew Bible in a single volume.  Encoding a cryptic apparatus
like that of <it>BHS</it> makes sense only if we want to
perpetuate that inadequate tradition in electronic medium.  The
advantage of electronic data storage is that it's cheap and
portable; the advantage of "hypertext" is that one can
dynamically allocate electronic "page space" (screen space) to a
critical apparatus as needed.  One can "pop-up" a perspicuous and
more complete critical apparatus which fills 80% of the screen,
reserving just a few lines for the display of the running source
text.  With flexible software, one could generate different
<it>kinds</it> of critical apparatuses, each containing different
content or composed in different views.  Would it not be
preferable to dynamically build better critical apparatuses with
software rather than encoding deficient ones from print media?

    (2) A second concern relates to the data structure under
SGML.  Of what obvious "database" use is an elaborate critical
apparatus "marked up" with codes that will render it completely
unreadable to mortal eyes?<<9>>  I assume that "tags" used in a
critical apparatus must reflect clearer syntax (e.g., kinds of
relationships) than do cryptic and ambiguous text-critical sigla
-- sigla that even humans sometimes cannot parse without making
reference to the actual witnesses.  Is this not a contradiction
in terms for descriptive markup?  More important still: if we
wish to query the text-critical database in sophisticated ways,
is a "marked up" flat text file the appropriate database format?
I doubt it.  While critical apparatuses in print format contain
useful compilations and summaries of textual evidence, it would
seem preferable to import this information into a real database
rather than to "mark it up."  Perhaps the two operations are not
mutually exclusive (generation of a relational database from an
SGML-structured text?) but I am skeptical about the utility of
the marked-up format.

    (3) The markup of critical editions requires that textual
annotations be to attached not only to single words, phrases,
lines (etc.), but also to discontinuous elements within the text
stream.  This phenomenon occurs, for instance, when semantic
components in the source or ancient translation (target) text are
morphologically discontinuous, or where a single "word" in one
text contains several "words" from the perspective of another
language, as a function of many kinds of imperfect inter-language
mappings.  For example, lack of morpheme breaks in Hebrew result
in graphic units like <it>u//me'artsam</it> ("and-from-their-
land") being written as <it>one</it> word (graphic unit
bounded by white space), while in Greek the phrase would have
five "words."  The mappings are thus sometimes one-to-many (also
many-to-one), with the possibility that "many" represents
discontinuous textual elements.  It is not clear to me whether
the expressive power of SGML is sufficient or optimal to
represent these mappings of discontinuous textual elements; two
qualified scholars have expressed a similar misgiving about SGML
in connection with dictionary markup.<<10>>  Typographic conventions
are relatively simple to use in this connection: is this another
instance where typesetting for human eyes is easier than
specifying relationships between textual elements in the syntax
of descriptive markup?

<bo>FACTOR 3.  SACRED TEXTS WERE COPIED FREQUENTLY, MODERNIZED,
EDITED, REDACTED AND OTHERWISE ADAPTED TO THE NEEDS OF RELIGIOUS
COMMUNITIES: THE NEED TO TAG EVOLUTIONARY STAGES OF TEXTS.</bo>

    The implication of frequent copying is that texts in rapid
transmission tend to accumulate an abundance of textual
corruptions.  It is a principle of typological science that
applies to virtually all human artifacts: rapid succession of
generations yields rapid evolution.  Early textual corruption of
sacred texts (or simply "textual change," if one wishes to view
the process neutrally) worked in concert with other dynamics of
conscious change to make "the (sacred) text" a moving target from
the very beginning.  Though the rate of change in individual
cases of sacred texts varied (varies) as a function of many
complex factors, the canonical status and long transmission
history <it>per se</it> guarantee a high degree of textual
evolution in canonical scriptures.

    Readers will recognize the obvious fact that markup of a
document is simpler when we have just one version, harder when we
have more than one version. If we feel obligated to show
cognizance of alternate and/or successive versions and the
relationships of versions in the encoding of the text, encoding
becomes much more demanding.  Sacred texts show evolution or
"versions," in the extreme, as is well known: the versions are
the result of both diachronic and synchronic evolutionary
processes. Some resultant difficulties for encoding are as
follows:

    (1) We rarely have a full, extant copy of any complete
version of a sacred text (e.g., few full codices from the first
millennium B.C.E; few "complete" Mesopotamian religious texts
from tablets of a single scribe or single archive), so that
stages in textual evolution are usually the product of scholarly
extrapolation.  The method and relative certainty of
extrapolation needs to be represented in the encoding.

    (2) Competing versions in antiquity led to conflations and
other kinds of textual contamination.

    (3) Despite the existence of variant versions (sometimes in
the same ancient archive), sacred texts hold a sense of canonical
status, so that they are treated by modern and ancient religious
communities as the "same" text.  Even if versions are radically
different in structure/content, they bear the same title and
canonical status, the most problematic consequence of which, for
markup, is treated below: a standard referencing scheme is
typically applied to all variant versions, but where mapping
between versions under these referencing schemes sometimes
becomes impossible or nonsensical.

    (4) The existence of varying, more-or-less
"parallel/synoptic" versions or recensionally variant sacred
texts has led to yet another scholarly creation which is
problematic for encoding: the printing of diaglott, polyglot,
parallel-column and interlinear formats which (presentationally)
facilitate the comparison of these versions.

    From my perspective, the challenge for TEI is to propose an
encoding scheme general enough and powerful enough <it>not
only</it> to permit markup of elaborate critical editions, but to
describe relationships between versions which vary more than just
"textually" -- where they are related, but related
<it>recensionally</it>, sharing 90%, 75%, 50%, or 25% in content
and/or structure.  Some textual phenomena in the tradition of the
Hebrew Bible will illustrate the problem.  The book of Jeremiah
in the official rabbinic tradition (the Masoretic Text), also
followed by modern Christian Bibles, is one-seventh longer than
that of ancient Greek tradition.<<11>>  The shorter version (also
known from Qumran manuscripts) appears pristine and primitive
from a critical point of view, but classical versification
(referencing) schemes are based on the later recension.  The
biblical book of Job, similarly, is one-sixth longer in the
Masoretic tradition.  In other cases, biblical texts show
alternate arrangement of verses, chapters, or groups of chapters.
Due in part to the durability of fired clay tablets, Mesopotamian
mythological (= religious) texts show similar, and even wider
recensional variations.  For example, a Ninevite recension of a
myth from the late first millennium may coincide in basic
structure with earlier Assyrian, Babylonian, Sumerian and Hittite
versions, but share precise content (phraseology) only in 20%-80%
of its text.<<12>>  Along the spectrum from trivial single textual
variant to radical recensionally-variant version, text encoding
ought to provide for description of the relationships
(historical, textual and otherwise) between versions.  Research
on document versioning in hypertext applications may be of value
to other TEI sub-committees who work on these problems.<<13>>

<bo>FACTOR 4.  SCHOLARLY EDITIONS OF SACRED TEXTS EMPLOY NON-
CONGRUENT, COMPETING REFERENCING SCHEMES FOR THE SAME OR SIMILAR
VERSIONS.</bo>

    Allusion has already been made to the problem of citation and
referencing.  I think the problem is larger than that faced in
the world of classical studies, where scholars of various periods
and national schools use(d) references to competing standard
editions which employ variant citation schemes.  Neither is the
problem simply a matter of using concurrent referencing within a
text (reflecting multiple overlapping hierarchical structures,
logical or physical), nor of differences between reference
markers in <it>editio princeps</it> publications and standard
edition(s).  In the study of sacred texts, these factors and
other factors conspire to make the situation more grave for
encoding.  To summarize:

    (1) Different referencing schemes are employed in <it>editio
        princeps</it> publications and standard editions of
        antiquity and modern times (the trivial case).

    (2) Similar or identical referencing schemes are applied to
        versions sharing similar, but not exact, content, so that
        apparently identical references point to different
        content.

    (3) Citations are frequently given in highly cryptic,
        abbreviated form, so that misleading or incongruous
        pointers cannot be predicted from the citation format
        itself.

    (4) Editors and authors even in modern times, when conscious
        of these problems, cannot enforce consistency in citation
        schemes.<<14>>

    (5) Sacred texts and the scholarship around them employ heavy
        cross-referencing, so that the citation
        ambiguities/errors are not occasional, but pandemic.

    An example from the biblical Psalter will illustrate the
point above: this is the <it>simplest</it> case, where content is
almost identical but referencing systems are incompatible.  In
high antiquity, biblical texts were written with almost no
punctuation or reference markers, often in <it>scriptio
continua</it> (lacking even explicit word boundaries, though this
is disputed in the case of Hebrew).  In the late biblical period,
minimal forms of punctuation were introduced (some final forms of
characters; spaces; blank lines; colophons).  The explicit
citation systems eventually introduced in the medieval period
were uneven in the case of the Psalter: in the Hebrew text, Psalm
superscriptions were versified (one or two verses), whereas in
Greek and some derivative traditions, superscriptions were
accorded no numbers.  Furthermore, divergence at Psalm 9/10 led
to the Hebrew chapter numbers being staggered by one number
(Hebrew/Greek traditions) until an offsetting event merged the
chapter enumeration scheme at chapter 148 of the corpus.  Into
modern times, biblical scholarship continues to cite the Psalter
under both systems, sometimes without clear indication of the
intended scheme, and always with frustrating inconsistency.  In
order to mark up texts referencing or cross-referencing the
Psalter (if this markup is to be useful for data analysis such as
searching, or for "hypertext" functions), the inherent
ambiguities must be resolved.  If references to the LXX
("Septuagint Bible") are tagged in a commentary, it must be known
what "LXX" means in a given instance.  The note "cf. Ps 51:5
(LXX)" may mean several things, and the citation cannot be tagged
until the correct target text (content) is contextually
identified.

    It might be argued that the TEI project should not inherit
the problems biblical scholars have created for themselves on
paper: perhaps not.  On the other hand, given that these (cross-)
referencing problems amount to an extreme case of a general
phenomenon experienced elsewhere in literary studies, the TEI
would render an invaluable service by proposing a sound
linguistic/computational scheme for resolving these problems.
The solution should embrace not only the problem of variant
citation systems, but also the problem of designating mappings
between texts that share content but not structure/arrangement,
between texts that share structure but not exact content, between
texts partially orthogonal for both parameters.  Study of
versioning has been done in recent hypertext research and in
version-control of software and documentation; TEI might benefit
from consultation with software-versioning and hypertext experts.

<bo>FACTOR 5.  SACRED TEXTS HAVE BOTH LITURGICAL (SACRED) AND
SCHOLARLY (SECULAR) PURPOSES.</bo>

    The fact that sacred texts have a life both in the religious
community and in the world of scholarship lies at the root of
several complex issues; in some cases, the two uses of sacred
text are at cross purposes.  The implications for text encoding
are probably minimal, but need to be studied.<<15>>  As liturgical
texts, scriptures are continually modernized, adapted, excerpted,
redacted, re-compiled, rearranged in accordance with the changing
demands of the religious community.  For example, many poetic
passages and prayers of the Hebrew Bible and Jewish Greek
scriptures were excerpted, incorporated into derivative Biblical
prose texts, and even circulated independently.  Sometimes
excerpted corpora underwent their own textual development (as in
biblical Odes and modern editions of lectionaries and prayer
books).  These connections between parallel traditions (kinds of
connections, extent of connections) ought to be indicated in the
encoding scheme of the derivative texts.  Similarly, encoding
ought to represent the re-use or quotation of sacred texts within
themselves.

<bo>FACTOR 6.  SACRED TEXTS ARE INTENSELY STUDIED, ANNOTATED AND
CROSS-REFERENCED.</bo>

    The fact that sacred texts are the objects of intensive
scrutiny, extensive annotation and elaborate commentary bears
relevance to their encoding.  It could probably be proven with
bibliographic or other instruments that sacred texts are more
intensively studied than other kinds of world literature.  The
bulk of primary and secondary literature (textual, linguistic,
exegetical, homiletical, theological) thus constitutes the
richest documentary arena to be found.  Several consequences are
evident in scholarly religious literature, but I will highlight
two which appear especially pertinent to text encoding: cross-
referencing (within books and across volume titles) and the
phenomenon of composite-genre study editions.

    Intense textual focus on religious texts has led to a
scholarly and popular tradition in which printed editions of
sacred texts contain elaborate conventions for cross-referencing
and annotation.  Similar phenomena (scholia, commentary, marginal
glossing) are known from antiquity as well.  For the purposes of
encoding/markup, it would seem wise to consider a means of
"typing" annotations and cross-references.  Explicit cross-
referencing and annotation usually involves placement of a "note"
marker in the running text (often supra- or sub-linear) and an
associated marginal note of some kind.  Translations of sacred
texts, for instance, often contain notes offering the following
kinds of information:

    *cross-reference to textual locus within the same document,
        or to other another scriptural passage
        **cross-reference based upon thematic connection
        **cross-reference based upon synoptic/parallel tradition
        **cross-reference to tables, maps or appendices within
        the volume
    *citation of secondary literature (reference tool or
        commentary)
    *alternate translation based upon variant etymology
    *word-level gloss or more "literal" translation equivalent
    *alternate paraphrastic translation
    *alternate translation based upon variant textual witness
    *commentary on the translation or textual readings
    *warning about textual or lexical uncertainty at a given
        locus

    The kinds of notes vary greatly in a given document, and
sometimes multiple concurrent note series will be used to
separate annotations of different kinds on different regions of a
page (cross-references in a central column, textual notes in
right or left margins, commentary notes at the bottom, etc.).
Other books employ a single note series, lumping together
annotations and cross-references of all kinds into a single
format.

    It would seem highly useful for the TEI to propose a means of
encoding (a) note <it>types</it> and (b) syntax for note
<it>links</it>.  The notion of typed links is already partially
understood in hypertext research, and indeed, is supported in
some hypertext software.  The word processor Nota Bene supports
three concurrent note series (for the production of critical
editions, for instance), where style-sheets for each note type
are under user control.  While a full taxonomy of links (link
types) remains to be developed and tested for literary documents,
a recent paper by Steve DeRose makes important progress.<<16>>

    A related scholarly phenomenon germane to encoding of sacred
texts is the use of complex-genre or composite-genre study
editions.  The Babylonian and Palestinian Talmud, and the
<it>Miqra'ot Gedolot</it> commentary format are well-known
examples from Jewish religious literature; popular Christian
"study Bibles" are examples from the modern movement of religious
lay-education.  These documents customarily employ the explicit
note- and cross-reference schemes discussed above, but add levels
of implicit annotation/linking as well.  In the Babylonian
Talmud, for example, Aramaic and Hebrew commentary text from
various sources flows around a central region of (Hebrew) Mishnah
text; keywords, explicit citation formula and other conventions
link the commentary traditions together, to the Mishnah text, and
sometimes to the biblical text references in the Mishnah.<<17>>
While the spatial geometry of the Talmudic page is very important
for the rabbinic scholar (and should be presented with fidelity
in an electronic edition), the concern for encoding is that
<it>implicit</it> cross-references in this milieu should be
<it>explicitly</it> represented in markup.  The same could be
said for study Bibles, which often use the convention of italic
type in the study notes: words in italic might be excerpted from
the Biblical text <it>loc cit</it>., and appearance in italic
within the study note identifies for the reader <it>that</it>
particular word or phase as the focus of attention in the
annotation.  Jewish and Christian commentaries employ this
typographic feature to allude to the source text (e.g., an
official Bible translation, not usually contained in the
commentary volume): within the body of the commentary, and
sometimes integrated within the running prose, bold (or italic,
or upper case) type signifies source text which is the subject of
the commentary.

    There are many variations on this theme of implicit mapping
between geographically adjacent text streams, or between real and
virtual text streams.  A bi-columnar edition of the Bhagavad
Gita<<18>> presents the following page layout: (a) the left column
contains successive blocks of text with a line of Sanskrit,
beneath which is found a line of transliteration, beneath which
is a line (or two) of English translation; (b) the right column
maintains morphological parsings, lemmas and lemma glosses in
vertical alignment with the left hand column (so far as alignment
is possible, using smaller point size); (c) the bottom of the
page contains a running English translation of the verse (or
whatever portion of the verse is presented on the page); (d) a
fourth region at the extreme bottom of the page optionally
contains a commentary note on words flagged with asterisk in the
transliteration and running translation.  The page thus contains
several kinds or implicit mappings between text streams, the most
interesting of which is the lemma gloss and morphological
parsing; the mapping between transliterated form of the lemma and
transliterated form of the inflected contextual term is usually
obvious, but it cannot be perfectly deduced from the alignment
alone.

    The standard edition of the Masoretic Hebrew Bible
(<it>BHS</it>, based upon the 11th century Leningrad Codex)
supplies another example of implicit cross-referencing in its use
of masorah.   The masorah are marginal notes appearing in various
regions of the page.<<19>>  The point of interest is the implicit
attachment of these notes to points of text or spans of text
within the main text region.  The note markers in the text which
signal the presence of marginal notes are graphically
undifferentiated: they appear as small supralinear circlets above
words or between words.  If spans of text are being annotated,
the circlet occurs between words; alternately, additional
circlets are used to flag larger spans of text.  The mapping
between these note markers and the annotations takes place at
line level, and the <it>sequence</it> of marginal notes
(separated by dots) corresponds serially to the sequence of
regions flagged by the note markers (circlets).  For heavily
annotated lines, the reader must carefully observe the sequences
in both text regions to establish the correct correspondences
between note markers and notes.

    These are representative kinds of commented and annotated
sacred texts which employ varying levels of implicit and explicit
cross-referencing between text streams -- usually primary and
secondary text, or between collateral commentary texts.  The
encoding of such texts should represent and distinguish these
implicit links and cross-references as well as explicit cases.

<bo>FACTOR 7.  THE STUDY OF SACRED TEXTS HAS BEEN ENHANCED BY THE
PUBLICATION OF SCHOLARLY DOCUMENTS IN WHICH "PRESENTATION" (TEXT
GEOGRAPHY) IS THE CONTROLLING PURPOSE OF THE DOCUMENT
STRUCTURE.</bo>

    Because sacred texts appear in multiple languages, in
alternate recensions, in synoptic traditions and are so heavily
cross-referenced, they have commonly been published in formats
which make visual inspection of these similarities, differences
and "links" easy for the student or scholar.  I make reference
here to both student and scholarly editions of diaglotts,
polyglots, parallel-column and multi-lingual interlinear formats
which (presentationally) facilitate the comparison of variant
versions.  On the surface, at least, this phenomenon raises a
question against the felicity or adequacy of descriptive markup,
in which document structure and content are declared to be
completely separable issues, and in which concerns for
"presentation" (text geography) are anathematized.

    Interlinear text formats used in the study of sacred texts
usually involve two or more languages.  The scholarly tradition
of interlinear format developed in ancient times, and became very
popular (for example) in the transmission of bi-lingual Sumerian-
Akkadian religious texts.<<20>>  The source (or "base") text is
presented in a horizontal format which permits printing of the
secondary text(s), usually ancient or modern translations,
immediately below the source text.  Since interlinear
translations are customarily fuller (graphically) than source
texts, and do not map directly onto the source texts (word-for-
word, in exact sequence), the vertical alignment of source and
translation is adjusted by hand and typeset to reflect the
mapping in the most optimal fashion.  Sometimes two lines of
translation are allotted for each line of source to permit
regular vertical alignment of the translation under regularly-
spaced source text.  The critical issues for encoding are these:
(a) when the interlinear text is a modern scholarly publication,
the semantic mappings usually require human judgment, and
typographic alignments must be made by hand; (b) the text
geography is the entire point of the modern interlinear document
format, or of the particular interlinear region of the document
page; (c) in the publication of ancient bi-lingual interlinear
texts, character spacing in the electronic "typeset" copy should
reflect as faithfully as possible the text geography of the
tablet or manuscript.

    As in the case of other diaglott, polyglot or parallel-column
arrangements, it would theoretically be possible to regard the
various texts of an interlinear document as independent text
streams, encoding each separately.  But lines of interlinear text
<it>are</it> related by physical geography. The spatial
arrangement of the interlinear format does not suggest (to me) an
obvious encoding scheme best handled by descriptive markup.

    It may be suggested that scholarly editions of diaglotts,
polyglots, parallel-column and multi-lingual interlinear formats
should not be "encoded" at all because superior kinds of
electronic and printed editions of these genres can be generated
by programmatic means.  Indeed, new incarnations of software for
creating and maintaining complex interlinear text formats (e.g.,
SIL's <it>IT</it> program available for IBM and Macintosh
microcomputers<<21>>) will increasingly make it possible to produce
texts with interlinear text geography, as well as to build other
important mappings between "source" text and linguistic/literary
data.   But volumes of printed texts in interlinear formats
constitute a major genre (<it>ergo</it>, they should be encoded),
and they are only one type of sacred text in which presentation
(geography) is critical to the usefulness, accuracy, or
completeness of the text representation.

    Polyglot texts are a logical extension of interlinear and/or
parallel-column formats,<<22>> and do not warrant special discussion.
They sometimes involve, on facing pages, a dozen or more
simultaneous (and cross-referenced) text streams which are very
useful.<<23>>  While polyglots still thrive in some arenas of study
(especially for students), they more commonly belong to previous
decades of scholarship, from whence they heralded some modern
forms of electronic "hypertext."

    Parallel-column text formats do deserve special mention, for
they exploit a presentational feature not commonly used in
interlinears.  Parallel-column texts are optimal tools for
representing synoptic traditions or parallel versions, usually in
the same language.  In biblical studies, the <it>Synopsis
Quattuor Evangeliorum</it> is the most well-known volume.<<24>>  The
parallel texts are synchronized on a line-by-line basis in
vertical, in multi-columnar style, where the "pace" of
presentation (horizontally and vertically) is controlled by the
fullest version at any given locus.  The vacats (empty spaces) or
textual minuses in the alternate versions are readily visible to
the scholar from the parallel-column topography: that is, the
presentation is optimized to elucidate textual pluses and
minuses, and secondarily, to permit ready comparison of similar
readings.<<25>>  As in the case of interlinear formats, the page
geography is typeset based upon human judgments about textual
equivalents and textual vacats; it is hardly possible (for me) to
visualize bringing this kind of document production under program
control.  A special case of parallel-column formats, but greatly
simplified, is the scholarly production of bilingual handbook
traditions in which original texts and translations appear on
facing pages, or on the same page in bi-columnar format.  In
these cases, textual synchronization is not always maintained at
the line level, unless the texts are poetic, and the semantic
mappings are entirely implicit.  Line numbers or other reference
markers may be explicitly indicated in both or one text stream:
where only <it>one</it> text stream (usually the "source text)
contains reference markers, the implicit mapping between text
streams must somehow be introduced into the encoding.  While this
latter document format may pose no obviously insurmountable
hurdles in terms of descriptive markup (two text streams
referenced to a common citation format), the former case of
parallel-column format appears (to me) problematic.

    A kind of text presentation similar to the interlinear is
growing in popularity, and thus deserves mention.  In languages
which have a stratified system of orthography, or for which
multiple layers of transcription are used in conventional
scholarly publication, primary or critical text editions are
sometimes presented in line-by-line format like interlinear
translations.  The layout may be called a (composite) "score," or
(French) <it>partition</it> or (German) <it>Partitur-
Umschrift</it>.  For each physical line of printed text (usually
determined by the genre, or rarely dictated by typesetting
constraints), the extant textual witnesses are arranged serially
with similar semantic units in vertical alignment.  Implicit
mappings are sometimes made clearer by the use of fixed-pitch,
adjusted with the introduction of additional spaces to yield
greater visual clarity.  This layout permits rapid identification
of textual variations and examination of a scribe's orthographic
practices.<<26>>

<bo>FACTOR 8.  PRIMARY TEXT PUBLICATIONS HOLD A PRIVILEGED PLACE
IN THE STUDY OF SACRED TEXTS: THE NECESSITY OF MACHINE-READABLE
VERSIONS OF THE <BO>EDITIO PRINCEPS ALONG WITH DIGITIZED
PHOTOFACSIMILE.</bo>

    Since matters of codicology are targeted for Phase-II of the
TEI effort, no complete discussion need be attempted in this
paper.  Yet, in light of past discussions on the TEI-L electronic
forum, I wish to highlight one concern.  Primary text
publications continue to hold a central place in the study of
sacred texts.  In areas of religious studies familiar to me,
there are no signs of abatement in primary text publication, but
rather the promise of significant new text publications and re-
publications in the coming decade.  It is customary for
<it>editio princeps</it> volumes to contain photographic plates
or hand copies, and these are essential for certain levels of
study.  On the other hand, delivery of the <it>editio
princeps</it> in machine-readable format is a strong desideratum.
In these primary editions, the typographic presentation of text
(in transliteration or other forms of "character" encoding)
almost always reflects key physical properties of the copy text:
horizontal and vertical spacing of text, use of smaller point
size for supralinear corrections, overstrike characters for
erasures, brackets to indicate margins of the tablet or
manuscript, etc.  While it may be difficult to develop encoding
standards for these primary publications (in which issues of text
geography are important), I think the challenge should be faced
squarely.  It is no solution to simply propose: "if someone cares
about text geography in a publication, just use bitmap images."
We need encoding methods which permit text analysis in the
traditional sense ("character" encoding) <it>and</it>
representation of text geography.  If descriptive markup proves
inadequate for representation of text geography which is normally
typeset in a good <it>editio princeps</it>, then stronger
alternative encoding methods should be sought.

<bo>FACTOR 9. SACRED TEXTS WERE WRITTEN AND TRANSMITTED WITH
MINIMAL PUNCTUATION AND STRUCTURAL MARKERS: THE PROBLEMS OF
STRATIFIED WRITING SYSTEMS AND VARIANT ORTHOGRAPHIES.</bo>

    From a modern Indo-European point of view, the issues of
writing, spelling, punctuation and document form/genre can
readily be separated, at least at surface levels: authors use
variations on these cultural conventions to communicate in
graphic symbols.  The encoding of sacred texts, and primarily
because sacred texts are ancient texts, requires that scholarship
deal with texts having more complex stratification, texts which
embody fewer writing conventions, and texts having writing
conventions that are imperfectly understood.  More specifically,
encoding of sacred texts requires careful judgment about the form
(or forms) of the text to be encoded, and subjective imposition
of structure markers upon the unmarked ancient texts.  Examples
taken from ancient Greek, Hebrew and Akkadian will illustrate the
range of issues faced by scholars who set out to encode sacred
texts.

    Ancient texts were often written without explicit word or
morpheme boundaries.  In order to translate the texts, and to
study them in electronic format, scholars will have to make
subjective judgments about these boundaries, sometimes deciding
between legitimate competing alternatives.  The purest form of
research would probably require software that (selectively)
ignored the encoder's word divisions, or methodologically, an
outright refusal to introduce word boundaries in the first place.
Improved readings in Akkadian and Hebrew texts are regularly
proposed by scholars who dare to challenge the decisions of
traditional scholarship on word and morpheme boundaries.<<27>>  While
<it>scriptio continua</it> is proven by induction for Hebrew
texts, it is regular in many exemplars and genres of Greek and
Akkadian texts.

    Another decision point for encoding is how to transcribe
ambiguous graphs (alphabetic characters, cuneiform signs) in the
writing systems.  In the cuneiform traditions it is customary to
publish various levels of transcription, sometimes distinguished
as "transliterations" and "normalizations."  Akkadian served as
the political and commercial <it>lingua franca</it> of the
Fertile Crescent for over a millennium, and thus provides a
superior example.  Akkadian (including dialects of Assyrian and
Babylonian) used the cuneiform writing system of the non-Semitic
Sumerians, where most individual signs in the syllabary could
have several kinds of values: ideographic value ("logographic"),
syllabic value (including any of a dozen or so different syllabic
values, depending upon period, dialect and genre), determinative
value or phonetic-complement value.  The first level of
transcription in the publication of an Akkadian text is therefore
often just an algebraic representation of the sign from the
syllabary, while successive levels of orthography advance
interpretive transliterations in the direction of the vocalized
Akkadian.  In Hebrew, two consonants (<it>si//n</it> and
<it>shi//n</it>) were represented by the same symbol, and several
consonants functioned as vowel markers (<it>matres
lectionis</it>) in certain environments.  At these levels,
decisions must be made to provide for one or more transcription
(character encoding) schemes, depending on whether ambiguities
are to be resolved at the character level or by other means.

    The encoding of the Hebrew Bible supplies a vivid example of
a stratified orthographic system in which various levels of
encoding (all are valuable for scholarship) are implicit in the
writing systems, but frustrate the goal of machine analysis.  On
the one hand, biblical Hebrew manuscripts from the Common Era
(including the Qumran manuscripts) were written without any
vowels, save erratic use of the <it>matres lectionis</it>.  On
the other hand, the traditional scholarly Hebrew text used as the
standard modern edition includes vowels and a full complement of
other diacritics to distinguish various usages of characters
(consonantal versus vocalic function; <it>mappiq</it>), doubled
consonants (<it>daghesh</it>, or explicit lack, <it>raphe</it>),
accentuation, close word juncture (<it>maqqeph</it> -- used
irregularly) and syllable structure (<it>metheg</it>).  The
accentual system in the Hebrew Bible uses special symbols to
identify primary and secondary stress in a "word," but the
accents also provide verse-level punctuation.  In 21 "prose"
texts, some 27 different conjunctive and disjunctive accents thus
reveal in hierarchical fashion the medieval rabbinic
understanding verse-level semantics (sometimes syntax as well).
In the three "poetic" books (Psalms, Proverbs, Job) similar
accents are used in a slightly different way.  Special classes of
accents (pre-positive, post-positive) do not correctly mark word
stress at all, which must then be determined on other grounds.
Since the use of these accents follows elaborate contextual
rules, a single accent may relate to non-adjacent and non-
contiguous words in the verse.  In order to be useful to
scholarly study, encoded texts should reflect these relationships
in the markup.

    Thus, for Hebrew we can probably isolate four or five
strata<<28>> in the writing system: which stratum should be encoded?
Or should several levels be encoded?<<29>>  If we recognize that the
vocalization is artificial from the standpoint of the first
millennium (B.C.E.), should we reconstruct a pure vocalization
(aided by early Latin and Greek transcriptions)?  Similarly,
should we eliminate the frustrating unevenness of the orthography
for one or more levels of orthography to facilitate linguistic
analysis?  Hebrew and Akkadian supply examples of orthographic
traditions which fluctuate so wildly that linguistic study of the
texts, if coded without normalization, is almost impossible.<<30>>
It would seem prudent that the TEI encourage relevant scholarly
societies to conduct linguistic study of these problems so that
encoding and data analysis of such texts is made possible.

    Does it make sense to mark up "punctuation" for sacred texts
when the many ancient exemplars had no punctuation, or almost
none?   As noted above, medieval codices of the Hebrew Bible had
as many as (18) hierarchical levels of disjunctive "punctuation"
marks for use with a verse,<<31>> but even those "verse" divisions do
not correspond precisely with modern judgments about "sentences."
Given that the typographic symbols for punctuators within a verse
or sentence vary in different text corpora (e.g., Greek uses ";"
for interrogation and a raised dot for our "colon" and "semi-
colon"),<<32>> should TEI propose a metalanguage of hierarchically-
ordered disjunctive and conjunctive punctuation markers?  The
designation of biblical chapters (ca. 14th century) is useful and
probably permanent for referencing, but "chapters" do not always
coincide with modern literary-critical divisions of the text; the
older liturgical divisions of sacred texts sometimes appear even
less felicitous.  These examples lead to the conclusion that it
may be necessary to encode multiple systems of punctuation and
structural marking to permit study of sacred texts as they were
written in antiquity (with little or sometimes infelicitous
punctuation) and as they are understood by modern literary
criticism.  The other "encoding" alternative (to throw all
orthographic strata and punctuation systems into the same marked-
up text) would seem to present an impossible task for software
developers.  Does this situation suggest yet another alternative:
to abandon the goal of "marking up" these texts as the primary
means of encoding?

<bo>FACTOR 10.  MARKUP OF (ANCIENT) SACRED TEXTS IS SUBJECTIVE:
COMPLICATED BY THE MIXTURE OF LITERARY GENRES, FORM-CRITICAL
UNITS AND STYLES OF DISCOURSE (PROSE/POETRY) WITHIN SINGLE
"DOCUMENTS."</bo>

    It will hardly be denied that subjective judgments are
involved in almost every aspect of the encoding of sacred texts.
The earliest copy texts usually contain (physically) too little
information to be of use, while standard editions (like the
cantillated Hebrew <it>BHS</it> discussed above) contain either
too much, or incorrect, or too little encoding, depending upon
the goals of scholarly inquiry.  Thus, while the element of
subjectivity cannot be removed from encoding, attempts should be
made to reckon with it so that scholarship not be retarded in any
way.  I will survey here a few special concerns which arise in
connection with encoding of literary features of sacred texts.

    Styles and units of discourse are rarely distinguished in the
earliest religious texts, although ancient texts exhibit great
variation on this point.  Some cuneiform tablets of the second
and first millennia B.C.E. contain rulings on the tablets,
delineating verse and strophic structure, separating the main
text from the colophon, etc.  The Qumran manuscripts in Hebrew,
Aramaic and Greek (third century B.C.E - second century C.E.)
only occasionally reflect verse (poetic) structure in the
presentation of texts which are universally acknowledged by
modern scholarship as poetic.  But in other texts, the
subjectivity of judgment in identifying "poetry" and in
elucidating its structure is evidenced in the fact that standard
modern editions and translations do not agree: what one edition
or translation presents as verse, another presents as prose; a
textual unit presented by one edition as a quatrain is presented
as a tristich in another.

    Two concerns for encoding may be registered here.  First, it
would be highly desirable to insure that the tagset employed for
designating poetry (or different kinds of poetry) be simple and
mnemonically perspicuous: scholars will inevitably wish to change
these tags to reflect their own literary judgments.  Second, it
is important that the encoding scheme be powerful enough to
represent poetic structure in the standard ways scholarship has
already established to designate poetry.  In semitic languages,
for instance, metrical systems are still not well understood, but
parallelism has been studied intensively.<<33>>  For example, the
markup ought to be able to express these features: the notion of
parallel lines in bicola, tricola and quatrains; symmetrical
(e.g., chiastic and palistrophic) and recursive literary
structures;<<34>> poetic subdivisions within stanza and stroph;
acrostic patterns.  In short, the encoding ought to provide for
the elucidation of all levels and kinds of literary structure
identified in current scholarship.

    A parallel concern is that encoding provide for the diversity
and complexity of literary features within the same "document" or
text.  Sacred texts constitute a special case, for they are
frequently acknowledged to have composite authorship (complicated
literary prehistory, reflected in various evolutionary stages of
extant texts) and mixed genres.  The most obvious examples are
the Jewish and Christian "Bibles."  The "Bible" is of course not
one book, but many books.  Individual biblical books may be
highly composite, as with the Psalter, Proverbs and other well-
known examples.  Within an individual biblical book, viewed at a
synchronic level or at various compositional (diachronic) levels,
there will inevitably be several literary genres, styles of
discourse, form-critical types and so forth.  The encoding scheme
should provide for representation of these varying features at
the lowest contextual and generic levels, but also with a unified
scheme which permits analysis of the whole "Bible," for example,
as a single document.

<bo>OTHER GENRES OF RELIGIOUS TEXTS</bo>

    The universe of "religious" texts is broader than just
"sacred" texts, of course.  Both in ancient and modern times,
religious documents enjoying less prestige or authority than
scripture are nevertheless important objects of scholarly
research.  Most genres I have examined (far less than the total)
do not yield concerns for encoding beyond the complexity of
scriptural texts. Ancient commentaries (including the cuneiform
traditions) sometimes involve deep complexity in the use implicit
mappings and ambiguous use of text-critical or cross-reference
symbols.  If a historically-qualified definition of "religious
texts" is to be maintained, it must be acknowledged that several
scientific, political and official-public genres of antiquity
should be regarded as religious texts: medical texts, chemistry
and "recipe" texts, astronomical and astrological texts,
mystical-mathematical texts, public ritual texts, some onomastica
and bureaucratic texts, funerary texts and grave inscriptions,
treaties, chronicles, some legal compositions (including royal
grants and decrees, boundary-stone inscriptions), etc.  Since all
these genres, and others, fall under the purview of TEI concerns,
the distinction is immaterial.  Religious genres which I know
will require further study include: manuals of religious ethics,
instruction and discipline; incantation and blessing manuals;
hemerologies; oracle and omen texts; private ritual
prescriptions; festival prescriptions; fables and proverbs; hymns
and prayers [including hymnals and prayer books]; votive and
mortuary inscriptions; elegies, laments and theodicies; prophecy
and dream texts; sacred marriage texts; lectionaries; breviaries;
catechisms; sacramentaries; ordinals; rubrics; registers; canon
law; council decrees; creeds; expositions; theologies; sermons.
This subset of genres of religious texts reflects my own narrow
range of experience and study.  It is obvious that expert advice
must be sought from a wide range of scholars in religion,
literature and linguistics if adequate accounting of the full
inventory of encoding problems is to be made.

<bo>RECOMMENDATIONS</bo>

    The following list of recommendations to the Text
Representation subcommittee of TEI summarizes general concerns
for text encoding in the world of religious.  They are submitted
with full acknowledgment of my personal limitations, and of the
inadequacy and superficiality of the survey upon which they are
based.

    (1) I recommend that TEI broaden the base of support and
involvement by members of professional societies who can bring to
bear their expertise in linguistics, literature and religion.  I
suspect that religious texts (especially, as ancient and sacred
texts) present more serious challenges for encoding than we
currently understand, and I feel that the involvement of
qualified teams of specialists will be required to help define
these problems and to help promote/referee optimal solutions.

    (2) I hope that vigilance will be maintained in not
permitting text encoding to become the handmaid of a bygone era
of textual study conducted on paper, nor of the modern electronic
publication industry.  Even if it be justified in some instances
to replicate exact page images of paper on the computer screen
(Talmud, popular critical editions), the goals of encoding/markup
should not be uncritically dominated by methodologies of textual
study familiar to us from the "paper past."  I do not imply that
anyone currently advocates such an agenda, but fear that it might
become an attractive fall-back position if the demands of
encoding begin to loom too large.  In my view, development of
encoding standards should be a patient process, dominated by a
vision for the <it>potential</it> of textual study made possible
by emerging technologies: hypertext, dynamic document versioning,
real-time applications (currently possible only on
supercomputers).

    (3) I recommend that TEI carefully (re-) consider the wisdom
of making commitments to markup languages and other encoding
schemes until it is known that these models are robust.  If it is
not <it>known</it> that they are sufficiently general, powerful
and extensible to work for the "hardest cases" of world
literature, would it be wise to make commitments to these
solutions?  If software (and possibly) hardware are built around
a "preliminary" model of text encoding which ultimately proves to
be inadequate, what will be the consequences for these
inadequately-supported areas of textual scholarship?  On the
analogy with software design, may we not assume that robust
design for encoding is ultimately in the best interests of
<it>all</it> arenas of textual scholarship, popular or arcane?

    If no other justification can be found for embracing the
difficulties of (ancient, sacred) religious texts, perhaps this
point of appeal may find favor with TEI.  Great evils have been
done in our world in the name of religion, to be sure.  Very
great evils.  Yet if the planet is to survive the atrocities of
human greed, exploitation, hybris and jealousy, its salvation
probably cannot come just from a study of documents written by
lawyers, politicians, economists, historians, physicists,
philosophers, linguists and mathematicians.  We must also reflect
upon the spiritual values embodied in the writings of the
spiritual masters of our world religions: the ideas of Moses,
Jesus Christ, Muhammed, Buddha, Gandhi.  If the immediate
followers of these masters sometimes failed through zealotry and
spiritual blindness, it is not too much to hope that the human
race may now learn from those excesses, revived by a renaissance
of interest in the ethical ideals of world scriptures.

=================================================================

                          ENDNOTES

<<1>> My understanding of the terms "markup" and "encoding" are
informed especially by interaction with written reports of
Darrell Raymond and Frank Tompa, both from the University of
Waterloo Centre for the New Oxford English Dictionary.  Neither
should be blamed for my misunderstandings, however.  See
especially:  Frank Wm. Tompa,  "What is (tagged) Text."  Pp. 81-
93 in <it>Fifth Annual Conference of the UW Centre for the New
Oxford English Dictionary: Dictionaries in the Electronic Age.
Proceedings of the Conference</it>. [18-19 September 1989; St.
Catherine's College, Oxford] Waterloo, Ontario, Canada:
University of Waterloo, 1988; Frank Wm. Tompa and Darrell R.
Raymond, "Database Design for a Dynamic Dictionary."  Technical
Report OED-85-05, University of Waterloo Centre for the New
Oxford English Dictionary, June 1989; Darrell R. Raymond,
"Reading Between the Tags: An Appraisal of Descriptive Markup."
University of Waterloo Centre for the New Oxford English
Dictionary. Waterloo, October 1989. [Draft technical report]

<<2>> <it>The Book of Isaiah</it>. The Hebrew University Bible.
Parts 1-2. Ed. Moshe H. Goshen-Gottstein.  Jerusalem: Magnes
Press, 1975.

<<3>> <it>The Gospel According to St. Luke</it>. Ed. The American
and British Committees of the International Greek New Testament
Project.  2 volumes. Oxford: Clarendon Press, 1984, 1987.

<<4>> The <it>BHS</it> text reproduces Codex Leningradensis (B
19a</it>), dated to about 1008 C.E.

<<5>> Reinhard Wonneberger, <it>Understanding BHS. A Manual for the
Users of Biblia Hebraica Stuttgartensia</it>, trans. Dwight
Daniels; Subsidia Biblica 8; Rome: Pontifical Institute Press,
1988.  See also by the same author, "Die Apparatsprache der
Biblia Hebraica Stuttgartensia. Ein linguistischer Beitrag zur
Editionskunde," <it>Biblica</it> 64 (1983) 305-343.  A less
ambitious attempt to explain the use of the <it>BHS</it>
apparatus is found in William Scott's <it>A Simplified Guide to
BHS</it>, Berkeley: BIBAL Press, 1987 (VIII + 62 + 22 pages).

<<6>> Important progress has been made in the CATSS (Computer
Assisted Tools for Septuagint Studies) Project under the
direction of Robert Kraft and Emanuel Tov.  Variants for the
Greek version of Ruth and other books have been encoded, and
parallel-aligned editions of Hebrew-Greek have been prepared. See
Jack Abercrombie, "Computer-Assisted Alignment of the Greek and
Hebrew Biblical Texts -- Programming Background," <it>Textus</it>
11 (1984) 125-139; Robert Kraft and Emanuel Tov (eds.),  <it>LXX:
Computer Assisted  Tools for Septuagint Research. Volume 1,
Ruth</it>. Septuagint and Cognate Studies 20. (Atlanta: Scholars
Press, 1986); Emanuel Tov (ed.), <it>A Computerized Data Base for
Septuagint Studies. The Parallel Aligned Text of the Greek and
Hebrew Bible</it>.  CATSS Volume 2/JNSL Supplement Series, 1.
Stellenbosch, 1986.

<<7>> Professor Wilhelm Ott is head of the Department of Literary
and Documentary Data Processing, where the TUSTEP
(<it>TU</it>ebingen <it>S</it>ystem of <it>TE</it>xtprocessing
<it>P</it>rograms) program for text collation and textual editing
has been developed.  See (from among many publications) Wilhelm
Ott, "A Text Processing System for the Preparation of Critical
Editions," <it>CHUM</it> 13 (1979) 29-35; Wilhelm Ott,
"Bibliographie: Computer in der Editionstechnik," [bibliographic
essay] <it>ALLC Bulletin</it> 2/1 (1974) 73-80.

<<8>> Computers have been used in manuscript collation and
production of critical texts for many years.  While the creation
of text-critical databases is of obvious priority, I question the
wisdom of trying to use marked-up critical apparatuses as the
optimal database.  Surveys of the use of computers in textual
criticism and bibliography may be found in the following
publications: <it>La pratique des ordinateurs dans la critique
des textes</it>. [Actes du Colloque internationale sur "La
pratique des ordinateurs dans la critique des textes<it></it>,"
organized by the Centre National de la Recherche Scientifique,
29-31 March 1987, Paris]  Eds. Jean Irigoin and Gian P. Zarri.
Paris: Centre National de la Recherche Scientifique, 1979.  ISBN:
2-222-02399-8; Susan Hockey, "Textual Criticism [= Chapter 7, pp.
144-167]," <it>A Guide to Computer Applications in the
Humanities</it>.  Baltimore/London: Johns Hopkins, 1980; Robert
Oakman, "Textual Editing With a Computer [= Chapter 6, pp. 113-
138, cf. 214-217]," <it>Computer Methods for Literary
Research</it>. 2nd</it> edition. Athens, GA: University of
Georgia, 1984; <it>Centre: Informatique et Bible. Verzeichnis
(Katalog) der Datenba/"nke</it>  Maredsous: Brepols, 1981 (pages
97-101).  More recent bibliography may be found in the indexed
bibliographic sections of <it>Literary and Linguistic
Computing</it>: <it>LLC</it> 1/2 (1986) 85-92; <it>LLC</it> 1/3
(1986) 173-175; <it>LLC</it> 1/4 (1986) 216-220; <it>LLC</it> 2/2
(1987) 132-140;  <it>LLC</it> 3/4 (1988) 255-260.

<<9>> Scholars at the The University of Waterloo Centre for the New
Oxford English Dictionary have developed sophisticated tools for
analysis of SGML-style tagged text.  Descriptions are published
in a number of technical reports and in the proceedings volumes
from the Annual NOED Conference.  See (for example) "PAT, GOEDEL,
LECTOR and more: text-dominated database software."  Pp. 83-84 in
<it>Tools for Humanists, 1989. A Guidebook to the Software and
Hardware Fair Held in Conjunction with the Dynamic Text</it> [6-9
June 1989 Toronto].  Toronto, Ontario: Centre for Computing in
the Humanities, 1989.  This article describes several software
tools developed at the Waterloo Centre, including TRUC (an editor
for SGML or SGML-style tagged text); Gonnet, Gaston and Frank Wm.
Tompa. "Mind your Grammar: A New Approach to Modelling Text."
Technical Report OED-87-01, University of Waterloo Centre for the
New Oxford English Dictionary, February, 1987; Raymond, Darrell
R. "lector -- An Interactive Formatter for Tagged Text."
Technical Report, Centre for the New Oxford English Dictionary,
University of Waterloo, Waterloo, Ontario, 1989; Tompa, Frank Wm;
Raymond, Darrell R.  "Database Design for a Dynamic Dictionary."
Technical Report OED-85-05, University of Waterloo Centre for the
New Oxford English Dictionary, June 1989; Tompa, Frank Wm.  "What
is (tagged) Text."  Pp. 81-93 in <it>Fifth Annual Conference of
the UW Centre for the New Oxford English Dictionary: Dictionaries
in the Electronic Age.  Proceedings of the Conference</it>. [18-
19 September 1989; St. Catherine's College, Oxford] Waterloo,
Ontario, Canada: University of Waterloo, 1988.   It is not clear
to me, however, that these or similar software tools are designed
to extract and analyze text-critical data marked up in complex
"flat file" format.

<<10>> Tompa, Frank Wm; Raymond, Darrell R.  "Database Design for a
Dynamic Dictionary."  Technical Report OED-85-05, University of
Waterloo Centre for the New Oxford English Dictionary, June 1989.
On page 13 the authors briefly remark on the deficiency of SGML
as being "unable to mark aggregates of elements that do not occur
contiguously in a text (e.g., the sets of rhyming lines in a
poem)."  See also Raymond, Darrell R.  "Reading Between the Tags:
An Appraisal of Descriptive Markup."  University of Waterloo
Centre for the New Oxford English Dictionary. Waterloo, October
1989. [Draft technical report]

<<11>> See the collection of essays on empirical evidence for
biblical criticism in Jeffrey H. Tigay, ed. <it>Empirical Models
for Biblical Criticism</it>. Philadelphia: University of
Pennsylvania, 1985.  For the Jeremiah problem in particular, see
Emanuel Tov's essay on pages 211-337 of this volume: "The
Literary History of the Book of Jeremiah in Light of its Textual
History."  Other important contributions include: Tov, Emanuel,
"Some Sequence Differences Between the MT and the LXX and their
Ramifications for the Literary Criticism of the Bible."
<it>JNSL</it> 13 (1987) 151-160; Stulman, L.  <it>The Other Text
of Jeremiah.  A Reconstruction of the Hebrew Text underlying the
Greek Version of the Prose Sections of Jeremiah with English
Translation</it>.  Lanham, MD: University Press of America, 1985;
Stulman, L.  "Some Theological and Lexical Differences between
the Old Greek and the MT of the Jeremiah Prose Discourses."
<it>Hebrew Studies</it> 25 (1984) 18-23.

<<12>> See, for example, Jeffrey H. Tigay, <it>The Evolution of the
Gilgamesh Epic</it>. Philadelphia: University of Pennsylvania,
1982.

<<13>> One approach to versioning involves a semantic network
formalism of nodes connected by typed links; see Randall Trigg
and Mark Weiser, "TEXTNET: A Network-Based Approach to Text
Handling," <it>ACM Transactions on Office Automation Systems</it>
4/1 (January 1986) 1-23.  Within the arena of TEI's Text
Representation subcommittee, Steve DeRose and David Durand (at
least) have done research on problems of versioning.

<<14>> The markup of several biblical commentaries and lexica in the
CDWord project forced us to reckon with great unevenness in
volumes which avowed consistent citation practices in the volume
preface.  Editors and authors who claimed to have followed a
specified referencing system frequently failed -- as the
hypertext links revealed.  CDWord is a hypertext and text-
retrieval program developed at Dallas Seminary, and includes: (a)
digitized Greek scriptures (New Testament, Septuagint) which can
be searched by morphological description and lemma; (b) digitized
and minimally marked-up Greek lexica (Intermediate Liddell &
Scott<it> Greek-English Lexicon</it>; Bauer-Arndt-Gingrich-Danker
<it>Greek Lexicon of the New Testament</it>); (c) English Bible
translations, Bible dictionaries, commentaries.  The data is
currently (October 1989) being mastered on CDROM for beta test.

<<15>> It is ironic that ancient standards movements (desire to fix
authoritative texts of canonical scripture) were to blame on
several occasions for the suppression and eventual loss of
invaluable textual data.

<<16>> See Steve DeRose "Expanding the Notion of Links" [Conference
Paper accepted for Hypertext '89].

<<17>> See in a facing-page format (Aramaic/Hebrew - English) the
<it>Hebrew-English Edition of the Babylonian Talmud</it>.  New
York: Soncino, 1960-.

<<18>> Winthrop Sargeant. <it>The Bhagavad Gita.  An interlinear
translation from the Sanskrit, with word-for-word transliteration
and translation, and complete grammatical commentary, as well as
a readable prose translation, and page-by-page vocabularies</it>.
New York: Doubleday, 1979.  ISBN: 0-385-63690-5.  I have no
opinion or means of independently judging the scholarly worth of
the volume, but the format is highly interesting and appears
quite useful as a student edition.

<<19>> Collections of medieval masorah are available in separate
tomes, and critical study of them continues today with the
assistance of electronic databases.  For example, Philippe
Cassuto of the CATAB project in Lyon has recently released a
detailed publication of <it>kethiv/qere</it> readings in the
Leningrad codex (<it>Qere-Ketib et listes massoretiques dans le
manuscrit B 19a</it>); CATAB = Centre d'analyse et de traitement
automatique de la Bible et des traditions e/'crites). There is a
vast bibliography on the masorah: see a convenient discussion and
bibliography in E. J. Revell, <it>Introduction to the Tiberian
Masorah</it> Masoretic Studies 5. Missoula, MT: Scholars Press,
1980.

<<20>> See for example the following text publications: Maul,
Stefan M.  <it>'Herzberuhigungsklagen.' Die sumerisch-akkadischen
Ersahunga-Gebete</it>.  Wiesbaden: Harrassowitz, 1988; von
Weiher, Egbert.  <it>Spa/"tbabylonische Texte aus Uruk</it>. Teil
III.  Ausgrabungen der Deutschen Forschungsgemeinschaft in Uruk-
Warka, 12. Berlin: Gebr. Mann, 1988; Cohen, Mark. <it>The
Canonical Lamentations of Ancient Mesopotamia</it>. 2 vols.
Potomac, MD: Capital Decisions Limited, 1988 (p. 536-603 and
passim).  Interlinear Akkadian translations are known for most
literary genres; for a catalogue listing of myths and epics, see
R. Borger, <it>Handbuch der Keilschriftliteratur. Band III:
Inhaltlich Ordnung der sumerischen und akkadischen Texte</it>.
Berlin/New York: Walter de Gruyter, 1975.

<<21>> See John J. Hughes, <it>Bits, Bytes and Biblical
Studies</it>. Grand Rapids: Zondervan, 1987 (pp. 275-276).  A
newer DOS version and superior Macintosh version are now
available.  <it>IT</it> assists in the process of automatic
glossing of interlinear text fields by maintaining lexical
mappings in database files; the program also maintains proper
alignment of interlinear text fields with on-screen formatting
(proportionally spaced fonts, correct word wrap, etc.).

<<22>> A polyglot version available to me contains at least (12)
text streams on the facing pages on the Pentateuch, including two
interlinear formats and several Latin translations of daughter
versions.  It is the <it>Biblia Sacra Polyglotta</it>. (Tome I)
Ed. Brianus Waltonus. London: Thomas Roycraft, 1657.  Other Tomes
of lexica and New Testament text contain varying combinations of
versions and translations with complex implicit mappings.

<<23>> The earliest complete modern edition of the "Septuagint"
(LXX, of Jewish Greek scriptures) was printed in polyglot format,
the <it>Complutensian Polyglot</it> (1522 C.E.).  Reproduction of
a page facsimile may be found in Ernst Wu/"rthwein, <it>The Text of
the Old Testament. An Introduction to the Biblia Hebraica</it>.
Grand Rapids: Eerdmans, 1979 (pp. 214-215).

<<24>> <it>Synopsis Quattuor Evangeliorum. Locis parallelis
evangeliorum apocryphorum et patrum adhibitis edidit</it>. ed.
Kurt Aland. 7th</it> edition.  Stuttgart: Wu/"rttembergische
Bibelanstalt Stuttgart, 1967.

<<25>> A complex edition of this type in found in <it>Libri
Synoptici Veteris Testamenti</it>.  3 vols (Rome: Pontifical
Institute, 1934).  Volume 2 contains the parallel texts of Kings,
Chronicles and Isaiah, with three columns of Hebrew on the left
page (with critical apparatuses) and three columns of Greek
[Septuagint] on the right page, along with their critical
apparatuses.

<<26>> Recent examples of the "score" text publication format from
my own library: Watanabe, Kazuko.  <it>Die <bo>ade//</bo><it>-
Vereidigung anla/"sslich der Thronfolgeregelung Asarhaddons</it>.
Baghdader Mitteilungen, Beiheft 3.  Berlin: Gebr. Mann, 1987;
Dijk, J. van.  <it>LUGAL UD ME-LAM-bi NIR-GAL.  Le re/'cit e/'pique
et didactique des Travaux de Ninurta, de De/'luge et de la Nouvelle
Cre/'ation</it>. 3 volumes. Leiden: Brill, 1983; Michalowski, Piotr.
<it>The Lamentation over the Destruction of Sumer and Ur</it>.
Mesopotamian Civilizations, 1.  Winona Lake, IN: Eisenbrauns,
1989 (pp. 109-191); Farber, Walter.  <it>Schlaf, Kindchen,
Schlaf!  Mesopotamische Baby-Beschwo/"rungen und -Rituale</it>.
Mesopotamian Civilizations, 2.  Winona Lake, IN: Eisenbrauns,
1989.  For a text publication of the Old Latin version of the
Bible in this format: <it>Vetus Latina. Die Reste der
altlateinischen Bibel. 11/1: Sapienta Solomonis</it>. ed. Walter
Thiele. Friberg: Herder, 1977-1988.

<<27>> In Hebrew and other semitic languages written without vowels,
the consonantal writing and polyvalence of key morphemes makes it
possible to offer legitimate alternative interpretations of text
in <it>scripta continua</it>.  The brilliant work of the hebraist
Mitchell Dahood (even though [if] incorrect most of the time)
demonstrated how radically scholarship could be altered if one
challenges the tradition of word and morpheme breaks.  The
decisive cases of <it>enclitic mem</it> (misunderstood by the
Masoretes) and other successes of comparative (Northwest Semitic)
philology during the past 50 years vindicate the soundness of
challenging subjective scholarly tradition on this point.  In
Akkadian (mostly syllabic), the polyvalence of signs and some key
morphemes makes modern misunderstanding of word boundaries a
predictable event.

<<28>> An internal publication from the CATAB Center (CATAB =
Centre d'analyse et de traitement automatique de la Bible et des
traditions e/'crites) identifies five orthographic strata: formes
graphiques; de/'coupage du texte; texte alphabe/'tique; vocalisation;
cantilation.  Other divisions (permutations of graphs-types and
punctuators) are possible and of theoretical interest to the
linguist, masoretic scholar or biblical exegete.  See "Tableau
re/'capitulatif" on page 7 of CATAB's "Dossier de pre/'sentation.
Laboratoire CATAB (Universite/' Lyon-III, U.A. du CNRS)."

<<29>> Public domain software developed at the University of
Pennsylvania is available for filtering the encoded <it>BHS</it>
text to produce bare consonantal text or text with various
combinations of orthography and punctuation.  Though useful, this
approach of holding data in multiple formats does not solve all
the complex encoding problems.

<<30>> See F. I. Andersen and Dean Forbes, <it>Spelling in the
Hebrew Bible</it>. Biblica et Orientalia 41. Rome: Pontifical
Biblical Institute, 1986;  Aronoff, Mark. "Orthography and
Linguistic Theory: The Syntactic Basis of Masoretic Hebrew
Punctuation," <it>Language</it> 61 (1985) 28-72;  David N.
Freedman, "The Masoretic Text and the Qumran Scrolls: A Study in
Orthography," <it>Textus</it> 2 (1962) 87-102.  Further
bibliography is available in Bruce K. Waltke and Michael
O'Connor, <it>An Introduction to Biblical Hebrew Syntax</it>
(Winona Lake, IN: Eisenbrauns, 1990), pp. 703-704.

<<31>> The accents were used for purposes of cantillation and for
marking word stress as well as for semantic [sometimes = modern
syntactic] divisions.  See William Wickes, <it>Two Treatises on
the Accentuation of the Old Testament</it>. [1881, 1887]
Prolegomenon by Aron Dotan. [1968] New York: KTAV, 1970;  E.J.
Revell, <it>Biblical Texts with Palestinian Pointing and their
Accents</it>. Masoretic Studies 4.  Missoula, MT: Scholars Press,
1977;  E. J. Revell, <it>Introduction to the Tiberian
Masorah</it>. Masoretic Studies 5. Missoula, MT: Scholars Press,
1980. (pages 157-274).

<<32>> For discussion of punctuators in Greek, see (with ample
bibliography) E.G. Turner, <it>Greek Manuscripts of the Ancient
World</it>. Second (revised, enlarged) edition [P.J. Parsons].
Bulletin Supplement 46.  London: Institute of Classical Studies,
1987.

<<33>> See the following handbooks, monographs and the bibliography
cited.  Watson, Wilfred G.  <it>Classical Hebrew Poetry.  A Guide
to its Techniques</it>.  JSOT Supplement Series, 26.  Sheffield:
JSOT Press, 1984;  Alonso-Scho/"kel, Luis.  <it>A Manual of Hebrew
Poetics</it>.  Subsidia Biblica, 11. Rome: Pontifical Institute
Press, 1988;  Alter, Robert. "The Dynamics of Parallelism."
<it>Hebrew University Studies in Literature and the Arts</it>
11/1 (1983) 71-101;  Berlin, Adele.  <it>The Dynamics of Biblical
Parallelism</it>.  Bloomington: Indiana University Press, 1985;
Collins, Terrance.  <it>Line-Forms in Hebrew Poetry.  A
Grammatical Approach to the Study of the Hebrew Prophets</it>.
Studia Pohl (Series Maior), 7.  Rome: Biblical Institute Press,
1978;  Geller, Stephen A.  <it>Parallelism in Early Biblical
Poetry</it>.  HSM, 20. Ann Arbor, MI: Scholars Press, 1979;
Kugel, James L.  <it>The Idea of Biblical Poetry.  Parallelism
and its History</it>.  New Haven:  Yale University Press, 1981;
O'Connor, Michael.  <it>Hebrew Verse Structure</it>.  Winona
Lake, IN: Eisenbrauns, 1980.

<<34>>  On chiasm as one of the dominant literary structures in
ancient texts, see John W. Welch (ed.), <it>Chiasmus in
Antiquity.  Structures, Analyses, Exegesis</it>. Hildesheim:
Gersterberg Verlag, 1981.



----------------------------Original message----------------------------