Date: Thu, 19 Oct 89 19:29:17 CST From: "Robin C. Cover" Subject: ASCII FILE TEI TRW2 To: "Dr. C.M. Sperberg-McQueen" =================================================================== For anyone who prefers easier reading, here are conversions for some (American ascii) hi-bit characters & standard print modes: a-umlaut = a/" o-umlaut = o/" u-umlaut = u/" e-acute = e/' e-circumflex = e// u-circumflex = u// i-circumflex = i// bold string italic string <> =================================================================== Draft Copy, Working Paper 3909 Swiss Avenue Copyright (c) Robin C. Cover (October 1989) Dallas, Texas 75204 TEI, Text Representation Committee zrcc1001@smuvm1.BITNET TEI TRW2 (Revision .9; October 16 1989) attctc!utafll!robin.UUCP SPECIAL CHALLENGES FOR THE ENCODING OF RELIGIOUS TEXTS INTRODUCTION The following is a draft copy of a working paper for the TEI Text Representation subcommittee. My assignment was to describe some of the special concerns for encoding of "religious" texts of antiquity and modern times. I employ the terms "encoding" and "markup" as nearly synonymous, but occasionally reserve the former as a broader term: (a) to include the recording of a wide range of information about copy texts; (b) to remain non- committal about the propriety or possibility of using descriptive markup [mnemonic tags stored within the "flat" text file] as the specific means of encoding.<<1>> SACRED TEXTS Within the domain of the world's "religious" texts, sacred texts must hold a privileged place. It is my suspicion (though perhaps just an unfounded complaint) that literary religious texts constitute some of the more complex kinds of documents in world literature, especially when we reckon with the phenomenon of "virtual" documents within the sphere of canonical/sacred literature. I will attempt to prove this claim with a survey of the special problems encountered in encoding and/or markup of sacred texts. I have isolated several categories of inter- related features of sacred texts, and supply descriptions of some implications for encoding and markup. FACTOR 1. SACRED TEXTS ARE WRITTEN IN ANCIENT LANGUAGES: THE CONCOMITANT DOMINANCE OF NON-ROMAN, NON-ALPHABETIC AND LIGATURED SCRIPTS. Minutes of the first meeting of the TEI Text Representation Committee (Lou Burnard, TEI TR M 1; University of Toronto, June 6, 1989) indicated that during the first [two-year] phase of the TEI's work, only alphabetic scripts and the nine official languages of the modern European community would be embraced (Danish, Dutch, English, French, German, Greek, Italian, Portuguese, Spanish), with consideration of Slavic, ancient Greek and Latin as highly desirable. A brief survey of religious populations in the modern world and their sacred literatures reminds us that this initial agenda of the TEI will not take us very deeply into "religious" texts. The world's three "revealed" religions are Judaism, Christianity and Islam. The historic Jewish scriptures are written in Hebrew and Aramaic; Orthodox Christian scriptures were written in Hebrew and Aramaic (together amounting to about 75% of the Christian Bible) and Greek; Islamic scriptures of the Qur'an are written in Arabic. Hindu, Buddhist and other Far Eastern scriptures are also largely disqualified under TEI Phase-I restrictions, by languages and scripts. For reasons discussed later, it would only complicate matters to suggest that English (German, French) translations of sacred texts be targeted for encoding as a first step. The suggestion that translations would be "easier to deal with" would be relevant only if we ignore the fact that the derivative texts are translations, and it would fail to address the real needs of textual scholarship (focus on original-language texts). Thus, the remainder of this paper discusses problems which may be more germane to Phase-II work of TEI. Many issues of text encoding for these ancient languages/scripts have yet to be addressed, and in my judgment will require the cooperation of special advisory teams from several relevant professional societies. If TEI intends to provide guidance for orientalists in encoding of sacred texts of the distant past (religious literature of the ancient Mediterranean, Middle East, etc.) then cooperation must be sought from scholars who treat cuneiform, hieroglyphic and other forms of ideographic writing. Given the vast corpus of religious literature in Sumerian, Babylonian, Assyrian, Hittite, Elamite, Egyptian, Ugaritic, Aramaic (etc.), it would be disappointing if the guidelines of TEI were not sufficiently general/extensible so as to provide guidelines for text encoding in these fields. Computers are already being used for textual analysis in the study of literature in these "dead" languages, but there is a desperate need for encoding standards, especially at the lowest strata of the writing systems. Since ISO documents governing international character sets do not cover most of these ancient languages and scripts, I believe TEI should accept responsibility for helping coordinate standards efforts within the specialized professional societies. FACTOR 2. SACRED TEXTS WERE TRANSLATED IN ANCIENT TIMES INTO FOREIGN LANGUAGES: THE COMPLEXITY OF CRITICAL EDITIONS AND INTERLINEAR FORMATS. Because sacred texts were authoritative, popular, and held central position in ancient scribal curricula, they were usually translated in antiquity. The fact of translation is inconsequential of itself, but becomes significant for encoding in that critical apparatuses for sacred texts are highly complex, and sometimes force us to deal with ancient interlinear texts. The special typographic and text-geographic problems of interlinear formats are discussed under Factor 7 below. The complexity of the critical editions is heightened for sacred texts because: (a) surviving scriptures, in whole or part, are sometimes attested only in derivative translation languages which need to be mapped onto each other and onto a presumed/reconstructed original-language text; (b) printing or screen display of variants in the textual apparatus necessitates the use of symbols which imply the method of reconstruction of an eclectic text through retroversion; (c) printing or screen display of the apparatus sometimes requires (for paleographic reasons) that multiple ancient languages be displayed in their native scripts. For example, critical editions of the Hebrew Bible will include witnesses in the languages and scripts of Greek, Ethiopic (syllabic script), Armenian (non-roman script) Arabic (ligatured script), Coptic (non-Roman), Syriac (several ligatured scripts and dialects), etc. If the representation of textual witnesses in these ancient languages/scripts in printed critical editions is a very demanding task, the markup for representing document structure and textual relationships in electronic critical editions will be an order of magnitude more difficult. I refer to two primary issues: (a) the fact that electronic critical editions useful for data analysis should support in the encoding more information than the simple typography of critical editions represents; (b) translating the cryptic syntax of sigla used in printed critical editions into intelligent encoding is itself highly complicated. The complexity of critical editions as a function of ancient translation can be further elucidated as follows: modern retroversions (back-translations) must be tagged to designate level of confidence, formal versus semantic equivalence, ancient transliterations, etymological (mis- )interpretation, homiletical/expositional features of translation variants versus linguistic/exegetical mappings, etc. When sacred texts have come to us in interlinear bi-lingual format (as in the popular Sumerian-Akkadian bilingual tradition), the relationship between source and target texts must be evaluated text-critically and appropriately marked-up in the encoding. The genetic filiations of texts ("parent" and "daughter" textual generations) and textual readings within language groups (or across language groups) must be represented in the encoding if the stemmatic relationships or textual affinities are understood. To the extent that encoding of critical editions should support the database and "hypertextual" features alluded to here, markup theory and implementation would appear very demanding. If the complexity of a sacred text's critical apparatus derives from the fact that textual evidence comes from multiple languages across a wide span of time, it is exacerbated by the sheer mass of textual data. It may surprise humanities scholars who have never used critical editions of sacred texts that a printed page of a critical edition may contain 10% "text" and 90% critical apparatus, as measured by character count; the Isaiah volumes from the Hebrew University Bible Project reflect this situation.<<2>> A recent critical edition of the Gospel of Luke contains over 500 pages and a 1:30 ratio of text to apparatus.<<3>> Even if the deepest complexity for encoding of critical texts will be met in the attempt to create tags (with implied syntax) for designating links and relationships of many kinds within the textual apparatus, the surfeit of data will add to the challenge. The standard scholarly edition of the printed Hebrew Bible (Biblia Hebraica Stuttgartensia [BHS]) supplies an ominous example of the complexity of markup for electronic critical editions which implicate multiple languages among their textual witnesses. Although the critical apparatus of the BHS is spartan in the extreme, and the main text represents merely a medieval codex (not a diplomatic edition),<<4>> the structure of this Bible is nevertheless very complicated. A 100-page monograph has been written from a generative linguistics perspective just to help students understand the syntax and semantics of the textual apparatus.<<5>> Most critical editions of Bibles contain far more textual data in critical apparatus than does BHS (e.g., volumes of critical text for Isaiah in the Hebrew University Bible Project contain approximately ten times more textual data than does BHS), they frequently contain multiple levels in the apparatus (as did the predecessor to BHS, the BHK), they sometimes contain additional syntactic ambiguities by reason of more text-critical sigla, and are otherwise more complex than BHS. Thus, the critical text of Hebrew Bible may not represent the most pathological case: encoding critical editions of the Greek "Septuagint" tradition (the Old Greek itself being used as one major witness in reconstructing the textual history of the Hebrew Bible) may be equally challenging in that more precise stemmatic and family relationships can be identified in the relatively greater wealth of textual data. But suppose the Hebrew Bible were the most pathological case: should it be avoided (in the priorities of the TEI) because it is so complicated? The Hebrew Bible constitutes holy scripture for three primary world religions which account for more than 50% of the earth's religious population. In my judgment, the TEI project ought to be able to embrace encoding of these sacred texts in critical editions and other formats without embarrassment. On the other hand, the challenges involved in "markup" of critical editions of sacred texts appear immense: my current perspective and understanding of encoding through descriptive markup do not prepare me to visualize how the introduction of inline tags will serve the interests of the text critic. So I question whether critical editions ought be to "marked up" at all. I would defer to the judgment of scholars who have produced print copies of critical editions of sacred texts, and to those who currently employ text-critical databases in research (e.g., the CATSS project at the University of Pennsylvania/Hebrew University, Jerusalem;<<6>> research at the University of Tu/"bingen in the Abteilung fu/"r literarische und dokumentarische Datenverarbeitung, Zentrums fu/"r Datenverarbeitung<<7>>), but several issues are worthy of evaluation. It must be borne in mind that for editing critical editions of sacred texts, not just a few, but hundreds or thousands of manuscripts may be involved.<<8>> I highlight here the three most important issues: (1) Are standard critical apparatuses in print format the most useful means of viewing textual alliances and textual history? Probably not: they are convenient, but in semantic/syntactic density they are a concession to limitations of space on paper medium. Reference was made above to the BHS critical apparatus, which despite its recent vintage, is regarded a disappointment by many scholars and most students. The omission of important textual evidence and extremely cryptic syntax are the most commonly-felt frustrations, but both inadequacies are a function of the need to print the Hebrew Bible in a single volume. Encoding a cryptic apparatus like that of BHS makes sense only if we want to perpetuate that inadequate tradition in electronic medium. The advantage of electronic data storage is that it's cheap and portable; the advantage of "hypertext" is that one can dynamically allocate electronic "page space" (screen space) to a critical apparatus as needed. One can "pop-up" a perspicuous and more complete critical apparatus which fills 80% of the screen, reserving just a few lines for the display of the running source text. With flexible software, one could generate different kinds of critical apparatuses, each containing different content or composed in different views. Would it not be preferable to dynamically build better critical apparatuses with software rather than encoding deficient ones from print media? (2) A second concern relates to the data structure under SGML. Of what obvious "database" use is an elaborate critical apparatus "marked up" with codes that will render it completely unreadable to mortal eyes?<<9>> I assume that "tags" used in a critical apparatus must reflect clearer syntax (e.g., kinds of relationships) than do cryptic and ambiguous text-critical sigla -- sigla that even humans sometimes cannot parse without making reference to the actual witnesses. Is this not a contradiction in terms for descriptive markup? More important still: if we wish to query the text-critical database in sophisticated ways, is a "marked up" flat text file the appropriate database format? I doubt it. While critical apparatuses in print format contain useful compilations and summaries of textual evidence, it would seem preferable to import this information into a real database rather than to "mark it up." Perhaps the two operations are not mutually exclusive (generation of a relational database from an SGML-structured text?) but I am skeptical about the utility of the marked-up format. (3) The markup of critical editions requires that textual annotations be to attached not only to single words, phrases, lines (etc.), but also to discontinuous elements within the text stream. This phenomenon occurs, for instance, when semantic components in the source or ancient translation (target) text are morphologically discontinuous, or where a single "word" in one text contains several "words" from the perspective of another language, as a function of many kinds of imperfect inter-language mappings. For example, lack of morpheme breaks in Hebrew result in graphic units like u//me'artsam ("and-from-their- land") being written as one word (graphic unit bounded by white space), while in Greek the phrase would have five "words." The mappings are thus sometimes one-to-many (also many-to-one), with the possibility that "many" represents discontinuous textual elements. It is not clear to me whether the expressive power of SGML is sufficient or optimal to represent these mappings of discontinuous textual elements; two qualified scholars have expressed a similar misgiving about SGML in connection with dictionary markup.<<10>> Typographic conventions are relatively simple to use in this connection: is this another instance where typesetting for human eyes is easier than specifying relationships between textual elements in the syntax of descriptive markup? FACTOR 3. SACRED TEXTS WERE COPIED FREQUENTLY, MODERNIZED, EDITED, REDACTED AND OTHERWISE ADAPTED TO THE NEEDS OF RELIGIOUS COMMUNITIES: THE NEED TO TAG EVOLUTIONARY STAGES OF TEXTS. The implication of frequent copying is that texts in rapid transmission tend to accumulate an abundance of textual corruptions. It is a principle of typological science that applies to virtually all human artifacts: rapid succession of generations yields rapid evolution. Early textual corruption of sacred texts (or simply "textual change," if one wishes to view the process neutrally) worked in concert with other dynamics of conscious change to make "the (sacred) text" a moving target from the very beginning. Though the rate of change in individual cases of sacred texts varied (varies) as a function of many complex factors, the canonical status and long transmission history per se guarantee a high degree of textual evolution in canonical scriptures. Readers will recognize the obvious fact that markup of a document is simpler when we have just one version, harder when we have more than one version. If we feel obligated to show cognizance of alternate and/or successive versions and the relationships of versions in the encoding of the text, encoding becomes much more demanding. Sacred texts show evolution or "versions," in the extreme, as is well known: the versions are the result of both diachronic and synchronic evolutionary processes. Some resultant difficulties for encoding are as follows: (1) We rarely have a full, extant copy of any complete version of a sacred text (e.g., few full codices from the first millennium B.C.E; few "complete" Mesopotamian religious texts from tablets of a single scribe or single archive), so that stages in textual evolution are usually the product of scholarly extrapolation. The method and relative certainty of extrapolation needs to be represented in the encoding. (2) Competing versions in antiquity led to conflations and other kinds of textual contamination. (3) Despite the existence of variant versions (sometimes in the same ancient archive), sacred texts hold a sense of canonical status, so that they are treated by modern and ancient religious communities as the "same" text. Even if versions are radically different in structure/content, they bear the same title and canonical status, the most problematic consequence of which, for markup, is treated below: a standard referencing scheme is typically applied to all variant versions, but where mapping between versions under these referencing schemes sometimes becomes impossible or nonsensical. (4) The existence of varying, more-or-less "parallel/synoptic" versions or recensionally variant sacred texts has led to yet another scholarly creation which is problematic for encoding: the printing of diaglott, polyglot, parallel-column and interlinear formats which (presentationally) facilitate the comparison of these versions. From my perspective, the challenge for TEI is to propose an encoding scheme general enough and powerful enough not only to permit markup of elaborate critical editions, but to describe relationships between versions which vary more than just "textually" -- where they are related, but related recensionally, sharing 90%, 75%, 50%, or 25% in content and/or structure. Some textual phenomena in the tradition of the Hebrew Bible will illustrate the problem. The book of Jeremiah in the official rabbinic tradition (the Masoretic Text), also followed by modern Christian Bibles, is one-seventh longer than that of ancient Greek tradition.<<11>> The shorter version (also known from Qumran manuscripts) appears pristine and primitive from a critical point of view, but classical versification (referencing) schemes are based on the later recension. The biblical book of Job, similarly, is one-sixth longer in the Masoretic tradition. In other cases, biblical texts show alternate arrangement of verses, chapters, or groups of chapters. Due in part to the durability of fired clay tablets, Mesopotamian mythological (= religious) texts show similar, and even wider recensional variations. For example, a Ninevite recension of a myth from the late first millennium may coincide in basic structure with earlier Assyrian, Babylonian, Sumerian and Hittite versions, but share precise content (phraseology) only in 20%-80% of its text.<<12>> Along the spectrum from trivial single textual variant to radical recensionally-variant version, text encoding ought to provide for description of the relationships (historical, textual and otherwise) between versions. Research on document versioning in hypertext applications may be of value to other TEI sub-committees who work on these problems.<<13>> FACTOR 4. SCHOLARLY EDITIONS OF SACRED TEXTS EMPLOY NON- CONGRUENT, COMPETING REFERENCING SCHEMES FOR THE SAME OR SIMILAR VERSIONS. Allusion has already been made to the problem of citation and referencing. I think the problem is larger than that faced in the world of classical studies, where scholars of various periods and national schools use(d) references to competing standard editions which employ variant citation schemes. Neither is the problem simply a matter of using concurrent referencing within a text (reflecting multiple overlapping hierarchical structures, logical or physical), nor of differences between reference markers in editio princeps publications and standard edition(s). In the study of sacred texts, these factors and other factors conspire to make the situation more grave for encoding. To summarize: (1) Different referencing schemes are employed in editio princeps publications and standard editions of antiquity and modern times (the trivial case). (2) Similar or identical referencing schemes are applied to versions sharing similar, but not exact, content, so that apparently identical references point to different content. (3) Citations are frequently given in highly cryptic, abbreviated form, so that misleading or incongruous pointers cannot be predicted from the citation format itself. (4) Editors and authors even in modern times, when conscious of these problems, cannot enforce consistency in citation schemes.<<14>> (5) Sacred texts and the scholarship around them employ heavy cross-referencing, so that the citation ambiguities/errors are not occasional, but pandemic. An example from the biblical Psalter will illustrate the point above: this is the simplest case, where content is almost identical but referencing systems are incompatible. In high antiquity, biblical texts were written with almost no punctuation or reference markers, often in scriptio continua (lacking even explicit word boundaries, though this is disputed in the case of Hebrew). In the late biblical period, minimal forms of punctuation were introduced (some final forms of characters; spaces; blank lines; colophons). The explicit citation systems eventually introduced in the medieval period were uneven in the case of the Psalter: in the Hebrew text, Psalm superscriptions were versified (one or two verses), whereas in Greek and some derivative traditions, superscriptions were accorded no numbers. Furthermore, divergence at Psalm 9/10 led to the Hebrew chapter numbers being staggered by one number (Hebrew/Greek traditions) until an offsetting event merged the chapter enumeration scheme at chapter 148 of the corpus. Into modern times, biblical scholarship continues to cite the Psalter under both systems, sometimes without clear indication of the intended scheme, and always with frustrating inconsistency. In order to mark up texts referencing or cross-referencing the Psalter (if this markup is to be useful for data analysis such as searching, or for "hypertext" functions), the inherent ambiguities must be resolved. If references to the LXX ("Septuagint Bible") are tagged in a commentary, it must be known what "LXX" means in a given instance. The note "cf. Ps 51:5 (LXX)" may mean several things, and the citation cannot be tagged until the correct target text (content) is contextually identified. It might be argued that the TEI project should not inherit the problems biblical scholars have created for themselves on paper: perhaps not. On the other hand, given that these (cross-) referencing problems amount to an extreme case of a general phenomenon experienced elsewhere in literary studies, the TEI would render an invaluable service by proposing a sound linguistic/computational scheme for resolving these problems. The solution should embrace not only the problem of variant citation systems, but also the problem of designating mappings between texts that share content but not structure/arrangement, between texts that share structure but not exact content, between texts partially orthogonal for both parameters. Study of versioning has been done in recent hypertext research and in version-control of software and documentation; TEI might benefit from consultation with software-versioning and hypertext experts. FACTOR 5. SACRED TEXTS HAVE BOTH LITURGICAL (SACRED) AND SCHOLARLY (SECULAR) PURPOSES. The fact that sacred texts have a life both in the religious community and in the world of scholarship lies at the root of several complex issues; in some cases, the two uses of sacred text are at cross purposes. The implications for text encoding are probably minimal, but need to be studied.<<15>> As liturgical texts, scriptures are continually modernized, adapted, excerpted, redacted, re-compiled, rearranged in accordance with the changing demands of the religious community. For example, many poetic passages and prayers of the Hebrew Bible and Jewish Greek scriptures were excerpted, incorporated into derivative Biblical prose texts, and even circulated independently. Sometimes excerpted corpora underwent their own textual development (as in biblical Odes and modern editions of lectionaries and prayer books). These connections between parallel traditions (kinds of connections, extent of connections) ought to be indicated in the encoding scheme of the derivative texts. Similarly, encoding ought to represent the re-use or quotation of sacred texts within themselves. FACTOR 6. SACRED TEXTS ARE INTENSELY STUDIED, ANNOTATED AND CROSS-REFERENCED. The fact that sacred texts are the objects of intensive scrutiny, extensive annotation and elaborate commentary bears relevance to their encoding. It could probably be proven with bibliographic or other instruments that sacred texts are more intensively studied than other kinds of world literature. The bulk of primary and secondary literature (textual, linguistic, exegetical, homiletical, theological) thus constitutes the richest documentary arena to be found. Several consequences are evident in scholarly religious literature, but I will highlight two which appear especially pertinent to text encoding: cross- referencing (within books and across volume titles) and the phenomenon of composite-genre study editions. Intense textual focus on religious texts has led to a scholarly and popular tradition in which printed editions of sacred texts contain elaborate conventions for cross-referencing and annotation. Similar phenomena (scholia, commentary, marginal glossing) are known from antiquity as well. For the purposes of encoding/markup, it would seem wise to consider a means of "typing" annotations and cross-references. Explicit cross- referencing and annotation usually involves placement of a "note" marker in the running text (often supra- or sub-linear) and an associated marginal note of some kind. Translations of sacred texts, for instance, often contain notes offering the following kinds of information: *cross-reference to textual locus within the same document, or to other another scriptural passage **cross-reference based upon thematic connection **cross-reference based upon synoptic/parallel tradition **cross-reference to tables, maps or appendices within the volume *citation of secondary literature (reference tool or commentary) *alternate translation based upon variant etymology *word-level gloss or more "literal" translation equivalent *alternate paraphrastic translation *alternate translation based upon variant textual witness *commentary on the translation or textual readings *warning about textual or lexical uncertainty at a given locus The kinds of notes vary greatly in a given document, and sometimes multiple concurrent note series will be used to separate annotations of different kinds on different regions of a page (cross-references in a central column, textual notes in right or left margins, commentary notes at the bottom, etc.). Other books employ a single note series, lumping together annotations and cross-references of all kinds into a single format. It would seem highly useful for the TEI to propose a means of encoding (a) note types and (b) syntax for note links. The notion of typed links is already partially understood in hypertext research, and indeed, is supported in some hypertext software. The word processor Nota Bene supports three concurrent note series (for the production of critical editions, for instance), where style-sheets for each note type are under user control. While a full taxonomy of links (link types) remains to be developed and tested for literary documents, a recent paper by Steve DeRose makes important progress.<<16>> A related scholarly phenomenon germane to encoding of sacred texts is the use of complex-genre or composite-genre study editions. The Babylonian and Palestinian Talmud, and the Miqra'ot Gedolot commentary format are well-known examples from Jewish religious literature; popular Christian "study Bibles" are examples from the modern movement of religious lay-education. These documents customarily employ the explicit note- and cross-reference schemes discussed above, but add levels of implicit annotation/linking as well. In the Babylonian Talmud, for example, Aramaic and Hebrew commentary text from various sources flows around a central region of (Hebrew) Mishnah text; keywords, explicit citation formula and other conventions link the commentary traditions together, to the Mishnah text, and sometimes to the biblical text references in the Mishnah.<<17>> While the spatial geometry of the Talmudic page is very important for the rabbinic scholar (and should be presented with fidelity in an electronic edition), the concern for encoding is that implicit cross-references in this milieu should be explicitly represented in markup. The same could be said for study Bibles, which often use the convention of italic type in the study notes: words in italic might be excerpted from the Biblical text loc cit., and appearance in italic within the study note identifies for the reader that particular word or phase as the focus of attention in the annotation. Jewish and Christian commentaries employ this typographic feature to allude to the source text (e.g., an official Bible translation, not usually contained in the commentary volume): within the body of the commentary, and sometimes integrated within the running prose, bold (or italic, or upper case) type signifies source text which is the subject of the commentary. There are many variations on this theme of implicit mapping between geographically adjacent text streams, or between real and virtual text streams. A bi-columnar edition of the Bhagavad Gita<<18>> presents the following page layout: (a) the left column contains successive blocks of text with a line of Sanskrit, beneath which is found a line of transliteration, beneath which is a line (or two) of English translation; (b) the right column maintains morphological parsings, lemmas and lemma glosses in vertical alignment with the left hand column (so far as alignment is possible, using smaller point size); (c) the bottom of the page contains a running English translation of the verse (or whatever portion of the verse is presented on the page); (d) a fourth region at the extreme bottom of the page optionally contains a commentary note on words flagged with asterisk in the transliteration and running translation. The page thus contains several kinds or implicit mappings between text streams, the most interesting of which is the lemma gloss and morphological parsing; the mapping between transliterated form of the lemma and transliterated form of the inflected contextual term is usually obvious, but it cannot be perfectly deduced from the alignment alone. The standard edition of the Masoretic Hebrew Bible (BHS, based upon the 11th century Leningrad Codex) supplies another example of implicit cross-referencing in its use of masorah. The masorah are marginal notes appearing in various regions of the page.<<19>> The point of interest is the implicit attachment of these notes to points of text or spans of text within the main text region. The note markers in the text which signal the presence of marginal notes are graphically undifferentiated: they appear as small supralinear circlets above words or between words. If spans of text are being annotated, the circlet occurs between words; alternately, additional circlets are used to flag larger spans of text. The mapping between these note markers and the annotations takes place at line level, and the sequence of marginal notes (separated by dots) corresponds serially to the sequence of regions flagged by the note markers (circlets). For heavily annotated lines, the reader must carefully observe the sequences in both text regions to establish the correct correspondences between note markers and notes. These are representative kinds of commented and annotated sacred texts which employ varying levels of implicit and explicit cross-referencing between text streams -- usually primary and secondary text, or between collateral commentary texts. The encoding of such texts should represent and distinguish these implicit links and cross-references as well as explicit cases. FACTOR 7. THE STUDY OF SACRED TEXTS HAS BEEN ENHANCED BY THE PUBLICATION OF SCHOLARLY DOCUMENTS IN WHICH "PRESENTATION" (TEXT GEOGRAPHY) IS THE CONTROLLING PURPOSE OF THE DOCUMENT STRUCTURE. Because sacred texts appear in multiple languages, in alternate recensions, in synoptic traditions and are so heavily cross-referenced, they have commonly been published in formats which make visual inspection of these similarities, differences and "links" easy for the student or scholar. I make reference here to both student and scholarly editions of diaglotts, polyglots, parallel-column and multi-lingual interlinear formats which (presentationally) facilitate the comparison of variant versions. On the surface, at least, this phenomenon raises a question against the felicity or adequacy of descriptive markup, in which document structure and content are declared to be completely separable issues, and in which concerns for "presentation" (text geography) are anathematized. Interlinear text formats used in the study of sacred texts usually involve two or more languages. The scholarly tradition of interlinear format developed in ancient times, and became very popular (for example) in the transmission of bi-lingual Sumerian- Akkadian religious texts.<<20>> The source (or "base") text is presented in a horizontal format which permits printing of the secondary text(s), usually ancient or modern translations, immediately below the source text. Since interlinear translations are customarily fuller (graphically) than source texts, and do not map directly onto the source texts (word-for- word, in exact sequence), the vertical alignment of source and translation is adjusted by hand and typeset to reflect the mapping in the most optimal fashion. Sometimes two lines of translation are allotted for each line of source to permit regular vertical alignment of the translation under regularly- spaced source text. The critical issues for encoding are these: (a) when the interlinear text is a modern scholarly publication, the semantic mappings usually require human judgment, and typographic alignments must be made by hand; (b) the text geography is the entire point of the modern interlinear document format, or of the particular interlinear region of the document page; (c) in the publication of ancient bi-lingual interlinear texts, character spacing in the electronic "typeset" copy should reflect as faithfully as possible the text geography of the tablet or manuscript. As in the case of other diaglott, polyglot or parallel-column arrangements, it would theoretically be possible to regard the various texts of an interlinear document as independent text streams, encoding each separately. But lines of interlinear text are related by physical geography. The spatial arrangement of the interlinear format does not suggest (to me) an obvious encoding scheme best handled by descriptive markup. It may be suggested that scholarly editions of diaglotts, polyglots, parallel-column and multi-lingual interlinear formats should not be "encoded" at all because superior kinds of electronic and printed editions of these genres can be generated by programmatic means. Indeed, new incarnations of software for creating and maintaining complex interlinear text formats (e.g., SIL's IT program available for IBM and Macintosh microcomputers<<21>>) will increasingly make it possible to produce texts with interlinear text geography, as well as to build other important mappings between "source" text and linguistic/literary data. But volumes of printed texts in interlinear formats constitute a major genre (ergo, they should be encoded), and they are only one type of sacred text in which presentation (geography) is critical to the usefulness, accuracy, or completeness of the text representation. Polyglot texts are a logical extension of interlinear and/or parallel-column formats,<<22>> and do not warrant special discussion. They sometimes involve, on facing pages, a dozen or more simultaneous (and cross-referenced) text streams which are very useful.<<23>> While polyglots still thrive in some arenas of study (especially for students), they more commonly belong to previous decades of scholarship, from whence they heralded some modern forms of electronic "hypertext." Parallel-column text formats do deserve special mention, for they exploit a presentational feature not commonly used in interlinears. Parallel-column texts are optimal tools for representing synoptic traditions or parallel versions, usually in the same language. In biblical studies, the Synopsis Quattuor Evangeliorum is the most well-known volume.<<24>> The parallel texts are synchronized on a line-by-line basis in vertical, in multi-columnar style, where the "pace" of presentation (horizontally and vertically) is controlled by the fullest version at any given locus. The vacats (empty spaces) or textual minuses in the alternate versions are readily visible to the scholar from the parallel-column topography: that is, the presentation is optimized to elucidate textual pluses and minuses, and secondarily, to permit ready comparison of similar readings.<<25>> As in the case of interlinear formats, the page geography is typeset based upon human judgments about textual equivalents and textual vacats; it is hardly possible (for me) to visualize bringing this kind of document production under program control. A special case of parallel-column formats, but greatly simplified, is the scholarly production of bilingual handbook traditions in which original texts and translations appear on facing pages, or on the same page in bi-columnar format. In these cases, textual synchronization is not always maintained at the line level, unless the texts are poetic, and the semantic mappings are entirely implicit. Line numbers or other reference markers may be explicitly indicated in both or one text stream: where only one text stream (usually the "source text) contains reference markers, the implicit mapping between text streams must somehow be introduced into the encoding. While this latter document format may pose no obviously insurmountable hurdles in terms of descriptive markup (two text streams referenced to a common citation format), the former case of parallel-column format appears (to me) problematic. A kind of text presentation similar to the interlinear is growing in popularity, and thus deserves mention. In languages which have a stratified system of orthography, or for which multiple layers of transcription are used in conventional scholarly publication, primary or critical text editions are sometimes presented in line-by-line format like interlinear translations. The layout may be called a (composite) "score," or (French) partition or (German) Partitur- Umschrift. For each physical line of printed text (usually determined by the genre, or rarely dictated by typesetting constraints), the extant textual witnesses are arranged serially with similar semantic units in vertical alignment. Implicit mappings are sometimes made clearer by the use of fixed-pitch, adjusted with the introduction of additional spaces to yield greater visual clarity. This layout permits rapid identification of textual variations and examination of a scribe's orthographic practices.<<26>> FACTOR 8. PRIMARY TEXT PUBLICATIONS HOLD A PRIVILEGED PLACE IN THE STUDY OF SACRED TEXTS: THE NECESSITY OF MACHINE-READABLE VERSIONS OF THE EDITIO PRINCEPS ALONG WITH DIGITIZED PHOTOFACSIMILE. Since matters of codicology are targeted for Phase-II of the TEI effort, no complete discussion need be attempted in this paper. Yet, in light of past discussions on the TEI-L electronic forum, I wish to highlight one concern. Primary text publications continue to hold a central place in the study of sacred texts. In areas of religious studies familiar to me, there are no signs of abatement in primary text publication, but rather the promise of significant new text publications and re- publications in the coming decade. It is customary for editio princeps volumes to contain photographic plates or hand copies, and these are essential for certain levels of study. On the other hand, delivery of the editio princeps in machine-readable format is a strong desideratum. In these primary editions, the typographic presentation of text (in transliteration or other forms of "character" encoding) almost always reflects key physical properties of the copy text: horizontal and vertical spacing of text, use of smaller point size for supralinear corrections, overstrike characters for erasures, brackets to indicate margins of the tablet or manuscript, etc. While it may be difficult to develop encoding standards for these primary publications (in which issues of text geography are important), I think the challenge should be faced squarely. It is no solution to simply propose: "if someone cares about text geography in a publication, just use bitmap images." We need encoding methods which permit text analysis in the traditional sense ("character" encoding) and representation of text geography. If descriptive markup proves inadequate for representation of text geography which is normally typeset in a good editio princeps, then stronger alternative encoding methods should be sought. FACTOR 9. SACRED TEXTS WERE WRITTEN AND TRANSMITTED WITH MINIMAL PUNCTUATION AND STRUCTURAL MARKERS: THE PROBLEMS OF STRATIFIED WRITING SYSTEMS AND VARIANT ORTHOGRAPHIES. From a modern Indo-European point of view, the issues of writing, spelling, punctuation and document form/genre can readily be separated, at least at surface levels: authors use variations on these cultural conventions to communicate in graphic symbols. The encoding of sacred texts, and primarily because sacred texts are ancient texts, requires that scholarship deal with texts having more complex stratification, texts which embody fewer writing conventions, and texts having writing conventions that are imperfectly understood. More specifically, encoding of sacred texts requires careful judgment about the form (or forms) of the text to be encoded, and subjective imposition of structure markers upon the unmarked ancient texts. Examples taken from ancient Greek, Hebrew and Akkadian will illustrate the range of issues faced by scholars who set out to encode sacred texts. Ancient texts were often written without explicit word or morpheme boundaries. In order to translate the texts, and to study them in electronic format, scholars will have to make subjective judgments about these boundaries, sometimes deciding between legitimate competing alternatives. The purest form of research would probably require software that (selectively) ignored the encoder's word divisions, or methodologically, an outright refusal to introduce word boundaries in the first place. Improved readings in Akkadian and Hebrew texts are regularly proposed by scholars who dare to challenge the decisions of traditional scholarship on word and morpheme boundaries.<<27>> While scriptio continua is proven by induction for Hebrew texts, it is regular in many exemplars and genres of Greek and Akkadian texts. Another decision point for encoding is how to transcribe ambiguous graphs (alphabetic characters, cuneiform signs) in the writing systems. In the cuneiform traditions it is customary to publish various levels of transcription, sometimes distinguished as "transliterations" and "normalizations." Akkadian served as the political and commercial lingua franca of the Fertile Crescent for over a millennium, and thus provides a superior example. Akkadian (including dialects of Assyrian and Babylonian) used the cuneiform writing system of the non-Semitic Sumerians, where most individual signs in the syllabary could have several kinds of values: ideographic value ("logographic"), syllabic value (including any of a dozen or so different syllabic values, depending upon period, dialect and genre), determinative value or phonetic-complement value. The first level of transcription in the publication of an Akkadian text is therefore often just an algebraic representation of the sign from the syllabary, while successive levels of orthography advance interpretive transliterations in the direction of the vocalized Akkadian. In Hebrew, two consonants (si//n and shi//n) were represented by the same symbol, and several consonants functioned as vowel markers (matres lectionis) in certain environments. At these levels, decisions must be made to provide for one or more transcription (character encoding) schemes, depending on whether ambiguities are to be resolved at the character level or by other means. The encoding of the Hebrew Bible supplies a vivid example of a stratified orthographic system in which various levels of encoding (all are valuable for scholarship) are implicit in the writing systems, but frustrate the goal of machine analysis. On the one hand, biblical Hebrew manuscripts from the Common Era (including the Qumran manuscripts) were written without any vowels, save erratic use of the matres lectionis. On the other hand, the traditional scholarly Hebrew text used as the standard modern edition includes vowels and a full complement of other diacritics to distinguish various usages of characters (consonantal versus vocalic function; mappiq), doubled consonants (daghesh, or explicit lack, raphe), accentuation, close word juncture (maqqeph -- used irregularly) and syllable structure (metheg). The accentual system in the Hebrew Bible uses special symbols to identify primary and secondary stress in a "word," but the accents also provide verse-level punctuation. In 21 "prose" texts, some 27 different conjunctive and disjunctive accents thus reveal in hierarchical fashion the medieval rabbinic understanding verse-level semantics (sometimes syntax as well). In the three "poetic" books (Psalms, Proverbs, Job) similar accents are used in a slightly different way. Special classes of accents (pre-positive, post-positive) do not correctly mark word stress at all, which must then be determined on other grounds. Since the use of these accents follows elaborate contextual rules, a single accent may relate to non-adjacent and non- contiguous words in the verse. In order to be useful to scholarly study, encoded texts should reflect these relationships in the markup. Thus, for Hebrew we can probably isolate four or five strata<<28>> in the writing system: which stratum should be encoded? Or should several levels be encoded?<<29>> If we recognize that the vocalization is artificial from the standpoint of the first millennium (B.C.E.), should we reconstruct a pure vocalization (aided by early Latin and Greek transcriptions)? Similarly, should we eliminate the frustrating unevenness of the orthography for one or more levels of orthography to facilitate linguistic analysis? Hebrew and Akkadian supply examples of orthographic traditions which fluctuate so wildly that linguistic study of the texts, if coded without normalization, is almost impossible.<<30>> It would seem prudent that the TEI encourage relevant scholarly societies to conduct linguistic study of these problems so that encoding and data analysis of such texts is made possible. Does it make sense to mark up "punctuation" for sacred texts when the many ancient exemplars had no punctuation, or almost none? As noted above, medieval codices of the Hebrew Bible had as many as (18) hierarchical levels of disjunctive "punctuation" marks for use with a verse,<<31>> but even those "verse" divisions do not correspond precisely with modern judgments about "sentences." Given that the typographic symbols for punctuators within a verse or sentence vary in different text corpora (e.g., Greek uses ";" for interrogation and a raised dot for our "colon" and "semi- colon"),<<32>> should TEI propose a metalanguage of hierarchically- ordered disjunctive and conjunctive punctuation markers? The designation of biblical chapters (ca. 14th century) is useful and probably permanent for referencing, but "chapters" do not always coincide with modern literary-critical divisions of the text; the older liturgical divisions of sacred texts sometimes appear even less felicitous. These examples lead to the conclusion that it may be necessary to encode multiple systems of punctuation and structural marking to permit study of sacred texts as they were written in antiquity (with little or sometimes infelicitous punctuation) and as they are understood by modern literary criticism. The other "encoding" alternative (to throw all orthographic strata and punctuation systems into the same marked- up text) would seem to present an impossible task for software developers. Does this situation suggest yet another alternative: to abandon the goal of "marking up" these texts as the primary means of encoding? FACTOR 10. MARKUP OF (ANCIENT) SACRED TEXTS IS SUBJECTIVE: COMPLICATED BY THE MIXTURE OF LITERARY GENRES, FORM-CRITICAL UNITS AND STYLES OF DISCOURSE (PROSE/POETRY) WITHIN SINGLE "DOCUMENTS." It will hardly be denied that subjective judgments are involved in almost every aspect of the encoding of sacred texts. The earliest copy texts usually contain (physically) too little information to be of use, while standard editions (like the cantillated Hebrew BHS discussed above) contain either too much, or incorrect, or too little encoding, depending upon the goals of scholarly inquiry. Thus, while the element of subjectivity cannot be removed from encoding, attempts should be made to reckon with it so that scholarship not be retarded in any way. I will survey here a few special concerns which arise in connection with encoding of literary features of sacred texts. Styles and units of discourse are rarely distinguished in the earliest religious texts, although ancient texts exhibit great variation on this point. Some cuneiform tablets of the second and first millennia B.C.E. contain rulings on the tablets, delineating verse and strophic structure, separating the main text from the colophon, etc. The Qumran manuscripts in Hebrew, Aramaic and Greek (third century B.C.E - second century C.E.) only occasionally reflect verse (poetic) structure in the presentation of texts which are universally acknowledged by modern scholarship as poetic. But in other texts, the subjectivity of judgment in identifying "poetry" and in elucidating its structure is evidenced in the fact that standard modern editions and translations do not agree: what one edition or translation presents as verse, another presents as prose; a textual unit presented by one edition as a quatrain is presented as a tristich in another. Two concerns for encoding may be registered here. First, it would be highly desirable to insure that the tagset employed for designating poetry (or different kinds of poetry) be simple and mnemonically perspicuous: scholars will inevitably wish to change these tags to reflect their own literary judgments. Second, it is important that the encoding scheme be powerful enough to represent poetic structure in the standard ways scholarship has already established to designate poetry. In semitic languages, for instance, metrical systems are still not well understood, but parallelism has been studied intensively.<<33>> For example, the markup ought to be able to express these features: the notion of parallel lines in bicola, tricola and quatrains; symmetrical (e.g., chiastic and palistrophic) and recursive literary structures;<<34>> poetic subdivisions within stanza and stroph; acrostic patterns. In short, the encoding ought to provide for the elucidation of all levels and kinds of literary structure identified in current scholarship. A parallel concern is that encoding provide for the diversity and complexity of literary features within the same "document" or text. Sacred texts constitute a special case, for they are frequently acknowledged to have composite authorship (complicated literary prehistory, reflected in various evolutionary stages of extant texts) and mixed genres. The most obvious examples are the Jewish and Christian "Bibles." The "Bible" is of course not one book, but many books. Individual biblical books may be highly composite, as with the Psalter, Proverbs and other well- known examples. Within an individual biblical book, viewed at a synchronic level or at various compositional (diachronic) levels, there will inevitably be several literary genres, styles of discourse, form-critical types and so forth. The encoding scheme should provide for representation of these varying features at the lowest contextual and generic levels, but also with a unified scheme which permits analysis of the whole "Bible," for example, as a single document. OTHER GENRES OF RELIGIOUS TEXTS The universe of "religious" texts is broader than just "sacred" texts, of course. Both in ancient and modern times, religious documents enjoying less prestige or authority than scripture are nevertheless important objects of scholarly research. Most genres I have examined (far less than the total) do not yield concerns for encoding beyond the complexity of scriptural texts. Ancient commentaries (including the cuneiform traditions) sometimes involve deep complexity in the use implicit mappings and ambiguous use of text-critical or cross-reference symbols. If a historically-qualified definition of "religious texts" is to be maintained, it must be acknowledged that several scientific, political and official-public genres of antiquity should be regarded as religious texts: medical texts, chemistry and "recipe" texts, astronomical and astrological texts, mystical-mathematical texts, public ritual texts, some onomastica and bureaucratic texts, funerary texts and grave inscriptions, treaties, chronicles, some legal compositions (including royal grants and decrees, boundary-stone inscriptions), etc. Since all these genres, and others, fall under the purview of TEI concerns, the distinction is immaterial. Religious genres which I know will require further study include: manuals of religious ethics, instruction and discipline; incantation and blessing manuals; hemerologies; oracle and omen texts; private ritual prescriptions; festival prescriptions; fables and proverbs; hymns and prayers [including hymnals and prayer books]; votive and mortuary inscriptions; elegies, laments and theodicies; prophecy and dream texts; sacred marriage texts; lectionaries; breviaries; catechisms; sacramentaries; ordinals; rubrics; registers; canon law; council decrees; creeds; expositions; theologies; sermons. This subset of genres of religious texts reflects my own narrow range of experience and study. It is obvious that expert advice must be sought from a wide range of scholars in religion, literature and linguistics if adequate accounting of the full inventory of encoding problems is to be made. RECOMMENDATIONS The following list of recommendations to the Text Representation subcommittee of TEI summarizes general concerns for text encoding in the world of religious. They are submitted with full acknowledgment of my personal limitations, and of the inadequacy and superficiality of the survey upon which they are based. (1) I recommend that TEI broaden the base of support and involvement by members of professional societies who can bring to bear their expertise in linguistics, literature and religion. I suspect that religious texts (especially, as ancient and sacred texts) present more serious challenges for encoding than we currently understand, and I feel that the involvement of qualified teams of specialists will be required to help define these problems and to help promote/referee optimal solutions. (2) I hope that vigilance will be maintained in not permitting text encoding to become the handmaid of a bygone era of textual study conducted on paper, nor of the modern electronic publication industry. Even if it be justified in some instances to replicate exact page images of paper on the computer screen (Talmud, popular critical editions), the goals of encoding/markup should not be uncritically dominated by methodologies of textual study familiar to us from the "paper past." I do not imply that anyone currently advocates such an agenda, but fear that it might become an attractive fall-back position if the demands of encoding begin to loom too large. In my view, development of encoding standards should be a patient process, dominated by a vision for the potential of textual study made possible by emerging technologies: hypertext, dynamic document versioning, real-time applications (currently possible only on supercomputers). (3) I recommend that TEI carefully (re-) consider the wisdom of making commitments to markup languages and other encoding schemes until it is known that these models are robust. If it is not known that they are sufficiently general, powerful and extensible to work for the "hardest cases" of world literature, would it be wise to make commitments to these solutions? If software (and possibly) hardware are built around a "preliminary" model of text encoding which ultimately proves to be inadequate, what will be the consequences for these inadequately-supported areas of textual scholarship? On the analogy with software design, may we not assume that robust design for encoding is ultimately in the best interests of all arenas of textual scholarship, popular or arcane? If no other justification can be found for embracing the difficulties of (ancient, sacred) religious texts, perhaps this point of appeal may find favor with TEI. Great evils have been done in our world in the name of religion, to be sure. Very great evils. Yet if the planet is to survive the atrocities of human greed, exploitation, hybris and jealousy, its salvation probably cannot come just from a study of documents written by lawyers, politicians, economists, historians, physicists, philosophers, linguists and mathematicians. We must also reflect upon the spiritual values embodied in the writings of the spiritual masters of our world religions: the ideas of Moses, Jesus Christ, Muhammed, Buddha, Gandhi. If the immediate followers of these masters sometimes failed through zealotry and spiritual blindness, it is not too much to hope that the human race may now learn from those excesses, revived by a renaissance of interest in the ethical ideals of world scriptures. ================================================================= ENDNOTES <<1>> My understanding of the terms "markup" and "encoding" are informed especially by interaction with written reports of Darrell Raymond and Frank Tompa, both from the University of Waterloo Centre for the New Oxford English Dictionary. Neither should be blamed for my misunderstandings, however. See especially: Frank Wm. Tompa, "What is (tagged) Text." Pp. 81- 93 in Fifth Annual Conference of the UW Centre for the New Oxford English Dictionary: Dictionaries in the Electronic Age. Proceedings of the Conference. [18-19 September 1989; St. Catherine's College, Oxford] Waterloo, Ontario, Canada: University of Waterloo, 1988; Frank Wm. Tompa and Darrell R. Raymond, "Database Design for a Dynamic Dictionary." Technical Report OED-85-05, University of Waterloo Centre for the New Oxford English Dictionary, June 1989; Darrell R. Raymond, "Reading Between the Tags: An Appraisal of Descriptive Markup." University of Waterloo Centre for the New Oxford English Dictionary. Waterloo, October 1989. [Draft technical report] <<2>> The Book of Isaiah. The Hebrew University Bible. Parts 1-2. Ed. Moshe H. Goshen-Gottstein. Jerusalem: Magnes Press, 1975. <<3>> The Gospel According to St. Luke. Ed. The American and British Committees of the International Greek New Testament Project. 2 volumes. Oxford: Clarendon Press, 1984, 1987. <<4>> The BHS text reproduces Codex Leningradensis (B 19a), dated to about 1008 C.E. <<5>> Reinhard Wonneberger, Understanding BHS. A Manual for the Users of Biblia Hebraica Stuttgartensia, trans. Dwight Daniels; Subsidia Biblica 8; Rome: Pontifical Institute Press, 1988. See also by the same author, "Die Apparatsprache der Biblia Hebraica Stuttgartensia. Ein linguistischer Beitrag zur Editionskunde," Biblica 64 (1983) 305-343. A less ambitious attempt to explain the use of the BHS apparatus is found in William Scott's A Simplified Guide to BHS, Berkeley: BIBAL Press, 1987 (VIII + 62 + 22 pages). <<6>> Important progress has been made in the CATSS (Computer Assisted Tools for Septuagint Studies) Project under the direction of Robert Kraft and Emanuel Tov. Variants for the Greek version of Ruth and other books have been encoded, and parallel-aligned editions of Hebrew-Greek have been prepared. See Jack Abercrombie, "Computer-Assisted Alignment of the Greek and Hebrew Biblical Texts -- Programming Background," Textus 11 (1984) 125-139; Robert Kraft and Emanuel Tov (eds.), LXX: Computer Assisted Tools for Septuagint Research. Volume 1, Ruth. Septuagint and Cognate Studies 20. (Atlanta: Scholars Press, 1986); Emanuel Tov (ed.), A Computerized Data Base for Septuagint Studies. The Parallel Aligned Text of the Greek and Hebrew Bible. CATSS Volume 2/JNSL Supplement Series, 1. Stellenbosch, 1986. <<7>> Professor Wilhelm Ott is head of the Department of Literary and Documentary Data Processing, where the TUSTEP (TUebingen System of TExtprocessing Programs) program for text collation and textual editing has been developed. See (from among many publications) Wilhelm Ott, "A Text Processing System for the Preparation of Critical Editions," CHUM 13 (1979) 29-35; Wilhelm Ott, "Bibliographie: Computer in der Editionstechnik," [bibliographic essay] ALLC Bulletin 2/1 (1974) 73-80. <<8>> Computers have been used in manuscript collation and production of critical texts for many years. While the creation of text-critical databases is of obvious priority, I question the wisdom of trying to use marked-up critical apparatuses as the optimal database. Surveys of the use of computers in textual criticism and bibliography may be found in the following publications: La pratique des ordinateurs dans la critique des textes. [Actes du Colloque internationale sur "La pratique des ordinateurs dans la critique des textes," organized by the Centre National de la Recherche Scientifique, 29-31 March 1987, Paris] Eds. Jean Irigoin and Gian P. Zarri. Paris: Centre National de la Recherche Scientifique, 1979. ISBN: 2-222-02399-8; Susan Hockey, "Textual Criticism [= Chapter 7, pp. 144-167]," A Guide to Computer Applications in the Humanities. Baltimore/London: Johns Hopkins, 1980; Robert Oakman, "Textual Editing With a Computer [= Chapter 6, pp. 113- 138, cf. 214-217]," Computer Methods for Literary Research. 2nd edition. Athens, GA: University of Georgia, 1984; Centre: Informatique et Bible. Verzeichnis (Katalog) der Datenba/"nke Maredsous: Brepols, 1981 (pages 97-101). More recent bibliography may be found in the indexed bibliographic sections of Literary and Linguistic Computing: LLC 1/2 (1986) 85-92; LLC 1/3 (1986) 173-175; LLC 1/4 (1986) 216-220; LLC 2/2 (1987) 132-140; LLC 3/4 (1988) 255-260. <<9>> Scholars at the The University of Waterloo Centre for the New Oxford English Dictionary have developed sophisticated tools for analysis of SGML-style tagged text. Descriptions are published in a number of technical reports and in the proceedings volumes from the Annual NOED Conference. See (for example) "PAT, GOEDEL, LECTOR and more: text-dominated database software." Pp. 83-84 in Tools for Humanists, 1989. A Guidebook to the Software and Hardware Fair Held in Conjunction with the Dynamic Text [6-9 June 1989 Toronto]. Toronto, Ontario: Centre for Computing in the Humanities, 1989. This article describes several software tools developed at the Waterloo Centre, including TRUC (an editor for SGML or SGML-style tagged text); Gonnet, Gaston and Frank Wm. Tompa. "Mind your Grammar: A New Approach to Modelling Text." Technical Report OED-87-01, University of Waterloo Centre for the New Oxford English Dictionary, February, 1987; Raymond, Darrell R. "lector -- An Interactive Formatter for Tagged Text." Technical Report, Centre for the New Oxford English Dictionary, University of Waterloo, Waterloo, Ontario, 1989; Tompa, Frank Wm; Raymond, Darrell R. "Database Design for a Dynamic Dictionary." Technical Report OED-85-05, University of Waterloo Centre for the New Oxford English Dictionary, June 1989; Tompa, Frank Wm. "What is (tagged) Text." Pp. 81-93 in Fifth Annual Conference of the UW Centre for the New Oxford English Dictionary: Dictionaries in the Electronic Age. Proceedings of the Conference. [18- 19 September 1989; St. Catherine's College, Oxford] Waterloo, Ontario, Canada: University of Waterloo, 1988. It is not clear to me, however, that these or similar software tools are designed to extract and analyze text-critical data marked up in complex "flat file" format. <<10>> Tompa, Frank Wm; Raymond, Darrell R. "Database Design for a Dynamic Dictionary." Technical Report OED-85-05, University of Waterloo Centre for the New Oxford English Dictionary, June 1989. On page 13 the authors briefly remark on the deficiency of SGML as being "unable to mark aggregates of elements that do not occur contiguously in a text (e.g., the sets of rhyming lines in a poem)." See also Raymond, Darrell R. "Reading Between the Tags: An Appraisal of Descriptive Markup." University of Waterloo Centre for the New Oxford English Dictionary. Waterloo, October 1989. [Draft technical report] <<11>> See the collection of essays on empirical evidence for biblical criticism in Jeffrey H. Tigay, ed. Empirical Models for Biblical Criticism. Philadelphia: University of Pennsylvania, 1985. For the Jeremiah problem in particular, see Emanuel Tov's essay on pages 211-337 of this volume: "The Literary History of the Book of Jeremiah in Light of its Textual History." Other important contributions include: Tov, Emanuel, "Some Sequence Differences Between the MT and the LXX and their Ramifications for the Literary Criticism of the Bible." JNSL 13 (1987) 151-160; Stulman, L. The Other Text of Jeremiah. A Reconstruction of the Hebrew Text underlying the Greek Version of the Prose Sections of Jeremiah with English Translation. Lanham, MD: University Press of America, 1985; Stulman, L. "Some Theological and Lexical Differences between the Old Greek and the MT of the Jeremiah Prose Discourses." Hebrew Studies 25 (1984) 18-23. <<12>> See, for example, Jeffrey H. Tigay, The Evolution of the Gilgamesh Epic. Philadelphia: University of Pennsylvania, 1982. <<13>> One approach to versioning involves a semantic network formalism of nodes connected by typed links; see Randall Trigg and Mark Weiser, "TEXTNET: A Network-Based Approach to Text Handling," ACM Transactions on Office Automation Systems 4/1 (January 1986) 1-23. Within the arena of TEI's Text Representation subcommittee, Steve DeRose and David Durand (at least) have done research on problems of versioning. <<14>> The markup of several biblical commentaries and lexica in the CDWord project forced us to reckon with great unevenness in volumes which avowed consistent citation practices in the volume preface. Editors and authors who claimed to have followed a specified referencing system frequently failed -- as the hypertext links revealed. CDWord is a hypertext and text- retrieval program developed at Dallas Seminary, and includes: (a) digitized Greek scriptures (New Testament, Septuagint) which can be searched by morphological description and lemma; (b) digitized and minimally marked-up Greek lexica (Intermediate Liddell & Scott Greek-English Lexicon; Bauer-Arndt-Gingrich-Danker Greek Lexicon of the New Testament); (c) English Bible translations, Bible dictionaries, commentaries. The data is currently (October 1989) being mastered on CDROM for beta test. <<15>> It is ironic that ancient standards movements (desire to fix authoritative texts of canonical scripture) were to blame on several occasions for the suppression and eventual loss of invaluable textual data. <<16>> See Steve DeRose "Expanding the Notion of Links" [Conference Paper accepted for Hypertext '89]. <<17>> See in a facing-page format (Aramaic/Hebrew - English) the Hebrew-English Edition of the Babylonian Talmud. New York: Soncino, 1960-. <<18>> Winthrop Sargeant. The Bhagavad Gita. An interlinear translation from the Sanskrit, with word-for-word transliteration and translation, and complete grammatical commentary, as well as a readable prose translation, and page-by-page vocabularies. New York: Doubleday, 1979. ISBN: 0-385-63690-5. I have no opinion or means of independently judging the scholarly worth of the volume, but the format is highly interesting and appears quite useful as a student edition. <<19>> Collections of medieval masorah are available in separate tomes, and critical study of them continues today with the assistance of electronic databases. For example, Philippe Cassuto of the CATAB project in Lyon has recently released a detailed publication of kethiv/qere readings in the Leningrad codex (Qere-Ketib et listes massoretiques dans le manuscrit B 19a); CATAB = Centre d'analyse et de traitement automatique de la Bible et des traditions e/'crites). There is a vast bibliography on the masorah: see a convenient discussion and bibliography in E. J. Revell, Introduction to the Tiberian Masorah Masoretic Studies 5. Missoula, MT: Scholars Press, 1980. <<20>> See for example the following text publications: Maul, Stefan M. 'Herzberuhigungsklagen.' Die sumerisch-akkadischen Ersahunga-Gebete. Wiesbaden: Harrassowitz, 1988; von Weiher, Egbert. Spa/"tbabylonische Texte aus Uruk. Teil III. Ausgrabungen der Deutschen Forschungsgemeinschaft in Uruk- Warka, 12. Berlin: Gebr. Mann, 1988; Cohen, Mark. The Canonical Lamentations of Ancient Mesopotamia. 2 vols. Potomac, MD: Capital Decisions Limited, 1988 (p. 536-603 and passim). Interlinear Akkadian translations are known for most literary genres; for a catalogue listing of myths and epics, see R. Borger, Handbuch der Keilschriftliteratur. Band III: Inhaltlich Ordnung der sumerischen und akkadischen Texte. Berlin/New York: Walter de Gruyter, 1975. <<21>> See John J. Hughes, Bits, Bytes and Biblical Studies. Grand Rapids: Zondervan, 1987 (pp. 275-276). A newer DOS version and superior Macintosh version are now available. IT assists in the process of automatic glossing of interlinear text fields by maintaining lexical mappings in database files; the program also maintains proper alignment of interlinear text fields with on-screen formatting (proportionally spaced fonts, correct word wrap, etc.). <<22>> A polyglot version available to me contains at least (12) text streams on the facing pages on the Pentateuch, including two interlinear formats and several Latin translations of daughter versions. It is the Biblia Sacra Polyglotta. (Tome I) Ed. Brianus Waltonus. London: Thomas Roycraft, 1657. Other Tomes of lexica and New Testament text contain varying combinations of versions and translations with complex implicit mappings. <<23>> The earliest complete modern edition of the "Septuagint" (LXX, of Jewish Greek scriptures) was printed in polyglot format, the Complutensian Polyglot (1522 C.E.). Reproduction of a page facsimile may be found in Ernst Wu/"rthwein, The Text of the Old Testament. An Introduction to the Biblia Hebraica. Grand Rapids: Eerdmans, 1979 (pp. 214-215). <<24>> Synopsis Quattuor Evangeliorum. Locis parallelis evangeliorum apocryphorum et patrum adhibitis edidit. ed. Kurt Aland. 7th edition. Stuttgart: Wu/"rttembergische Bibelanstalt Stuttgart, 1967. <<25>> A complex edition of this type in found in Libri Synoptici Veteris Testamenti. 3 vols (Rome: Pontifical Institute, 1934). Volume 2 contains the parallel texts of Kings, Chronicles and Isaiah, with three columns of Hebrew on the left page (with critical apparatuses) and three columns of Greek [Septuagint] on the right page, along with their critical apparatuses. <<26>> Recent examples of the "score" text publication format from my own library: Watanabe, Kazuko. Die ade//- Vereidigung anla/"sslich der Thronfolgeregelung Asarhaddons. Baghdader Mitteilungen, Beiheft 3. Berlin: Gebr. Mann, 1987; Dijk, J. van. LUGAL UD ME-LAM-bi NIR-GAL. Le re/'cit e/'pique et didactique des Travaux de Ninurta, de De/'luge et de la Nouvelle Cre/'ation. 3 volumes. Leiden: Brill, 1983; Michalowski, Piotr. The Lamentation over the Destruction of Sumer and Ur. Mesopotamian Civilizations, 1. Winona Lake, IN: Eisenbrauns, 1989 (pp. 109-191); Farber, Walter. Schlaf, Kindchen, Schlaf! Mesopotamische Baby-Beschwo/"rungen und -Rituale. Mesopotamian Civilizations, 2. Winona Lake, IN: Eisenbrauns, 1989. For a text publication of the Old Latin version of the Bible in this format: Vetus Latina. Die Reste der altlateinischen Bibel. 11/1: Sapienta Solomonis. ed. Walter Thiele. Friberg: Herder, 1977-1988. <<27>> In Hebrew and other semitic languages written without vowels, the consonantal writing and polyvalence of key morphemes makes it possible to offer legitimate alternative interpretations of text in scripta continua. The brilliant work of the hebraist Mitchell Dahood (even though [if] incorrect most of the time) demonstrated how radically scholarship could be altered if one challenges the tradition of word and morpheme breaks. The decisive cases of enclitic mem (misunderstood by the Masoretes) and other successes of comparative (Northwest Semitic) philology during the past 50 years vindicate the soundness of challenging subjective scholarly tradition on this point. In Akkadian (mostly syllabic), the polyvalence of signs and some key morphemes makes modern misunderstanding of word boundaries a predictable event. <<28>> An internal publication from the CATAB Center (CATAB = Centre d'analyse et de traitement automatique de la Bible et des traditions e/'crites) identifies five orthographic strata: formes graphiques; de/'coupage du texte; texte alphabe/'tique; vocalisation; cantilation. Other divisions (permutations of graphs-types and punctuators) are possible and of theoretical interest to the linguist, masoretic scholar or biblical exegete. See "Tableau re/'capitulatif" on page 7 of CATAB's "Dossier de pre/'sentation. Laboratoire CATAB (Universite/' Lyon-III, U.A. du CNRS)." <<29>> Public domain software developed at the University of Pennsylvania is available for filtering the encoded BHS text to produce bare consonantal text or text with various combinations of orthography and punctuation. Though useful, this approach of holding data in multiple formats does not solve all the complex encoding problems. <<30>> See F. I. Andersen and Dean Forbes, Spelling in the Hebrew Bible. Biblica et Orientalia 41. Rome: Pontifical Biblical Institute, 1986; Aronoff, Mark. "Orthography and Linguistic Theory: The Syntactic Basis of Masoretic Hebrew Punctuation," Language 61 (1985) 28-72; David N. Freedman, "The Masoretic Text and the Qumran Scrolls: A Study in Orthography," Textus 2 (1962) 87-102. Further bibliography is available in Bruce K. Waltke and Michael O'Connor, An Introduction to Biblical Hebrew Syntax (Winona Lake, IN: Eisenbrauns, 1990), pp. 703-704. <<31>> The accents were used for purposes of cantillation and for marking word stress as well as for semantic [sometimes = modern syntactic] divisions. See William Wickes, Two Treatises on the Accentuation of the Old Testament. [1881, 1887] Prolegomenon by Aron Dotan. [1968] New York: KTAV, 1970; E.J. Revell, Biblical Texts with Palestinian Pointing and their Accents. Masoretic Studies 4. Missoula, MT: Scholars Press, 1977; E. J. Revell, Introduction to the Tiberian Masorah. Masoretic Studies 5. Missoula, MT: Scholars Press, 1980. (pages 157-274). <<32>> For discussion of punctuators in Greek, see (with ample bibliography) E.G. Turner, Greek Manuscripts of the Ancient World. Second (revised, enlarged) edition [P.J. Parsons]. Bulletin Supplement 46. London: Institute of Classical Studies, 1987. <<33>> See the following handbooks, monographs and the bibliography cited. Watson, Wilfred G. Classical Hebrew Poetry. A Guide to its Techniques. JSOT Supplement Series, 26. Sheffield: JSOT Press, 1984; Alonso-Scho/"kel, Luis. A Manual of Hebrew Poetics. Subsidia Biblica, 11. Rome: Pontifical Institute Press, 1988; Alter, Robert. "The Dynamics of Parallelism." Hebrew University Studies in Literature and the Arts 11/1 (1983) 71-101; Berlin, Adele. The Dynamics of Biblical Parallelism. Bloomington: Indiana University Press, 1985; Collins, Terrance. Line-Forms in Hebrew Poetry. A Grammatical Approach to the Study of the Hebrew Prophets. Studia Pohl (Series Maior), 7. Rome: Biblical Institute Press, 1978; Geller, Stephen A. Parallelism in Early Biblical Poetry. HSM, 20. Ann Arbor, MI: Scholars Press, 1979; Kugel, James L. The Idea of Biblical Poetry. Parallelism and its History. New Haven: Yale University Press, 1981; O'Connor, Michael. Hebrew Verse Structure. Winona Lake, IN: Eisenbrauns, 1980. <<34>> On chiasm as one of the dominant literary structures in ancient texts, see John W. Welch (ed.), Chiasmus in Antiquity. Structures, Analyses, Exegesis. Hildesheim: Gersterberg Verlag, 1981. ----------------------------Original message----------------------------