Desiderata for TEI-MS

TEI MS W03

David J. Birnbaum (djbpitt+@pitt.edu)
2004-01-24


This document describes limitations in or problems with the TEI manuscript-description model and proposes solutions. "TEI-MS" here refers to the synthesis of MASTER and the proposal of the original TEI manuscript-description Work Group, as developed by the Manuscript Task Force in Reykjavik in September 2003. See also the original TEI-MMSS Work Group report and the MASTER documentation (both available on line at the TEI web site, although the MASTER documentation is out of sync with the MASTER DTD), as well as two documents prepared by Andrej Bojadžiev, one outlining the modified TEI system used in the Sofia Repertorium project and one contrasting the Repertorium DTD and MASTER.


Contents


Paragraphs vs structured content

Problem: Paragraph data is difficult to process automatically. Paragraph typing (using the new "topic" attribute) is helpful, but structured alternatives are still required by some users.

Proposed Solution: Retain (p+) models for subelements, but provide structured alternatives that do not require <p> wrappers.

Constraining where <msDescription> can appear

Problem: TEI-MS, following MASTER, treats <msDescription> as a type of <div>. Andrej notes that this has the undesirable side-effect of permitting <msDescription> inside elements where it probably does not belong: <argument>, <q>, <sic>, <corr>, <add>, <item>, <note>, and <epigraph>. The Repertorium defines <msDescription> (which it calls <msDesc>) as a type of %component;, which places it in the <body> (only), but does not permit it to appear inside these other elements. The Repertorium DTD modifications do not follow the TEI parameter-entity guidelines, but they do appear to restrict the location of the <msDescription> element in a useful way.

Proposed Solution: TEI-MS should permit <msDescription> inside <sourceDesc> or <body>, and should do so using the TEI guidelines for extension, but should not permit it inside the additional elements listed above.

"form" attribute vs element

Problem: Repertorium defines a "form" attribute on <msDesc> (= TEI-MS <msDescription>) with possible values: (codex | roll | leaf | fragleaf | fragCodex | cutting). This list probably a reasonably complete inventory of the types of paper and parchment documents one encounters, although it is not applicable to birchbark, wax, wood, stone, etc. The Repertorium attribute element is misplaced because the shape of the book is logically part of the physical description, and should be located accordingly.

MASTER proposes a <form> element inside <physDesc> with paragraph content. The Repertorium approach does not permit such MASTER examples as "Three vellum pieces sewn together to make a roll" or "Seven volumes," but one cannot search meaningfully for values like that anyway, and if that exact wording is desirable for rendering purposes in a prose catalog, it might be included in the <head> or a leading <msLooseDesc>.

Proposed Solution: Change the MASTER <form> element to a CDATA attribute on <support> (or some other part of <physDesc>), using values from the Repertorium proposal (except that lower-case "fragleaf" should be replaced by camel-case "fragLeaf". This attribute value is not meaningful for materials other than parchment and paper, but using fixed values where it is applicable will make it possible to find all types of fragments with simple searches, which is not possible with the MASTER <form> element. The Repertorium "unity" attribute on <msDesc> is not needed because the unity of the manuscript can be determined automatically from the subelements (that is, from whether it has one or more <msPart> elements).

Describing parchment and paper

Problem: TEI-MS provides no structured mechanism for describing parchment (e.g., Gregory Rule), ruling and pricking (e.g., Gilissen, Leroy), format (e.g., a list of fixed attribute values for quarto, etc.), and watermarks. All of these should have structured alternatives, ideally with fixed attribute values (rather than PCDATA element content or CDATA attribute values), not so much for searching or rendering (although they might be useful here) as to enable users to search for correspondences (e.g., are manuscrips with certain ruling patterns correlated with particular scribes or scriptoria?).

Proposed Solution: Introduce attributes with a set of possible values to describe the features mentioned above. Specifically:

The "use" attribute on Repertorium <parchmentDesc> and <paperDesc> is probably superfluous; it can be described in prose (as long as an appropriate element with data content is available), and is unlikely to be used for searching, special rendering, or quantitative analysis.

Documenting the source of the information in the description

Problem: The Repertorium provides a "source" attribute on <catalogueStmt> (the Repertorium counterpart to TEI-MS <msIdentifier>) with values (devisu | microform | description | edition | other). This attribute is misplaced (it is part of the source information about the description, rather than a feature of the manuscript), but it serves to document in a standard way the types of resources used in constructing descriptions. Simplifying retrieval according to this information is helpful for scholars who may need to distinguish in a general way the reliability of the source, perhaps to determine which manuscripts merit reevaluation.

Proposed Solution: Add a comparable attribute to TEI-MS, but locate it as an attribute value on the <source> element inside <recordHist>. Retain the <source> element with prose content to specify the source more precisely, e.g., to provide publication information about an edition. Divide the "edition" attribute value into "typesetEdition" and "facsimileEdition", which is a crucial distinction.

General and specific titles

Problem: Reportorium contains as part of the <catalogueStmt> (the counterpart to the TEI-MS <msIdentifier>) a <manuscriptName> element, with an attribute "codexType" that have values (general | individual). This is intended to make it possible to identify, for example, a miscellany (general) and a specific miscellany type (individual). The <manuscriptName> element is misplaced because it is part of the intellectual content of the manuscript, rather than a geographic identifier comparable to repository or signature (shelfmark). The TEI-MS <title> element inside <msItem> fulfills this function, but needs an attribute to distinguish different levels of titles. The current attribute "type" with values "uniform" and "supplied" is not adequate for representing uniform or supplied titles of varying specificity.

Proposed Solution: Enhance the attribute values available for <title> inside <msItem> to make it possible to specify both general titles (e.g., "Patericon") and narrower ones (e.g., "Sinai Patericon"). The values discussed on the TEI-MS mailing list, (generic | distinctive), should be suitable for this purpose.

Possible Complication: In some cases more than two levels of titles are required, e.g., Mineja (a genre term for a book involving readings arranged by month), Služebnaja Mineja (a liturgical type of Mineja), and Služebnaja Mineja na sentjabr′ (a Služebnaja Mineja for the month of September). For some purposes, such as searching, the last two of these could both be tagged as "distinctive", although for formatting a print description it might be necessary to distinguish three levels in markup.

Issues pertaining to <msItem>

Problem: The Repertorium divides the description of the intellectual content of the manuscript into <manuscriptContentDesc>, which pertains to the manuscript as a whole, and <articleContentDesc>, which pertains to constituent articles. Both Reportorium and TEI-MS permit <msItem> (or its equivalent) to nest. This makes <manuscriptContentDesc> unnecessary, since the intellectual content of a manuscript can be described as a parent <msItem>, whose <msItem> children then correspond to the constituent texts.

The content model for <manuscriptContentDesc> in Repertorium is (overview*, numberTexts*, source*, translation*, protograph*, antigraph*, apograph*, litRedaction*, churchCal?, sampleText*, listBibl?, bibl?). TEI-MS now includes a <filiation> element (not originally present in MASTER) with an attribute value capable of distinguishing protographs, antigraphs, and apographs, but not the other subelements. The Reportorium model includes two attributes: "type" (original | compilation | translation) and "style" (narrative | non-narrative). "Style" is probably of limited utility (one could, for example, have chosen hymnographic and non-hymnographic, liturgical and non-liturgical, religious and non-religious, etc.), but "type" may be useful for locating all of the translations quickly, but it may not be needed alongside the <translation> subelement.

Proposed Solution: The "type" attribute should be incorporated (on <msItem>) with the Repertorium values.The Repertorium subelements translation, litRedaction, and churchCal are needed, but they do not pertain to <filiation>. To accommodate them, add two subelements:

  1. A <transmission> element with a CDATA attribute "type" with suggested values (translation | litRedaction).
  2. A CDATA"churchCal" attribute (or something similar) on <msItem>. A formalized system for representing the calendar should be proposed as a framework for suggested values.

Marginalia and other additions

Problem: Repertorium uses <noteDesc> to correspond to TEI-MS <additions>. Repertorium <noteDesc> contains <noteItem> elements, which have structured content comparable to the description of other manuscript material (in TEI-MS terms, this structured content pertains to the intellectual content of the manuscript [<msItem>]or to handwriting [<msWriting>]). TEI-MS <additions> contains only paragraphs, which makes it impossible to search for marginalia with the same content-related or writing-related features as regular text. Because one might wish to find correspondences in marginalia (whether based on contents or form), a structured alternative to paragraphs would be useful.

Proposed Solution: Uncertain. Allow <msItem> and <msWriting> inside individual items within <additions>? Document a mechanism for linking between a general <msWriting> element and those items? The TEI-MS strategy of separating notes in the manuscript (<additions>) from editorial notes (<note>) should be retained; what is lacking is a structured way to describe an "addition".

Location and order of constituent texts

Problem: The order of constituent articles in a miscellany or other complex manuscript would appear to be inferable from the order of the <msItem> elements. Such an assumption yields incorrect results in the case of manuscripts that have been rearranged during rebinding. The actual (current) order needs to be included because it is part of the description of the manuscript as it is, but a structured way of representing the original order is also required for automated analysis (for example, the order of the contents when the manuscript was originally created is the only order that matters for identifying a possible protograph or antigraph).

Proposed Solution: Because MASTER (whether wisely or not) chose to privilege current physical codicology over textual description, the order of content items in the manuscript at the time of description, however accidental that order may be, is primary. Accordingly, the order of <msItem> elements should continue to record the order of the constituent texts as they appear in the manuscript currently. <msItem> should have an optional attribute with a name like "originalLocation" and a value that indicates the original ordinal place of the article where that differs from the current place.

Catchwords, foliation, pagination, and other markings

Problem: TEI-MS provides no structured way of describing markings of this type. Repertorium provides a <markings> element with a "form" attribute and values (catchword | signature | numbers | figure | mixed | other) and %phrase.seq; contents. There is also a "presence" attribute with values (all | some) and an "orientation" attribute with values (horizontal | up | down). In general the marks in question were added after the manuscript was originally composed, which means that similarities in marking patterns between manuscripts do not suggest common origin, but they might suggest, for example, that the manuscripts were rebound in the same location.

Proposed Solution: Introduce an optional and repeatable <markings> element as part of <collation>. The Repertorium attribute values are probably superfluous insofar as they are not needed for searching, rendering, or quantitative analysis. With appropriate typing, this element might be able to subsume the functions of the separate Repertorium <foliation> and <pagination> elements.

Decoration and binding

Problem: Repertorium has richly structured models for the description of decoration and binding. TEI-MS has only loosely structured or unstructured models, and these are not capable of identifying manuscripts in a corpus that share decorative or binding features.

Proposed Solution: Use the Repertorium models as a starting point for creating formalized alternatives to paragraph descriptions of decoration and binding. Elisaveta Musakova might be invited to help develop the decoration model.

Paleography and orthography

Problem: Paleographic and orthographic descriptions are a standard feature of manuscript description and structured representations are needed for searching, rendering, and quantitative analysis. On the other hand, these features are specific to different languages and writing systems, and cannot be standardized across cultural traditions in any general way. TEI-MS contains optional <palaeography>, <orthography>, and <morphology> subelements of <handDesc>; this is the right location (they vary with scribe), but 1) they do not have structured content, and 2) if <morphology> is to be included, <phonology> should probably also be available.

Proposed Solution: Retain the TEI-MS elements (adding <phonology>) with paragraph content. Invite expert constituencies to propose structured alternatives for specific traditions. The Reportorium can supply a structured model for Cyrillic paleography and orthography. The structured alternatives would be part of the TEI DTDs as pizza toppings (or, I suppose, toppings for toppings, since they would be meaningful only if one had selected the manuscript description topping).

The contents of the paleographic description in Repertorium includes <letterForm>, <cadels>, <crypto>, and <musicNotation>. The last of these might go elsewhere in TEI-MS, the the first three provide a generic superstructure that might be useful independently of tradition.