From LISTSERV@LISTSERV.UIC.EDU Wed Sep 1 17:39:47 1999 Date: Wed, 1 Sep 1999 11:21:09 -0500 From: "L-Soft list server at University of Illinois at Chicago (1.8c)" To: Lou Burnard Subject: File: "TR9W4 DOC" TEI TR9, TEI Work Group for Manuscripts Diplomatic transcription of modern manuscripts, preliminary suggestions and recommendations. To be submitted to the TEI meeting in Myrdal, November 1991 Claus Huitfeldt, 8 November 1991 Draft: This text has not been reviewed by other members of the group, and should not be taken as an expression of any official stance of TR9 1 Background The Text Encoding Initiative Work Group for Manuscripts had its first meeting in Louvain-la-Neuve, 26-27 October 1991. Present at this meeting were: D. Buzetti (Bologna), J.Hamesse (Louvain-la-Neuve), C. Huitfeldt (Bergen), M. Sperberg-McQueen (Chicago). There was consensus at the meeting that although it is probably possible to agree on a recommended set of core tags and rules for the encoding of manuscripts, it is almost certain that no such core set will suffice for the encoding of any particular manuscript within any particular project. The following two distinctions appeared to be crucial: (1) Diplomatic transcriptions of single witnesses vs. text-critical transcriptions of several witnesses. (2) Modern manuscripts vs. ancient and medieval manuscripts. The work group decided to divide its work into three parts: One covering features common to all manuscripts, one concerned with modern manuscripts, and one with ancient and medieval manuscripts. It was also decided to seek cooperation with the work groups on Textual Criticism and Printed Books. (Cf. the minutes from the TEI TR9 meeting 26-27 October 1991.) 2 General remarks The present text was written with the intention to treat exclusively of what is of particular relevance to the diplomatic transcription of modern manuscripts. Thus, what such transcriptions have in common with transcription of (i) printed matter, (ii) ancient and medieval manuscripts, (iii) text-critical work, and (iv) analytic or interpretational encoding, will not be covered. However, I have not been able to live up to this intention in all respects, and for several reasons. First, because I do not know the other types of texts and traditions well enough to know what is common and what is special. Second, because time has neither permitted a coordination with Prof. Hamesse's work, which is to concentrate on core manuscript features and ancient and medieval texts, nor a close study of the other TEI Work Groups' recommendations. Third, because there are some points at which I have become uncertain whether the abovementioned distinctions are applicable: Ad (i), Manuscripts vs. printed matter: What about annotations in printed books? What about manuscripts written on printed formulae, schemes, diaries with printed dates, titles, headers, pagination etc.? And what about the abundant typewritten materials more or less heavily annotated by hand? In the following, read "modern manuscripts" for "manuscripts" unless otherwise specified, and take into consideration the possible inclusion of these types of annotated printed or typewritten texts. Ad (ii), modern vs. ancient and medieval: Early printed books have many features in common with manuscripts. Ad (iii), Diplomatic vs. textcritical transcription: The transcription of a single witness is normally assumed to be distinctly different from the (text-critical) collation of several witnesses. Modern manuscripts with insertions, substitutions, revisions, annotations etc. present problems concerning variant readings though partly different from, partly also similar to those encountered in text critical work. (Cf. 10 below.) Besides, diplomatic transcription of single witnesses is often a first step on the way to a text-critical version. And what about the diplomatic transcription of a manuscript which has later emerged in print? etc. Ad (iv), Diplomatic vs. analytical or interpretational: First, the present writer is skeptical about the distinction as such. E.g., what is it that makes us recognize a straight horizontal line partly below, partly overstriking a line of text in a manuscript as an underlining, and not as a deletion? I will not pursue the subject here, since it brings us far away from the subject matter... More important, there are certain questions, which are normally considered analytical or interpretational, which appear in different form in manuscripts from other texts, and which should not be neglected even in purely diplomatic transcriptions. I am thinking of matters such as abbreviations, encoding of elements below word level, certain kinds of internal normalization of orthography etc. (Cf. below.) At the present stage I have not found it possible or desirable to be specific about details such as names of tags, attributes, values, or entity references. Besides, many of the considerations do not necessarily call for new tags and attributes etc., but for different application criteria. (This is for me one of the major problems with TEI P1, a tag set alone can be used for almost anything (cf. Humpty-Dumpty), the real issue is the application criteria. And they are often extremely difficult to decide without becoming either general or specific beyond interest.) Since there has been no time for a general survey of actual encoding schemes, I have simply described (though in many instances in a modified form) some things I believe to be of general relevance from one particular scheme that I know very well, the Wittgenstein Archives' system MECS-WIT (which is not even SGML). No doubt, generalizations on the basis of one specific scheme must have lead me astray on a number of points. In sum, this is neither an attempt to produce anything final nor to represent a consensus of the work group, but rather an attempt to highlight some items and to suggest some possible recommendations for discussion. One final remark: The present writer is convinced that diplomatic transcription in the computer age, which is also the age of computerized facsimile, xerox machines, and other reprographic techniques, should not aim at the exact reproduction of all paleographic/ typographic/ physical/ topological (?) features of the copy text. In my eyes, a diplomatic transcription can only be a careful and detailed representation of what is considered as such features, to the extent that they are found significant by scholars concerned, and in a form which supplies sufficiently detailed information about what all scholars agree on, in order for them to make informed decisions on matters about which they may disagree. Therefore, the following recommendations are also made on the basis of general assumptions about what everyone will in general agree on as significant and indisputable - now, and for a long time to come. 2 Pages and pagination In printed matter there is a certain regularity in the physical order of appearance of the text which is often absent in manuscripts. Manuscript books may contain insertions in running text on recto-pages written on verso-pages, manuscript books may be written beginning on recto-pages continuing on verso-pages, etc. Pagination is often inconsistent, repagination may have resulted in several pagination numbers on each page etc. At the same time, pages and page numbers are often important and well established points of reference. Recommendation: Texts should be transcribed according to their physical order of appearance, but only to the extent that this does not obviously lead to unnecessary (?) absurdity. Wherever necessitated by requirements of cotextual consistency, this recommendation is overruled. Therefore, since parts of text may have to be moved from their proper physical to their proper cotextual position, it is of high importance that folios, pages, and pagination is indicated for all parts of text. 3 Lines, vertical and horizontal space On the printed page there is always at least a minimal linearity in the physical order of appearance of the text. Also in columned or tabular print, and even in print with interlinear text, there are at least clear and systematic rules for the deviations from strict linearity, e.g. there is seldom doubt as to what is interlinear and what not. This is not so in all manuscripts, where interlinear and marginal insertions, annotations, deletions, substitutions and the like both make it unclear what the physically linear order is, and forces a cotextual order of representation to deviate from the physically linear order, or even to repeat several times in the encoded text what occurs only once in the copy text. In such cases, it appears pointless to insist on the encoding of line endings, at least unless commentary has established lines as points of reference. However, there are exceptions - even in such texts, there are certain line endings which are of quite clear significance, e.g. in centered text, in certain kinds of columns, etc. This calls for a distinction between "hard" and "soft" line endings in manuscripts, - or rather, to encode only "hard" line endings. Indentation, if found to be of significance, should be indicated. In modern manuscripts written in a casual manner there is hardly any point in measuring them in millimeters or inches. A more suitable unit of measure may be average character width, percentage of page width, or the like, and the values should be rounded off in suitable intervals. The same goes for other horizontal blank spaces, except that blank space in a line should be sharply distinguished from indentations, which always imply a line brake. In tabular and columned text additional indication of horizontal blank space will normally not be advisable. Vertical space should be handled with more care, since vertical blank space often indicates textual subdivisions which are unclear and therefore not captured by other codes. 4 Sections and sentences Manuscripts do often not adhere to the general scheme of division, - front matter, main body, and back matter -, each of which subdivided into their specific proper parts, each of which finally subdivided into paragraphs at the lowest level of division. The borderlines between such parts of content may be fuzzy, their order of appearance may be switched, etc. Many (most?) texts do not contain front or back matter, parts, chapters and paragraphs at all - just a series of pages filled with text and occasional, more or less arbitrary, vertical spaces. This poses a problem since there is always a need for a system of reference. Pages or folios supply one such system, but it does mostly not coincide with natural cotextual units, and in addition we have already seen (cf. 2 above) that parts of text may have to be "moved" from or within their proper places within this system. The div1, div2, div3... tags of TEI P1 are probably well suited, but each project must carefully design its own application criteria. The transcription of Wittgenstein is a lucky case, - Wittgenstein was very careful about where he put blank lines, so we define a section as a part of text between blank lines. In this case, a section turns out to be anything between one sentence and several pages, but mostly less than one and very rarely more than five pages. The TEI concept of orthographic sentence is also applicable to manuscripts. However, one should be aware that unclarity as to whether a piece of text belongs to a sentence or not, or to which sentence, is much more frequent in manuscripts than in other sources, because of the number of insertions, deletions, false starts and the like. Special tags are recommended for incomplete sentences and elements within sentences which do not form well-formed parts of the sentences within which they occur. 5 Readability This phenomenon is not different in manuscripts from what it is in other types of text, but since it is obviously more frequent and applies to larger parts of text, it would be convenient to introduce tags to indicate a number of unreadable words of lines. 6 Marks and lines in margins This feature is probably not peculiar to manuscripts, but it is mentioned here because it is very frequent and obviously significant in certain manuscripts, and not mentioned in TEI P1. 7 Underlining etc. Underlining is not a feature peculiar to manuscripts. But the fact that underlinings may be canceled, added by another or a later hand, etc is. Cf. below. Shapes and style may also take other forms than in printed text. 8 Deletions I know of no good English word for the simple phenomenon I am thinking of here - perhaps "overstriked text" is the right one. (The TEI del-tag (TEI P1 p. 95) concerns editors "deletions", so an additional tag is required. This phenomenon probably exists only in manuscripts. It is sometimes assumed that deleted text is of no interest and that it should simply be left out. This assumption is strongly discouraged. Deleted text should be transcribed along with non-deleted text. This poses some special problems with deleted parts of words etc. - cf. 15 below. 9 Insertions Again a feature which is not in itself unique to manuscripts - but its frequency and forms are probably unique. It is recommended that insertions are put in their proper cotextual (not physical) positions, and that the following attributes are required: Position, decide ability, marked/non- marked. The first and last attributes are probably self-explanatory, while the third requires some explanation: If the proper linear position of an insertion is explicitly marked (by insertion marks, arrows, or the like) in the copy text, the insertion is decided. If not, then if its proper position may be inferred with certainty on the basis of cotextual considerations, it is decidable. If neither, it is undecidable, and should be placed in a linear position where it disturbs the rest of the text as little as possible, yet close to its corresponding physical location. 10 Substitutions, counterpositions, overwritings These features are sometimes referred to collectively as alternative readings, variant readings, or the like. Their structure is to a large extent identical to the structure of variant readings dealt with in text critical studies and/or collation of texts from different witnesses. In manuscripts, however, there are variant readings within one and the same witness, not only resulting from a second or even from a later hand, but also from insertions, deletions, overwritings, counterpositions, and combinations of such. What is the best way to represent these features syntactically within SGML is problematic or at least complicated. Some methods have been discussed in TEI P1 5.10.3-5. At the Wittgenstein Archives a variety of different methods have been developed over the years. Some of these are at least basically or partly identical to those discussed in TEI, some seem to be widely different (but may also turn out to be basically identical). However, space and time forbids going into questions of syntactical means of representation here, and this aspect of the problem is perhaps outside the scope of the manuscript work group anyway. However, what IS important at this stage, is that variants or substitutions seem to take on forms in manuscripts in which they do not exist in other types of text. Irrespective of the syntactical means of representation, the encoding of substitutions for manuscripts should provide for the indication of such features as: Decidedness, priority, cotextual binding, hand, and status. Features of individual elements in a substitution should be encoded independently of their relation to the other elements. Decidedness: A substitution is decided if there is a clear and conclusive indication of preference for one element to all others. E.g. two elements deleted, the third not deleted. A substitution is decidable if such preference can be inferred on the basis of cotextual considerations. E.g. two elements not grammatically well-formed parts of the rest of the sentence, the third a grammatically well-formed part. A substitution is undecidable if it is neither decided nor decidable. E.g. none or all elements deleted, none or all elements well-formed parts of the sentence. Priority: Irrespective of the decidedness of a substitutions, with any number of elements higher than 2 the decidedness does not establish exhaustively the order of preference. (E.g. that a substitution-relationship between 3 elements is decided in favor of element number 1, does not establish the order of priority between number 2 and 3.) Cotextual binding: An element in one substitution may be cotextually bound to (presuppose or exclude) an element from some other substitution. Hand: A substitution may be the result of interference by a seconda manu, or by the same hand at a later time, even without any particular element being added by any other or later hand. Status: Substitutions may be canceled, - again by another or a later hand, and even without any particular element being canceled. 11 Abbreviations etc One does find kinds of abbreviations in manuscripts which do not occur often in printed texts, but probably there are none in modern manuscripts which will not also be represented in ancient or medieval texts. 12 Emendation and normalization This headline may seem inappropriate in a recommendation for diplomatic transcription. However, there are certain emendations which are so easily performed and do no harm to the overall goal of diplomatic transcription, yet so strongly improves the possible usefulness of the transcriptions also for other purposes (word search, lexical analysis), that it would be wrong not to recommend them also for diplomatic transcription. Most details of this is common ground and does not deserve special mention here (but certainly in a final version of TR9's recommendations). The issue implies the acceptance and in some cases even the requirement of transcriber's additions, substitutions, deletions and counterpositions. 13 Punctuation Punctuation is generally much more inconsistent in manuscripts than in most other kinds of text. Since punctuation carries so much structural information, which may otherwise be scarce in such texts, the recommendation to encode the underlying feature instead of the punctuation itself does not apply to manuscripts. Both kinds of marking should be applied. However, a much more discriminative encoding scheme than the one proposed so far must be introduced. E.g., representing all points (full stops) with a full stop does not suffice. Neither would there be any point (ha-ha) in substituting one single entity reference, e.g. &fs; for all points. The function of a point may be to indicate the end of a declarative sentence, an abbreviation, an ordinal number, it may be a logical operator, and so on. There should either be separate entity references for each of these different functions, or one must find a more general way of solving the problem by means of generic encoding. (The latter strategy has been adopted by the Wittgenstein Archives.) 14 Cancellations, second or later hand The following attributes seem to be applicable to a large number of tags and to be peculiar to manuscripts. Cancellation: Features like underlining, indentation, deletions, substitutions, counterpositions, and overwritings may all be canceled in various different ways and by different hands. To make things worse, cancellations may themselves be canceled, again in several different ways and by different hands. Second or later hand: Not only text elements, but also features such as underlining, indentation, deletions, substitutions, counterpositions, and overwritings may be supplied by a different hand, or by the same hand at a later time. The same goes for cancellations, which may be regarded as features of attributes, - although it is unclear how this should be represented. 15 Word delimiters, encoding below word-level My impression is that blanks are regarded quite generally as word delimiters in TEI. Already at the outset, this raises the question whether a special code for typographical blanks should be introduced; or whether blanks should be regarded as insignificant formatting characters, i.e. exclusively as parts of the formatting of the encoded text itself, and a separate encoding of beginning and end of words introduced. Encoding below word-level, i.e. insertion of tags inside words, must be recommended and permitted without restriction. Encoding below character-level is not permitted (if only because it is difficult to see how this should be possible). Since the transcription of manuscripts involves the transcription of deleted, substituted, and inserted text and of text by different hands, they will contain strings which are incomplete words, or strings which contain parts which do not form proper parts of the word represented by the string. This poses obvious problems for retrieval, lexical analysis, etc. Special tags should be introduced for the marking of incomplete words and strings which occur inside but do not form proper parts of words. 16 Presentation vs. underlying feature Some words have already been said on this under 13 above. The general recommendations of TEI P1 is to encode the underlying feature rather than its outer appearance. Concerning diplomatic transcription this must of course be quite generally discouraged. The perhaps most important reason why this must be discouraged is that SGML does not allow tagging of attribute values, so that representing parts of text as attribute values prohibit a proper representation of them. As touched upon earlier, the present writer is not at all convinced that the distinction between underlying feature and outer appearance is clear enough to be generally applicable. But this issue is to broad to be pursued here, and has got nothing to do with manuscripts per se. The End.