Géometrie Pratique

Representation of Primary Sources

This chapter defines a module intended for use in the representation of primary sources, such as manuscripts or other written materials. Section provides elements for the encoding of digital facsimiles or images of such materials, while the remainder of the chapter discusses ways of encoding detailed transcriptions of such materials. It is expected that this module will also be useful in the preparation of critical editions, but the module defined here is distinct from that defined in chapter , and may be used independently of it. Detailed metadata relating to primary sources of any kind may be recorded using the elements defined by the manuscript description module discussed in chapter , but again the present module may be used independently if such data is not required.

It should be noted that, as elsewhere in these Guidelines, this chapter places more emphasis on the problems of representing the textual components of a document than on those relating to the description of the document's physical characteristics such as the carrier medium or physical construction. These aspects, of particular importance in codicology and the bibliographic study of incunables, are touched on in the chapter on Manuscript Description () and also form the subject of ongoing work in the TEI Physical Bibliography workgroup.

Although this chapter discusses manuscript materials more frequently than other forms of written text, most of the recommendations presented are equally applicable mutatis mutandis in the encoding of printed matter or indeed any form of written source, including monumental inscriptions. Similarly, where in the following descriptions terms such as scribe, author, editor, annotator or corrector are used, these may be re-interpreted in terms more appropriate to the medium being transcribed. In printed material, for example, the compositor plays a role analogous to the scribe, while in an authorial manuscript, the author and the scribe are the same person.

Digital Facsimiles

These Guidelines are mostly concerned with the preparation of digital texts, in which a pre-existing text is transcribed or otherwise converted into character form, and marked up in XML. However, it is also very common practice to make a different form of digital text, which is instead composed of digital images of the original source, typically one per page, or other written surface. We call such a resource a digital facsimile. A digital facsimile may, in the simplest case, just consist of a collection of images, with some metadata to identify them and the source materials portrayed. It may sometimes contain a variety of images of the same source pages, for example of different resolutions, or of different kinds. Such a collection may form part of any kind of document, for example a commentary of a codicological or paeleographic nature, where there is a need to align explanatory text with image data. And it may also be complemented by a transcribed or encoded version of the original source, which may be linked to the page images. In this section we present elements designed to support these various possibilities and discuss the associated mechanisms provided by these Guidelines.

When this module is included in a schema, the class att.global is extended to include a new pointer attribute facs: This attribute may be used to associate any element in a transcribed text with an image of it, by means of the usual URI pointing mechanism.

If a digital text contains one image per page or column (or similar unit), and no more complex mapping between text and image is envisaged, then the facs attribute may be used to point directly to a graphic resource: By convention, this encoding indicates that the image indicated by facs attribute represents the whole of the text following the pb (pagebreak) element, up to the next pb element. Any convenient milestone element (see further ) could be used in the same way; for example if the images represent individual columns, the cb element might be used. Though simple, this method has some drawbacks. It does not scale well to more complex cases where, for example, the images do not correspond exactly with transcribed pages, or where the intention is to align specific marked up elements with detailed images, or parts of images. And it makes the management of the information about the images more difficult by scattering references to them through the file. Nevertheless, this solution may be adequate for many straightforward digital library applications.

The recommended approach to encoding facsimiles is instead to use the facs attribute in conjunction with the elements facsimile, surface, and zone, which are also provided by this module. These elements make it possible to accommodate multiple images of each page, as well as to record arbitrary planar coordinates of textual elements on any kind of written surface and to link such elements with digital facsimile images of them. Typical applications include the provision of full text search in digital facsimile editions, and ways of annotating graphics, for example so as to identify individuals appearing in a group portraits and link them to data about the person represented.

The following elements are used to represent components of a digital facsimile:

The facsimile element is used to represent a digital facsimile. It appears within a TEI document along with, or instead of, the text element introduced in section . When this module is selected therefore, a legal TEI document may thus comprise any of the following:- a TEI Header and a text element a TEI Header and a facsimile element a TEI Header, a facsimile element, and a text element

Like the text element, a facsimile element may also contain an optional front or back element, used in the same way as described in sections and .

In the simplest case, a facsimile just contains a series of graphic elements, each of which identifies an image file: If desired, the binaryObject element described in (or any other element from the model.graphicLike class) can be used instead of a graphic.

In this simple case, the four page images are understood to represent the complete facsimile, and are to be read in the sequence given. Suppose, however, that the second page of this particular work is available both as an ordinary photograph and as an infra-red image, or in two different resolutions. The surface element may be used to indicate that there are two image files corresponding with the same area of the work:

The surface element provides a way of indicating that the two images of page2 represent the same physical surface within the source material. A surface might be a sheet of paper or parchment, a face of a monument, a billboard, a membrane of a scroll, or indeed any two-dimensional surface, of any size.

The actual dimensions of the object represented are not documented by the surface element; instead, the surface is located within an abstract coordinate space, which is defined by the following attributes, supplied by the att.coordinated class:

The same coordinate space is used for a surface and for all of its child elements.The coordinate space may be thought of as a grid superimposed on a rectangular space. Rectangular areas of the grid are defined as four numbers a b c d: the first two identify the grid point which is at the upper left corner of the rectangle; the second two give the grid point located at the lower right corner of the rectangle. The grid point a b is understood to be the point which is located a points from the origin along the x (horizontal) axis, and b points from the origin along the y (vertical) axis. It may be most convenient to derive a coordinate space from a digital image of the surface in question such that each pixel in the image corresponds with a whole number of units (typically 1) in the coordinate space. In other cases it may be more convenient to use units such as millimetres; in neither case is any specific mapping to the physical dimensions of the object represented implied.

Each surface can contain one or more zone elements, each of which represents a rectangular region or bounding box defined in terms of the same coordinate space as that of its parent surface element. This provides a unit of analysis which may be used to define any rectangular region of interest, such as a detail or illustration, or some part of the surface which is to be aligned with a particular text element. The att.coordinated attributes listed above are also used to supply the coordinates of a zone.

As we have seen, a surface will usually correspond with the whole of a written surface. A zone, by contrast, defines any arbitrary rectangular area of interest using the same coordinate system. It might be bigger or smaller than its parent surface, or might overlap its boundaries. The only constraint is that it must be defined using the same coordinate system.

When an image of some kind is supplied within either a zone or a surface, the implication is that the whole of the image represents the zone or surface containing it. In the simple case therefore, we might imagine a surface defining a page, within which there is a graphic representing the whole of that page, and a number of zones defining parts of the page, each with its own graphic, each representing a part of the page. If however one of those graphics actually represents an area larger than the page (for example to include a binding or the surface of a desk on which the page rests), then it will be enclosed by a zone with coordinates larger than those of the parent surface.

Note that this mechanism does not provide any way of addressing a non-rectangular area, nor of coping with distortions introduced by perspective or parallax; if this is needed, the more powerful mechanisms provided by the Standard Vector Graphics (SVG) language should be used to define an overlay, as further discussed in .

For example, consider the following figure:

Relation between page, surface, and zone This is an image of a two page spread from a manuscript in the Badische Landesbibliothek, Karlsruhe. We have no information as to the dimensions of the original object, but the low resolution image displayed here contains 500 pixels horizontally and 321 pixels vertically. For convenience, we might map each pixel to one cell of the coordinate space.The coordinate space used here is based on pixels, but the mapping between pixels and units in the coordinate space need not be one-to-one; it might be convenient to define a more delicate grid, to enable us to address much smaller parts of the image. This can be done simply by supplying appropriate values for the attributes which define the coordinate space; for example doubling them all would map each pixel to two grid points in the coordinate space.

The coordinates of the surface (that is, the area of the image which represents the written two page spread) can then be specified in terms of this coordinate space, simply by counting pixels in the image. The left corner of the two page spread appears 50 units from the left of the image and 20 units from the top, while the bottom right corner of the spread appears 400 units from the left of the image, and 280 units from the top. We therefore define the written surface within this image as follows: To describe the whole image, we will also need to define a zone of interest which represents an area larger than this surface. Using the same coordinate system as that defined for the surface, its coordinates are 0,0,500,321. This zone of interest can be defined by a zone element, within which we can place the uncropped graphic:

If desired, the binaryObject element described in (or any other element from the model.graphicLike class) may be used instead of a graphic element.

The desc element may also be used within either surface or zone to provide some further information about the area being defined. For example, since the image in this example contains two pages, it might be preferable to define two distinct surfaces, one for each page, including its illuminated margins. In this case, each surface must specify a bounding box which encloses the appropriate page, as well as defining the zone for the graphic itself: left hand page right hand page

In addition to acting as a container for graphic elements, zone elements may also be used to select parts of each surface for analytical purposes. For example, to define the written part of the left hand page: Left hand page Written part of left hand page

In the following example, we discuss a hypothetical digital edition of an early 16th century French work, Charles de Bovelles' Géometrie Pratique.The image is taken from the collection at , and was digitized from a copy in the Bibliothèque Municipale de Lyon, by whose kind permission it is included here In this edition, each page has been digitized as a separate file: for example, recto page 49 is stored in a file called Bovelles-49r.png. In the facsimile element used to contain the whole set of pages, we define a surface element for this page, which we situate within a coordinate scale running from 0 to 200 in the x (horizontal) axis, and 0 to 300 in the y (vertical) axis. The surface element contains a graphic element which represents the whole of this surface: We can now identify distinct zones within the page image using the coordinate scale defined for the surface. In we show the upper part of the page, with boxes indicating four such zones. Each of these will be represented by a zone element, given within the surface element already defined, and specified in terms of the same coordinate system.

Zones within a surface The following encoding defines each of the four zones identified in the figure. contains the title Note that the location of each zone is defined independently but using the same coordinate system, so that they may overlap freely. Zones need not nest within each other; they must however be rectangular, as previously noted. As noted earlier, a zone may fall outside the area of the surface which defines its coordinate space.

In this example a single graphic element has been associated directly with the surface of the page rather than nesting it within a zone. However, it is also possible to include multiple zone elements which contain a graphic element, if for example a detailed image is available. Since all zone elements use the same coordinate system (that defined by their parent surface), there is no need to demonstrate enclosure of one zone within another by means of nesting. To continue the current example, supposing that we have an additional image called Bovelles49r-detail.png containing an additional image of the figure in the third zone above, we might encode that zone as follows:

Now suppose that we wish to align a transcription of this page with the zones identified above. The first step is to give each relevant part of the facsimile an identifier: The alignment between transcription and image is made, as usual, by means of the facs attribute: De Geometrie 49 DU SON ET ACCORD DES CLOCHES ET des alleures des chevaulx, chariotz & charges, des fontaines:& encyclie du monde, & de la dimension du corps humain. Chapitre septiesme

Le son & accord des cloches pendans en ung mesme axe, est faict en contraires parties.

LEs cloches ont quasi figures de rondes pyramides imperfaictes & irregulieres: & leur accord se fait par reigle geometrique. Comme si les deux cloches C & D sont pendans à ung mesme axe ou essieu A B: je dis que leur accord se fera en contraires parties comme voyez icy figuré. Car quand lune sera en hault, laultre declinera embas. Aultrement si elles declinent toutes deux ensembles en une mesme partie, elles seront discord, & sera leur sonnerie mal plaisante à oyr.

Further discussion of the encoding choices made in the above transcription is provided in the remainder of this chapter.

It is also possible to point in the other direction, from a surface or zone to the corresponding text. This is the function of the start attribute, which supplies the identifier of the element containing the transcribed text found within the surface or zone concerned. Thus, another way of linking this page with its transcription would be simply De Geometrie 49

&facsimile; &att.global.facs; &surface; &att.coordinated; &zone; &model.resourceLike;

Scope of Transcriptions

When transcribing a primary source, scholars may wish to record information concerning individual readings of letters, words, or larger units, whether the object is simply a neutral transcription or a critical edition. In either case they may also wish to include other editorial material, such as comments on the status or possible origin of particular readings, corrections, or text supplied to fill lacunae. Further, it is customary in transcriptions to register certain features of the source, such as ornamentation, underlining, deletion, areas of damage and lacunae. This chapter provides ways of encoding such information: first, methods of recording editorial or other alterations to the text, such as expansion of abbreviations, corrections, conjectures, etc. (section ) then, methods of describing important extra-linguistic phenomena in the source: unusual spaces, lines, page and line breaks, change of manuscript hand, etc. (section ) finally, a method of recording material such as running heads, catch-words, and the like (section )

These recommendations are not intended to meet every transcriptional circumstance likely to be faced by any scholar. Rather, they should be regarded as a base which can be elaborated if necessary by different scholars in different disciplines.

As a rule, all elements which may be used in the course of a transcription of a single witness may also be used in a critical apparatus, i.e. within the elements proposed in chapter . This can generally be achieved by nesting a particular reading containing tagged elements from a particular witness within the rdg element in an app structure.

Just as a critical apparatus may contain transcriptional elements within its record of variant readings in various witnesses, one may record variant readings in an individual witness by use of the apparatus mechanisms app and rdg. This is discussed in section .

Altered, Corrected, and Erroneous Texts

In the detailed transcription of any source, it may prove necessary to record various types of actual or potential alteration of the text: expansion of abbreviations, correction of the text (either by author, scribe, or later hand, or by previous or current editors or scholars), addition, deletion, or substitution of material, and the like. The sections below describe how such phenomena may be encoded using either elements defined in the core module (defined in chapter ) or specialized elements available only when the module described in this chapter is available.

Core elements for Transcriptional Work

In transcribing individual sources of any type, encoders may record corrections, normalizations, expansions of abbreviations, additions, and omissions using the elements described in section . Those particularly relevant to this chapter include:

Several of these elements bear additional attributes for specifying who is responsible for the interpretation represented by the markup, and the certainty associated with it. In addition, some of them bear an attribute allowing the markup to be categorised by type and source. The specific aspect of the markup described by these attributes differs on different elements; for further discussion, see the relevant sections below, especially section .

The following sections describe how the core elements just named may be used in the transcription of primary source materials.

Abbreviation and Expansion

The writing of manuscripts by hand lends itself to the use of abbreviation to shorten scribal labour. Commonly occurring letters, groups of letters, words, or even whole phrases, may be represented by significant marks. This phenomenon of manuscript abbreviation is so widespread and so various that no taxonomy of it is here attempted. Instead, methods are shown which allow abbreviations to be encoded using the core elements mentioned above.

A manuscript abbreviation may be viewed in two ways. One may transcribe it as a particular sequence of letters or marks upon the page: thus, a p with a bar through the descender, a superscript hook, a macron. One may also interpret the abbreviation in terms of the letter or letters it is seen as standing for: thus, per, re, n. Both of these views are supported by these Guidelines.

In many cases the glyph found in the manuscript source also exists in the Unicode character set: for example the common Latin brevigraph ⁊, standing for et and often known as the Tironian et can be directly represented in any XML document as the Unicode character with code point U+204A (see further and ). In cases where it does not, these Guidelines recommend use of the g element provided by the gaiji module described in chapter . This module allows the encoder great flexibility both in processing and in documenting non-standard characters or glyphs, including the ability to provide detailed documentation and images for them.

These two methods of coding abbreviation may also be combined. An encoder may record, for any abbreviation, both the sequence of letters or marks which constitutes it, and its sense, that is, the letter or letters for which it is believed to stand. For example, in the following fragment the phrase euery persone is represented by a sequence of characters which may be transcribed directly, using the g element to indicate the two brevigraphs it contains as follows: euery persone that loketh after heuen hath a place in this ladder Note that in each case the g element may contain a suggested replacement for the referenced brevigraph; this is purely advisory however, and may not be appropriate in all cases. The referenced character definitions may be located elsewhere in this or some other document, typically forming part of a charDecl element, as described in .

The transcriber may also wish to indicate that, because of the presence of these particular characters, the two words are actually abbreviations, by using the abbr element: euery persone ... Alternatively, the transcriber may choose silently to expand these abbreviations, using the expan element: euery persone ... And, of course, the choice element can be used to show that one encoding is an alternative for the other: eueryeuery

When abbreviated forms such as these are expanded, two processes are carried out: some characters not present in the abbreviation are added (always), and some characters or glyphs present in the abbreviation are omitted or replaced (often). For example, when the abbreviation Dr. is expanded to Doctor, the dot in the abbreviation is removed, and the letters octo are added. Where detailed markup of abbreviated words is required, these two aspects may be marked up explicitly, using the following elements: Using these elements, a transcriber may indicate the status of the individual letters or signs within both the abbreviation and the expansion. The am element surrounds characters or signs such as tittles or tildes, used to indicate the presence of an abbreviation, which are typically removed or replaced by other characters in the expanded form of the abbreviation: euy sone ... while the ex element may be used to indicate those characters within the expansion which are not present in the abbreviated form. euery persone ... The content of the abbr element should usually include the whole of the abbreviated word, while the expan element should include the whole of its expansion. If this is not considered necessary, the am and ex elements may be used within a choice element, as in this example: euery persone ...

As implied in the preceding discussion, making decisions about which of these various methods of representing abbreviation to use will form an important part of an encoder's practice. As a rule, the abbr and am elements should be preferred where it is wished to signify that the content of the element is an abbreviation, without necessarily indicating what the abbreviation may stand for. The ex and expan elements should be used where it is wished to signify that the content of the element is not present in the source but has been supplied by the transcriber, without necessarily indicating the abbreviation used in the original. The decision as to which course of action is appropriate may vary from abbreviation to abbreviation; there is no requirement that the one system be used throughout a transcription, although doing so will generally simplify processing. The choice is likely to be a matter of editorial policy. If the highest priority is to transcribe the text literatim, while indicating the presence of abbreviations, the choice will be to use abbr or am throughout. If the highest priority is to present a reading transcription, while indicating that some letters or words are not actually present in the original, the choice will be to use ex or expan throughout.

Further information may be attached to instances of these elements by the note element, on which see section , and by use of the resp and cert attributes. In this instance from the English Brut, a note is attached to an editorial expansion of the tail on the final d of good to goode: For alle the while that I had goode I was welbeloued Then the note: The stroke added to the final d could signify the plural ending (-es, -is, -ys>) but the singular good was used with the meaning property, wealth, at this time (v. examples quoted in OED, sb. Good, C. 7, b, c, d and 8 spec.) The editor might declare a degree of certainty for this expansion, based on the OED examples, and state the responsibility for the expansion: For alle the while that I had goode I was welbeloued The value supplied for the resp attribute should point to the name of the editor responsible for this and possibly other interventions; an appropriate element therefore might be a respStmt element in the header like the following: Editorial emendations Malcom Parkes Observe that the cert and resp attributes are used with the ex element only to indicate confidence in the content of the element (i.e. the expansion), and responsibility for suggesting this expansion respectively.

The choice element may be used to indicate that the proposed expansion is one way of encoding what might equally well be represented as an abbreviation, represented by the hooked D, as follows: For alle the while that I had goodɽ goode I was welbeloued If it is desired to express aspects of certainty and responsibility for some other aspect of the use of these elements, then the mechanisms discussed in chapter should be used. See also for discussion of the issues of certainty and responsibility in the context of transcription.

If more than one expansion for the same abbreviation is to be recorded, multiple notes may be supplied. It may also be appropriate to use the markup for critical apparatus; an example is given in section .

Correction and Conjecture

The sic, corr, and choice elements, defined in the core module should be used to indicate passages deemed in need of correction, or actually corrected, during the transcription of a source. For example, in the manuscript of William James's A Pluralistic Universe, edited by Fredson Bowers (Cambridge: Harvard University Press, 1977) a sentence first written One must have lived longer with this system, to appreciate its advantages. has been modified by James to begin But One must ..., without the inital capital O having been reduced to lowercase. This non-standard orthography could be recorded thus: But One must have lived ... or corrected: But one must have lived ... or the two possibilities might be represented as a choice: But Oneone must have lived ...

Similarly, in this example from Albertus Magnus, both a manuscript error angues and its correction augens are registered within a choice element: Nos autem iam ostendimus quod nutrimentum et anguesaugens.

Note that the corr element is used to provide a corrected form which is not present in the source; in the case of a correction made in the source itself, whether scribal, authorial, or by some other hand, the add, del, and subst elements described in should be used.

The sic element is used to mark passages considered by the transcriber to be erroneous; in such cases, the corr element indicates the transcriber's correction of them. Where the transcriber considers that one or more words have been erroneously omitted in the original source and corrects this omission, the supplied element discussed in should be used in preference to corr. Thus, in the following example, from George Moore's draft of additional materials for Memoirs of My Dead Life, the transcriber supplies the word we omitted by the author: You see that I avoid the word create for we create nothing we develope.

As with expan and abbr, the choice as to whether to record simply that there is an apparent error, or simply that a correction has been applied, or to record both possible readings within a choice element is left to the encoder. The decision is likely to be a matter of editorial policy, which might be applied consistently throughout or decided case by case. If the highest priority is to present an uncorrected transcription while noting perceived errors in the original, the choice will typically be to use only sic throughout. If the highest priority is to present a reading transcription, while indicating that perceived errors in the original have been corrected, the choice will be to use only corr throughout.

Further information may be attached to instances of these elements by the note element and resp and cert attributes. Instances of these elements may also be classified according to any convenient typology using the type attribute.

For example, consider the following encoding of an emendation in the Hengwrt manuscript proposed by E. Talbot Donaldson: Telle me also, to what conclusioun Were membres maad, of generacioun And of so parfit wis a wightwright ywroght? This emendation of the Hengwrt copy text, based on a Latin source and on the reading of three late and usually unauthoritative manuscripts, was proposed by E. Talbot Donaldson in Speculum 40 (1965) 626–33. The note element discussed in may be used to give a more detailed discussion of the motivation for or scope of a correction. If linked by means of a pointer (as in this example) it may be located anywhere convenient within the transcription; typically all detailed notes will be collected together in a separate div element in the back. Alternatively, the pointer may be omitted, and the note placed immediately adjacent to the element being annotated. The advantage of the former solution is that it permits the same annotation to refer to several corrections.

The attribute cert may be used to indicate the degree of confidence ascribed by the encoder to the proposed emendation on a broad scale: high, medium, or low. The attribute resp is used to indicate who is responsible for the proposed emendation. Its value is a pointer, which will typically indicate a respStmt or name element in the header of the transcribed document, but can point anywhere, for example to some online authority file. Using these two attributes, the corr element presented above might usefully be enhanced as follows: E Talbot Donaldson And of so parfit wis a wightwright ywroght?

As remarked above, where the same annotation applies to several corrections, this may be represented by supplying multiple pointers on the note. Consider for example such corrections as the following, in Dudo of S. Quentin. Parkes cites two cases in this manuscript of the same phenomenon: quamuis mensiners que nutu dei gesta sunt ... unde esset uiriliter uegetatanegata which may be described as follows: Substitution of a more familiar word which resembles graphically what the scribe should be copying but which does not make sense in the context.

The target attribute on the note element indicates the choice elements which exemplify this kind of scribal error. This necessitates the addition of an identifier to each choice element. However, if the number of corrections is large and the number of notes is small, it may well be both more practical and more appropriate to regard the collection of annotations as constituting a typology and then use the type attribute. Suppose that the note given above is one of half a dozen possible kinds of corrected phenomena identified in a given text; others might include, say, repetition of a word from the preceding line, etc. The type attribute on the corr element can be used to specify an arbitrary code for the particular kind of correction (or other editorial intervention) identified within it. This code can be chosen freely and is not treated as a pointer. quamuis mensiners que nutu dei gesta sunt ... unde esset uiriliter uegetatanegata Note that this encoding might be extended to include a range of possible corrections: quamuis mensinersinres que nutu dei gesta sunt ... In addition, the conscientious encoder will provide documentation explaining the circumstances in which particular codes are judged appropriate. A suitable location for this might be within the correction element of the encodingDesc of the header, which might include a list such as the following:

The following codes are used to categorise corrections identified in this transcription: graphSubs Substitution of a more familiar word which resembles graphically what the scribe should be copying but which does not make sense in the context.

A subtype attribute may be used in conjunction with the type for subclassification purposes: the above examples might thus be represented as choice type="substitution" subtype="graphicResemblence" for example.

For a given project, it may well be desirable to limit the possible values for the type or subtype attributes automatically. This is easily done but requires customization of the TEI system using techniques described in , in particular , which should be consulted for further information on this topic.

When making a correction in a source which forms part of a textual tradition attested by many witnesses, a textual editor will sometimes use a reading from one witness to correct the reading of the source text. In the general case, such encoding is best achieved with the mechanisms provided by the module for textual criticism described in chapter . However, for simple cases, the source attribute of the corr attribute may suffice. In the passage from Chaucer's Wife of Bath's Tale mentioned above, Parkes proposes to emend the problematic word wight to wyf which is the reading found in the Cambridge manuscript Gg.1. 27. This may be simply represented as follows: And of so parfit wis a wightwyf ywroght? The value of the source attribute here is, like the value of the resp attribute, a pointer, in this case indicating the manuscript used as a witness. Elsewhere in the transcribed text, a list of witnesses used in this text will be given, one of which has an identifier Gg. Each witness will be represented either by a witness element (see ) or more fully by a msDesc element (see ) : Cambridge University Library Gg.1. 27

The app element described in chapter provides a more powerful way of representing all three possible readings in parallel: And of so parfit wis a wight wright wyf

This encoding simply records the three readings found in the various traditions, and gives (by means of the wit attribute) an indication of the witnesses supporting each. If the resp attribute were supplied on the rdg element, it would indicate the person responsible for asserting that the manuscript indicated has this reading, who is not necessarily the same as the person responsible for asserting that this reading should be used to correct the others. Editorial intervention elements such as corr can however be nested within a rdg to provide this additional information: And of so parfit wis a wight wright wyf This encoding asserts that the reading wyf found in Gg is regarded as a correction by Parkes.

Like the resp attribute, the cert attribute may be used with both corr and rdg elements. When used on the rdg element, these attributes indicate confidence in and responsibility for identifying the reading within the sources specified; when used on the corr element they indicate confidence in and responsibility for the use of the reading to correct the base text. If no other source is indicated (either by the source attribute, or by the wit attribute of a parent rdg), the reading supplied within a corr has been provided by the person indicated by the resp attribute.

If it is desired to express aspects of certainty and responsibility for some other aspect of the use of these elements, then the mechanisms discussed in chapter may be found useful. See also for further discussion of the issues of certainty and responsibility in the context of transcription.

Additions and Deletions

Additions and deletions observed in a source text may be described using the following elements: Of these, add and del are included in the core module, while addSpan and delSpan are available only when using the module defined in this chapter. These particular elements are members of the att.spanning class, from which they inherit the following attribute:

Further characteristics of each addition and deletion, such as the hand used, its effect (complete or incomplete, for example), or its position in a sequence of such operations may conveniently be recorded as attributes of these elements, all of which are members of the att.transcriptional class:

As described in section , the add element is used to record any manuscript addition observed in the text, whether it is considered to be authorial or scribal. In the autograph manuscript of Max Beerbohm's The Golden Drugget, the author's addition of do ever may be recorded as follows, with the hand attribute indicating that the addition was Beerbohm's by referencing a handNote element defined elsewhere in the document (see further ): Some things are best at first sight. Others — and here is one of them — do ever improve by recognition .... Max Beerbohm holograph

Similarly, when the del element is used to record manuscript deletions. In the autograph manuscript of D. H. Lawrence's Eloi, Eloi, lama sabachthani the author's deletion of my may be recorded as follows. In this case, the hand attribute indicating that the deletion was Lawrence's is complemented by a rend attribute indicating that the deletion was by strike-through: For I hate this my body, which is so dear to me ... D H Lawrence holograph

If deletions are classified systematically, the type attribute may be useful to indicate the classification; when they are classified by the manner in which they were effected, or by their appearance, however, this will lead to a certain arbitrariness in deciding whether to use the type or the rend attribute to hold the information. In general, it is recommended that the rend attribute be used for description of the appearance or method of deletion, and that the type attribute be reserved for higher level or more abstract classifications.

The place attribute is also available to indicate the location of an addition. For example, consider the following passage from a draft letter by Robert Graves:

At the end of this extract, the writer inserts the word cant, above the line, with a stroke to indicate insertion. Assuming that we have previously defined the identifier RG somewhere: , this extract might now be encoded as follows: The O.E.D. is not a dictionary so much as a corpus of precedents ~~in the~~: current, obsolete, cant, cataphretic and nonce-words are all included. A little earlier in the same extract, Graves writes for an abridgement above the line, and then deletes it. This may be encoded similarly: As for 'significant artist.' You quote the O.E.D ~~for an abridgement~~in explanation... Similarly, in the margin, the word Norton has been added and then deleted: You quote the ~~Norton~~ O.E.D... The word O.E.D. in this first sentence has also clearly been the result of some redrafting: it may be that Graves started to write Oxford, and then changed it; it may be that he inserted other punctuation marks between the letters before replacing them with the centre dots used elsewhere to represent this acronym. We do not deal with these possibilities here, and mention them only to indicate that any encoding of manuscript material of this complexity will need to make decisions about what is and is not worth mentioning.

An encoder may also wish to indicate that an addition replaces a specific deletion, that is to encode a substitution as a single intervention in the text. This may be achieved by grouping the addition and deletion together within a subst element. At the end of the passage illustrated above, Graves first writes It is the expressed..., then deletes It is, and substitutes an uppercase T at the start of the. ... are all included. ~~It is~~ Tthe expressed The use of this element and of the seq attribute to indicate the order in which interventions such as deletions are believed to have occurred are further discussed in section below.

The add and del elements defined in the core module suffice only for the description of additions and deletions which fit within the structure of the text being transcribed, that is, which each deletion or addition is completely contained by the structural element (paragraph, line, division) within which it occurs. Where this is not the case, for example because an individual addition or deletion involves several distinct structural subdivisions, such as poems or prose items, or otherwise crosses a structural boundary in the text being encoded, special treatment is needed. The addSpan and delSpan elements are provided by this module for that purpose. (For a general discussion of the issue see further ).

In this example of the use of addSpan, the insertion by Helgi Ólafsson of a gathering containing four neo-Eddic poems into Lbs 1562 4to is recorded as follows. A handNote element is first declared, within the header of the document, to associate the identifier heol with Helgi. Each of the added poems is encoded as a distinct div element. In the body of the text, an addSpan element is placed to mark the beginning of the span of added text, and an anchor is used to mark its end. The hand attribute on the addSpan element ascribes responsibility for the addition to the manuscript to Helgi, and the spanTo attribute points to the end of the added text:

The delSpan element is used in the same way. An authorial manuscript will often contain several occasions where sequences of whole lines are marked for deletion, either by boxes or by being struck out. If the encoder is marking up individual verse lines with the l element, such deletions are problematic: deletion of two consecutive lines should be regarded as a single deletion, but the del element must be properly nested within a single l element. The delSpan element solves this problem: Flowed up the hill and down King William Street, To where Saint Mary Woolnoth kept the time, With a dead sound on the final stroke of nine. There I saw one I knew, and stopped him, crying "Stetson!...

It is also often the case that deletions and additions may themselves contain other deletions and additions. For example, in Thomas Moore's autograph of the second version of Lalla Rookh two lines are marked for omission by vertical strike-through. Within the first of the two lines, the word upon has also been struck out, and the word over has been added: Tis moonlight ~~upon~~ over Oman's sky Her isles of pearl look lovelily In this case the anchor and delSpan have been placed within the structural elements (the ls) rather than between, as in the previous example. This is to indicate that placement of these empty elements is arbitrary.

The text deleted must be at least partially legible, in order for the encoder to be able to transcribe it. If all of part of it is not legible, the gap element should be used to indicate where text has not not transcribed, because it could not be. The unclear element described in section may be used to indicate areas of text which cannot be read with confidence. See further section and section .

Substitutions

Substitution of one word or phrase for another is perhaps the most common of all phenomena requiring special treatment in transcription of primary textual sources. It may be simply one word overwriting another, or deletion of one word and its replacement by another written above it by the same hand at the one time; the deletion and replacement may be done by different hands at different times; there may be a long chain of substitutions on the one stretch of text, with uncertainty as to the order of substitution and as to which of many possible readings should be preferred.

As we have shown, the simplest method of recording a substitution is simply to record both the addition and the deletion. However, when the module defined by this chapter is in use, an additional element is available to indicate that the encoder believes the addition and the deletion to be part of the same intervention: a substitution. Using this element, the example at the end of the last section might be encoded as follows: Tis moonlight ~~upon~~over Oman's sky Her isles of pearl look lovelily Since the purpose of this element is solely to group its child elements together, the order in which they are presented is not significant. By convention, however, deletion precedes addition. This may be overridden by means of the seq attribute, which is of particular usefulness when a sequence of deletions and additions occurs.

For example, returning to the example from William James, in a passage first written out by James as One must have lived longer with this system, to appreciate its advantages, the word this is first replaced by such a and this is then replaced by a.The manuscript contains several other substitutions, ignored here for the sake of clarity. This may be encoded as follows, representing the two changes as a sequence of additions and deletions: One must have lived longer with ~~this~~ ~~such a~~ a system, to appreciate its advantages. Note the nesting of an add element within a del to record text first added, then deleted in the source. The numbers assigned by the seq attribute may be used to identify the order in which the various additions and deletions are believed by the encoder to have been carried out, and thus provide a simple method of supporting the kind of genetic textual criticism typified by (for example) Hans Walter Gabler's work on the reconstruction of the overlay levels implicit in the manuscripts of James Joyce's Ulysses.

As a more complex example, consider the following passage in one of the manuscripts of Wilfred Owen's Dulce et decorum est:

This passage might be encoded as follows: And towards our distant rest began to trudge, ~~Helping the worst amongst us~~Dragging the worst amongt us, who'd no boots But limped on, blood-shod. All went lame; ~~half-~~all blind; Drunk with fatigue ; deaf even to the hoots Of tired, outstripped ~~fif~~ five-nines that dropped behind. In this representation, the false start fif in the last line is simply marked as a deletion; the other two authorial corrections are marked as substitutions, each combining a deletion and an addition. the authorial slip (amongt for amongst) is retained without comment.

The app element presented in chapter provides similar facilities, by treating each state of the text as a distinct reading. The rdg element has a varSeq attribute which may be used in the same way as the seq attribute to indicate the preferred sequence. The James example above might thus be represented as follows: One must have lived longer with ~~this~~ ~~such a~~ a system, to appreciate its advantages.

Cancellation of Deletions and Other Markings

An author or scribe may mark a word or phrase in some way, and then on reflection decide to cancel the marking. For example, text may be marked for deletion and the deletion then cancelled, thus restoring the deleted text. Such cancellation may be indicated by the restore element:

This element bears the same attributes as the other transcriptional elements. These may be used to supply further information such as the hand in which the restoration is carried out, the type of restoration, and the person responsible for identifying the restoration as such, in the same way as elsewhere.

Presume that Lawrence decided to restore my to the phrase of Eloi, Eloi, lama sabachthani first written For I hate this my body, with the my first deleted then restored by writing stet in the margin. This may be encoded: For I hate this my body

Another feature commonly encountered in manuscripts is the use of circles, lines, or arrows to indicate transposition of material from one point in the text to another. No specific markup for this phenomenon is proposed at this time. Such cases are most simply encoded as additions at the point of insertion and deletions at the point of encirclement or other marking.

Text Omitted from or Supplied in the Transcription

Where text is not transcribed, whether because of damage to the original, or because it is illegible, or for some other reason such as editorial policy, the gap core element should be used to register the omission; where text not present in the source is supplied (whether conjecturally or from other witnesses) to fill an apparent gap in the text, it should be marked using the supplied element provided by the module defined in this chapter.

By its nature, the gap element has no content. It marks a point in the text where nothing at all can be read, whether because of authorial or scribal erasure, physical damage, or any other form of illegibility. Its attributes allow the encoder to specify the amount of text which is illegible in this way at this point, using any convenient units, where this can be determined. For example, in the Beerbohm manuscript of The Golden Drugget cited above, the author has erased a passage amounting about 10 cm in length by inking over it completely: Others —and here is one of them...

In an autograph letter of Sydney Smith now in the Pierpont Morgan library three words in the signature are quite illegible: I am dr Sr yr Sydney Smith The degree of precision attempted when measuring the size of a gap will vary with the purpose of the encoding and the nature of the material: no particular recommendation is made here.

As noted above, the gap element should only be used where text has not been transcribed; if partially legible text has been transcribed, one of the elements damage and unclear should be used instead. These elements are described in section .

If the source text is completely illegible or missing, an encoder may sometimes wish to supply new (conjectural) material to replace it. This conjectural reading is analogous to a correction in that it contains text provided by the encoder and not attested in the source. This is not however a correction, since no error is necessarily present in the original; for that reason a different element supplied should be used. If another (imaginary) copy of the letter above preserved the signature as reading I am dear Sir your very humble Servt Sydney Smith, the text illegible in the autograph might be supplied in the transcription: I am dr Sr yr very humble Servt Sydney Smith Here the source and resp attributes are used, as elsewhere, to indicate respectively the sigil of a manuscript from which the supplied reading has been taken, and the identifier of the person responsible for deciding to supply the text. If the source attribute is not supplied, the implication is that the encoder (or whoever is indicated by the value of the resp attribute) has supplied the missing reading. Both gap and supplied may be used in combination with unclear, damage, and other elements; for discussion, see section .

Hands and Responsibility

This section discusses in more detail the representation of aspects of responsibility perceived or to be recorded for the writing of a primary source. These include points at which one scribe takes over from another, or at which ink, pen, or other characteristics of the writing change. A discussion of the usage of the hand, resp, and cert attributes is also included.

Document Hands

For many text-critical purposes it is important to signal the person responsible (the hand) for the writing of a whole document, a stretch of text within a document, or a particular feature within the document. A hand, as the name suggests, need not necessarily be identified with a particular known (or unknown) scribe or author; it may simply indicate a particular combination of writing features recognized within one or more documents. The examples given above of the use of the hand attribute with coding of additions and deletions illustrate this.

The handNote element is used to provide information about each hand distinguished within the encoded document.

A handNote element, with an identifier given by its xml:id attribute, may appear in either of two places in the TEI Header, depending on which modules are included in a schema. When the transcr module defined by the present chapter is used, the element handNotes is available, within the profileDesc element of the Header, to hold one or more handNote elements. When the msdescription module defined in chapter is included, the handDesc element described in also becomes available as part of a structured manuscript description. The encoder may choose to place handNote elements identifying individual hands in either location without affecting their accessibility since the element is always addressed by means of its xml:id attribute. The handDesc element may be more appropriate when a full cataloguing of each manuscript is required; the handNotes element if only a brief characterization of each hand is needed. It is also possible to use the two elements together if, for example, the handDesc element contains a single summary describing all the hands discursively, while the handNotes element gives specific details of each. The choice will depend on individual encoders' priorities.

As shown above, the hand attribute is available on several elements to indicate the hand in which the content of the element (usually a deletion or addition) is carried out. The handShift element may also be used within the body of a transcription to indicate where a change of hand is detected for whatever reason.

Both handShift and handNote are members of the att.handFeatures class, and thus share the following attributes:

A single hand may employ different writing styles and inks within a document, or may change character. For example, the writing style might shift from anglicana to secretary, or the ink from blue to brown, or the character of the hand may change. Simple changes of this kind may be indicated by assigning a new value to the appropriate attribute within the handShift element. It is for the encoder to decide whether a change in these properties of the writing style is so marked as to require treatment as a distinct hand.

Where such a change is to be identified, the new attribute is used to indicate the hand applicable to the material following the handShift. This will ordinarily, but not necessarily, be the order in which the material was originally written.

As might be expected, one hand may employ different renditions within the one writing style, for example medieval scribes often indicate a structural division by emboldening all the words within a line. These should be indicated by use of the rend attribute on an element, in the same manner as underlining, emboldening, font shifts, etc. are represented in transcription of a printed text, rather than by introducing a new handShift element.

In the following example there is a change of ink within the one hand. This is simply indicated by a new value for the medium attribute on the handShift element: When wolde the cat dwelle in his ynne And if the cattes skynne be slyk and gaye

In the following example, the encoder has identified two distinct hands within the document and given them identifiers h1 and h2, by means of the following declarations included in the document's TEI Header: Carefully written with regular descenders Unschooled scrawl

Then the change of hand is indicated in the text: ... and that good Order Decency and regular worship may be once more introduced and Established in this Parish according to the Rules and Ceremonies of the Church of England and as under a good Consciencious and sober Curate there would and ought to be and for that purpose the parishioners pray

Hand, Responsibility, and Certainty Attributes

The hand and resp attributes have similar, but not identical, meanings. Observe their distinctive uses in the following encoding of the William James passage mentioned above in section . In this example, the But inserted by James is tagged as an add, and the consequent editorial correction of One to one treated separately: But Oneone must have lived ... editorial changes Fredson Bowers authorial changes William James As in this example, hand should be reserved for indicating the hand of any form of marking—here, addition but also deletion, correction, annotation, underlining, etc.—within the primary text being transcribed. The scribal or authorial responsibility for this marking may be inferred from the value of the hand attribute. The value of the hand attribute should be one of the hand identifiers declared in the document header (see section ).

The resp attribute, by contrast, indicate the person responsible for deciding to apply the element carrying it to this part of the text, and hence has a slightly different interpretation. In the case of the add element, for example, the resp attribute will indicate the responsibility for identifying that the addition is indeed an addition, and also (if the hand attribute is supplied) to which hand it should be attributed. In this case, Bowers is credited with identifying the hand as that of William James. In the case of the corr element, the resp attribute indicates who is responsible for supplying the intellectual content of the correction reported in the transcription: here, Bowers' correction of One to one. In the case of a deletion, the resp attribute will similarly indicate who bears responsibility for identifying or categorising the deletion itself, while other attributes (hand most obviously) attribute responsibility for the deletion itself.

As these examples show, the field of application of the resp attributes varies from element to element. In some cases, it applies to the content of the element (corr, ex, and supplied); in others it applies to the value of a particular attribute (sic, abbr, del, etc.). In all cases where both the resp and cert attributes are defined for a particular element, the two attributes refer to the same aspect of the markup. The one indicates who is intellectually responsible for some item of information, the other indicates the degree of confidence in the information. Thus, for a correction, the resp attribute signifies the person responsible for supplying the correction, while the cert attribute signifies the degree of editorial confidence felt in that correction. For the expansion of an abbreviation, the resp attribute signifies the person responsible for supplying the expansion and the cert attribute signifies the degree of editorial confidence felt in the expansion.

This close definition of the use of the resp and cert attributes with each element is intended to provide for the most frequent circumstances in which encoders might wish to make unambiguous statements regarding the responsibility for and certainty of aspects of their encoding. The resp and cert attributes, as so defined, give a convenient mechanism for this. However, there will be cases where it is desired to state responsibility for and certainty concerning other aspects of the encoding. For example, one may wish in the case of an apparent addition to state the responsibility for the use of the add element, rather than the responsibility for identifying the hand of the addition. It may also be that one editor may make an electronic transcription of another editor's printed transcription of a manuscript text — here, one will wish to assign layers of responsibility, so as to allow the reader to determine exactly what in the final transcription was the responsibility of each editor. In these complex cases of divided editorial responsibility for and certainty concerning the content, attributes, and application of a particular element, the more general mechanisms for representing certainty and responsibility described in chapter should be used.

It should be noted that the certainty and responsibility mechanisms described in chapter replicate all the functions of the resp and cert attributes on particular elements. For example, the encoding of Donaldson's conjectured emendation of wight to wright in line 117 of Chaucer's Wife of Bath's Prologue (see ) may be encoded as follows using the resp and cert attributes on the corr element: wightwright Exactly the same information could be conveyed using the certainty and responsibility mechanisms, as follows: wrightwight The choice of which mechanism to use is left to the encoder. In transcriptions where only such statements of responsibility and certainty are made as can be accommodated within the resp and cert attributes of particular elements, it will be economical to use the resp and cert attributes of those elements. Where many statements of responsibility and certainty are made which cannot be so accommodated, it may be economical to use the respons and certainty elements throughout.

The above discussion supposes that in each case an encoder is able to specify exactly what it is that one wishes to state responsibility for and certainty about. Situations may arise when an encoder wishes to make a statement concerning certainty or responsibility but is unable or unwilling to specify so precisely the domain of the certainty or responsibility. In these cases, the note element may be used with the type attribute set to cert or resp and the content of the note giving a prose description of the state of affairs.

Damage and Conjecture

The carrier medium of a primary source may often sustain physical damage which makes parts of it hard or impossible to read. In this section we discuss elements which may be used to represent such situations and give recommendations about how these should be used in conjunction with the other related elements introduced previously in this chapter.

Damage, Illegibility, and Supplied Text

The gap and supplied elements described above (section ) should be used with appropriate attributes where the degree of damage or illegibility in a text is such that nothing can be read and the text must be either omitted or supplied conjecturally or from one or more other sources. In many cases, however, despite damage or illegibility, the text may yet be read with reasonable confidence. In these cases, the following elements should be used: As members of the class att.damaged, these elements bear the following attributes The class att.damaged is a subclass the class att.dimensions, from which these elements also therefore inherit the following attributes: As a member of the att.spanning class, damageSpan inherits the following additional attribute:

The following examples all refer to the recto of folio 5 of the unique manuscript of the Elder Edda. Here, the manuscript of Vóluspá has been damaged through irregular rubbing so that letters in various places are obscured and in some cases cannot be read at all.

In the first line of this leaf, the transcriber may believe that the last three letters of daga can be read clearly despite the damage: um aldr daga yndisniota

If, as is often the case, the damage crosses structural divisions, so that the damage element cannot be nested properly within the containing div elements, the damageSpan element may be used, in the same way as the delSpan and addSpan elements discussed in section .

....

Note that in this example the spanTo element points to the next pb element rather than to an inserted anchor element, since the whole of the leaf (the text between the two pb elements has sustained damage. For other techniques of handling non-nesting information, see chapter .

If, as is also likely, the damage affects several disjoint parts of the text, each such part must be marked with a separate damage or damageSpan element. To indicate that each of these is to be regarded as forming part of the same damaged area, the group attribute may be used as in the following example. In this (imaginary) text of Fitzgerald's translation from Omar Khayam, water damage has affected an area covering parts of several lines The Moving Finger wries; and having writ, Moves on: nor all your Piety nor Wit Shall lure it back to cancel half a Line, Nor all your Tears wash out a Word of it

A more general solution to this problem is provided by the join element discussed in which may be used to link together arbitrary elements of any kind in the transcription. Where, as here, several phenomena of illegibility and conjecture all result from the one cause, an area of damage to the text — rubbing at various points — which is not continuous in the text, affecting it at irregular points, the join element may be used to indicate which tagged features are part of the same physical phenomenon.

If the damage has been so severe as to render parts of the text only imperfectly legible, the unclear element should be used to mark the fact. Returning to the Eddic example above, an encoder less confident in the daga reading, may indicate this as follows: um aldr daga yndisniota

If it is desired to supply more information about the kind of damage, it is also possible to nest an unclear element within the damage element: um aldr daga yndisniota

Alternatively, the transcriber may not feel able to read the last three letters of daga but may wish to supply them by conjecture. Note the use of the resp attribute to assign the conjecture to Finnur Jónsson: um aldr daga yndisniota The supplied element may if desired be enclosed within a damage element: um aldr daga yndisniota

Contrast the use of gap in the next line, where the transcriber believes that four letters cannot be read at all because of the damage: þar komr inn dimmi dreki fliugandi naþr frann neþan As with supplied, this gap might be enclosed by a damage element.

Where elements are nested in this way, information about agency, etc. is by default inherited. In the following imaginary example, there is a smoke-damaged part within which two stretches can be read with some difficulty, and third stretch which cannot be read at all: and the proof of this is margin

The above examples record imperfect legibility due to damage. When imperfect legibility is due to some other reason (typically because the handwriting is ill-formed), the unclear element should be used without any enclosing damage element. In Robert Southey's autograph of The Life of Cowper the final six letters of attention are difficult to read because of the haste of the writing, though reasonably certain from the context. and from time to time invited in like manner his attention The cert attribute on the unclear element may be used to indicate the level of editorial confidence in the reading contained within it.

Use of the gap, del, damage, unclear, and supplied Elements in Combination

The gap, damage, unclear, supplied, and del elements may be closely allied in their use. For example, an area of damage in a primary source might be encoded with any one of the first four of these elements, depending on how far the damage has affected the readability of the text. Further, certain of the elements may nest within one another. The examples given in the last sections illustrate something of how these elements are to be distinguished in use. This may be formulated as follows: where the text has been rendered completely illegible by deletion or damage and no text is supplied by the editor in place of what is lost: place an empty gap element at the point of deletion or damage. Use the reason attribute to state the cause (damage, deletion, etc.) of the loss of text. where the text has been rendered completely illegible by deletion or damage and text is supplied by the editor in place of what is lost: surround the text supplied at the point of deletion or damage with the supplied element. Use the reason attribute to state the cause (damage, deletion, etc.) of the loss of text leading to the need to supply the text. where the text has been rendered partly illegible by deletion or damage so that the text can be read but without perfect confidence: transcribe the text and surround it with the unclear element. Use the reason attribute to state the cause (damage, deletion, etc.) of the uncertainty in transcription and the cert attribute to indicate the confidence in the transcription. where there is deletion or damage but at least some of the text can be read with perfect confidence: transcribe the text and surround it with the del element (for deletion) or the damage element (for damage). Use appropriate attribute values to indicate the cause and type of deletion or damage. Observe that the degree attribute on the damage element permits the encoding to show that a letter, word, or phrase is not perfectly preserved, though it may be read with confidence. where there is an area of deletion or damage and parts of the text within that area can be read with perfect confidence, other parts with less confidence, other parts not at all: in transcription, surround the whole area with the del element (for deletion; or the delSpan element where it crosses a structural boundary); or the damage element (for damage). Text within the damaged area which can be read with perfect confidence needs no further tagging. Text within the damaged area which cannot be read with perfect confidence may be surrounded with the unclear element. Places within the damaged area where the text has been rendered completely illegible and no text is supplied by the editor may be marked with the gap element. For each element, one may use appropriate attribute values to indicate the cause and type of deletion or damage and the certainty of the reading.

The rules for combinations of the add and del elements, and for the interpretation of such combinations, are similar: if one add element (with identifier ADD1) contains another (with identifier ADD2), then the addition ADD1 was first made to the text, and later a second addition (ADD2) was made within that added text: This is the text with some added (interlinear!) material as written. if one del element contains another, and the seq attribute does not indicate otherwise, it should be assumed that the inner deletion was made before the enclosing one. In the following example, the word redundant was deleted before a second second deletion removed the entire passage: ~~This sentence contains some ~~redundant~~ unnecessary verbiage.~~ if a del element contains an add element, the normal interpretation will be that an addition was made within a passage which was later deleted in its entirety: ~~This sentence was deleted originally from the text.~~ if an add element contains a del element, the normal interpretation will be that a deletion was made from a passage which had earlier been added: This sentence was added ~~eventually~~ to the text.

Aspects of Layout

Finally in this chapter we present elements which may be used to capture aspects of the layout of material on a page where this is considered important. Methods for recording page breaks, column breaks, and line breaks in the source are described in section .

Space

The author or scribe may have left space for a word, or for an initial capital, and for some reason the word or capital was never supplied and the space left empty. The presence of significant space in the text being transcribed may be indicated by the space element. Note that this element should not be used to mark normal inter-word space or the like.

In line 694 of Chaucer's Wife of Bath's Prologue in the Holkham manuscript the scribe has left a space for a word where other manuscripts read preestes: By god if wommen had writen storyes As han within her oratoryes The supplied element discussed in the previous section may be used to supply the text presumed missing: By god if wommen had writen storyes As preestes han within her oratoryes Here, the fact of the space within the manuscript is indicated by the value of the reason attribute. The source of the supplied text is shown by the value of the source attribute as the Hengwrt manuscript; the transcriber responsible for supplying the text is ES.

Lines

The most common form of marking of text in manuscripts is by lines written under, beside, or through the text. The lines themselves may be of various types: they may be solid, dashed or dotted, doubled or tripled, wavy or straight, or a combination of these and other renderings. The line may be used for emphasis, or to mark a foreign or technical term, or to signal a quotation or a title, etc.: the elements emph, foreign, term, mentioned, title may be used for these. Frequently, a scholar may judge that a line is used to delete text: the del element is available to indicate this. In all these cases, the rend attribute may be used on these or other elements to indicate that the text is marked by a line and the style of the line. Thus, Lawrence's deletion by strike-through of my in the autograph of Eloi, Eloi, lama sabachthani is noted: For I hate this my body, which is so dear to me

There will be instances, however, where a scholar wishes only to register the occurrence of lines in the text, without making any judgement as to what the lines signify. In these the hi element may be used, with the rend attribute to mark the style of line. In the manuscript of a letter by Robert Browning to George Moulton-Barrett the underlining of the phrase had obtained all the letters to Mr Boyd may be marked-up as follows: I have once — by declaring I would prosecute by law — hindered a man's proceedings who had obtained all the letters to Mr Boyd

The above examples presume the common case where a single word or phrase is marked by a line, with no doubt as to where the marking begins or ends and with no overlapping of the area of text with other marked areas of text. Where there is doubt, the certainty element may be used to record the doubt. In the Browning example cited above the underlining actually begins half-way under who, and this uncertainty could be remarked as follows: I have once — by declaring I would prosecute by law — hindered a man's proceedings who had obtained all the letters to Mr Boyd may begin with previous word

Where the area of text marked overlaps other areas of text, for example crossing a structural division, one of the spanning mechanisms mentioned above must be used; for example where the line is thought to mark a deletion, the delSpan element may be used. Where it is desired simply to record the marking of a span of text in circumstances where it is not possible to surround the text with a hi element, the span element may be used with the rend or type attribute indicating the style of line-marking.

More work needs to be done on clarifying the treatment of other textual features marked by lines which might so overlap or nest. For example, in many Middle English manuscripts (e.g. the Jesus and Digby verse collections), marginal sidebars may indicate metrical structure: couplets may be linked in pairs, with the pairs themselves linked into stanzas. Or, marginal sidebars may indicate emphasis, or may point out a region of text on which there is some annotation: in many manuscripts of Chaucer's Wife of Bath's Prologue lines 655–8 are marked with nesting parentheses against which the scribe has written nota.

At the lowest level, all such features could be captured by use of the note element, containing a prose description of the manuscript at this point, enhanced by a link to a visual representation (or facsimile) of the feature in question. It is not yet clear how best to mark up such phenomena so as to obtain more usefully structured encodings. For example, in the Chaucer example just cited, one may wish to record that the nota is written in the Hengwrt manuscript in the right margin against a single large left parenthesis bracketing the four lines, with two right parentheses in the right margin bracketing two overlapping pairs of lines: the first and third, the second and fourth. The note element allows us to record that the scribe wrote nota, but is not well-adapted to show that the nota points both at all four lines and at two pairs of lines within the four lines.

Headers, Footers, and Similar Matter

As a rule, matter associated with the page break (signature, catchword, page number) should be drawn into the pb element as attributes: see section . In text-critical situations where these elements need tagging in their own right (for instance, when the catch-word presents a variant reading, or spacing in the header or footer is significant for compositor identification), the element fw may be used: The name fw is short for forme work. It may be used to encode any of the unchanging portions of a page forme, such as: running heads (whether repeated or changing on every page, or alternating pages) running footers page numbers catch-words other material repeated from page to page, which falls outside the stream of the text It should not be used for marginal glosses, annotations, or textual variants, which should be tagged using gloss, note, or the text-critical tags described in chapter , respectively.

For example: Poëms. 29 E3 TEMPLE

Other Primary Source Features not Covered in these Guidelines

We repeat the advice given at the beginning of this chapter, that these recommendations are not intended to meet every transcriptional circumstance ever likely to be faced by any scholar. They are intended rather as a base to enable encoding of the most common phenomena found in the course of scholarly transcription of primary source materials. These guidelines particularly do not address the encoding of physical description of textual witnesses: the materials of the carrier, the medium of the inscribing implement, the organisation of the carrier materials themselves (as quiring, collation, etc.), authorial instructions or scribal markup, etc., except insofaras these are involved in the broader question of manuscript description, as addressed by the msdescription module described in chapter .

Module for Transcription of Primary Sources

The module described in this chapter makes available the following components: Transcription of Primary SourcesTranscription of primary sources 原文轉錄 Représentation de sources primaires Trascrizione di fonti primarieTranscrição de fontes primárias転記モジュール The selection and combination of modules to form a TEI schema is described in .

&addSpan; &damage; &damageSpan; &delSpan; &ex; &fw; &handNotes; &handShift; &am; &restore; &space; &subst; &supplied;