Physical Bibliography - Draft for P5


Contents

This module defines elements that can be used to encode the physical structure of books and manuscripts, either in order to provide a higher level of bibliographic detail or more structured encoding of bibliographic facts than allowed by the TEI Header or the Manuscript Description module, or in order to associate transcribed text or images of pages with an encoding of the physical structure of the book from which the transcription or images are taken. Two kinds of tags are provided to supplement the standard provisions of the <sourceDesc> section of the TEI header, those that allow encoding of bibliographic formulae, that is, standard or project-specific systems of representation or notation of the physical facts of books or manuscripts, such as the "collation formula" refined by Fredson Bowers, and those that permit direct encoding of the physical facts themselves. In addition, tags are provided to enable book structure to serve as the primary hierarchy governing the encoding of the text itself, and tags and a stand-off markup strategy are provided for users who must choose another kind of TEI hierarchy as their primary one in order to capture the textual features that are of interest to them, but who also wish to encode the physical structure of the source as an aspect of their text encoding.

The collation element

The <collation> element will appear within the <msDescription> or <bookDescription> elements in the <sourceDesc> section of the TEI header. It can contain a collation formula and the other elements that form a full bibliographic description following the Bowers notation (or some other standard or project-specific collation formula); or a paragraph-form description of the structure of a book or manuscript. It can also contain a full formal representation of the structure of the book itself using <codexStructure> and other tags defined below.

The collationFormula element

The <collationFormula> element is designed to be used to encode any of the standard kinds of collation formulae, such as the type of collation formula specified by Fredson Bowers in his influential book Principles of Bibliographical Description, the kinds used by manuscript cataloguers, and the kind employed in the Gesamtkatalog der Wiegendrucke, or to be adaptable to a project-specific style of collation. It contains the following elements, none of which is obligatory:
  • <gatherings> contains a listing of the gatherings that make up the book or manuscript, divided into instances of the element <gatheringRange> , plus (optionally) a record of the alphabet used to mark signatures (in <signatureAlphabet> ) and (also optionally) a record of the leaves on which signature marks appear (in <signatureLeaves> and <anomSignature> ).
  • <totalLeaves> : contains a number specifying the total number of leaves in the book that is being described in the formula.
  • <pagination> : for books with printed or hand-written page numbers, contains a list of the ranges of consecutive numbering, appearing as separate instances of <pageRange>
Sub-elements of <gatherings> :
  • <gatheringRange> : Allows the encoder to specify the start and end gatherings of a range of gatherings that have the same physical construction (using <start> and <end> to enclose the signature letter or the sequential number of the first and last gathering of the sequence, and <added> , <cancelled> , or <missing> to identify anomalies within the sequence).
  • <signatureAlphabet> : Used for specifying or describing a particular alphabet used for signatures, such as the 23-letter printers' alphabet used to sign British and some American letterpress-printed books.
  • <signatureLeaves> : Used to specify the first ( <start> ) and last ( <end> ) leaves normally signed in each gathering.
  • <anomSignature> : specifices a signature that is additional to the norm specified in <signatureLeaves> or that would be expected but is not present, using the attributes type="added"" and type="missing".
Sub-element of <pagination> :
  • <pageRange> : Allows the encoder to specify the start and end pages of a sequence of pages that are continuously paginated in regular and uninterrupted succession (using <start> and <end> to enclose the page numbers of the first and last pages of the sequence). In the case of interrupted or repeated sequences of page numbers, multiple instances of <pageRange> may be used to show all continuous sequences or stray individual page numbers.
Example (showing the encoding for the formula for an Aphra Behn book given on page 471 of Fredson Bowers's Principles of Bibliographical Description):
<collation> <format>quarto</format> <collationFormula> <gatherings> <signatureAlphabet>23 letter</signatureAlphabet> <gatheringRange signed="no"> <start>A</start> <end>A</end> <leaves>4</leaves> </gatheringRange> <gatheringRange signed="yes"> <start>B</start> <end>L</end> <leaves>4</leaves> </gatheringRange> <gatheringRange> <start>M</start> <end>M</end> <leaves>4</leaves> </gatheringRange> <gatheringRange signed="no"> <start>N</start> <end>N</end> <leaves>1</leaves> </gatheringRange> <signatureLeaves> <start>1</start> <end>2</end> </signatureLeaves> <anomSignature type="added"> <gathering>B</gathering> <leaf>3</leaf> </anomSignature> </gatherings> <totalLeaves>49</totalLeaves> <pagination> <pageRange type="front matter" numbered="no"> <start>1</start> <end>8</end> </pageRange> <pageRange numbered="yes"> <start>1</start> <end>33</end> </pageRange> <pageRange numbered="yes"> <start>26</start> <end>27</end> </pageRange> <pageRange numbered="yes"> <start>36</start> <end>37</end> </pageRange> <pageRange numbered="yes"> <start>30</start> <end>31</end> </pageRange> <pageRange numbered="yes"> <start>40</start> <end>89</end> </pageRange> <pageRange numbered="no"> <start>90</start> <end>90</end> </pageRange> <totalPages>90</totalPages> <paginationAppears>in parens centered in hdl.</paginationAppears> </pagination> </collationFormula> </collation>

The codexStructure element

The <codexStructure> element encloses a complex of elements that together describe the full physical form of a printed or handwritten book, such as <gathering> , <leaf> , and <page> . In the case of multi-volume works, <codexStructure> may be repeated for each volume.

Sub-elements of <codexStructure> :
  • <gathering> : represents a quire or gathering in the souce material, that is, a unit consisting of pages constructed by folding and attached (or formerly attached) to the spine of the book. Sub-elements and the relations between them formed by pointer-type attributes (and also by sequence in the XML file) may indicate the many physical and spatial relationships that exist between leaves and pages in the gathering and in the sheet(s) as printed before folding and assembly into the completed gathering.
Sub-elements of <gathering> include:
  • <leaf> : this element represents an individual physical leaf in the source material. The attribute conjunct may be used to refer to the xml:id of the <leaf> element representing the leaf that is conjoined to this leaf at the spine of the book. The attribute "signature" may be used to record the signature letter/number printed or written on this leaf. The attribute "sheet" may be used to assign the leaf to one of the sheets folded together, if more than one sheet is involved in the construction of a single gathering.
  • <page> : this element represents an individual physical page (one side of a leaf) in the source material. The attribute "no" may be used to record the page number printed or written on this leaf. The attribute "cutFromN" may be used to refer to the xml:id of the page that before folding into the gathering and cutting of leaves was attached to the top (North) end of the current page; similarly "cutFromS", "cutFromE", and "cutFromW". The attribute "W" may be used to refer to the xml:id of the page to which the current page is attached by its leftward edge; similarly "E", and for books with the spine at the top or bottom of the pages, "N" or "S". The attribute "sheetSide" may be used to assign the current page to one or other of the surfaces of the printed sheet before folding into a gathering.
Example (a representation of a gathering of common octavo, folded as in the illustration in Figure 50 of Gaskell's New Introduction to Bibliography, with all relationships explicitly represented in the encoding):
<gathering> <leaf xml:id="leaf1" conjunct="#leaf8"> <page xml:id="p1" SheetSide="1" cutFromN="#p8" W="#p16"/> <page xml:id="p2" SheetSide="2" cutFromN="#p7" E="#p15"/> </leaf> <leaf xml:id="leaf2" conjunct="#leaf7"> <page xml:id="p3" SheetSide="2" cutFromN="#p6" W="#p14"/> <page xml:id="p4" SheetSide="1" cutFromN="#p5" E="#p13"/> </leaf> <leaf xml:id="leaf3" conjunct="#leaf6"> <page xml:id="p5" SheetSide="1" cutFromN="#p4" W="#p12"/> <page xml:id="p6" SheetSide="2" cutFromN="#p3" E="#p11"/> </leaf> <leaf xml:id="leaf4" conjunct="#leaf5"> <page xml:id="p7" SheetSide="2" cutFromN="#p2" W="#p10"/> <page xml:id="p8" SheetSide="1" cutFromN="#p1" E="#p9"/> </leaf> <leaf xml:id="leaf5" conjunct="#leaf4"> <page xml:id="p9" SheetSide="1" cutFromN="#p16" cutFromE="#p12" W="#p8"/> <page xml:id="p10" SheetSide="2" cutFromN="#p15" cutFromW="#p11" E="#p7"/> </leaf> <leaf xml:id="leaf6" conjunct="#leaf3"> <page xml:id="p11" SheetSide="2" cutFromN="#p14" cutFromE="#p10" W="#p6"/> <page xml:id="p12" SheetSide="1" cutFromN="#p13" cutFromW="#p9" E="#p5"/> </leaf> <leaf xml:id="leaf7" conjunct="#leaf2"> <page xml:id="p13" SheetSide="1" cutFromN="#p12" W="#p4" cutFromE="#p16"/> <page xml:id="p14" SheetSide="2" cutFromN="#p11" cutFromW="#p15" E="#p3"/> </leaf> <leaf xml:id="leaf8" conjunct="#leaf1"> <page xml:id="p15" SheetSide="2" cutFromN="#p10" cuFromE="#p14" W="#p2"/> <page xml:id="p16" SheetSide="1" cutFromN="#p9" cutFromW="#p13" E="#p1"/> </leaf> </gathering>

"Milestone" tags for book-structure

Note: these tags replace <pb/> , <cb/> , and <lb/> tags included in previous editions of these Guidelines.

The following "milestone" tags may be used to indicate within a text the points at which the various articulations of the physical source occur:
  • <newGathering/> marks the place in the text where a gathering or quire begins in the source document.
  • <newLeaf/> marks the place in the text where a leaf begins in the source document.
  • <newPage/> marks the place in the text where a page (that is, one of the sides of a leaf) begins in the source document.
  • <newColumn/> marks the place in the text where a written or printed column begins in the source document. Depending on the arrangement of the source document and the wishes of the encoder, <newColumn/> may be used once to mark the beginning of the transcription of each column in the source document, or may be invoked multiple times in order to signal the changes of column within each line of a text arranged in tabular form.
  • <newLine/> marks the place in the text where a physical line of text (as distinct from a conceptual verse line, to be marked with <l> ) begins in the source document.

A stand-off markup strategy using milestone tags

The physical structure of a book can be conceptualized as a series of hierarchically-organized objects, such as gatherings which contain leaves, and pages which contain lines of text. For some encoders, especially those with strong bibliographic interest and those preparing electronic transcriptions of manuscript or print materials, the physical structure hierarchy will be the primary one, and tags are provided elsewhere in this chapter to facilitate such a choice of primary hierarchy. For many other encoders, the rich resources of these Guidelines for encoding conceptual textual hierarchies such as chapters, sections and paragraphs are important and a primary hierarchy other than physical book structure must be chosen. The situation arises so frequently that a researcher using another TEI hierarchy as her or his primary hierarchy also wishes to encode the book structure hierarchy in the same file that special provision is made here to facilitate this in addition to the resources offered in Chapter 31, Multiple Hierarchies.

The mechanism described here creates a kind of within-file "stand-off markup" in which information about the book structure hierarchy is kept separate from the encoded text but is linked to the book-structure milestone tags within the encoded text. Reference from the encoded text to the elaboration of book structure in the <codexStructure> section of sourceDesc is by means of pointer-like references to the xml:id attribute of instances of the <page> element in <sourceDesc> , references which occur within the <pageID> attribute of the empty milestone element <newPage> .

The following example shows the use of this strategy:
<teiHeader> . . . <sourceDesc> . . . <msDescription> <collation> <codexStructure> . . . <leaf xml:id="leaf4"> <page xml:id="p7"/> <page xml:id="p8"/> </leaf> <leaf xml:id="leaf5"> <page xml:id="p9"/> <page xml:id="p10"/> </leaf> . . . </codexStructure> </collation> </msDescription> </sourceDesc> . . . </teiHeader> <text> . . . <newPage pageID="#p7"/>Text from page seven with associated markup.<newPage pageID="#p8"/>Text from page eight. . . . </text>
In this example, <newPage/> tags within the text indicate the places where pages begin in the physical book. The <newPage/> tags are milestone tags that do not contain any text and do not participate in the document hierarchy, so elements that do, such as <div> , <p> , or <hi> can be used even if the marked sections of text cross page boundaries. However, the book structure hierarchy is specified in the <codexStructure> section of <sourceDesc> , and the <newPage/> tags within <text> are linked to that specification of the hierarchy by means of the pageID. In effect, the <newPage/> tags specify the points at which the book structure hierarchy specified in <codexStructure> intersects with the running text in which they are inserted. Note that only the <newPage/> tags need to be inserted into the transcription in this example, since leaves and gatherings, composed of pages, can be fully represented in <codexStructure> . In effect, the book structure hierarchy "stands off" from the encoded text, since it exists as a hierarchy only in <codexStructure> .

Physical structure as the primary hierarchy

Scholars creating book surrogates or electronic transcriptions, or those who have a strong interest in representing bibliographic structures, may wish to make book structure the primary organizing principle of their encoding of a text. The following tags are provided to permit such encoding. Users should note that in most instances the use of a book structure hierarchy will make it necessary to treat the addition of other forms of TEI markup carefully, either by avoiding the creation of a competing hierarchy or by employing one or more of the techniques outlined in Chapter 31, Multiple Hierarchies. To signal this need for caution, tags provided for recording the physical structure of the source document within the encoded text are provided with the preface "phys":
  • <physGathering> contains the text that occupies a physical gathering or quire in the source material.
  • <physLeaf> contains the text that occupies a physical leaf in the source material.
  • <physPage> contains the text that occupies a physical page (that is, one side of a leaf) in the source material.
  • <physColumn> contains the text that occupies a physical column in the source material.
  • <physLine> contains the text that occupies a physical line in the source material.

Last recorded change to this page: 2007-05-02  •  For corrections or updates, contact webmaster AT tei-c DOT org