Frost at Midnight

Verse

This module is intended for use when encoding texts which are entirely or predominantly in verse, and for which the elements for encoding verse structure already provided by the core module are inadequate.

The tags described in section include elements for the encoding of verse lines and line groups such as stanzas: these are available for any TEI document, irrespective of the module it uses. Like the modules for prose and for drama, the module for verse additionally makes use of the module defined in chapter to define the basic formal structure of a text, in terms of front, body and back elements and the text-division elements into which these may be subdivided.

The module for verse extends the facilities provided by these modules in the following ways: a special purpose caesura element is provided, to allow for segmentation of the verse line (see section ) a set of attributes is provided for the encoding of rhyme scheme and metrical information (see sections and ) a special purpose rhyme element is provided to support simple analysis of rhyming words (see section )

Structural Divisions of Verse Texts

Like other kinds of text, texts written in verse may be of widely differing lengths and structures. A complete poem, no matter how short, may be treated as a free-standing text, and encoded in the same way as a distinct prose text. A group of poems functioning as a single unit may be encoded either as a group or as a text, depending on the encoder's view of the text. For further discussion, including an example encoding for a verse anthology, see chapter .

Many poems consist only of ungrouped lines. This short poem by Emily Dickinson is a simple case: 1755 To make a prairie it takes a clover and one bee, One clover, and a bee, And revery. The revery alone will do, If bees are few.

Often, however, lines are grouped, formally or informally, into stanzas, verse paragraphs, etc. The lg element defined in the core tag set (in section ) may be used for all such groupings. It may thus serve for informal groupings of lines such as those of the following example from Allen Ginsberg: My Alba Now that I've wasted five years in Manhattan life decaying talent a blank talking disconnected patient and mental sliderule and number machine on a desk

It may also be used to mark the verse paragraphs into which longer poems are often divided, as in the following example from Samuel Taylor Coleridge's Frost at Midnight: The Frost performs its secret ministry, Unhelped by any wind. ... Whose puny flaps and freaks the idling Spirit By its own moods interprets, every where Echo or mirror seeking of itself, And makes a toy of Thought. But O! how oft, How oft, at school, with most believing mind Presageful, have I gazed upon the bars, To watch that fluttering stranger! ... Dear Babe, that sleepest cradled by my side, Note, in the above example, the use of the part attribute on the l element, where a verse line is broken between two line groups, as discussed in section .

Most typically, however, the lg element is used to mark the highly regular line groups which characterize stanzaic and similar verse forms, as in the following example from Chaucer: Sire Thopas was a doghty swayn; White was his face as payndemayn, His lippes rede as rose; His rode is lyk scarlet in grayn, And I yow telle in good certayn, He hadde a semely nose. His heer, his ber was lyk saffroun, That to his girdel raughte adoun;

Like other text-division elements, lg elements may be nested hierarchically. For example, one particularly common English stanzaic form consists of a quatrain or sestet followed by a couplet. The lg element may be used to encode both the stanza and its components, as in the following example from Byron: In the first year of Freedom's second dawn Died George the Third; although no tyrant, one Who shielded tyrants, till each sense withdrawn Left him nor mental nor external sun: A better farmer ne'er brushed dew from lawn, A worse king never left a realm undone! He died — but left his subjects still behind, One half as mad — and t'other no less blind.

Note the use of the type attribute to name the type of unit encoded by the lg element; this attribute is common to all members of the att.divLike class (see section ).For discussion of other attributes of this class, see . Sestet and couplet might conceivably also be used as the values of the rhyme attribute in an analysis of rhyme scheme, for which see below, section . The type attribute is intended solely for conventional names of different classes of text block; the met attribute is intended for systematic metrical analysis.

As a further example, consider the Shakespearean sonnet. This may be divided into two parts: a concluding couplet, and a body of twelve lines, itself subdivided into three quatrains: My Mistres eyes are nothing like the Sunne, Currall is farre more red, then her lips red If snow be white, why then her brests are dun: If haires be wiers, black wiers grown on her head: I have seene Roses damaskt, red and white, But no such Roses see I in her cheekes, And in some perfumes is there more delight, Then in the breath that from my Mistres reekes. I love to heare her speake, yet well I know, That Musicke hath a farre more pleasing sound: I graunt I never saw a goddesse goe, My Mistres when shee walkes treads on the ground. And yet by heaven I think my love as rare, As any she beli'd with false compare.

Particularly lengthy poetic texts are often subdivided into units larger than stanzas or paragraphs, which may themselves be subdivided. Spenser's Faery Queene, for example, consists of twelve books each of which contains a prologue followed by twelve cantos. Each prologue and each canto consists of nine-line stanzas, each of which follows the same regular pattern. Other examples in the same tradition are easy to find.

Large structures of this kind are most conveniently represented by div or div1 elements, as described in section . Thus the start of the Faerie Queene might be encoded as follows:

A Gentle Knight was pricking on the plain Y cladd in mightie armes and silver shielde,

The encoder must choose at which point in the hierarchy of structural units to introduce lg elements rather than a yet smaller div element: it would (for example) also be possible to encode the above example as follows:

A noble knight was pricking on the plain Ycladd in mightie armes and silver shielde,

One reason for using div rather than lg elements is that the former may contain non-metrical elements, such as epigraphs or dedications and other members of the model.divTop class, whereas lg elements may contain only headings or metrical lines.

Components of the Verse Line

It is often convenient for various kinds of analysis to encode subdivisions of verse lines. The general purpose seg element defined in the tag set for segmentation and alignment (section ) is provided for this purpose:

To use this element together with the module for verse, the module for segmentation and alignment must also be enabled as further described in section .

In Old and Middle English alliterative verse, individual verse lines are typically split into half lines. The seg element may be used to mark these explicitly, as in the following example from Langland's Piers Plowman: In a somer seson, whan softe was the sonne, I shoop me into shroudes as I a sheep were, In habite as an heremite unholy of werkes, Went wide in this world wondres to here.

The seg element can be nested hierarchically, in the same way as the lg element, down to whatever level of detailed structure is required. In the following example, the line has been divided into feet, each of which has been further subdivided into syllables.As elsewhere in these Guidelines, this example has been formatted for clarity of exposition rather than correct display. Note in particular that whether an XML processor retains whitespace within the seg element or not (this can be configured by means of the xml:space attribute) this example will still require additional processing, since white space should be retained for the lower level seg elements (those of type syll) but not for the higher level one (those of type foot). Arma vi rumque ca no Tro iae qui primus ab oris

The seg element may be used to identify any subcomponent of a line which has content; its type attribute may characterize such units in any way appropriate to the needs of the encoder. For the specific case of labeling each foot with its formal type (dactyl, spondee, etc.), and each syllable with its metrical or prosodic status (syllables bearing primary or secondary stress, long syllables, short syllables), however, the specialized attributes met and real are defined, which provide a more systematic framework than the type attribute; see section below.

In classical verse, a hexameter like that above may also be formally divided into two cola or hemistiches. This example provides a typical case, in that the boundary of the first colon falls in the middle of one of the feet (between the syllables no and Tro). If both kinds of segmentation are required, the part attribute might be used to mark the overlapping structure as follows. Ar ma vi rum que ca no Tro iae qui

Instead of using the part attribute on the seg element, it might be simpler just to mark the point at which the caesura occurs. An additional element is provided for analyses of this kind, in which what is to be marked are points between the words, which have some significance within a verse line: In classical prosody, the caesura, which occurs within a foot, is distinguished from a diaeresis, which occurs on a foot boundary (not to be confused with the division of a diphthong into two syllables, or the diacritic symbol used to indicate such division, each of which is also termed diaeresis). This distinction is rarely made nowadays, the term caesura being used for any division irrespective of foot boundaries. No special-purpose diaeresis element is therefore provided.

As an example of the caesura element, we refer again to the example from Langland. An encoder might choose simply to record the location of the caesura within each line, rather than encoding each half-line as a segment in its own right, as follows: In a somer seson, whan softe was the sonne, I shoop me into shroudes as I a sheep were, In habite as an heremite unholy of werkes, Went wide in this world wondres to here.

Logically, the opposite of caesura might be considered to be enjambement. When the verse module is included in a schema, an additional class called att.enjamb is defined as follows: The following lines demonstrate the use of the enjamb attribute to mark places where there is a discrepancy between the boundaries of the l elements and the syntactic structure of the verse (a discrepancy of some significance in some schools of verse): Un astrologue, un jour, se laissa choir Au fond d'un puits.

Rhyme and Metrical Analysis

When the module for verse is in use, the following additional attributes are available to record information about rhyme and metrical form:

These attributes may be attached to the lg element, or to the higher-level text-division elements div, div1, etc. In general, the attributes should be specified at the highest level possible; they may not however be specifiable at the highest level if some of the subdivisions of a text are in prose and others in verse. All these attributes may also be attached to the l and seg elements, but the default notation for the rhyme attribute has no defined meaning when specified on l or seg. The value for these attributes may take any form desired by the encoder, but the nature of the notation used will determine how well the attribute values can be processed by automatic means.

The primary function of the metrical attributes is to encode the conventional metrical or rhyming structure within which the poet is working, rather than the actual prosodic realization of each line; the latter can be recorded using the real attribute, as further discussed below. A simple mechanism is also provided for recording the actual realization of a rhyme pattern; see .

Sample Metrical Analyses

As a simple example of the use of these attributes, consider the following lines from Pope's Essay on Criticism:

'Tis hard to say, if greater Want of Skill Appear in Writing or in Judging ill; But, of the two, less dang'rous is th'Offence, To tire our Patience, than mis-lead our Sense:

This text is written entirely in heroic couplets; each line is an iambic pentameter (which, using a common notation, can be described with the formula -+|-+|-+|-+|-+/, each - denoting a metrically unstressed syllable, each + a metrically stressed one, each | a foot boundary, and the / a line-end), and the couplets rhyme (which can be represented with the conventional formula aa).

Because both rhyme pattern and metrical form are consistent throughout the poem, they may be conveniently specified on the div element; the values given for the attributes will be inherited by any metrical unit contained within the div elements of this poem, and must be interpreted in the appropriate way.

Since the notation used in the met, real, and rhyme attributes is user-defined, no binding description can be given of its details or of how its interpretation must proceed. (A default notation is provided for the rhyme attribute, which however the encoder can replace with another; see section .) It is expected, however, that software should be able to support these attributes in useful ways; the more intelligent the software is, and the more knowledge of metrics is built into it, the better it will be able to support these attributes. In the extract given above, for example, the met and rhyme attribute values specified on the div element are inherited directly by the lg elements nested within it. Since the met value specifies the metrical form of a single verse line, the structure of the lg as a whole is understood to involve as many repetitions of the pattern as there are lines in the verse paragraph. The same attribute value, when inherited in turn by the l element, must be understood not to repeat. With sufficiently sophisticated software, segments within the line might even be understood as inheriting precisely that portion of the formula which applies to the segment in question; this will, however, be easier to accomplish for some languages than for others.

The rhyme attribute in this example uses the default notation to specify a rhyme scheme applicable only to pairs of lines. As elsewhere, the default notation for the rhyme attribute has no meaning for metrical units at the line level or below. In verse forms where line-internal rhyme is structurally significant, e.g. in some skaldic poetry, the default notation is incapable of expressing the required information, since the rhyme pattern may need to be specified for units smaller than the line. In such cases, a user-specified rhyme notation must be substituted for the default notation, or else the rhyme pattern must be described using some alternative method (e.g. by using the link mechanism described below).

The precise semantics of the met attribute and the inferences which software is expected or able to draw from it, are implementation-dependent; so are the semantics and processing of the rhyme attribute, when user-specified notations are used.

A formal definition of the significance of each component of the pattern given as the value of the met attribute may be provided in the metDecl element within the encodingDesc element in the TEI header (see section ). The encoder is free to invent any notation appropriate to his or her analytic needs, provided that it is adequately documented in this element. The notation may define metrical components using invented or traditional names (such as iamb or hexameter) or in terms of basic units such as codes for stressed or unstressed syllables, or a combination of the two.

The real (for realization) attribute may optionally be specified to indicate any deviation from the pattern defined by the met attribute which the encoder wishes to record. By default, the real attribute has the same value as the met attribute on the same element; it is only necessary to provide an explicit value when the realization differs in some way from the abstract metrical pattern. The tension between conventional metrical pattern and its realization may thus be recorded explicitly. For example, many readers of the above passage would stress the word But at the beginning of the third line rather than the word of following it, as the metrical pattern would normally require. This variation might be encoded as follows: But, of the two, ...

Where the real attribute is used to over-ride the default or conventional metrical pattern, it applies only to the element on which it is specified. The default pattern for any subsequent lines is unaffected.

As it happens, this particular kind of variation is very common in the English iambic pentameter — it even has a name: trochaic substitution — an encoder might therefore choose to regard this not as an instance of a variant realization, but as an instance of a variant metrical form: But, of the two, ... Alternatively, a different metrical notation might be defined, in which this kind of variation was permitted throughout the text.

In choosing whether to over-ride a metrical specification in this way or by using the real attribute, the encoder is required to determine whether the change is a systematic or conventional one (as in this example) or an occasional variation, perhaps for local effect. In the following example, from Goethe's Auf dem See, the variation is a matter of local realization: Und frische Nahrung, neues Blut Saug' ich aus freier Welt; Wie ist Natur so hold und gut, Die mich am Busen hält! Die Welle wieget unsern Kahn Im Rudertakt hinauf, Und Berge, wolkig himmelan, Begegnen unserm Lauf. On the other hand, the famous inserted alexandrine in Pope's Essay on Criticism, might be encoded as follows: A needless alexandrine ends the song, That, like a wounded snake, drags its slow length along. Here the met attribute indicates that a different metrical convention (the alexandrine) is in force, while the real attribute indicates that there is a variation from that convention. As with many other aspects of metrical analysis, however, this is of necessity an entirely interpretive judgment.

Segment-Level versus Line-level Tagging

The examples given so far have encoded information about the realization of metrical conventions at the level of the whole verse-line. This has obvious advantages of simplicity, but the disadvantage that any deviation from metrical convention is not marked at its precise point of occurrence in the text. Greater precision may be achieved, but only at the cost of marking deviant metrical units explicitly. This may be done with the seg element, giving the variant realization as the value of the real attribute on that element. Using this method, the example given immediately above might be encoded as follows: A needless alexandrine ends the song, That, like a wounded snake, drags its slow length along. The marking of the foot boundaries with the symbol | in the met attribute value of the l element allows the human reader, or a sufficiently intelligent software program, to isolate the correct portion of that attribute value as the default value for the same attribute on the seg elements for feet, namely -+. It is of course up to the encoder to decide whether or not to include the n attribute of seg here, and whether or not also to tag the feet in the line in which there is no deviation from the metrical convention. The ability of software to infer which foot is being marked, if not all are tagged, will depend heavily on the language of the text and the knowledge of prosody built into the software; the fuller and more explicit the markup, the easier it will be for software to handle it. It may prove useful, however, to mark metrical deviations in the manner shown, even if the available software is not sufficiently intelligent to scan lines without aid from the markup. Human readers who are interested in prosody may well be able to exploit the markup in useful ways even with less sophisticated software.

There are circumstances where it may also be useful to use the met attribute of seg. If we wish to identify the exact location of the different types of foot in the first line of Virgil's Aeneid, the text could be encoded as follows (for simplicity's sake the caesura has been omitted): Arma vi rumque ca no Tro iae qui primus ab oris An appropriate value of the met attribute might also be supplied on the enclosing div element, to indicate that each foot may be made up of a dactyl or a spondee, so that the values given here for met at the level of the foot may be considered a series of local variations on this fundamental pattern; in cases like this, of course, the local variations may also be considered aspects of realization rather than of convention, in which case the real attribute may be used instead of met, if desired.

Metrical Analysis of Stanzaic Verse

The method described above may be used to encode quite complex verse forms, for instance various kinds of fixed-form stanzas. Let us take one of Dante's canzoni, in which each stanza except the last has the same combination of eleven-syllable and seven-syllable lines, and the same rhyme scheme:

Doglia mi reca nello core ardire

Here the met attribute specifies a metrical pattern for each of the twenty-one lines making up a stanza of the canzone. Each stanza inherits this definition from the parent div element. The rhyme attribute specifies a rhyme scheme for each stanza, in the same way.

In the metrical notation used here, the letter E represents a line containing nine syllables which may or may not be metrically prominent, a tenth which is prominent and an optional non-prominent eleventh syllable. The letter S is used to represent a line containing five syllables which may or may not be metrically prominent, a sixth which is prominent and an optional non-prominent seventh syllable. A suitable definition for this notation might be given by a metDecl element like the following: xxxxxxxxx+o xxxxx+o metrically prominent or non-prominent metrically prominent optional non prominent line division

As noted above, the metrical pattern specified on the div applies to each lg (stanza) element contained within the div. In fact however, after seven stanzas of this type, there is a final stanza, known as a commiato or envoi, which follows a different metrical and rhyming scheme. The solution to this problem is simply to specify a new met attribute on the eighth stanza itself, which will override the default value inherited from parent div, as follows:

... Canzone, presso di qui è une donna

Note that, in the same way as for the real attribute, over-riding of this kind does not affect subsequent elements at the same hierarchic level. Any lg element following the commiato above would be assumed to use the same metrical and rhyming scheme as the one preceding the commiato. Moreover, although it is quite regular (in the sense that the last stanza of each canzone is a commiato), the over-riding must be specified for each case.

Rhyme

The rhyme attribute is used to specify the rhyme pattern of a verse form. It should not be confused with the rhyme element, which is used to mark the actual rhyming word or words:

Like the met attribute, the rhyme attribute can be used with a user-specified notation documented by the metDecl element in the TEI header. Unlike met, however, the rhyme attribute has a default notation; if this default notation is used, no metDecl element need be given.

The default notation for rhyme offers the ability to record patterns of rhyming lines, using the traditional notation in which distinct letters stand for rhyming lines. For a work in rhyming couplets, like the Pope example above, the rhyme attribute simply specifies aa, indicating that pairs of adjacent lines rhyme with each other. For a slightly more complex scheme, applicable to groups of four lines, in which lines 1 and 3 rhyme, as do lines 2 and 4, this attribute would have the value abab. The traditional Spenserian stanza has the pattern ababbcbcc, indicating that within each nine line stanza, lines 1 and 3 rhyme with each other, as do lines 2, 4, 5 and 7, and lines 6, 8 and 9.

Non-rhyming lines within such a group may be represented using a hyphen or an x, as in the following example:

The rhyme element may be used to mark the words (or parts of words) which rhyme according to a predefined pattern: Outside in the distance a wildcat did growl Two riders were approaching and the wind began to howl

The label attribute is used to specify which parts of a rhyme scheme a given set of rhyming words represent: I wander thro' each charter'd street, Near where the charter'd Thames does flow, And mark in every face I meet Marks of weakness, marks of woe. In every cry of every Man In every Infant's cry of fear, In every voice, in every ban, The mind-forg'd manacles I hear.

Within a given scope, all rhyme elements with the same value for their label attribute are assumed to rhyme with each other: thus, in the above example, the two rhymes labelled a in the first stanza rhyme with each other, but not necessarily with those labelled a in the second stanza. The scope is defined by the nearest ancestor element for which the rhyme attribute has been supplied.

The rhyme element can appear anywhere within a verse line, and not necessarily around a single word. It can thus be used to mark quite complex internal rhyming schemes, as in the following example: The sunlight on the garden Hardens and grows cold, We cannot cage the minute Within its nets of gold When all is told We cannot beg for pardon.

This mechanism, although reasonably simple for simple cases, may not be appropriate for more complex applications. In general, rhyme may be considered as a special form of correspondence, and hence encoded using the mechanisms defined for that purpose in section . Similar considerations apply to other metrical features such as alliteration or assonance.

To use the correspondence mechanisms to represent the complex rhyming pattern of the above example, each rhyme element must be given a unique identifier, as follows: The sunlight on the garden Hardens and grows cold, We cannot cage the minute Within its nets of gold When all is told We cannot beg for pardon. Now that each rhyming word, or part-word, has been tagged and allocated an arbitrary identifier, the general purpose link element may be used to indicate which of the rhyme elements share the same rhyme, as follows:

For further discussion of the link and linkGrp element, see section .

The rhyme and caesura phrase level elements are made available by the model.lPart class when the module defined by this chapter is included in a schema.

Metrical Notation Declaration

When the module defined in this chapter is included in a schema, a specialised element is optionally available in the encodingDesc element of the TEI Header to document the metrical notation used in marking up a text.

As with other components of the header, metrical notation may be specified either formally or informally. In a formal specification, every symbol used in the metrical notation must be documented by a corresponding metSym element; in an informal one, only a brief prose description of the way in which the notation is used need be given. In either case, the optional pattern attribute may be used to supply a regular expression which a processor can use to validate expressions in the intended notation. The following constraints apply: if pattern is supplied, any notation used which does not conform to it should be regarded as invalid if any metSym is defined, then any notation using undefined symbols should be regarded as invalid if both pattern and symbol are defined, then every symbol appearing explicitly within pattern must be defined symbols which are not matched by pattern may be defined within a metDecl element

As a simple example, consider the case of the notation in which metrical prominence, metrical feet, and line boundaries are all to be encoded. Legal specifications in this notation may be written for any sequence of metrically prominent or non-prominent features, optionally separated by foot or metrical line boundaries at arbitrary points. Assuming that the symbol 1 is used for metrical prominence, 0 for non-prominence, | for foot boundary and / for line boundary, then the following declaration achieves this object: metrical prominence metrical non-prominence foot boundary metrical line boundary

The same notation might also be specified less formally, as follows:

Metrically prominent syllables are marked '1' and other syllables '0'. Foot divisions are marked by a vertical bar, and line divisions with a solidus.

This notation may be applied to any metrical unit, of any size (including, for example, individual feet as well as groups of lines).

Note that in this case, because the pattern attribute has not been supplied, no processor can validate met attribute values within the text which use this metrical notation.

For more complex cases, it will often be more convenient to define a notation incrementally. The terminal attribute should be used to indicate for a given symbol whether or not it may be re-defined in terms of other symbols used within the same notation. For example, here is a notation for encoding classical metres, in which symbols are provided for the most common types of foot. These symbols are themselves documented within the same notation, in terms of more primitive long and short syllables: -oo -o o- -- ooo oo- short syllable long syllable Note here the use of the global n attribute to supply an additional name for the symbols being documented.

&metDecl; &metSym;

Encoding Procedures for Other Verse Features

A number of procedures that may be of particular concern to encoders of verse texts are dealt with elsewhere in these guidelines. Some aspects of layout and physical appearance, especially important in the case of free verse, are dealt with in chapter . Some initial recommendations for the encoding of phonetic or prosodic transcripts, which may be helpful in the analysis of sound structures in poetry, are to be found in chapter ; it may also be found convenient to use standard entity names (those proposed for the International Phonetic Alphabet suggest themselves) to mark positions of suprasegmentals such as primary and secondary stress, or other aspects of accentual structure.

As already indicated, chapter contains much which will be found useful for the aligning of multiple levels of commentary and structure within verse analysis. Encoders of verse (as of other types of literary text) will frequently wish to attach identifying labels to portions of text that are not part of a system of hierarchical divisions, may overlap with one another, and/or may be discontinuous; for instance passages associated with particular characters, themes, images, allusions, topoi, styles, or modes of narration. Much of the computerized analysis of verse seems likely to require dividing texts up into blocks in this way. The span element discussed in provides the means for doing this. Finally, the procedures for the tagging of feature structures, described in chapter , provide a powerful means of encoding a wide variety of aspects of verse literature, including not only the metrical structures discussed above, but also such stylistic and rhetorical features as metaphor.

For other features it must for the time being be left to encoders to devise their own terminology. Elements such as metaphor tenor="..." vehicle="..." ... /metaphor might well suggest themselves; but given the problems of definition involved, and the great richness of modern metaphor theory, it is clear that any such format, if predefined by these Guidelines, would have seemed objectionable to some and excessively restrictive to many. Leaving the choice of tagging terminology to individual encoders carries with it one vital corollary, however: the encoder must be utterly explicit, in the TEI header, about the methods of tagging used and the criteria and definitions on which they rest. Where no formal elements are currently proposed, such information may readily be given as simple prose description within the encodingDesc element defined in section .

Module for Verse

The module described in this chapter makes available the following components: Verse Verse structures Poésie 韻文結構 Strutture poeticheEstrutura dos versos韻文モジュール The selection and combination of modules to form a TEI schema is described in .

&att.metrical; &att.enjamb; &caesura; &rhyme;