Don Spaeth Text Encoding Initiative -- History Working Party. Early Modern History -- DRAFT <date>March 91 <body> History is first and foremost a text-based discipline. The majority of sources are textual. Before the invention of the modern pro forma in the 18th century (the English invention, at least), all sources were textual. Although it is a fair comment that many pre-industrial textual documents were in fact lists, often heavily structured, this only confirms the judgment that so-called 'free-format' text is structured. Examples of such structured texts include Exchequer Pipe Rolls, Wardrobe Account Books, Quarter Sessions indictments, manorial surveys and episcopal inductions (i.e. clerical appointments). The Text Encoding Initiative is therefore just as relevant to history as to any other discipline in the humanities. Historians have themselves been slow to see the connection. Despite the textual nature of their discipline, they have generally despaired of gaining computer assistance in the processing of descriptive sources, despite the examples of literary and biblical scholars, some of whom have been using computers to study texts since the 1950's. They have preferred to reduce sources, where possible, to simple lists, classifying data where necessary. So, in the 1980's the database management system -- paired sometimes with the statistics package -- has been the application par excellence of the computer-based historical research. Yet the database has never been an entirely satisfactory solution. The basic problem is that the process of converting data from its original textual form into a database destroys the integrity of the source, so that vital information is inevitably lost. Furthermore, historical data has a habit of being more complex than the database model easily allows. In recent years, as part of the first stage of the rediscovery of the word, the increasing capacity of database and statistical packages to cope with alphanumeric data has enabled historians to include both classifications and original descriptions; the relational database has made it more possible to match the complexity of historical sources at the expense of simplicity of analysis. The shortcomings of the dbms have contributed to the debate about whether historians need specialised historical software, or else risk forcing history into the strait-jacket of commercial software. The second stage of the "rediscovery of the word" is to turn the problem on its head. Rather than converting textual data into lists, why not leave it as text and indicate the implicit structure (invisible to a computer) through tags and flags? Examples of this approach can already be found, for example, in the Hull Domesday Project. Unfortunately, there is at present no general purpose software available to analyse such data. The TEI offers two potential advantages. First, by 2 offering a standard, it holds out the promise of greater exchange of data, a matter of increasing interest to computer- aware historians and, of course, a central goal of the Initiative. Second, by offering a standard, it will encourage the development of software for analysis. Although this may have to be written by academics, it need not necessarily be written specifically for historians. There are, however, significant disadvantages to the text- based approach, which may make some sources inappropriate for such treatment. Both the nature of the sources and the varying interests of historical researchers may render the exact duplication of a source excessively time-consuming. Taking the latter point first, it is rare that an historian is interested in every word in a source, and in some cases only a small portion may be of interest. For example, ecclesiastical court records are rich sources offering much to several types of historians. The administrative historian may be interested in every cause listed in an Act Book, the names of presiding officials and proctors, and dates and places of court appearance. The social historian may, on the other hand, treat the source as a "snatch-and-run" exercise, pulling out only causes on one issue -- say, those relating to family affairs and sexual morality, or to religious observance -- skimming over not only other causes but also court appearances when no evidence or description of the principals is given. It would be unreasonable to expect such an historian to take the effort to record countless apparently irrelevant details, although there is nothing to stop him or her from recording selected causes in textual form. The other obstacle is the heavily formulaic nature of some sources. It has been argued above that all pre-modern sources were textual, but this conceals the fact that in some sources the verbal formulae are used so consistently as to give the words of the formula the effect of lines and boxes drawn on a form. For example, a standard Latin formulae is used for every indictment for theft, or for the appointment by a bishop of a cleric to a living. In these cases, there really is no point in duplicating the formula hundreds of times (although both the TEI and editor software provide ways of entering the formulae many times whilst entering and storing it only once). This note will now (1) discuss the applications for text-based study, and (2) describe some of the sources used by the pre- modern historian. The first topic raises the issue of what software one might hope would be developed in the future, while the second emphasizes problems of representation or markup. Applications As the argument so far might suggest, the primary mode of analysis of textual data will be information retrieval. Historians' need for software which combines the functions of a text retrieval package and a dbms, identified by Lou Burnard in 1986, is still unfulfilled. For example, a courtbook will list names of plaintiffs, defendants and offences. A 3 researcher might wish to do no more than locate all causes involving a particular person or offence, as done now in a database. The ability to select structured information, to add subject tags to lengthy descriptions, to link such descriptions with classifications, without damaging the integrity of the source, will all be useful. Text handling packages also add methods for normalising non-standard spellings and for matching synonyms. However, historians will also benefit from being able to study texts as texts, and of linking different portions of a single document. Several examples come to mind. Social networks: A source or set of sources may enable the analysis of social netwrk or community. For example, Thomas Smith, an 18c gentleman from the English county of Wiltshire, left a diary covering a period of roughly five years, which includes the names of dinner and other companions, colleagues with whom he worked in local government, clergymen he heard preach, etc., all of which enable one to establish his social circle. We might ask such questions as whose names he mentions most frequently, what people he brings together, what activities different people are associated with, and whether Smith acts as a social broker. Ideally, this source would be combined with others, including wills, bonds, vestry lists, etc., to establish a more complete picture of the social network. Language: Unlike many literary computing scholars, historians tend to be interested in content rather than style. Collocation analysis is therefore likely to be central. Naturally, intellectual, cultural, and ecclesiastical historians study many of the same sources as those in literature, philosophy, and theology. Historians may be interested in establishing links between different writers, through similarities in word use or phrasing, or in determining the meanings of words. There has been debate, for example over what English Levellers meant when they advocated "universal manhood suffrage"; did they include servants among men who were to have the right to vote? One might also study the social significance of language -- were gentlemen witnesses more likely to refer to days and months, while labourers thought in terms of saint's days? This is a case where "borrowings" may need to be studied; it may be that the calendrical terms derive from the language used by the clerk in drawing up the charge. Or one might wish to use word use and word pairing as indicaters of personal beliefs and opinions, ranging from religious belief to policy statements. For example, references to saints as the key to salvation in late 16c England may provide evidence for the persistence of Catholicism after the English Reformation. Some of the above examples are questions that I would like to answer as part of my own research. Others are issues addressed by historians based on traditional close-reading but without computer assistance. Of course, in some cases, 4 textual analysis may only confirm impressionistic conclusions or lead to distortions. It is not proposed that textual analysis will be a substitute for close-reading and extensive knowledge of the source. Description of Sources There follow descriptions of a number of early modern English sources, which I have used in my own research. Most are ecclesiastical court records, not because they are the only sources surviving but because they are particularly complicated. If a marking system can cope with these sources it would be able to cope with most. The purpose of the section is to describe the sources so as to show the physical and functional components which must be encoded, to identify research issues which may call for additional coding, and to identify particular problems which may require special treatment. Depositions (Ecclesiastical Court) Depositions are testimony given by witnesses in instance (analagous to "civil") causes, i.e., those between conflicting parties. They may be recorded in book form or on individual pages. Testimony is given in response to a succession of queries taken directly from the original libel (or charge), which therefore define their structure. There may also be responses to countercharges (interrogata) made by the defendant. The number of the charge is given, followed by a Latin statement that 'to the nth charge the witness testified...'. The testimony itself is given in English. The names of the parties are given at the beginning of each deposition and usually the source if the dispute. Then the witness is identified in a fairly formulaic fashion, including name, occupation, age, village of residence, number of years in that village. The testimony then follows. All testimony on a cause may appear together, in which case witnesses may be numbered, or responses to the libel and interrogata may be intermixed with depositions in other causes. The witness will sign at the end of the statement or give a mark. The date when testimony was recorded may be given. Modes of Study 1. Literacy, through witness' social details and signature/mark. 2. Geographical mobility -- the length of time in a village and whether lives in village of birth. 3. Direct study of cause, including comparison of testimony by different witnesses to same query. 4. Language -- (a) the depositions provide a rare opportunity for unlettered witnesses to record their thoughts in their own language. Or do they? Are there consistencies in style which suggest that it is the clerk's language? Does the wording derive from that in the original libel? (b) Are there differences in the language used by different classes of witness? Age? Change over time? 5 Depositions survive for the secular courts as well, including Quarter Sessions and Assizes. These records are much simpler, usually only a few lines without structuring into different queries. Problems of representation - the structural location of each item of testimony must be marked - the descriptive information -- name, occupation, etc. -- must be marked. Churchwardens' presentments (also hundred and jury presentments) These are stored together in bundles each for a region, each on a separate sheet. The presentment may be in the hand of one of the churchwardens, the minister, the visitation clerk, or some unidentified individual. It contains a header giving the names and parishes of presenters, when their presentment was heard and by whom. It consists of a series of answers to questions circulated in advance in book form. The question itself is never given although its number may be, or some statement such as 'As to dissenters...'. Some query books survive, enabling comparison between the wording of questions and presentments. The structure of this questionnaire must be preserved. Presentments may name individuals and their offence, and this information should be marked for easy retrieval. The presentment ends with the names and signatures/marks of those presenting. Act Books These are one of the most complicated and interesting of sources. The fully entered and encoded Act Book, particularly one linked to related Court papers such as sentences, excommunications and summons, would be an invaluable source for an understanding of court procedure. A book lists a series of court days, giving details of the court, presiding judge and date. Court days of different courts may be mixed in the same book. Each day consists of a list of causes. In each, the names of the parties are given, followed by the action taken on the previous appearance, followed by the action on that day, which may require a particular action by one or both parties. This may go on for weeks, months, or years, until the case is resolved or dropped. The Act Books are a frustrating source to use because causes may simply disappear, without any apparent resolution. The text is in Latin and usually heavily abbreviated. Abbreviations may be general or local to a source, and may take some time to learn. The student of procedure will almost certainly wish to supplement them with standard tags expressing the stage reached by the cause. Abbreviations (either original or used in a published edition) include: 6 ad a o = ad articulum objectum dns = dominus emt = emanavit oia = omnia fatr = confessed Other judicial documents have their own sets of abbreviations, e.g. Quarter Session. Petitions and Letters Relatively simple documents, which have the advantage of providing text most likely in the hand of one of the authors. A header and signatures will need to be indicated. It may also be desirable to add, where omitted and known, such information as the name of recipient, date, etc. Wills Wills consist of a brief description of the testator including the state of his health, a religious preamble, a list of bequests and recipients, and possibly a signature. The historian may be interested in the social relationships indicated by the bequests and choice of executor, by the information about the the property owned or by the religious sentiments expressed in the preamble. In the latter case, subtle ramifications in wording may indicate significant differences in belief. (Spufford, Contrasting Communities.) Parish Registers Lists, organised chronologically, giving details of births, deaths and marriages. The date, name, sometimes occupation, and name of husband/father may be given. There may also be additional information, such as the words spoken on a deathbed, or that the marriage was illegal; or completely extraneous material may be recorded. These registrations are well suited to database analysis. Parish Accounts Account books may include the following - summary accounts of payments and receipts, tendered each Easter -detailed accounts -lists of the rate assessed on each property holder -vestry resolutions, with the signatures of those assenting -appointments of new parish officers -names of poor relief recipients -names of communicants With the exception of vestry resolutions, this material is largely in the form of lists and there are few interconnections between items. They may therefore be best suited for a number of databases. Other documents to cover? Indictments, sermons, inductions, books, Parliamentary diaries. General problems with early modern sources 7 Handwriting Unless taken from modern editions, early modern sources are almost invariably in manuscript. Information about the style and clarity of the hand and when it changes must be preserved, and where possible the hand must be identified. One may also wish to record the neatness of a hand (a clue to education) and shakiness (an indication of age). Specific markers, such as an uncommon form of a particular letter, may be coded. It is inevitable that information will be lost, however. Markup of the form <hand ID=n Name=a> or <scribe ...> is recommended. Language The legal language of early modern England is Latin. However, English is used in some documents, usually in those produced by laymen or where the clerk's Latin gives out. This problem should be satisfactorily handled by the TEI's language tags. Abbreviations Throughout the medieval and early modern period, Latin and even English documents were substantially abbreviated. For example, a horizontal bar indicates that one letter (usually a doubled letter or m/n) has been omitted; a curl at the end indicates an omitted ending; a bar or loop through a p indicates that the word is "pro" or "per". In most cases, the abbreviation is unambiguous, but the problem is how to present it in electronic form. Should it be expanded, thus adding a layer of interpretation? If it is not expanded, codes must be defined to represent the missing letters, and it must be decided where to put the code, since abbreviations may go over several characters. A number of letters -- including m, n, u, i, and j -- are constructed from the same building block, the minim, and it may not always be easy to determine which letter(s) are intended. The solution is to define a minim character, with the advantage that a computer search of a dictionary may have more luck in resolving the ambiguity than human guesswork. Non-Textual Material These include drawings and symbols. For example, in one diary, each day of the week is identified by a zodiac-like symbol. It may be best to describe each symbol at the beginning of a text (or establish a pointer to a graphical image in paper or electronic format) and use textual abbreviations thereafter. Marks used by those unable to sign their own names present a similar problem. The character used may be a simple "X", an initial, a symbol representing the individual's occupation, or some unidentifiable shape. Paging Most documents work from beginning to end, filling recto then verso sides of each sheet, but there are exceptions. The book may work simultaneously from both ends. From either end, it will look like all the recto side is written normally, while 8 verso is upside down. In one parish account book, for example, churchwardens' accounts start from one end and overseers' accounts from the other. In another, depositions start at the beginning; when the end was reached the book was turned over and worked through in the opposite direction. In both cases, it seems to make most sense to follow the original usage rather than the precise ordering of folios. The latter is preserved by use of folio recto and verso pagination counted from one direction only. Interlineation, notes on verso Standard TEI techniques need to to be developed for dealing with these problems. Non-contemporary notes and markings To be marked as such. Subjects Most items within documents will need one or more subject codes to be added at the paragraph level to assist in retrieval.