Received: by UICVM (Mailer R2.03B) id 0240; Mon, 22 May 89 06:32:54 CDT Date: Mon, 22 May 89 11:07:15 DNT Reply-To: Text Encoding Initiative - Text Documentation Committee list , Hans Joergen Marker Sender: Text Encoding Initiative - Text Documentation Committee list From: Hans Joergen Marker Subject: Re: preliminary discussion X-To: Text Encoding Initiative - Text Documentation Committee list , "on GEC 4190 Rim-D at UCL Wujastyk" To: "C. M. Sperberg-McQueen" In-Reply-To: Message of Thu, 18 May 89 09:58 from On the Standard Study Description I shall of course be telling you more about it when we meet in Toronto and I am looking for relevant litterature. In the meantime: Yes it certainly does work it is in use at a number of European Data Archives and at the ICPSR in Michigan. The following is a draft for an article suggesting alterations to the SSD in order to accomodate historical datamaterials. The article is not going to be published in this form, but it gives a listing of items, that can give you an impression of what is in the SSD. Best wishes, Hans Joergen Marker Standard for description of historical data materials. The aim of this article is to show how the description standards proposed by the Trinity at the three consecutive international conferences in Gūttingen, Graz and Paris 1985, '86 and '87 fits into the already established framework of the International Standard Study 1 Description2 of the social sciences. Readers that are not familiar with the content of the "Trinity" proposal will be able to find detailed information on it in three articles 3. In short the "Trinity" is a working group that was formed spontaneously at the Gūttingen conference in 1985. The members are Herbert Reinke, Quantum, Kūln, Kevin Schurer, Cambridge Group for the History of Population and Social Change, Cambridge and Hans Jšrgen Marker, Danish Data Archives, Odense. The "Trinity" proposal was started from scratch rather than being based explicitly on the earlier works of Herbert Reinke 4. However his knowledge of the subject has, of 1 The concept of study as use here and in the rest of this article means a research project resulting in a data material, and the resulting data material. 2 The International Standard Study Description is a checklist of essential background information needed in order to evaluate a data material deposited at a data archive. The ISSD was developped in the the early 70'es an employed by the data archives of a number of countries. 3 Reinke, Schurer & Marker: Making sense out of historical documentation, in: Haussmann, Frederik; H€rtel, Reinhard; Kropac, Ingo H. and Becker, Peter: Standardization and Exchange of Machine Readable Data in the Historical Disciplines, Graz 1986. Marker: Towards a Study Description for Historical Data Materials. DDA-Nyt 39, Odense 1986 Reinke, Schurer & Marker: Information Requirements and Data Description in Historical Social Research. A Proposal, Historical Social Research 42/43, Kūln 1987 4 Reinke: Towards Standard for the Description of Machine-Readable Historical Data, Historical Social Research 18, 1981 Reinke: Datenbeschreibung und Datendokumentation in der historischen Sozialforschung, Problembeschreibung und Empfelungen f~r die Forschung, in: Manfred Thaller (ed.): Datenbanken als Werkzeuge.... Standard for description of historical data materials 1 course, subsequently been of great value to the group. Additionally the proposal was not explicitly aimed at quantitative historical data only, despite the fact that the "Trinity" members confess to being from a quantitative history background. In the earlier versions 5 the proposal was to some degree influenced by the terminology introduced at the Gūttingen conference by Lou Burnard from the Oxford Text Archive. The reason for this was that during the preliminary stages of preparing the proposal the "Trinity" members felt at the earlier stages that the historians would be less affronted by the concepts of 'Entities' and 'Attributes' than by the concepts of 'Observations' and 'Variables'. Yet as the proposal has gradually developed the working group has stuck less rigidly to these notions. Despite this important thing remains, that the terminology as such give no specific reference as to the kinds of software and methods of analysis to which the proposal is applicable. In practical applications of the proposal some items may therefore be irrelevant in which case they should of course be left unanswered. In contrast to the current proposal not all of the items of the Standard Study Description are relevant to particular studies. The study description scheme detailed below is intended to be generally applicable to the studies that are usually deposited at data archives. Indeed, one of the prime reasons it has become necessary to propose enhancements to the existing schema in order to accommodate historical data materials is because the majority of the data received by archives relates to the contemporary rather than to the historic world. In the following sections we give recommendations on how a data archive should store information on historical data 6. These recommendations are descibed in relative technical detail since they are in essense a proposal for alterations historischer Forschung, Historisch-Sozialwissenschaftliche Forschungen Bd. 20, St. Katharinen 1986 5 Up to and including the version given in the publication from the Graz conference. 6 Appendix A Standard for description of historical data materials 2 to the already existing Standard Study Description. 7 The researcher who happens to have no connection with the more technical side of data archiving may very well ignore this aspect of the proposal, if however, a researcher wishes to deposit data material at an archive, it would be a great help to the institution in question if appendix A were studied and as much as possible of the information detailed there were supplied. Equally, if a researcher is simply considering the exchange of data with another researcher without the help of the data archive, he could probably greatly facilitate do the secondary user's 8 task by providing with the data set as much as is possible of the information set out in Appendix A: The Standard Study Description. The communication of such information may in the long-term result in th saving of much time and effort not only for the secondary user but also for the primary investigator 9. The remaining section of the proposal is a questionnaire10 which is designed to be sent by the data archive to any historical researcher wishing to deposit data at the archive. It is our intention that all historians creating machine-readable historical material should fill in such a questionnaire. Even, if the researcher does not intend to deposit the machine-readable data with a data archive he should consider the possibility of such a research questionnaire being append to any research publication. Such an approach should be consided as being as elementary an obligation for the researcher as giving references to the 7 In the autumn of 1987 and the spring of 1988 we will take steps towards the actual recognition of this standard. It will be adopted immediately by the DDA. 8 Secondary user is a concept of data archiving, meaning a researcher carrying out research on a data set that was not created for the purpose of his present research. From the data archive point of view this usually means a data material that is already deposited in the data archive. 9 Primary investigator is a concept of data archiving, meaning a researcher who is responsible for the creation of a data material. He usually becomes of interest for the data archive at the time when he intends to deposit his data material there. 10 Appendix B Standard for description of historical data materials 3 sources and scientific works referenced within the publication. If a data archive should come across an historian wanting to deposit data at the data archive but unwilling to provide the information required in the questionnaire in Appendix B, it would probably be most sensible for that data archive, to reconsider whether the data offered by the researcher would actually be of a value to possible secondary users, that made it reasonable for the data archive to invest its resources in handling the data material. How to fill in the Questionnaire The questionnaire is divided in two major parts, A and B. Part A is for information concerning the source used, Part B is for information on the procedures used to construct a data material on the basis of the source. If a data material draws upon several sources, part A has to filled in for each source. One common research procedure is to create a number of intermediary data sets one from each source, and then link these data sets in order to get more detailed information on the research topic investigated. In this case parts A and B ought to be filled in for all the intermediate data sets, and these ought to be retained for the use in secondary research on other topics than the one covered in the original research project. If the files are going to be deposited at a data archive, the intermediate files as well as the file resulting from the linking of them ought to be deposited. For the final linked file, part A of the questionnaire is not to be filled in, but references to source descriptions concerned should be given, and part B should be filled in. Part A Part A of the questionnaire is divided into four parts, of which the first to describes the creation of the source, the third is describing what has happened to the source in the time between its creation and its adoption for the research project in question, and part four gives information on how the possibilities are for getting access to the source and to get information about it. Sections A.1 and A.2 distinguished between the normative administration of records and the actual administration of records. This distinction is crucial for the understanding of some (older?) sources and totally irrelevant regarding some other sources. The 'normative' section A.1 tends to describe the regu- lations governing the creation of the sources, while the 'ac- Standard for description of historical data materials 4 tual' section gives whatever differences that may have occurred between the regulations and actual practice. If no such differences are known to have occurred section A.1 gives the full picture and part A.2 can be answered very quickly: "A.2.1: Irrelevant. A.2.2: Irrelevant". In some cases the answers to section A.1 are research topics in themselves 11 and it may not ever be possible to answer them, as there are no sources giving information on these subjects, in these cases the researcher is urged to give his opinion on these questions as he, being the one who has actually worked with the source, must have a better founded opinion on what the source was intended for, than a possible secondary user. If it is the intention to deposit the data at a data archive, the people there could most certainly not be expected to engage in research projects to form opinions on the origins of certain historical sources. Section A.3 and A.4 we hope are quite obvious to any researcher. The archival history of a source, A.3, is a prerequisite for forming an opinion of the representativity of a source as regards its content. The researcher using a particular source will naturally have needed to go into detail with this question to ensure that the sources he was working with was actually what they seemed to be. The knowledge that he have acquired through this process, he should pass on to his colleagues in section A.3. In some cases we are lucky enough to have sources with a very peaceful archival history, that left them undisturbed until this time. In these cases A.3 can be answered very quickly. For other sources (E.g: Oral History?) the concept of archival history can be irrelevant and as such may be left unanswered. The information asked for in section A.4 is part of the traditional 'videnskabelige apparat' 12, and should naturally be given with no less detail, when a computer has been involved as a research instrument. 11 This was pointed out by Jan Oldervoll in the discussion at the Graz conference 1986. 12 Scientific apparatus: A concept of Danish and German Source Criticism, meaning the essential documentation of the material a particular researcher has used in order to form his opinion on a research question. Standard for description of historical data materials 5 Part B Part B is intended to give information of the procedures that have been applied to the data material, that was constructed on the basis of the source described in part A. B.1 is simply information on the time when the data material was created. This will be of great interest for the possible secondary user if progress in the field of research in question leads to changes in the understanding of fundamental terminology etc. involved. B.2: The software applied. In many cases interfaces exist that will easily transport a data material from one software package to another. In other case a data material may be accessible only with a particular package running only on some (exotic?) machine. B.3 The physical properties of a data material may be decisive for a possible secondary user as the often give an indication of the amount of data processing overhead involved in getting to the information in question. B.4 The logical properties of the Data material gives an indication of the actual amount of information to be found in the the data material. B.5: Methods applied, gives information on the actual procedures that have been involved in the creation of the data material. These may be of decisive importance for a possible secondary user as some methods may make a data material useless for some other research topics. E.g: Coding and standardization, at least when exercised in the data entry faze, makes a data material useless for the study of ancient terminology. Standard for description of historical data materials 6 Appendix A Alterations to the present Standard Study Description. All items presently in use remains substantially as they are. But it has to be understood that the items from 231 "Dates of the data collection" to 322 "Control operations carried out by the data archive" are concerned with the transformation of the source into machine readable form. In order to prevent confusion of terms, the word 'archive' has been changed to 'data archive' where it occurs in the original items. A number of new items are introduced in the range from 600 an onwards concerned with source description. These items are containing the information from the items of the historical source description. Indications of where to put the information from the questionnaire filled in by the researcher is given in braces after the relevant item. General information 001 Status of the study in the data archive 002 Classification of the study in cluster(s) For historical studies this item will be 03 history and demography 003 Relevant keywords for the study 004 Language employed in the present study description 005 Abstract of the study description Standard for description of historical data materials 7 Identifications and acknowledgements 101 Bibliographical reference 111 Local data archive where the study is stored 112 Data archive where the study was originally stored 121 Depositor (donor) 122 Data of deposit 131 Principal investigator (Research organization) 132 Data collector 141 Research initiator 142 Funding agency 199 Other identifications/Acknowledgements (Specify): 201 Research Topic (Abstract) 202 Kind of data 211 Units of observation 212 Number of units (Cases) (B.4.2) 213 Dimensions of the dataset 214 Completeness of the study stored 220 Time period covered (A.1.4) 221 Time dimensions (B.1) 222 Definition of total universe (Universe sampled) 223 Sampling procedures 225 Geographical area covered 231 Dates of data collection 232 Method of data collection (B.2) 233 Type of research instrument 234 Actions to minimize losses (Specify) Standard for description of historical data materials 8 235 Data gathering staff 236 Characteristics of data collection situation noted 241 Weighting 299 Other analysis conditions Reanalysis conditions 301 Present data representation (B.4.1) 302 Applicable analysis packages (B.2) 303 Applicable retrieval systems 304 Information stored in retrieval system 305 Classification of scheme applied 311 Language(s) of written material 321 Control operations performed by original investigator 322 Control operations performed by data archive 331 Accessibility 332 Access directing authority 399 Other reanalysis conditions References to relevant publications/results/studies 401 to 409 Publications/reports by the primary investigator 411 to 419 Other publications (Secondary analysis) 421 to 429 Unpublished papers/reports of interest 431 Results of analysis (Scales, indices etc.) 441 References to related studies 499 Other references (Specify) Background variables included 501 Basic characteristics 502 Place of birth Standard for description of historical data materials 9 503 Residence 504 Housing situation 511 Household characteristics 512 Characteristics of parental family/household 521 Place of work 522 Occupation 531 Income 541 Education 546 Social class 551 Politics 556 Religion 561 Capital assets 562 Consumption of durables 571 Readership, mass media and 'cultural' exposure 576 Organizational membership 599 Other background variables included (specify) Source Description Normative administration of records 601 Purpose of the source. (Item A.1.1) 602 Scope of the source. (A.1.2) 603 Content of the source. (A.1.3) Actual administration of records 611 Ways of recording. (A.2.1) 612 Ways of record keeping. (A.2.2) Standard for description of historical data materials 10 Archival Procedures/History 621 Archival rearrangements or transformations of the source. (A.3.1) 622 Intentional partial destruction of the source at the archive. (A.3.2) 623 Event resulting in unintentional partial destruction of the source (A.3.3.1) 624 Consequences of the event described in 623. (A.3.3.2) Accessibility of the sources 631 Archive where the original of the source is found. (A.4.1.1) 632 Archival reference of the source at that archive. (A.4.1.2) 633 Method of acquiring copies of the source. (A.4.1.3) 634 Bibliographical reference of the publication of source. (A.4.1.4) 641 Bibliographical reference of the major scientific works concerned with the source. (A.4.2) Standard for description of historical data materials 11 Appendix B Questionnaire to be filled in by the researcher: A. Description of the source material. (Please fill in one of this questionnaires for each of the sources involved) A.1 The normative administration of records. (Information on the regulation under which the source was supposed to have been produced) A.1.1 Purpose of the source: Why was the source established? A.1.2 Scope of the source: To whom was the source applicable? A.1.3 Content of the source: What was to be recorded? A.1.4 Time dimension: When was the information in the source to be recorded? A.2 The actual administration of records. (The source may not have produced in accordance with the regulation governing its creation. Any differences should be given here) A.2.1 Ways of recording: How did the person(s) producing the source actually do it? A.2.2 Ways of record keeping: How was the source material treated at the time when it was originally produced? Standard for description of historical data materials 12 A.3 Archival Procedures/History (Has anything happened to the source material in the time between its creation and its adaption for this study?) A.3.1 Rearrangements/transformations: If the source has been the subjected to rearrangements of the archive it was part of: How did it afflict the source? A.3.2 Intentional partial destruction: If parts of the source have been scrapped by the archival institution: How does the remaining parts relate to the original entity? A.3.3 Unintentional partial destruction: If parts of the source have been destroyed by accidents like for instance fire or flooding: A.3.3.1 Describe the accident. A.3.3.2 What were the results to the sources? A.4 Accessibility of the Sources. (Essential archival and publication reference) A.4.1 Archival reference: A.4.1.1 Where is the original of the source found? A.4.1.2 Under which actual reference? A.4.1.3 If the source is available in copy: How? A.4.1.4 If the source has been published: Give the bibliographical reference of the publication. A.4.2 Bibliographical reference: Please give a bibliographical reference of the major scientific works concerned with this source. Standard for description of historical data materials 13 B Transformation of the source into machine readable form B.1 Time for the transformation. B.1.1 When was this study started? B.1.2 When was it finished (is it intended to finish)? B.2 Software used (Give a listing of commercially available software packages or a description of custom produced software applied in the analysis) B.3 Physical properties of the data set. B.3.1 On which media is the data set stored? (E.g: Disk, data tape, diskette(s)) B.3.2 On which machine is the dataset stored? B.3.2 Under which operating system is the data set stored? B.3.3 Give size and format of the file(s) in which the dataset is stored. B.4 Logical properties of the data set B.4.1 How is the data stored? (E.g: Flat file, database, running text) B.4.2 How many units of observation does the dataset consist of? (E.g: For a flat file give number of variables and number of observations. For at database, give number of files, an for each file the number of records and entries. For a running text, give the word count or line count.) Standard for description of historical data materials 14 B.5 Methods applied (If standard procedures have been used, a brief listing will be sufficient. In other cases a more detailed description will be appropriate) B.5.1 Sampling (If sampling has taken place, give a description of sampling procedures) B.5.2 Coding and standardization (A list of codings and standardizations carried out should be given) B.5.3 Linkage (References ought to be given to the data materials from which the present data material was created) B.5.4 Other (Specify) Standard for description of historical data materials 15