Biographical Data in the TEI: report from meeting held in Oxford 27-28 April 2006

TEI Activity on Biographical Data: Summary notes from Experts Meeting held Oxford 27-28 April 2006

These notes have been made by LB on the basis of a fuller rough transcript of the first day's discussion, made by James Cummings.

Attendees

Oxford University: Lou Burnard (LB), Sebastian Rahtz (SR), James Cummings (JC), Elaine Matthews (EM)*; King's College london: John Bradley (JB)*, Gabriel Bodard (GB)*; Ministry of Culture and Heritage, New Zealand: Fiona Oliver (FO); University of Copenhagen: Eva Wedervang-Jensen (EW), Matthew Driscoll (MD -- Chair). Starred names attended only day one.

Organization

The meeting was generously hosted at Oxford University's Classics Centre by Elaine Matthews (Lexicon of Greek Personal Names project), to whom be thanks.

Agenda

The meeting aimed to

Review the discussion points raised by EW's paper
Review additional requirements for biographical data from the projects around the table
Develop a concrete TEI conformant proposal for markup of biographical data

MD summarised that we were there to discuss the marking up of biographical and prospographical data in TEI, hopefully to improve on what the TEI currently offers and taking into account experience of non-TEI projects.

Scope

EM reminded the meeting of the distinction between onomastic (concerned with names) and prosopographic (concerned with people) data. It was agreed that the TEI provides some mechanisms for the former, but very little for the latter, and that addressing this was the primary focus of the meeting. SR noted that the TEI provided no way to record canonical information about a name itself, distinct from both its application to a person, and the person to whom it was applied.

Origins

LB recapped the history of the current TEI proposals, which originated in the need for historians to markup names as encountered in source materials, and further modified by the need of corpus linguists to record demographic (etc) information about participants in transcribed dialogues. This lead to a review of the current TEI proposals during which the following specific concerns were raised:

Where should person data be stored and managed? as an XML document, within the header, or elsewhere?
The existing attributes @age @sex @role on <person> were questioned: they provided summary shortcut information only, which could not be extended for more complex cases, e.g. multiple roles, or changing age. Consensus was to remove them in favour of more exact child elements.
Was the <person> element also usable for fictional or mythical people? If so (as agreed), might there be a need for an additional @status attribute to indicate fictionality, distinct from @role?
There was probably a need for analogously detailed structures to represent the other things for which TEI provides special naming elements, i.e. organizations, and places, and possibly artefacts such as swords etc. Out of scope for this meeting though.
LB suggested that family or similar groups, if treated as a unit, might be represented as <personGrp>
Most members of model.personPart needed to be also members of att.dateable
The existing <relation> element was intended for definite links; both it and <particDesc> should be renamed
The Ovid example in the current spec for <person> is erroneous in its use of xml:lang; also Publius not Publish; and death date should be AD 17
Where a person has different roles (e.g. bookseller and author) it is necessary to distinguish references to them in one or other capacity. Methods discussed include: an attribute on the reference; making the reference link to a subset of the person information; using standoff to associate the reference and the relevant subset.
Some existing child elements of <person> (e.g.o <langKnown> and <firstLang> , <nationality> ) were too specific and should be generalised
The child elements of <person> might usefully be classified more delicately, and generic instance elements for each class provided. Useful to distinguish life events (e.g. "marriage"), classifications (e.g. "married") and life states (e.g. "married life").
There is a need to indicate relationships between child elements of <person> , e.g. causality; also degree of certainty or where attested (e.g. via bibliographic note)
Most of the categories identified so far could be accomodated, with some stretching of categories (e.g. using "affiliation" to indicate ethnicity). The current scheme provides nothing for recording physical traits such as eye colour or "distinguishing characteristics" though.

Review of PERSW02

EW was complimented on the detail and thoroughness of her report. We worked through the various "Points for Discussion" indicated in it. Particular points noted were:

(Section 3.1) We agreed most of these. Noted that we needed to clarify terminology: what was a lifeEvent? what was an assertion? We discussed the latter with reference to SR's work on the LGPN. All markup asserts something: where does it become necessary to markup the assertion itself? It enabled grouping of components and also the addition of reliability or attribution information. It was better to have a wrapper element than to risk mixed content or ambiguity (e.g. a <bibl> might be content of its parent rather than commentary on it). We considered what kinds of relationship might need to be represented amongst life events, eventually concluding that these were inevitably project specific.
Laurent had proposed as a general principle that for any meta-element <foo> there should also be a <fooGrp> allowing it to be grouped with a <bibl> or other meta-meta element. SR's <assert> element was a generalisation of this.
(Section 3.2). We reviewed possible varieties of name. EM cited need for distinction between theophoric name, names with foreign components, other classifications of name. Should the @reg attribute be re-introduced as a pointer to a separate canonical <name> element? GB suggested using @type of <persName> to distinguish references to persons from references to names. Agreed that existing onomastic elements need major overhaul. To indicate changes in name over time (existDate vs useDate) we propose using naming events: the case of Caracalla was suggestred as a good test for this. For translations and transliterations, LB clarified the correct usage of xml:lang e.g. to distinguish real Chinese from Chinese Romanized (Pinyin) and Chinese Romanized (Wade Giles). Abbreviations and patronymics could be handled as types of persName if necessary.
(Section 3.4.1) agreed that "Floruit" was as useful as "birth" or "death" and should be regarded as a life state. Relationship of life events to other (historical) events was needed but representing events as such (e.g. volcanic eruptions) was outside our scope: we did not need to replicate Cidoc.
(Section 3.5) A generic lifeEvent element was agreed to be useful: the existing <event> element was very specific to a different sense. SR suggested we could consider simply embedding document fragments using the HEML namespace as an alternative, cf SVG.
(Section 3.11) We re-affirmed our preference for representing general concepts such as "functions and activities" as prose within a more generic element such as <occupation> or <lifeState> . We agreed that there was a need for some kind of <physicalCharacteristics> element. We decided that <environment> did not correspond with the existing <setting> element and could be adeqately dealt with as another <lifeState> . We agreed that additional mechanisms for representing language knowledge were needed. We agreed that <education> might need further specific subelements for CV-like purposes, but these could be added using generic mechanisms.
(Section 3.12) We agreed that the <relation> element could be made non-empty to accomodate a more nuanced description of the relationship intended, either as phrase content or (better) using macro.glossSeq.

Principles

MJD began the second day by proposing the following general principles:

The intended uses for our work included database-like collections of data about the people who were referenced by a set of documents; the conversion of existing biographical texts; the creation of new biographical or CV-like structured texts for use e.g. in Human Resources. We discussed at some length the implications of this broad scope. The <person> element might contain structured components or prose; we agreed that the structured components might also contain prose, but that this should not recursively include structured components (except for very generic components like dates or names).
We agreed that there was no need for a structured identification element inside <person> : @xml:id was sufficient.
In the "DNB model", the structured <person> element might contain the text of an article, but it was more likely that they would be stored in a header, and referenced from conventional prose.
The <person> element contains a series of statements ("factoids", or "assertions") relating to personal characteristics, states, or changes in state (i.e. events). A typical state might be "being a bishop"; a typical characteristic is "sex"; a typical state change might be "getting married".
These assertions are derived from many different sources, possibly contradictory, and define characteristics of the person some of which will change over time, while others do not, though in principle no characteristic is unchangeable. It follows that each such assertion needs to be documented, put into a time frame, and be relatable to other assertions.

On the basis of this agreement, we proceeded to formulate a series of specific recommendations coherent with existing TEI approaches.

Initial recommendations

The TEI already provides the following attribute classes:

att.datable: provides notAfter and notBefore, both of which are data.temporal; maybe should be extended to include value, as per <date> ?
att.editLike: provides cert resp and evidence notBefore, all of which are data.enumerated, except resp (data.pointer). Description of its semantics needs change; maybe should be renamed to something like att.asserted?

We could add a new attribute class

att.keyed: provides @key and @scheme, both of which are data.pointer

However, it is probably better to use the existing att.naming class, possibly adding the @scheme attribute to it. We also need to decide whether the datatype for @key should remain data.key or change to data.pointer (or whether we should review the intended semantics of data.key)

As an aid to comprehension, we next proposed the following new model classes:

model.assertable

the class of elements concerning which assertions are made inside person elements; it groups the following three subclasses (plus possibly others from other modules).

model.characteristics: the class of elements describing generally unchanging physical or socially-constructed characteristics of a person, for example hair-colour, ethnicity, sex... These characteristics of an individual are typically independent of their volition or action.
model.state: the class of elements describing changeable characteristics of a person which have a definite duration, for example occupation, residence, name... These characteristics of an individual are typically a consequence of their own action or that of others
model.stateChange: the class of elements describing specific events in a person's history, for example birth, marriage, appointment... These are not characteristics of an individual, but often cause an individual to gain such characteristics.

model.noteLike

the class of elements used to express comments or annotations within an assertion. This class exists already, with members <note> and <witDetail> ; it should also contain model.biblLike

We propose to populate each class with some specific very widely used elements, and also with a single generic element which will express its semantics by means of a <label> child element and/or a @key attribute (which might reference a point in some <taxonomy> or other classification element.

We propose the following specific elements for the model.characteristics class: <langKnowledge> , <faith> , <nationality> , <socEcStatus> , <sex> . These are all members of att.naming and att.datable and (except for <langKnowledge> ) all have a content model of macro.phraseSeq. <langKnowledge> permits a number of <langKnown> elements as alternative content, and may carry a @keys attribute. <langKnown> is a member of att.keyed and carries a @level attribute. <sex> carries a @value attribute to give the ISO normalised value.

Two generic elements are proposed for this class: <culturalTrait> and <physicalTrait> , the former to contain culturally determined characteristics such as nationality, ethnicity, tribe, caste, gender, socio-economic status, and the latter to hold physical characteristics such as eye colour, distinguishing features etc. These elements both have the content model (label, model.dateLike*, p*) and are also members of the att.datable and att.keyed classes. The child <label> element is used to specify the category of trait (the feature) concerned, and may carry an ana attribute to link to a definition for the feature concerned. The value of the feature itself is supplied within one or more <p> child elements; if desired, the @key attribute may be supplied on the <culturalTrait> (etc) element to indicate a fuller definition of the combination of feature and value.

For the model.state class we propose the following specific elements: <persName> , <relation> , <occupation> , <residence> , <affiliation> , <education> , and a new <floruit> element. The corresponding generic element will be called <lifeState> . Attributes and content models as above.

For the model.stateChange class, we proposed specific elements <birth> and <death> and a new generic <lifeEvent> element. All three are members of the att.datable class; lifeEvent is additionally a member of att.keyed; and its content model is label, model.dateLike, placeName?, relation*,p*.

Finally, we propose a new <assert> element which can be used to group together any one model.assertable element with members of the model.annotate class. Collections of such assertions may be combined to form an <assertGroup> element.

Example files

In the course of defining our recommendations we prepared some (rather silly) imaginary examples. An updated version of these is available in the file persw05.xml; a draft ODD describing the proposals and containing an extended example from the LGPN project is in preparation, and will appear as persw06.xml.

Last recorded change to this page: 2007-07-22 • For corrections or updates, contact webmaster AT tei-c DOT org