<?xml version="1.0" encoding="utf-8"?>
<!--
Copyright TEI Consortium. 
Licensed under the GNU General Public License. 
See the file COPYING.txt for details.
$Date: 2008-01-31 19:20:41 +0000 (Thu, 31 Jan 2008) $
$Id: CE-CertaintyResponsibility.xml 4344 2008-01-31 19:20:41Z louburnard $
-->
<div xmlns="http://www.tei-c.org/ns/1.0" type="div1" xml:id="CE" n="17"><head>Certainty and Responsibility</head>
<p>Encoders of text often find it useful to indicate that some aspects
of the encoded text are problematic or uncertain, and to indicate who is
responsible for various aspects of the markup of the electronic text.
These Guidelines provide three methods of recording uncertainty about the
text or its markup:
<list type="simple">
<item>the <gi>note</gi> element defined in section <ptr target="#CONO"/> may
be used with a value of <val>certainty</val> for its <att>type</att>
attribute.</item>
<item>the <gi>certainty</gi> element defined in this chapter may be used
to record the nature and degree of the uncertainty in a more structured
way.</item>
<item>the <gi>alt</gi> element defined in the module for
linking and segmentation may be used to provide alternative encodings
for parts of a text, as described in section <ptr target="#SAAT"/>.</item></list>
There are three methods of indicating responsibility for different
aspects of the electronic text:
<list type="simple">
<item>the TEI header records who is responsible for an electronic text
by means of the <gi>respStmt</gi> element and other more specific elements
(<gi>author</gi>, <gi>sponsor</gi>, <gi>funder</gi>, <gi>principal</gi>,
etc.) used within the <gi>titleStmt</gi>, <gi>editionStmt</gi>, and
<gi>revisionDesc</gi> elements.</item>
<item>the <gi>note</gi> element may be used with a value of <val>resp</val>
or <val>responsibility</val> in its <att>type</att> attribute.</item>
<item>the <gi>respons</gi> element defined in this chapter may be used
to record fine-grained structured information about responsibility for
individual tags in the text.</item></list>
No special steps are needed to use the <gi>note</gi> and <gi>respStmt</gi> elements, since they are defined in the core module and header
respectively.  The <gi>alt</gi> element is only available when the
module for linking has been selected, as described in
chapter <ptr target="#SA"/>. To use the <gi>certainty</gi> and
<gi>respons</gi> elements, the module for certainty and
responsibility must be selected. 
</p>
<div type="div2" xml:id="CECERT"><head>Levels of Certainty</head>
<p>Many types of uncertainty may be distinguished.  The
<gi>certainty</gi> element is designed to encode the following sorts:
<list type="simple">
<item>a given tag may or may not correctly apply (e.g. a given word may
be a personal name, or perhaps not)</item>
<item>the precise point at which an element begins or ends is
uncertain</item>
<item>the value to be given for an attribute is uncertain</item>
<item>content supplied by the encoder (such as the expansion of
an abbreviation marked by the <gi>abbr</gi> tag) is
uncertain</item>
<item>the transcription of a source text is uncertain, perhaps
because it is hard to read or hard to hear; this sort of
uncertainty is also handled by the <gi>unclear</gi> element in
section <ptr target="#PHDA"/></item></list></p>
<p>The following types of uncertainty are <emph>not</emph> indicated
with the <gi>certainty</gi> element:
<list type="simple">
<item>a number or date is imprecise</item>
<item>the text is ambiguous, so a given passage has several possible
interpretations</item>
<item>a transcriber, editor, or author wishes to indicate a level of
confidence in a factual assertion made in the text</item>
<item>an author is not sure if the sentence she has chosen to start a
paragraph is really the one she wants to retain in the final version</item></list>
Precision of numbers and dates is discussed in section <ptr target="#CONA"/>;
well-defined ambiguity is handled with
alternations in feature-structure values in chapter <ptr target="#FS"/>.
Uncertainty about the truth of assertions in the text and other sorts of
authorial and editorial uncertainty about whether the content is
satisfactory are not handled by the <gi>certainty</gi> element,
though they may be expressed using the <gi>note</gi> element.</p>
<div type="div3" xml:id="CECENO"><head>Using Notes to Record Uncertainty</head>
<p>The simplest way of recording uncertainty about markup is to attach a
note to the element or location about which one is unsure.  In the
following (invented) paragraph, for example, an encoder might be
uncertain whether to mark <q>Essex</q> as a place name or a personal
name, since both might be plausible in the given context:
<q rend="display">Elizabeth went to Essex. She had always liked Essex.</q>
Using <gi>note</gi>, the uncertainty here may be recorded quite simply:
<egXML xmlns="http://www.tei-c.org/ns/Examples"><persName>Elizabeth</persName> went to <placeName>Essex</placeName>. She had always liked <placeName>Essex</placeName>.<note type="uncertainty" resp="#MSM">It is not
clear here whether <mentioned>Essex</mentioned>
refers to the place or to the nobleman. -MSM</note></egXML></p>
<p>Using the normal mechanisms, the note may be associated
unambiguously with specific elements of the text, thus:
<egXML xmlns="http://www.tei-c.org/ns/Examples"><persName>Elizabeth</persName> went to <placeName xml:id="CE-p1a">Essex</placeName>.
She had always liked <placeName xml:id="CE-p1b">Essex</placeName>.<note type="uncertainty" resp="#MSM" target="#CE-p1a #CE-p1b">It
is not clear here whether <mentioned>Essex</mentioned>
refers to the place or to the nobleman. If the latter,
it should be tagged as a personal name. -<name xml:id="MSM">Michael</name></note></egXML></p>
<p>The advantage of this technique is its relative simplicity. Its
disadvantage is that the nature and degree of uncertainty are not
conveyed in any systematic way and thus are not susceptible to any sort
of automatic processing.</p></div>
<div type="div3" xml:id="CECECE"><head>Structured Indications of Uncertainty</head>
<p>To record uncertainty in a more structured way, susceptible of at
least simple automatic processing, the <gi>certainty</gi> element may be
used:
<specList><specDesc key="certainty"/></specList></p>
<p>Returning to the example, the <gi>certainty</gi> element may be used to record doubts about
the proper encoding of <q>Essex</q> in several ways of varying
precision.  To record merely that we are not certain that <q>Essex</q>
is in fact a place name, as it is tagged, we use the <att>target</att>
attribute to identify the element in question, and the <att>locus</att>
attribute to indicate what aspect of the markup we are uncertain about
(in this case, whether we have used the correct <q>gi</q>, that is,
element type, to mark it):
<egXML xmlns="http://www.tei-c.org/ns/Examples">Elizabeth went to 
<placeName xml:id="CE-pl1">Essex</placeName>.
<!-- ... elsewhere in the document ... -->
<certainty target="#CE-pl1" locus="gi"><desc>possibly not a placename</desc></certainty></egXML>
Because it is linked to the location of the uncertainty by a reference, the
<gi>certainty</gi> element will typically be included in the same
document as its target.  It may be placed adjacent to the target
element, or elsewhere in the document.</p>
<p>To record the further information that we estimate, subjectively,
that there is a 60 percent chance of <q>Essex</q> being a place name here, we
can add a value for our <att>degree</att> of confidence (usually a
number between 0 and 1, representing the estimated probability):
<egXML xmlns="http://www.tei-c.org/ns/Examples">
<!-- ... -->
<certainty target="#CE-pl1" locus="gi" degree="0.6"/></egXML>
According to one expert, there is a 60 percent chance of <q>Essex</q>
being a place name here, and a 40 percent chance of its being a
personal name. We can use two <gi>certainty</gi> elements to indicate the
two probabilities independently. Both elements indicate the same location in the
text, but the second provides an alternative choice of generic
identifier (in this case <gi>persName</gi>), which is given as the
value of the <att>assertedValue</att> attribute:
<egXML xmlns="http://www.tei-c.org/ns/Examples">
<!-- ... -->
<certainty target="#CE-pl1" locus="gi" degree="0.6">
   <desc>probably a placename, but possibly not</desc></certainty>
<certainty target="#CE-pl1" locus="gi" degree="0.4" assertedValue="persName">
   <desc>may refer to the Earl of Essex</desc></certainty></egXML></p>
<p>Finally, we may wish to make our probability estimates contingent
on some condition.  In the passage <q>Elizabeth went to Essex; she had
always liked Essex,</q> for example, we may feel there is a 60 percent chance
that the county is meant, and a 40 percent chance that the earl is meant.  But
the two occurrences of the word are not independent:  there is (we may
feel) no chance at all that one occurrence refers to the county and one
to the earl.  We can express this by using the <att>given</att>
attribute to list the identifiers of <gi>certainty</gi> elements.
<egXML xmlns="http://www.tei-c.org/ns/Examples">Elizabeth went to <placeName xml:id="CE-PL1">Essex</placeName>.
She had always liked <placeName xml:id="CE-PL2">Essex</placeName>.
<!-- ... -->
<!-- 60% chance that P1 is a placename,
     40% chance a personal name. -->
<certainty xml:id="cert-1" target="#CE-PL1" locus="gi" degree="0.6">
  <desc>probably a placename, but possibly not"</desc></certainty>
<certainty xml:id="cert-2" target="#CE-PL1" locus="gi" assertedValue="persName" degree="0.4">
   <desc>may refer to the Earl of Essex"</desc></certainty>
<!-- 60% chance that P2 is a placename,
     40% chance a personal name.
    100% chance that it agrees with P1. -->
<certainty target="#CE-PL2" locus="gi" given="#cert-1" degree="1.0">
   <desc>if P1 is a placename, P2 certainly is"</desc></certainty>
<certainty target="#CE-PL2" locus="gi" assertedValue="persName" degree="1.0" given="#cert-2">
   <desc>if p1 refers to the Earl of Essex, so does P2</desc></certainty></egXML>
When <att>given</att> conditions are listed, the <gi>certainty</gi>
element is interpreted as claiming a given degree of confidence in a
particular markup given the assertional content of the
<gi>certainty</gi> elements indicated—that is, <emph>if the markup
described in the indicated <gi>certainty</gi> elements is
correct.</emph></p>

<p>Conditional confidence may be less that 100 percent:  given the sentence
<q>Ernest went to old Saybrook</q>, we may interpret <q>Saybrook</q> as
a personal name or a place name, assigning a 60 percent probability to the
former. If it is a place name, there may be a 50 percent chance that the
place name actually in question is <q>Old Saybrook</q> rather than
<q>Saybrook</q>, while if it is correctly tagged as a personal name, it
is much more likely (say, 90 percent certain) that the name is <q>Saybrook</q>.
Hence there is uncertainty about the correct location for the markup
as well as about which markup to use.  This state of affairs can be expressed using the <gi>certainty</gi> element thus:
<egXML xmlns="http://www.tei-c.org/ns/Examples">Ernest went to <anchor xml:id="CE-a1"/> old <persName xml:id="CE-p2">Saybrook</persName>.
<certainty xml:id="cert1" target="#CE-p2" locus="gi" degree="0.6"/>
<certainty target="#CE-p2" locus="startLoc" given="#cert1" degree="0.9"/>
<certainty xml:id="cert2" target="#CE-p2" locus="gi" assertedValue="placeName" degree="0.4"/>
<certainty target="#CE-p2" locus="startLoc" given="#cert2" degree="0.5"/>
<certainty xml:id="cert3" target="#CE-p2" locus="startLoc"
	   assertedValue="CE-a1" given="#cert1" degree="0.1"/>
<certainty xml:id="cert4" target="#CE-p2" locus="startLoc" assertedValue="CE-a1" given="#cert2" degree="0.5"/></egXML>

Note the use of the <att>assertedValue</att> on <gi>certainty</gi>
elements <val>cert3</val> and <val>cert4</val> to reference
the <gi>anchor</gi> element placed at the alternative starting
point for the element.</p>

<p>Multiplying the numeric values out, this markup may be interpreted as
assigning specific probabilities to three different ways of
marking up the sentence:
<egXML xmlns="http://www.tei-c.org/ns/Examples">Ernest went to old <persName>Saybrook</persName>.    (0.6 * 0.9, or 0.54)
Ernest went to old <placeName>Saybrook</placeName>.  (0.4 * 0.5, or 0.20)
Ernest went to <placeName>old Saybrook</placeName>.  (0.4 * 0.5, or 0.20)</egXML>
The probabilities do not add up to 1.00 because the markup indicates
that if <q>Saybrook</q> is (part of) a personal name, there is a
10 percent likelihood that the element should start somewhere other than the
place indicated, without however giving an alternative location; there
is thus a 6 percent chance (0.1 × 0.6) that none of the alternatives given is
correct.</p>
<p>If an attribute value is uncertain, the <att>locus</att> attribute
takes as its value the name of the attribute in question.  In this
example, there is only a 50 percent chance that the question was spoken by
participant A:
<egXML xmlns="http://www.tei-c.org/ns/Examples"><u xml:id="CE-u1" who="#A">Have you heard the election results?</u>
<certainty target="#CE-u1" locus="att.who" degree="0.5"/></egXML></p>
<p>Doubts about whether the transcription is correct may be expressed
by assigning to <att>locus</att> the value
<mentioned>transcribedContent</mentioned>.  For example, if the source is
hard to read and so the transcription is uncertain:
<egXML xmlns="http://www.tei-c.org/ns/Examples">I have a <emph xml:id="CE-p3">gub</emph>.
<certainty target="#CE-p3" locus="transcribedContent" degree="0.5"/></egXML></p>
<p>Degrees of confidence in the proper expansion of abbreviations may
also be expressed, by using the value <mentioned>suppliedContent</mentioned>:
<egXML xmlns="http://www.tei-c.org/ns/Examples">You will want to use
<choice><expan xml:id="CE-e1">Standard
Generalized Markup Language</expan>
<expan xml:id="CE-e4">Some Grandiose Methodology for Losers</expan><abbr>SGML</abbr></choice> ...
<!-- ... -->
<certainty target="#CE-e1" locus="suppliedContent" degree="0.9"/></egXML></p>
<p>The <att>assertedValue</att> attribute should be used to provide an
alternative value for whatever aspect of the markup is in doubt:  an
alternative generic identifier, or the identifier of an alternative starting or
ending point, as already shown, an alternative attribute value, or
alternative element content, as in this example:
<egXML xmlns="http://www.tei-c.org/ns/Examples">I have a <emph xml:id="CE-P3">gub</emph>.
<certainty target="#CE-P3" locus="transcribedContent" assertedValue="gun" degree="0.8"> 
  <desc>a gun makes more sense in a holdup</desc></certainty></egXML>
Since attribute values have no internal substructure, the
<att>assertedValue</att> attribute is useful for specifying alternative
transcriptions only in relatively restricted circumstances
(specifically, when the alternative reading has no elements nested within
it).  More robust methods of handling uncertainties of transcription are
the <gi>unclear</gi> element and the <gi>app</gi> and <gi>rdg</gi>
elements described in chapter <ptr target="#TC"/>.
 The
<gi>certainty</gi> element allows for indications of uncertainty to
be structured with at least as much detail and clarity as appears to be
currently required in most ongoing text projects.  
It is expected that in the future more adequate systems for expressing
uncertainty will be developed.  These may extend the <gi>certainty</gi>
element or they may make use of the feature-structure encoding
mechanisms described in chapter <ptr target="#FS"/>.</p>
<p>The <gi>certainty</gi> element and the other TEI mechanisms for
indicating uncertainty provide a range of methods of graduated
complexity.  Simple expressions of uncertainty may be made by using the
<gi>note</gi> element.  This is simple and convenient, and can
accommodate either a discursive and unstructured indication of uncertainty, or
a complex and structured but probably project-specific expression of uncertainty.  In
general, however, unless special steps are taken, the <gi>note</gi>
element does not provide as much expressive power as the
<gi>certainty</gi> element, and in cases where highly structured
certainty information must be given, it is recommended that the
<gi>certainty</gi> element be used.</p>
<p>The <gi>certainty</gi> element may be used for simple unqualified
indications of uncertainty, in which case only the <att>locus</att>
and <att>target</att> attributes might be specified.  
In more complex cases, the
other attributes may be used to provide fuller information.  While
these attributes may take any string of characters as value, the recommended
values should be used wherever possible; if they are not appropriate
in a given situation, encoders should provide their own controlled
vocabulary and document it in the <gi>encodingDesc</gi> or
<gi>tagUsage</gi> elements of the TEI header.</p>
<specGrp>






&certainty;




</specGrp>
<specGrpRef target="#DCERESP"/></div></div>
<div type="div2" xml:id="CERESP"><head>Attribution of Responsibility</head>
<p>In general, attribution of responsibility for the transcription and
markup of an electronic text is made by <gi>respStmt</gi> elements
within the header: specifically, within the title statement, the
edition statement(s), and the revision history.</p>
<p>In some cases, however, more detailed element-by-element information
may be desired. For example, an encoder may wish to distinguish between the
individuals responsible for transcribing the content and those
responsible for determining that a given word or phrase constitutes a
proper noun.  Where such fine-grained attribution of responsibility is
required, the <gi>respons</gi> element can be used:
<specList><specDesc key="respons"/></specList></p>
<p>This element allows one or more aspects of the markup to be
attributed to a given individual.  The <att>target</att> and
<att>locus</att> attributes function as they do on the
<gi>certainty</gi> element described in section <ptr target="#CECERT"/>:
the <att>target</att> attribute points at a particular element (or
set of elements), and <att>locus</att> indicates the particular aspect
of the encoding of those elements for which responsibility is to be
assigned.  The suggested values may be combined as appropriate. For example,  to
indicate that RC is responsible for transcribing an
illegible word, and that PMWR is responsible for identifying that word
as a proper noun, the text might be encoded thus:
<egXML xmlns="http://www.tei-c.org/ns/Examples">Earnest went to old <persName xml:id="CE-p5">Saybrook</persName>.
<!-- ... -->
<respons target="#CE-p5" locus="transcribedContent" resp="#RC"/>
<respons target="#CE-p5" locus="gi location" resp="#PMWR"/>
<list type="encoders">
<item xml:id="PMWR"/>
<item xml:id="RC"/>
</list></egXML></p>
<p>Some elements bear specialized <att>resp</att> or <att>agent</att>
attributes, which have specific meanings that vary from element to
element; the <gi>respons</gi> element should be reserved for the general
aspects of responsibility common to all text transcription and
markup, and should not be confused with the more specific attributes on
individual elements.</p>
<specGrp xml:id="DCERESP" n="Responsibility for markup">







&respons;





</specGrp></div>

<div><head>The Certainty Module</head>
<p>The module described in this chapter makes available the following
additional elements:

<moduleSpec xml:id="DCE" ident="certainty"><altIdent type="FPI">Certainty
and Uncertainty</altIdent><desc>Certainty and uncertainty</desc>
<desc xml:lang="fr">Degré de certitude et responsabilité</desc>
<desc xml:lang="zh-tw">確定程度與不確定程度</desc>
<desc xml:lang="it">Certezza e incertezza</desc><desc xml:lang="pt">Certeza e incerteza</desc><desc xml:lang="ja">確信度モジュール</desc></moduleSpec><!--publicID:  -//TEI P5//ELEMENTS Additional Element Set for Certainty and Responsibility//EN-->

The selection and combination of modules to form a TEI schema is described in
<ptr target="#STIN"/>.

</p>
</div></div>
