-->
 
<!DOCTYPE ldoc [ ] >
<ldoc docnum=AI2M2><front>
<title>TEI AI2 M2 : Minutes of the meeting held at Oxford
University, 29 Sept and 1 Oct 1991
</front>
<present>(throughout meeting): Jane Edwards (JE), Stig Johansson
(SJ), And Rosta (AR);  (1 Oct only):  Lou Burnard (LB), Gavin
Burnage (GB), Steve Crowdy (SC); Dominic Dunlop (DD).
<div1><head>Day 1
<p>The work group met for six hours on 29 October to review the
latest draft of its working paper, previously circulated by SJ.
Discussion focused firstly on the sample texts listed in the
appendix, to which it was agreed to add two more, viz Dennis
Tedlock's `Spoken word' and  a sample from the ESF-funded project
on the ecology of adult language acquisition (Wittenberg,
Nijmegen). Various corrections were made in appendix 2.
<p>
There was a wide ranging discussion of the proposals in section 7
of the draft working paper. AR's comments, previously circulated
by e-mail, were reviewed briefly. AR tabled a new set of points
for discussion, together with a description of the ICE file
header. Other discussion topics included levels of analytic
refinement, and different methods of scripting.
<div1><head>Day 2
<div2>Agenda
Following a brief summary of the previous day's work (given as
section 1 above), the following agenda was agreed for the
remainder of the meeting.
<ul>
<li>Reviewers for the draft report
<li>Header information
<li>Timeline proposal
<li>Section 7 in general
<li>Areas for further work
<li>Other matters arising from previous minutes
</ul>
 
<div2>Reviewers
<p>It was agreed that the work paper should be circulated as
widely as possible for review. LB had been contacted by Brian
MacWhinney and also by Henry Thompson. SJ noted that only
transcription schemes which had been fully documented could be
included in the report, but that all were useful grist to the
mill. It was agreed that the originators of every scheme
documented in Appendix would be approached to review the paper,
at least with a view to checking its factual accuracy. JE
provided a list of names and addresses (attached); other names
suggested were Henry Thompson, Gerry Knowles (Lancaster), Victor
Zue (MIT) suggested, Moynihan (Hatfield) and Dafydd Gibbon (Bielefeld).
<p>
JE expressed concern that the final report might be of such
complexity that it would be regarded as unusable. LB described
briefly the proposed balance between introductory tutorial guides
and full reference manuals in the publication of the TEI final
drafts. It was agreed that the cover letter accompanying the work
group's report when it was sent out for reviewers should make
clear that the report was intended to cover the full complexity
of the general solution, and that a guide describing the subset
of features most needed in the simple case was in preparation. LB
expressed the hope that members of the workgroup would help in
the preparation of this tutorial overview.
 
<div2>The Header
<p>SJ drew attention to the figure at the end of the draft
report. The only change from the existing TEI header was the
inclusion of script, recording and transcription statements.
These fitted into the same place in the file.description as the
source.description did for written texts; indeed they could
replace it. On balance, the work group preferred to embed these
statements directly in the file.description element rather than
in an intermediate source.description.
<p>
The work group noted the requirement expressed by TR6 (Corpus
Texts) for participant information to be present at the
individual text level rather than at the header level; it was
felt that this information could be duplicated as needed, but
made more sense within the header.
<p>
Some members of the group felt that more prominence should be
given to the originator of a text than was implied by the current
ordering of subdivisions in the header. LB referred to the
Guideline's reference to AACR2 chap 25: the title of an
electronic work might well include the name of the author of the
source or sources from which it was derived.
<p>
It was noted that the revised list of header information had been
extended to include all the items on Longman's list of tags but
had not yet been collated with the ICE list.
<p>
Turning to AR's notes, LB queried the need for recording
statements to be self-embedding, arguing that an embedding of one
recording in another as in a magazine programme or news broadcast
did not imply any need to embed the description as well. AR asked
about voice-overs etc: LB suggested these were a kind of overlap.
AR agreed that the ID/IDREF mechanism used to link scripts with
participants could be used for recording statements as well.
<div2>Discourse structure
<p>
It was generally agreed that some kind of structure more complex
than a simple sequence of utterances should be allowed for.
Although the difficulties of identifying the structure of
unscripted discourse were clear enough, they formed the thrust of
much work. There were also well recognised semi-scripted
discourse types such as debates, news broadcasts etc in which
structural subdivisions could easily be identified. It was felt
that the general purpose segmentation tag S should be used only
for segmentation of Utterances; for segments composed of
utterances, (U), the general purpose DIV tag would be
recommended. This might self-nest, but not overlap. It was agreed
that much further work needed to be done in defining more
precisely different kinds of discourse structure: AR agreed to
draft a discussion paper.
<div2>Time, space and channel
<p>
There was a wide ranging discussion of the problems of encoding
setting, e.g. when recording transatlantic telephone
conversations. It was agreed that participant ids should be
included in the specification of a setting, as different
participants might be in different places. It was also agreed
that there were as many channels as there were media between
participants, and that a single utterance might be mediated by
several channels.
<p>
It was agreed that the specification for `setting' in AR's notes
should be followed, with the following (among other) minor
modifications:
<ul><li>Information concerning the `communicative.situation' as a
way of categorising the text type should be included in the
encoding.declarations part of the header. The report should refer
to the discussion of situational parameters proposed by WG TR6.
<li>For simplicity's sake, it should be possible to provide a
contextual description in simple prose as an alternative to the
detail of the structured <tag>setting</tag> tag.
<li>The element `description' was intended to contain a prose
description of the setting: the name `surroundings' was agreed to
be more precise for this purpose, to distinguish it from the
general contextual description mentioned above.
<li>The tag <tag>list.of.participants</tag> was simply a grouping
tag: there was no particular reason why participants should not
be at the same level as setting. The order of siblings was also
not important.
<li>`nationality' was too specific: a general purpose `other
information' tag would be substituted. This could also be used to
contain details such as affiliations.
<li>The general principle advocated by AR (in the context of
dates) -- that wherever parameter values are predictable or can
be constrained, they should be represented by attribute values
rather than by element content -- was accepted.
<li>It was agreed that any discussion of names, dates or times
should cross refer to work carried out in AI3.
</ul>
 
<div2> Timelines
<p>
AR's proposal for linking all components of a transcript to a
timeline element was agreed to be a good way of solving the
problems of synchrony and overlap which had dominated the
previous meeting. The group concentrated on defining and
simplifying a document type description incorporating this and
other proposals, a draft of which follows these minutes.
<p>
LB asked whether  face, point, gesture etc were all necessary,
given the strong similarity of the attributes they took and their
structural position. A review of the example texts suggested that
a useful distinction should be made between actions, vocal
communication and nonvocal communication. The term `kinesic' was
later substituted for `nonvocal'. It was agreed that this could
be substituted for the existing tags face, pointing, gesture and
gaze, with the addition of a `type' attribute. There was some
discussion of the use of entity references as an alternative way
of representing kinesics: for example &grunt; might be
automatically replaced in a text by "<kinesic
type=grunt>Oink</kinesic>" or by "Oink" or by nothing at all.
<p>
The following DTD fragment was agreed :
<xmp>
  <!ELEMENT div  - - ((u|pause|vocal|kinesic|writing|#pcdata)*)>
  <!ELEMENT   u  - -  ((u|vocal|pause|#PCDATA|time.point)*)>
</xmp>
Refinement of this model continued after lunch.
<p>
It was noted that utterances could only self-nest when they were
scripted, e.g. in a radio broadcast.
<p>
There was some discussion of alternative reference systems, with
a consensus emerging in favour of using the timeline over either
the standard TEI id/n attribute attached to each element, or an
entirely artificial parallel hierarchy based on notional line-
numbers in a transcript. It was noted that whichever was chosen,
only one should be used and it should be documented in the
header.
<p>
The writing element should be used only for writing produced and
visible to participants in the discourse at a specific time: its
duration attribute was useful to indicate the time during which
it could be read.
<p>
There was some further discussion as to whether a transcript
might ever need to use more than a single timeline. No very
persuasive case was produced, and so for the moment at least, it
was agreed that only one should be permitted.
<p>
The overlap example from the previous minutes was repeated and
recoded using a timeline as follows:
<xmp>
 
<timeline>
   <point n=1><point n=2><point n=3><point id=x4 n=4>
</timeline>
<u sp=A end=X4>This <point target=x1> is <point target=x2> my
<point target=x3>turn </u>
<u sp=B start=x1 end=x3>Balderd<point target=X2>ash</u>
<u sp=C start=x2 end=x4>No it's mine
</xmp>
LB noted that there was some slight redundancy between the use of
start or end attributes on the one hand and the presence of
explicit point tags at the start and end of an utterance. The
text of 3.2 should make clear what the implications of this were.
There was some discussion of the point at which the stack of
potential alignment points within a series of utterances could
safely be popped (e.g. by a formatting application): it was felt
that this would probably only be at the end of the surrounding
<tag>div</tag>.
<p>
LB suggested reference should be made to the HYTIME proposals as
well as to the work of TR2 on hypertext.
 
<div2>Section 7: further points
<p>
In 7.9 (paralinguistic features) it was agreed that tempo, loud,
etc would be best treated as pairs of milestone tags marking
positions of prominence. LB proposed that single tags named tempo
(etc).shift would be more appropriate, with the `end' tag of the
pair being replaced by a shift to normal.
<p>
Under 7.8 (prosody) it was noted that pauses contained by an
utterance should be marked by an entity reference; pauses between
utterances should be marked by the <tag>pause</tag>.
<p>
Various methods of aligning parallel representations were touched
on, inconclusively. It was agreed to refer to the issue but not
propose a solution in any detail.
<p>
There was general agreement to AR's proposals for handling
uncertainty of transcription. LB noted that P1 did not currently
address any analogous feature in written texts (such as
illegibility). SJ noted that 7.4 on speaker attribution was
relevant to this issue.
<p>
Due to shortage of time, it was agreed that AR should circulate
electronically a list of further points arising in section 7 of
the working paper (this has now been done).
<div2>Other further work
<p>
SJ noted that there was an urgent need for examples recoded
according to the work groups proposals for inclusion in Appendix
3. LB & AR undertook to produce these, together with a dtd in the
next ten days. The feasibility of automatic conversion between
existing schemes should be investigated briefly, but there would
not be enough time to do more than sketch illustrative examples.
<p>
AR suggested that further work was necessary for the handling of
discontinuities and on the possible use of CONCUR.
<p>
SJ undertook to produce a new draft of the working paper by 16
October.
<div1>Names and addresses of proposed reviewers
 
<name>  Dr Wallace Chafe
<address> Linguistics Program
Univ of California
Santa Barbara, CA 93106
 
<name>  Dr John DuBois
<address> Linguistics Program
Univ of California
Santa Barbara, CA 93106
 
<name>Prof Dr Konrad Ehlich
<address>Univ. Dortmund
Fachbereich 15
Emil-Figge-Str. 50
D-4600 Dortmund 50
 
<name>Dr John Esling
<address>Linguistics Dept
Univ of Victoria
British Columbia
Canada
<email>vqplot@uvvm.bitnet
 
<name>Helmut Feldweg
<address>ESF Second language Databank
Max-Planck-Inst f&uuml;r Psycholinguistik
Postbus 310
NL-6500 AH Nijmegen
<email>helmut@hnympi51.bitnet
 
<name>Dafydd Gibbon
<address>Fak. f&uuml;r Ling u Lit Wiss.,
Univ. Bielefeld, P8640,
D-4800 Bielefeld 1
<tel>+49 (521) 106 3509
<email>gibbon@lili11.uni-bielefeld.de
 
 
<name>Prof Sidney Greenbaum
<address>Survey of English usage
University College
Gower Street
London
 
<name>Dr John Gumperz
<address>Anthropology Dept
Univ of California
Berkeley, CA 94720
<email>gumperz@cissus.berkeley.edu
 
<name>Dr William Labov
<address>Dept of Linguistics
Univ of Pennsylvania
Philadelphia PA
 
<name>Dr Brian MacWhinney
<address>Psychology Dept
Carnegie-Mellon Univ
Pittsburgh, PA 15213
 
</ldoc>