TEI AI2 M1: Workgroup on Spoken Texts: Minutes of meeting held at University of Oslo: 9-10 August 1991 <author>Lou Burnard <date>11 Aug 1991 <present> Lou Burnard (LB), Jane Edwards (JE), Stig Johansson (SJ; chair), And Rosta (AR). </front> <text> <div1>Day 1: Preliminary <p> SJ welcomed the workgroup members and recapitulated its work plan as described in document TEI AI2 P1. A working draft which would form the basis of the group's report (TEI AI2 W1) had been circulated previously. SJ noted that the final version should be ready by 1 October. In view of the short time scale for affecting the final version of the TEI Guidelines, and the fact that the group was concerned with an area not previously addressed at all by the TEI, he felt that the aim should be to propose as many tags (etc) as possible, rather than simply recommend new work groups. Areas for further work would however also be identified, as stated in the current working paper. <p> LB summarised briefly the methods of claiming expenses and accepted the charge of preparing a record of the meeting. He asked that electronic copies of all working papers be lodged with the TEI secretariat in Chicago as soon as possible. SJ felt that a further revision of the current working paper should be carried out first. <div1>Review of existing encoding practices <p> Discussion began by reviewing the list of relevant research communities in the draft charge to the WG (document TEI AI2P1). SJ noted that only English material and corpus linguistics had been fully covered in his working paper. For natural language recognition, JE mentioned material from the ZUE (?) system; for sociolinguistics, Gumperz et al; for language acquisition, Childes. Phonology was not adequately covered: JE referred to the multi-level analyses being undertaken at Bell labs by Liberman et al; AR agreed that multi-level analysis was essential for our purposes; LB referred the group to the mechanisms developed by the AI working committee, in particular the unit/level scheme as described in TEI P1. SJ was concerned that we should not spend too much time recapping on work being done in other groups. For anthropology and ethnology, JE agreed to provide examples from Tedlock (<q>spoken word</q>, 1983) and LB from the Oxford Text Archive's holdings of SIL texts. SJ, noting that these would otherwise be the only non-English examples, agreed further to provide some examples from Swedish corpora. On rhetoric, speech, drama & journalism, it was noted that drama was now the subject of a different work group. Doubts were expressed as to the relevance of this kind of material. <p> The research typology presented in the draft working paper was accepted in preference to that of AI2P1. Reviewing the topics listed there, it was agreed that further examples were needed as follows: <gl> <gt>Lexicography<gd>LB agreed to get examples and assistance from Steve Crowdie (Longmans) <gt>Sociolinguistics and discourse analysis<gd>Examples from Du Bois and Gumperz were tabled by JE <gt>Second language studies<gd> JE referred to the <q>guestworkers speech archive</q> at Nijmegen (EALA); SJ to a project in error analysis (<q>PIF</q>)carried out by Claus Faerch. <gt>Speech recognition<gd>AR agreed to get some material from the SCRIBE project. </gl> It was agreed that examples of current practise would form an important part of the draft report but should be included as a separate appendix to the main document. Their provision, and its analysis in section 6 of the current working paper, constituted the WG's fulfilment of the charge to document existing practise. Section 7 would constitute its fulfilment of the charge to assess the current provisions of TEI P1, and to propose new tags or extensions. As noted above, the charge to propose new work groups was not accepted. The charge to respond to comments P1 routed to the group was easily accepted, as none of the comments so far received on P1 was directly relevant to the WG's remit, though LB noted that the omission of spoken texts from P1 had already been commented on adversely by one or two people. <p> JE proposed that a summary table comparing different ways in which the same features had been encoded would be useful. SJ felt that detailed description of a small number of <q>significant</q> schemes would be preferable. LB preferred a feature-based list. It was agreed that there would be room for both. <p> Examples from the following schemes would be added to those already present: <ul> <li>Roger Brown, Childes (child language) <li>Jefferson <li>HIAT (Ehlich) <li>Gumperz (Santa Barbara) <li>ESF <note>Dont know what this is LB</note> <li>Survey of English Usage, London-Lund, ICE <li>NatCorp proposals for spoken texts (Crowdie) </ul> JE was able to provide suitable examples for almost all of these during the meeting, several being included in her forthcoming book <cit>Talking Language: transcription and coding of spoken discourse</cit>. LB undertook to send copies of a draft of this volume to all members of the WG. AR would provide examples from the Survey texts, and LB from the British National Corpus texts. SJ stressed the urgency of receiving these. <p> As well as samples of encoded texts, brief descriptions of the markup's meaning and bibliographic references should be provided. <div1>Review of AI2W1 I <p> Following a break, the group proceeded to go through SJ's draft, Major points on which further revision was felt necessary are noted below. <p> JE was concerned that the manageability, readability and usability of encoding schemes should not be overlooked as these were highly important for ease of learning and understanding. LB agreed but noted that most of these factors were beyond the remit of an SGML encoding scheme. SJ agreed to cite JE's views on readability in the report. The word <q>tractability</q> was proposed and accepted as a substitute for <q>manipulability</q> in section 2.1 <p> In section 3, some short statement of the encoding needs of individual groups (as demonstrated by the example texts) should be included. <p> In section 4, rather than <q>levels</q> of transcription, the group preferred the term <q>dimensions</q>. An alternative typology was proposed, for discussion purposes: <ul><li>Lexicalisation (i.e. how words, or representations of words, are defined orthographically, possibly extended by strange spellings, phonemically, phonetically etc) <li>Temporal aspects (prosody, intonation, pausing etc) <li>Inter-speaker co-ordination (overlap, truncation, latching, attribution) <li>Units of analysis: (turns, syntagms, tone units etc) <li>Non-verbal features (anthrophonics, gestures, events etc) <li>Text documentation (recording details, transcription details etc) </ul> <p> Section 5. The need for this further typology was questioned. AR noted that the notion of authorship provided an additional complication. SJ said this would be noted in section 7.1. It was agreed to change <q>conversation</q> on p. 4 into <q>interaction</q> in order to loosen the sense. Problems of oral narrative (story telling etc) should not be ignored, even if no specific proposals were made by the wg. The list was not intended as a typology but as indication of the variety of sources involved. It was agreed that items <q>spoken to be written</q> (e.g. dictation) should be added to the list on p4. <p> Tables showing contrastive treatments for the same kind of feature should be added throughout section 6. <p>In 6.1 the importance of including documentation of the context for spoken texts was felt to need more emphasis. Separating documentation from data makes both less tractable. <p> Sections 6.2 and 6.3 were discussed together because the units defined by reference points (<q>text-units</q> in ICE terms) are a special case of the more basic units; differing chiefly in that they carry a reference number. Given that units might be syntax-, intonation-, pause-based, or a mixture, the present text was felt to be rather too evaluative. It should also note that units can be cross cutting and that some systems (e.g. DuBois) use more than one kind. 6.2 should be moved to follow 6.3 <p> In section 6.4, mention should be made of the various means taken to preserve speaker anonymity, to indicate unknown speakers and to document the degree of speaker awareness of being recorded, <p> A reference to the HIAT and ICE schemes should be added in section 6.5. <p> The tractability problems raised by the mixing of orthographic and phonetic transcription principles should be highlighted in section 6.6. <p> LB queried the distinction between features treated in 6.6 and 6.10, noting that in ICE they are grouped together. SJ said that the difference was that one group had conventional representations such as <q>Mmm</q>, while the latter did not, and proposed as an alternative name <q>Non-lexical vocalisation</q>. AR suggested <q>quasi-lexical</q> which was agreed. <p> The meeting formally closed at 1800, though discussion continued further. <div1>Review of AI2W1 - II <p>Opening the second day of the meeting, SJ proposed that sections 6 and 7 of the document should be discussed in parallel to speed up progress. This was agreed, though in practice the effect was largely to discuss section 7. <div2>Extensions to the TEI header (7.1) <p> LB said that proposals for changes to the TEI header would be subject to review by the Text Documentation committee, which would be meeting sometime in the autumn. <p> Agreed: the title statement should not be optional. <p> There was some discussion of the difficulties of identifying boundaries of some kinds of text: phone conversations were clearly delimited, but radio broadcasts were not. <p> The proposed <tag>interaction.type</tag> tag belonged in the <tag>encoding.declaration</tag>s (not discussed in the working paper) rather than the <tag>file.description</tag>. <p> LB noted the parallel between the notions of <tag>recording.statement</tag> and <tag>transcription.statement</tag> and that of the existing <tag>source.description</tag>. This suggested that recordings in which other recordings were embedded (a problem raised by AR) could be handled by nesting <tag>recording.statements</tag> <p> In discussion of the question of surreptitious recordings, a need for both a general level mechanism (were all participants aware?) and a low level mechanism (did this participant know?) was identified. <p> A need for a grouping tag for participants was identified and <tag>list.of.participants</tag> was proposed. This should be distinguished from the need for a tag identifying a number of participants operating as a group, e.g. the audience of a radio show, for which the tag <tag>participant.group</tag> was proposed. The latter had the same characteristics as <tag>participant</tag>, and an additional attribute <q>size</q>. <p> A need for ways of formally stating relationships between participants was identified, LB suggested that if each participant had a unique id, then their relationships could be expressed by a number of <tag>relation</tag> elements contained within the element, linked by means of a <q>target</q> attribute. For example: <xmp> <participant id=M1>Mary Jones</participant> <participant id=F1>Fred Jones <relation target=M1>spouse</relation> </participant> </xmp> There was some discussion of ways in which this could be extended to cater for reflexive, one- or two-way relationships etc. <p> JE proposed that age and sex of participant would be more economically handled as attributes rather than elements. This was felt to be appropriate for the latter but not the former. It would be useful to be able to specify a range or minima and maxima for age: this could be conveniently done by allowing for attributes with numeric values. AR proposed that in general, where an exhaustive list of attribute values could be specified, this was preferable to leaving the options open. <p> It was possible that some or all of the tags under <tag>transcription.statement</tag> belonged in the encoding declarations. For the moment they would remain where they were, though <tag>transcription.type</tag> should definitely move. <p> AR suggested the tag <tag>channel</tag> should be included within the <setting> element, to include information about the means of delivery of the speech being transcribed, e.g. by telephone, two-way radio etc. <p> JE questioned the order of components within the header, suggesting that the source should come first. LB commented that this would involve a substantial departure from existing practise in the TEI header. <p> The group then reviewed the components so far identified for the header and agreed that information under the following headings should be strongly recommended for inclusion wherever possible: <ul> <li>title and editor <li>time or date of recording (normally the same as that of the setting) <li>participant information <li>circumstance of data capture (e.g. location, situation, activities) </ol> It was noted that information could be presented informally, as running text, rather than formally categorised. <div2>Units of analysis <p>LB queried the need for <tag>u</tag> to mark individual utterances. How did this differ from the general purpose <tag>s</tag>? It was agreed that <tag>u</tag> tags had different attributes (notably, <q>speaker</q>) and should be retained. There was some discussion of the general validity of the back- channel/turn distinction currently emphasised in the draft. <p> For examples of the use of multiple hierarchies and the concur feature, implied by the need for multiply nested segmentation, LB referred the meeting to the discussion in P1, pages 141-4, and also to a fully worked out example of multi-level analysis of an eskimo story provided by Gary Simons, which he agreed to distribute, after checking with its author. <p> It was agreed that an utterance was defined as a stretch of discourse from a single speaker. If two participants spoke simultaneously, this should be regarded as a two utterances. Where speaker attribution was dubious, a list of possible speaker identifiers should be supplied as the value for the speaker attribute. A certainty attribute could be supplied, defaulting to YES in the case of a single speaker, and always having the value NO in the case of multiple speakers. <p> There was some discussion of the need to distinguish the role of a speaker as author or participant in the case of scripted material. LB hypothesized the case of an utterance such as <q>Are we rolling Bob? Good evening. Tonight Mr Gorbachow said We will bury you...</q> in which he distinguished (a) the newsreader speaking in propria persona (b) the newsreader reading from a script prepared by someone else which happens to quote (c) a third party's speech. It was agreed that the appropriate tag for case (c) was the <tag>q</tag> tag already present in the Guidelines. SJ stated that the role of a speaker with reference to the text should be documented in the header, which should distinguish scripted and unscripted material, the distinction being that scripted material can be departed from. <p> After further discussion, it was agreed that a <tag>script.statement</tag> should be included in the header to provide information which could be associated with a given utterance by means of an IDREF supplied on a <q>SCRIPT</q> attribute to the <tag>U</tag> element. (This implies that in the example above, the switch from unscripted to scripted remarks in fact indicates the start of a new, possibly nested, utterance by the same speaker - LB) <p> The question of back-channelling and interruptions was discussed further. SJ agreed to reconsider the matter. As an example, the group discussed the following <xmp> <u>This is <u type=back>uh huh</u> my turn </u> </xmp> which was generally felt to be unsatisfactory, as it obscured the fact that <q>This is my turn</q> was a single utterance. <p> It was noted that truncation was not necessarily associated with interruption, either of segments or words, since it could also be indicated by intonation patterns.The group initially proposed simply an attribute <q>trunc</q> with values Y or N which could be attached to utterance or segment tags. Thus in the following example <xmp> You know how they do that, so you can't s- ha- -- you dont have any balance (J&J 1.4.1) </xmp> the intonation unit beginning <q>so you can't...</q> is truncated, as are the two partial words with which it ends. This could be rendered as <xmp> <s type=IU trunc=y> so you can't <s type=W trunc=y>s</s><s type=W trunc=y>ha</s> </s> </xmp> Preference was expressed for a special purpose <tag>truncated.word</tag>, by analogy with existing tags such as <tag>foreign</tag> or <tag>highlighted</tag>. SJ felt that units should not be marked as segments simply in order to carry a truncation tag. The above example would thus become <xmp> <s type=IU trunc=y> so you can't <truncated.word>s</truncated.word> <truncated.word>ha</truncated.word> </s> </xmp> <p> An interruption could be regarded as an overlap associated with truncation, or which coincided with a pause. Returning therefore to the problem of overlapping segments, the group focused again on the example of overlap given above. In the London-Lund corpus this would be marked up as follows: <xmp> A This is B uh uh (a) my turn </xmp> LB proposed the following alternative: <xmp> <u sp=A>This is <point id=a1>my turn <u sp=B><point same=a1>uh uh </xmp> where the <q>same</q> attribute was used to point from one <tag>point</tag> to another, indicating synchrony of utterance. It was noted that this synchronised only the start of each utterance. Although overlapped segments clearly had extent, using a true SGML element (say, <tag>olap</tag> or even <tag>s type=olap</tag>) would not work. If, for example, A's turn was overlapped by two speakers, a formulation such as <xmp> <u sp=A>This <olap id=A1>is <olap id=A2>my</olap> turn</olap> <u sp=B><olap same=A1>uh uh</olap> <u sp=C><olap same=A2>No it's mine</olap> </xmp> where segments A1 and A2 are overlapped by different speakers was ambiguous (and illegal SGML). One alternative would be to define a concurrent hierarchy for each speaker thus <xmp> <u sp=A>This <(b)olap id=A1>is <(c)olap id=A2>my</(b)olap> turn</(c)olap> <u sp=B><(b)olap same=A1>uh uh</(b)olap> <u sp=C><(c)olap same=A2>No it's mine</(c)olap> </xmp> but this would require as many concurrent views for as there were overlapping speakers and would also lead to some processing difficulties with currently available SGML software. <p> As an alternative, LB proposed that an <tag>end.point</tag> could be used to mark the alignment of places where overlap finished in an utterance. For completeness, this could be linked to its corresponding <tag>point</tag> by a further pointer attribute named <q>start</q>, thus <xmp> <u sp=A>This <point id=A1>is <point id=A2>my <end.point id=A3 start=A1> turn<end.point id=A4 start=A2> <u sp=B><point same=A1 id=B1>uh uh<end.point same=A4 start=B1> <u sp=C><point same=A2 id=C1>No it's mine<end.point same=A4 start=C1> </xmp> <p> This formulation could be automatically derived from the simpler input conventions proposed by JE and others. Summarising the discussion, it was agreed that an utterance is a stretch of spoken language from one speaker and a segment is anything smaller. It may have a type (e.g. macrosyntagm, tone unit, turn, arbitrary text unit etc) and occurrences can be nested. To cope with the SGML prohibition on crossing of such nested segments, we recommend the use of milestone tags <tag>point</tag> and <tag>end.point</tag>, to mark synchronisation points where overlap begins and ends, which take the following attributes <gl> <gt>id<gd>provides an arbitrary identifier<note>In later discussion, it was noted that this could conveniently be derived from a timeline. Alternatively, the values chosen could act as pointers to discrete points on the timeline</note> <gt>same<gd>identifies a point in another overlapping utterance <gt>start<gd>identifies a point in the same utterance where overlap begins (used only for end.point) </gl> <div2>Other features discussed <p>In section 7.6 it was noted that issues of truncation had already been addressed. It was additionally suggested that causes, both for deletion and truncation, should be specified using an attribute. <p> It was noted that pauses may occur both within utterances and between them. A need for a <q>units</q> attributes, with values such as <q>seconds</q> or <q>syllables</q> was identified. <p> There was some discussion as to whether paralinguistic phenomena should be partitioned into vocal non-verbal actions (coughs, umms sneezes etc) and others (gestures, passing trucks etc). It was suggested that the former might be regarded as utterances, and a tag <tag>action</tag> or <tag>event</tag> used for the latter. Treating coughs etc. as utterances would imply that a cough by speaker A during an utterance by speaker B would have to be regarded as a case of overlap. SJ would prefer to include non-vocal involuntary noises within the utterance where they occur, possibly with a speaker attribute. This would preclude their representation by entity reference. <p> The full ramifications of encoding non-verbal actions were not explored; it was noted that as well as a description of the event, lists of identifiers for the participants would be needed, as well as (probably) an alignment map. LB referred the meeting to the discussion of movements in paper TEI MLW18 for some examples. <p> In a brief discussion on performative features such as pitch, speed and vocalisation, LB asked if these could not be regarded as analogous to rendition in written texts and treated in a similar way. It was generally felt that it would be better to mark these using milestone tags such as <tag>pitch.change</tag>, <tag>speed.change</tag> etc. <p> It was noted that the list of kinesic features in section 7.11 was not intended to be exhaustive but just to provide suggestions. Evaluative preferences should not be included in it. It was suggested that an attribute <q>iterated</q> might be useful. <div1>Conclusions <p> The group felt that substantial progress had been made, but identified the following topics as needing considerable further work: <ul> <li>quasi vocal things such as laughter <li>quasi lexical things such as <q>mm</q> <li>prosody <li>parallel and discontinuous segments <li>uncertainty of transcription, uncertainty in general </ul> JE would be out of touch till 27 August. SJ will work on revising the draft and circulate copies of the chosen set of examples as soon as possible. LB will circulate minutes of the meeting before 15 August. It was felt that funding for a second meeting should be sought, perhaps adjacent to the NOED conference in Oxford at the end of September. LB agreed to host the meeting and SJ to seek authorization to hold it. </ldoc>