--> TEI AI2 M2 : Minutes of the meeting held at Oxford University, 29 Sept and 1 Oct 1991 </front> <present>(throughout meeting): Jane Edwards (JE), Stig Johansson (SJ), And Rosta (AR); (1 Oct only): Lou Burnard (LB), Gavin Burnage (GB), Steve Crowdy (SC); Dominic Dunlop (DD). <div1><head>Day 1 <p>The work group met for six hours on 29 October to review the latest draft of its working paper, previously circulated by SJ. Discussion focused firstly on the sample texts listed in the appendix, to which it was agreed to add two more, viz Dennis Tedlock's `Spoken word' and a sample from the ESF-funded project on the ecology of adult language acquisition (Wittenberg, Nijmegen). Various corrections were made in appendix 2. <p> There was a wide ranging discussion of the proposals in section 7 of the draft working paper. AR's comments, previously circulated by e-mail, were reviewed briefly. AR tabled a new set of points for discussion, together with a description of the ICE file header. Other discussion topics included levels of analytic refinement, and different methods of scripting. <div1><head>Day 2 <div2>Agenda Following a brief summary of the previous day's work (given as section 1 above), the following agenda was agreed for the remainder of the meeting. <ul> <li>Reviewers for the draft report <li>Header information <li>Timeline proposal <li>Section 7 in general <li>Areas for further work <li>Other matters arising from previous minutes </ul> <div2>Reviewers <p>It was agreed that the work paper should be circulated as widely as possible for review. LB had been contacted by Brian MacWhinney and also by Henry Thompson. SJ noted that only transcription schemes which had been fully documented could be included in the report, but that all were useful grist to the mill. It was agreed that the originators of every scheme documented in Appendix would be approached to review the paper, at least with a view to checking its factual accuracy. JE provided a list of names and addresses (attached); other names suggested were Henry Thompson, Gerry Knowles (Lancaster), Victor Zue (MIT) suggested, Moynihan (Hatfield) and Dafydd Gibbon (Bielefeld). <p> JE expressed concern that the final report might be of such complexity that it would be regarded as unusable. LB described briefly the proposed balance between introductory tutorial guides and full reference manuals in the publication of the TEI final drafts. It was agreed that the cover letter accompanying the work group's report when it was sent out for reviewers should make clear that the report was intended to cover the full complexity of the general solution, and that a guide describing the subset of features most needed in the simple case was in preparation. LB expressed the hope that members of the workgroup would help in the preparation of this tutorial overview. <div2>The Header <p>SJ drew attention to the figure at the end of the draft report. The only change from the existing TEI header was the inclusion of script, recording and transcription statements. These fitted into the same place in the file.description as the source.description did for written texts; indeed they could replace it. On balance, the work group preferred to embed these statements directly in the file.description element rather than in an intermediate source.description. <p> The work group noted the requirement expressed by TR6 (Corpus Texts) for participant information to be present at the individual text level rather than at the header level; it was felt that this information could be duplicated as needed, but made more sense within the header. <p> Some members of the group felt that more prominence should be given to the originator of a text than was implied by the current ordering of subdivisions in the header. LB referred to the Guideline's reference to AACR2 chap 25: the title of an electronic work might well include the name of the author of the source or sources from which it was derived. <p> It was noted that the revised list of header information had been extended to include all the items on Longman's list of tags but had not yet been collated with the ICE list. <p> Turning to AR's notes, LB queried the need for recording statements to be self-embedding, arguing that an embedding of one recording in another as in a magazine programme or news broadcast did not imply any need to embed the description as well. AR asked about voice-overs etc: LB suggested these were a kind of overlap. AR agreed that the ID/IDREF mechanism used to link scripts with participants could be used for recording statements as well. <div2>Discourse structure <p> It was generally agreed that some kind of structure more complex than a simple sequence of utterances should be allowed for. Although the difficulties of identifying the structure of unscripted discourse were clear enough, they formed the thrust of much work. There were also well recognised semi-scripted discourse types such as debates, news broadcasts etc in which structural subdivisions could easily be identified. It was felt that the general purpose segmentation tag S should be used only for segmentation of Utterances; for segments composed of utterances, (U), the general purpose DIV tag would be recommended. This might self-nest, but not overlap. It was agreed that much further work needed to be done in defining more precisely different kinds of discourse structure: AR agreed to draft a discussion paper. <div2>Time, space and channel <p> There was a wide ranging discussion of the problems of encoding setting, e.g. when recording transatlantic telephone conversations. It was agreed that participant ids should be included in the specification of a setting, as different participants might be in different places. It was also agreed that there were as many channels as there were media between participants, and that a single utterance might be mediated by several channels. <p> It was agreed that the specification for `setting' in AR's notes should be followed, with the following (among other) minor modifications: <ul><li>Information concerning the `communicative.situation' as a way of categorising the text type should be included in the encoding.declarations part of the header. The report should refer to the discussion of situational parameters proposed by WG TR6. <li>For simplicity's sake, it should be possible to provide a contextual description in simple prose as an alternative to the detail of the structured <tag>setting</tag> tag. <li>The element `description' was intended to contain a prose description of the setting: the name `surroundings' was agreed to be more precise for this purpose, to distinguish it from the general contextual description mentioned above. <li>The tag <tag>list.of.participants</tag> was simply a grouping tag: there was no particular reason why participants should not be at the same level as setting. The order of siblings was also not important. <li>`nationality' was too specific: a general purpose `other information' tag would be substituted. This could also be used to contain details such as affiliations. <li>The general principle advocated by AR (in the context of dates) -- that wherever parameter values are predictable or can be constrained, they should be represented by attribute values rather than by element content -- was accepted. <li>It was agreed that any discussion of names, dates or times should cross refer to work carried out in AI3. </ul> <div2> Timelines <p> AR's proposal for linking all components of a transcript to a timeline element was agreed to be a good way of solving the problems of synchrony and overlap which had dominated the previous meeting. The group concentrated on defining and simplifying a document type description incorporating this and other proposals, a draft of which follows these minutes. <p> LB asked whether face, point, gesture etc were all necessary, given the strong similarity of the attributes they took and their structural position. A review of the example texts suggested that a useful distinction should be made between actions, vocal communication and nonvocal communication. The term `kinesic' was later substituted for `nonvocal'. It was agreed that this could be substituted for the existing tags face, pointing, gesture and gaze, with the addition of a `type' attribute. There was some discussion of the use of entity references as an alternative way of representing kinesics: for example &grunt; might be automatically replaced in a text by "<kinesic type=grunt>Oink</kinesic>" or by "Oink" or by nothing at all. <p> The following DTD fragment was agreed : <xmp> <!ELEMENT div - - ((u|pause|vocal|kinesic|writing|#pcdata)*)> <!ELEMENT u - - ((u|vocal|pause|#PCDATA|time.point)*)> </xmp> Refinement of this model continued after lunch. <p> It was noted that utterances could only self-nest when they were scripted, e.g. in a radio broadcast. <p> There was some discussion of alternative reference systems, with a consensus emerging in favour of using the timeline over either the standard TEI id/n attribute attached to each element, or an entirely artificial parallel hierarchy based on notional line- numbers in a transcript. It was noted that whichever was chosen, only one should be used and it should be documented in the header. <p> The writing element should be used only for writing produced and visible to participants in the discourse at a specific time: its duration attribute was useful to indicate the time during which it could be read. <p> There was some further discussion as to whether a transcript might ever need to use more than a single timeline. No very persuasive case was produced, and so for the moment at least, it was agreed that only one should be permitted. <p> The overlap example from the previous minutes was repeated and recoded using a timeline as follows: <xmp> <timeline> <point n=1><point n=2><point n=3><point id=x4 n=4> </timeline> <u sp=A end=X4>This <point target=x1> is <point target=x2> my <point target=x3>turn </u> <u sp=B start=x1 end=x3>Balderd<point target=X2>ash</u> <u sp=C start=x2 end=x4>No it's mine </xmp> LB noted that there was some slight redundancy between the use of start or end attributes on the one hand and the presence of explicit point tags at the start and end of an utterance. The text of 3.2 should make clear what the implications of this were. There was some discussion of the point at which the stack of potential alignment points within a series of utterances could safely be popped (e.g. by a formatting application): it was felt that this would probably only be at the end of the surrounding <tag>div</tag>. <p> LB suggested reference should be made to the HYTIME proposals as well as to the work of TR2 on hypertext. <div2>Section 7: further points <p> In 7.9 (paralinguistic features) it was agreed that tempo, loud, etc would be best treated as pairs of milestone tags marking positions of prominence. LB proposed that single tags named tempo (etc).shift would be more appropriate, with the `end' tag of the pair being replaced by a shift to normal. <p> Under 7.8 (prosody) it was noted that pauses contained by an utterance should be marked by an entity reference; pauses between utterances should be marked by the <tag>pause</tag>. <p> Various methods of aligning parallel representations were touched on, inconclusively. It was agreed to refer to the issue but not propose a solution in any detail. <p> There was general agreement to AR's proposals for handling uncertainty of transcription. LB noted that P1 did not currently address any analogous feature in written texts (such as illegibility). SJ noted that 7.4 on speaker attribution was relevant to this issue. <p> Due to shortage of time, it was agreed that AR should circulate electronically a list of further points arising in section 7 of the working paper (this has now been done). <div2>Other further work <p> SJ noted that there was an urgent need for examples recoded according to the work groups proposals for inclusion in Appendix 3. LB & AR undertook to produce these, together with a dtd in the next ten days. The feasibility of automatic conversion between existing schemes should be investigated briefly, but there would not be enough time to do more than sketch illustrative examples. <p> AR suggested that further work was necessary for the handling of discontinuities and on the possible use of CONCUR. <p> SJ undertook to produce a new draft of the working paper by 16 October. <div1>Names and addresses of proposed reviewers <name> Dr Wallace Chafe <address> Linguistics Program Univ of California Santa Barbara, CA 93106 <name> Dr John DuBois <address> Linguistics Program Univ of California Santa Barbara, CA 93106 <name>Prof Dr Konrad Ehlich <address>Univ. Dortmund Fachbereich 15 Emil-Figge-Str. 50 D-4600 Dortmund 50 <name>Dr John Esling <address>Linguistics Dept Univ of Victoria British Columbia Canada <email>vqplot@uvvm.bitnet <name>Helmut Feldweg <address>ESF Second language Databank Max-Planck-Inst für Psycholinguistik Postbus 310 NL-6500 AH Nijmegen <email>helmut@hnympi51.bitnet <name>Dafydd Gibbon <address>Fak. für Ling u Lit Wiss., Univ. Bielefeld, P8640, D-4800 Bielefeld 1 <tel>+49 (521) 106 3509 <email>gibbon@lili11.uni-bielefeld.de <name>Prof Sidney Greenbaum <address>Survey of English usage University College Gower Street London <name>Dr John Gumperz <address>Anthropology Dept Univ of California Berkeley, CA 94720 <email>gumperz@cissus.berkeley.edu <name>Dr William Labov <address>Dept of Linguistics Univ of Pennsylvania Philadelphia PA <name>Dr Brian MacWhinney <address>Psychology Dept Carnegie-Mellon Univ Pittsburgh, PA 15213 </ldoc>