Report on Technical Seminar in Lisbon

C. M. Sperberg-McQueen

24 July 1994

The editors of the TEI have just returned from Lisbon, where they gave a two-and-a-half day technical seminar or workshop on SGML and the TEI for a research group at the New University, with interests focused on terminological databases, dictionaries, and corpus-based lexicography and lexicology. For the work group, the workshop represented a chance to learn more about SGML and the Text Encoding Initiative. For the editors, it provided an opportunity to experiment with the design of a multi-day workshop for specialist users.

On the first afternoon of the workshop, after clarifying organizational questions, was devoted to an introduction to SGML and to the overal architecture of the TEI. We began 'medias in res' by examining together two sample SGML documents (a very simple 'Hello, world!' document, which we examined in an ASCII editor, and a more realistic, though still relatively simple, document in TEI markup, containing the proposed syllabus for the workshop, which we examined in both an ASCII editor and in an SGML editor which provided nice on-screen formatting for the document). Following this, LB presented some basic philosophical and practical observations on the nature and necessity of markup, and the design and goals of the TEI document type definitions. MSM then outlined the process of document analysis with which any serious project in electronic text encoding needs to begin, and we ended the day by applying that process to a sample text from the text-base being built in Lisbon for terminological work.

On the second day, we began with a description of corpus construction and TEI facilities for corpus planning and documentation, with a side view on linguistic annotation as practiced by current corpora such as the British National Corpus. The rest of the morning went to a survey of the TEI tag set for terminological databases, which is the basis of current work in ISO Technical Committee 37, and of fairly direct relevance to the work of the host research group. After lunch, we continued by tagging, with TEI markup, a portion of the document we had analysed the previous afternoon, and then the participants in the seminar were turned loose on machines equipped with Author/Editor and a selection of pre-compiled versions of the TEI DTD.

The final day of the workshop was devoted to dictionary encoding, with examples from Portuguese and French dictionaries, to a demonstration of SARA, the SGML-aware interactive concordance software being developed for the British National Corpus, and to more hands-on work. We finished with a plenary discussion of the workshop, in which the participants gave the editors a number of very useful suggestions, which will, we hope, benefit participants in future workshops.

We thank the research group and in particular Prof. Theresa Lino for their invitation and kind hospitality --- and for their patience with our non-existent Portuguese and imperfect French (on average, that is: LB's French is impeccable, MSM's is, well, highly peccable). Thanks are also due to Softquad, for a set of temporary licenses for Author/Editor, and to the British National Corpus for authorizing the demonstration of SARA.

Research groups, professional societies, or others interested in organizing workshops on the use of SGML and the TEI in their particular fields should contact the editors. We are in the process of preparing a workshop this fall, in which we will prepare a number of individuals to teach such workshops, and we hope to be in a position to accommodate as many such requests as humanly possible in the next couple of years.