Minutes of the Meeting Of the Metalanguage and Syntax Issues Committee Held at the University of Illinois at Chicago, March 9-11, 1990 Lou Burnard Document Number: TEI MLM23 22 March 1990 Present: David Barnard, chair (DB); Lou Burnard (LB); David Durand (DD); Jean-Pierre Gaspart (JPG); Nancy Ide (NI); Lynne Price (LAP); Michael Sperberg-McQueen (MSM); Frank Tompa (FT) Final, April 24, 1990 INTRODUCTORY BUSINESS DB welcomed the committee and apologized for the hiatus in committee progress consequent on an unanticipated increase in his administrative burdens at Queens. He presented an agenda [MLA21] for the meeting which was accepted. 1. MINUTES OF PREVIOUS MEETING These were accepted with minor corrections as marked. 2. DOCUMENT REVIEW The committee proceeded to a review of the ten documents listed on the agenda. MLA11 Statement of Work (misnumbered as MLW11 in the agenda) The introductory sections were accepted. Some rephrasing of the sec- tions on mapping between TEI and exogenous schemes were felt to be nec- essary. The possibility of information loss when going from TEI into a comparatively impoverished coding should be explicitly stated. FT noted that, given a mapping f into TEI and another mapping g out of TEI, while f(g(x)) would not be equal in information content to x, g(f(x)) always should. JPG felt that if it was stated that one mapping was a filter- ing, it should also be stated that the other was a default. DB agreed to draft a few sentences to clarify, which were presented to and accept- ed by the committee on the following day. MLW12 List of Encoding Schemes There was some discussion of the categorization used by the document. Items listed had been chosen both for their typicality and their gener- ality, which needed to be stated more clearly in the text. The document had been intended to suggest likely work items for the subgroup rather than be prescriptive in any sense. In many cases inclusion in list A rather than list B was entirely arbitrary -- if two schemes were effec- tively identical with respect to the tools needed to translate them, either of them might be in list A. FT suggested that items in B should be related explicitly to the items in A which provided the tools required for them. LAP suggested that a further category D for schemes which the committee felt it fruitless even to consider might be useful: this was not agreed. LB suggested that Microsoft RTF might profitably be moved from list B to list A. It was agreed that the list should be regarded solely as suggestions for initial work items for the Transla- tion Subgroup. LB undertook to make minor changes requested in its wording. If FT and NI chose to revise it substantially, it would be renumbered. MLW13 Guidelines Most of this was accepted with minor amendments to the wording and revisions in the layout. It was felt that a four column table should be presented at the end and that only two indications were needed for each SGML feature listed -- whether or not it was recommended for local use and whether or not it was recommended for interchange. A YES recommen- dation indicated that committees might use the feature, not that it was required. It should be made clearer that different DTDs might be used for interchange and for local use. DB was also requested to remove redundant quotation marks, tighten up the wording and flatten the struc- ture of the document. There were a number of errors in the draft SGML declaration, of both transcription and fact, which MSM agreed to fix after consulting with JPG. There was much discussion about the consequences of imposing con- straints additional to those supported by an SGML parser. Both because granularity of constraints enforceable by an SGML parser was too coarse, and because some TEI constraints might relate to data content, a TEI validator might be necessary. FT asked whether such a validator would have access only to the ESIS (entity structure/information structure as output by the parser) of a document or to the document instance itself. An inconclusive straw vote was taken on the first day of the meeting. On the second day FT argued the case for TEI-conforming (as opposed to validating) applications which would not need the full power of an SGML parser and could therefore be cheaper and quicker. A TEI-validating application would also need to duplicate much of the functionality of the parser. FT recommended that in considering restrictions additional to those of SGML, we should bear in mind the additional cost of a TEI validator, which should clearly be much reduced by the presence of an SGML parser. Lexical constraints beyond SGML must be accompanied by algorithms for their enforcement; JPG added that such algorithms should not change the semantics or style of SGML. There was some discussion of character set issues. MSM reported briefly on the proposals made at the Text Representation committee of the previous week to recommend use solely of the non-national IRV char- acters which formed the intersection of the ASCII and EBCDIC sets (the so-called "cosmopolitan" character set). It was agreed for now to rec- ommend the use of the reference concrete syntax only (even though this would imply usage of the non-cosmopolitan ("parochial"?) characters ! [ and ] (exclamation mark and square brackets). JPG brought to the committee's attention the existence of a recently published ISO Standard for an SGML Document Interchange Format (SDIF, ISO 9069). This was accepted as a new work item. DB read the document overnight and reported that it provided useful organizational principles for the packaging the various entities involved in document interchange, and of dealing with presentational problems. There was no software available to support it. Despite some reservations over the standard's apparent espousal of the escape-sequence mechanism in presentational problems, the committee agreed that a working paper on the use of SDIF or something similar within TEI should be produced. DB To draft WP on SDIF Due: no date LB opined that money might be forthcoming from the British Library to fund a research project into its applicability. MLW14 Bibliography DB circulated a copy of the formatted bibliography. This would be available as a series of SGML tagged files from the LISTSERV in the near future and also as a Queens Technical Report. An abbreviated form was to be published in the journal Literary & Linguistic Computing. Correc- tions and additions should be sent to DB who would maintain the database on the Listserv. MLW15 Introductory Guide to SGML LB had produced an initial draft of some parts of this and had also reviewed other existing introductory materials. He asked for sugges- tions for topics to be included and undertook to circulate a draft for comment in the near future. JPG suggested that concurrent document structures should be presented as the norm with singly hierarchic docu- ments and their tagging as a further refinement, rather than the reverse as is normal current practice. MLW18 SGML Technical Problems The committee reviewed this discussion paper, and proposed several detailed amendments. MSM To revise MLW18 Due: no date MLW19 Document List Documents W8, W9 and W16 were retired. W11 was renumbered to A11. The new drafts of W18 and W13, once produced, would be renumbered as P18 and P13 respectively, to indicate their public status. MLW20 SGML-Aware Software LAP commented that the goal of this paper was not so much to explain why SGML parsers were necessary as to warn about the difficulties of doing without them. SGML was not a language for which classic computing approaches worked: it would therefore be useful to warn people of the potential pitfalls they would encounter when developing ad hoc parsing software. The need for such a document was recognized and JPG and LAP agreed to work on it. It was noted that some restrictions implied by parsing difficulties might additionally be noted in ML W13. LAP, JPG To draft Parser Pitfalls Document Due: eventually 3. OUTSTANDING ACTIONS FROM MINUTES OF PREVIOUS MEETING Minimization features DB had been actioned to produce a document giving guidance on the use of minimization. MSM explained that in an early draft of the Commit- tee's report, its previous chair had listed a number of problems result- ing from omittag minimization, all of which could be circumvented by some simple guidelines. Although not our main responsibility, provision of these guidelines would probably be useful advice for DTD writers and others. This lead to a lengthy discussion: there was disagreement both as to whether other committees should concern themselves with minimiza- tion at all, (assuming that they were even writing DTDs or parts of them) and as to whether we should concern ourselves with DTDs relating to data capture (or input) only. It was agreed that a suggestion should be passed to the steering committee for a taskforce to consider data capture methods during the second cycle. After some further discussion of its likely contents, the need for a document providing guidelines for TEI writers was agreed. It should make clear that minimization was not a cause for concern and that data capture was independent of structure. JPG suggested input of tabular material would form a good exemplary illustration and agreed to produce a short paper. JPG To draft W22: minimization and data capture Due: 24 March 90 W17 DTD style guidelines The committee reviewed at length DD's initial draft proposals. It was agreed that guidelines on the layout of tagged document instances were inappropriate and should not be included. JPG noted that if we wished to define a standard format for such a purpose, generating it would be the responsibility of a particular application, a "beautifier". DB not- ed that most examples to be included in the June report would in any case be irredeemably ugly as a result of their innate complexity. The committee agreed that where possible all departures from standard TEI DTDs should be implemented by a single entity reference, thus: %localmods; ]> After some discussion, the committee also agreed that declarations of related objects should be physically adjacent in a DTD: FT pointing out that the grouping might preserve information which would be lost by reordering the objects as an alphabetical sequence within type. It was also agreed that indexes of names, tree structure diagrams should be suggested as helpful ways of conveying the structure of a large complex DTD. Comments should be provided on everything. DD and MSM agreed to provide a detailed set of proposals for naming conventions. To assist them, the following short list of issues to be addressed was compiled: * Structuring the name space, semantically or hierarchically * names in different name spaces (e.g. attributes) * case sensitivity * abbreviation conventions * length * Hamming distance (aka confusibility) * natural language * translatability * part of speech * constituent characters (dots, underscore etc) * id/idref conventions There was some debate as to whether any taxonomy of attributes existed and if so whether it should be reflected in the name space. DD, MSM to propose naming conventions Due: 30 March 90 4. TAG SET FOR TEI DOCUMENTS DB and MSM had agreed that the TEI Guidelines and its internal docu- mentation would so far as possible use the tag set described in Annex E of ISO 8879, augmented by the editors. MSM would convert documents pre- pared in this way for use with the GML formatter; DB would convert to the LaTeX formatter, for which he had a SED script. NI asked what for- matter was available for Macintosh systems: LB suggested that Author Editor or XGML could be used. 5. SPECIFYING AND IMPLEMENTING TRANSLATIONS This item was deferred to a smaller meeting (see section 7 below) 6. ML CONTRIBUTIONS TO THE TEI REPORT MSM gave a brief overview of the structure of the report as presented in section 7.2 of EDP1. The Guidelines would be in three parts, discur- sive prose, an alphabetic list of tags and attributes, and indexes and illustrative material. The other committees were producing drafts which the editors would need to have complete by 1st April so as to produce an initial first draft for the steering committee by 1st May. DB pointed out that producing Appendix 2 on translation could not be done until the target tagset was specified: MSM agreed to forward cop- ies of the currently available feature lists from section 6 immediately to DB, FT, NI and JPG. There was some discussion of section 3, for which FT proposed the revised title "Data modelling and representation". It was agreed that most of the text for 3.1 would be taken from MLW15 and most of 3.2 from MLW 13. The current list of contents for 3.3 was felt to be largely inappropriate, given the statement in MLW13 that SGML would be used as the basis for TEI work. There was much debate as to the proper place to locate non-SGML declarations such as "this is my morphological analysis based on x y and z as modified by q". It was agreed that such declara- tions were effectively part of the content of a document, though it might be desirable to distinguish them as a marked section. A tag such as TEI-INFO might be used, with a variety of constituents such as SYNTACTIC-INFO, MORPHOLOGICAL-INFO etc. Specifying these was generally felt to be more within the prerogative of the Documentation committee. In discussing section 9, it became apparent that there had been no explicit reference in the document draft to the creation of TEI DTDs and their use. It was therefore agreed to rename this section "Using the Guidelines" and to add a new section on Building a DTD. Several models for ways in which feature lists could be assembled into TEI DTDs were proposed (the chinese menu, the pizza topping, the table d'ho^te...) of which the Pizza Model was preferred. This model has a base into which any number of arbitrarily complex structures may be incorporated as inclusion exceptions. When committee drafts propose aggregates (sub- trees, crystals, toppings) they can thus be included as appropriate. A description of the model would be drafted for this section by DB and NI, within a week. NI, DB To draft new section 9.1 Due: 17 March 90 7. OTHER BUSINESS There was no further business, and no date for the next plenary meet- ing was proposed. It was agreed that the Translation Workgroup would meet to plan its remaining work for Appendix 2 on March 11, 1990. Final, April 24, 1990