Minutes Of the Metalanguage and Syntax Committee Meeting Kingston, 4-5 March 1991 Lou Burnard Document Number: TEI ML M45 31 March 1991 Draft April 26, 1991 (16:48:22) Present: David Barnard (DTB), chair; David Durand (DD); Frank Tompa (FWT); Lou Burnard (LB); C. M. Sperberg-McQueen (MSM); Doug Hamilton (DH). Apologies for absence: Nancy Ide and Lynne Price. 1 AGENDA The agenda provided by DTB was accepted, save that item 3 (the Liter- ary Work Group Critique) was moved to the end. 2 MINUTES OF OXFORD MEETING (ML M33) Approved as a true record. 3 UPDATE ON OVERALL STATUS OF THE PROJECT MSM briefly recapped the current status of the various work groups which had been set up since the last meeting, as summarised by a recent posting on TEI-L. Most, though not all, work groups were still quies- cent. DTB asked what further input to the TEI Guidelines was anticipated from the committee. MSM replied that it was clear that a substantial reorganisation of the current P1 would be needed, probably into several publications. DTB suggested that papers on transduction methods, already a work item for the committee, might be helpful. MSM agreed and reminded the committee that the current chapter 8 also needed to be recast, per- haps as a "cookbook" on DTD manipulation. On software, LB reported that Yard Software had offered a special deal for MarkIt, their PC-based SGML parser. Price for an individual licence was £100, discounted 50% if bought in bulk. Note: The TEI Steering Committee has since agreed to release funds to pursue the bulk purchase option Martin Bryan of Yard was interested in developing a TEI-specific version of WriteIt, Yard's low-end data entry system and LB had supplied him with copies of the P1 dtds to pur- sue this possibility. FWT asked how such customization would cope with the possibility of extensions to the DTD: LB replied that the extension capability was not yet implemented in the DTDs. DH reported that some problems had been experienced in installing the beta test version of Electronic Book Technology's DynaText product, though these were believed to have been fixed in the first official release. This release was available to TEI participants at a discounted price of $2500. It was also noted that Software Exoterica's parser was available through GCA for under $100. 4 ACTION ITEMS FROM PREVIOUS MEETING Parenthesized numbers in this list refer to the numbered points in the minutes of the previous meeting. Parser Pitfalls Document (3): LAP had sent the required information to Wendy Plotkin. ACTION: MSM to follow up availability of LAP's notes outside Hewlett Packard Due: 13 March Document Review (5): All documents available in electronic form had now been placed on the UICVM server, including W12. Actions on ODA and on liaison with ANSI had been carried out. Discussion of the action on DTB to propose a package of documents to go with W32 was deferred. SGML declaration revision (8): Deferred, pending further discussion of the implications of disabling both SHORTTAG and OMITTAG. DTD Manipulator (10): The editors had not yet formulated their requirements in this area: DTB would report on some experiments with indirect DTDs later in the meeting. Software Assessment (11): LB apologised for having made no progress on this item Referral of user comments (12): Only one set of comments (Joan Smith's) had been referred to the committee so far. 5 CURRENT WORKING PAPERS An updated document register was circulated. W14 : SGML Bibliography DTB circulated advance copies of this document, now published as a Queen's University Technical Report, and briefly described how it had been produced. A smaller version was to be published in Literary and Linguistic Computing. There was some discussion as to the usefulness of continuing to add to the document in its current form. FWT noted that it might be helpful to distinguish items of relevance to TEI from items of general SGML interest. While it was generally felt that several items were of only ephemeral interest, the committee expressed its appreciation for the hard work done by the editors of the bibliogra- phy, in particular Robin Cover. It was agreed to review the status of the document as a means of providing information about current SGML pub- lications at the next meeting. W17 : House style for DTDs There was some debate as to the need for perpetuating the use of ver- sion numbers. The consensus was that such numbers should be used only for internal preliminary drafts: the 1992 TEI publication would not have a version number. DTB noted the absence of discussion of parameter enti- ties in the document and asked what other topics were missing. LB men- tioned an article on "The well dressed DTD" in a recent issue of <;TAG>; and offered to circulate a copy. It was agreed that the current document should be retired. W18 : SGML Technical Questions MSM reviewed his revisions to this discussion document. FWT felt it had a potentially wider audience than working groups in need of recom- mendations for dealing with well-recognised thorny problems such as overlapping segments. DD said that it was a toolbox document not suit- able for end users. A number of minor revisions were requested (all attribute values to be lower-cased and all ID values to be quoted; re- use the M1 action "Nino sits at table" and comment on this possibility; refer to Hytime work and any papers from TR3; rename x and y to some- thing less co-ordinate-like such as start and end; check that all exam- ples parse); the document should be closed once these had been carried out. ACTION: MSM To revise W18 Due: 30 April W22 : Notes on minimisation No progress had been reported. The action on JPG continued, with a revised duedate of 30 May. W25 : Parser pitfalls No progress reported. Once MSM had confirmed the availability of the existing draft from Hewlett Packard, LAP should be requested to produce a revised draft by 30 June. W26: Naming conventions All names (other than entity names) should use lowercase only, with the possible exception of phrasal names such as "partOfSpeech". By a narrow majority (3:2) the committee found this a less appalling prospect than hyphenating such names: wherever possible however phrasal names should be avoided. Last sentence of para 1 on page 3 should be removed. The mention of "cartesian products" needed expansion or removal. It was noted that the current indirection method privileged the English lan- guage by allowing for only one set of parameter entity names and the editors were requested to seek guidance from the SC as to the accept- ability of this, given the project's commitment to language indepen- dence. ACTION: LB,MSM To request guidance from SC as to acceptability of monocul- tural parameter entities Due: not specified The general rule for use of abbreviation should be to avoid it wherever possible and to use only abbreviations recognised in the field where it was not. The detailed suggestions in the document should be curtailed. With these minor revisions it was agreed that the document was complete. ACTION: MSM,DD to revise W26 as specified above Due: 30 April W30: Transduction examples No progress on a general document had been made. One example (con- verting LOB to TEI, using SED scripts) existed and had been circulated but not fully documented by Nick Duncan. There was some debate as to the feasibility of producing a general theoretical framework for trans- duction within the timescale of the project. It was agreed that produc- ing a number of illustrative examples would be a more realistic goal. These should document a specific transduction in prose, demonstrating the adequacy of TEI to represent a wide variety of schemes for inter- change and giving examples of useful general purpose techniques. FWT noted that such examples might not always be easy to produce. A toy (partial translation of a specific text) was easy; a sketch of notes towards a more general solution was more difficult; a generic clean solution was probably close to magic. LB reminded the meeting of the list of target encoding schemes already proposed in ML W12. After some discussion, it was agreed that nine specific transductions should be documented in nine separate working papers, as listed below. ACTION: DTB To draft W34 documenting the LOB transduction Due: 30 April ACTION: JPG To draft W35 documenting transduction of LaTeX and RTF Due: 30 April ACTION: NI To draft W36 documenting transduction of dictionary examples Due: 30 April ACTION: FWT To draft W37 documenting transduction of Wire service stories from the Ottawa Citizen ______________ Due: 30 April ACTION: LB To draft W38 documenting simple conversion of SGML to LaTeX with a Spitbol program Due: 30 April ACTION: LB To draft W39 documenting conversion program used for the Dic- ___ tionary of Old English corpus ______________________ Due: 30 April ACTION: LB, Harry Weitenberg To draft W40 documenting progress in trans- lating COCOA formatted dramatic texts using XTRAN Due: 30 April ACTION: DD To draft W41 documenting transduction of TLG format texts Due: 30 April ACTION: DTB To draft W42 documenting transduction used by Gary Simons for his "Bear goes fishing" example Due: 30 April It was agreed that these working examples would provide a more useful basis for a general solution than the proposed topics of W30, which was retired. W31 : ODA No progress reported. ACTION: DD To draft W31 on relevance of ODA to TEI Due: 30 June W32: Revision of ISO 8879 FWT noted that illustrative examples should be included in support of the requirements identified. MSM had more comments to make on the docu- ment. ACTION: MSM To circulate further proposals for inclusion in W32 Due: asap ACTION: DTB To revise W32 Due: 31 March 6 CONFORMANCE The issue of what exactly "TEI conformance" might entail had been raised in several quarters and was discussed at some length. A document might use its own simple scheme for representation of elements (a local storage or capture format); this could be mapped to an SGML conformant scheme using the corresponding TEI elements. For interchange a distinct character representation scheme was currently recommended by the Guide- lines (using the ISO 646 subset). For transmission a further packing or encoding might be employed. Such discussion of TEI conformance as exist- ed in the current draft did not clearly distinguish between character set conformance and DTD conformance. The character set work group would probably wish to make independent recommendations as to TEI character- set conformance. FT asserted, and it was agreed, that the notion of "Tei conformant software" was meaningless: TEI conformance related only to data. Soft- ware suppliers could be expected only to specify whether their software accepted or generated TEI- conformant documents of a specified type. LB asked what characterized a TEI-conformant element structure: was it necessary only that it should be parsable with a TEI dtd? DTB said that there was a required minimal element set (tei.1, tei.header, text etc), and also noted that use of some extension mechanisms might be pre- cluded. LB suggested that documentation of the level of tagging used in a text should also be required. FWT felt that it was unreasonable to require documentation of what had not been done in a text and might prove difficult to specify what had. There was some inconclusive discus- sion of how the presence of non-SGML markup affected conformance. It was felt that whatever could be simply converted to a TEI/SGML form (e.g. by using string substitution without look-ahead) should be converted. If a TEI tag existed for some marked-up feature of a text, then that tag should be used. DD proposed as a requirement that any non-SGML markup be documented in the header. FWT and MSM felt that identifying what consti- tuted "non-SGML markup" was not a simple issue. No consensus was reached on the topic, except that conformance relat- ed only to data (rather than to software) and that a more detailed and clearer presentation of the issues involved was necessary. A working paper (W43) was assigned to MSM, who proposed to address the following points in it: * character set conformance * changes in the SGML declaration (e.g. to delimiters, or in SGML fea- tures or quantities) * variant DTDs using (or not using) the extension mechanism * documentation of tag usage and of features present in text * treatment of non-SGML markup The editors were also asked to convey to the Steering Committee some sense of the issues involved and to canvas their views. ACTION: MSM To draft W43 on Conformance Due: 30 April ACTION: Editors To ask SC views on what TEI-conformance should entail Due: asap 7 LITERARY WORKGROUP "Critique" The content of this document was discussed. It was noted that defini- tions of technical terms were missing from the current draft Guidelines and should be supplied. DTB felt that the current draft was less biassed in favour of descriptive over presentational markup and was also less prescriptive than the Critique suggested. MSM noted that such bias as existed had arisen out of the TR committee's requirements. FWT suggested that the ability to reconstruct the appearance of an encoded original was a reasonable requirement which should not be overlooked. The commit- tee felt that the usability of e.g. the alignment mechanisms for multi- ple literary analyses had been overlooked, and agreed that this and oth- er general purpose mechanisms discussed in chapter 6 of P1 needed to be more clearly and accessibly documented. 8 DTD MANIPULATOR DH presented the work he and DTB had so far done on implementing the indirection mechanisms for DTD manipulation. Rather than a DTD for DTDs they had simply generated an indirect DTD from the existing text. The following supportable facilities and their interactions were discussed: * rename elements and attributes * add new elements * clone an element * add new attributes to an element * change content model of an element * change declared attribute value * add new global attributes * change contents of %soup and %broth DH asked whether comments in the DTD should be preserved. It was agreed that the present chapter eight would need elaboration. The indirected DTDs would not be published in the text, but might be distributed elec- tronically, with a number of tables to create language specific DTDs. The indirection algorithms would be documented and C source code provid- ed. ACTION: DTB To provide DTD manipulator as outlined above in electronic form Due: 30 April 9 DATE OF NEXT MEETING It was agreed that no date should be set for the next meeting until some of the assigned work papers were completed. A meeting in late sum- mer or early autumn was likely. Draft April 26, 1991 (16:48:22)