Minutes of the Meeting
            Of the Metalanguage and Syntax Issues Committee
    Held at the University of Illinois at Chicago, March 9-11, 1990
 
                              Lou Burnard
                      Document Number:  TEI MLM23
                             22 March 1990
 
Present:  David Barnard, chair (DB); Lou Burnard (LB); David Durand
(DD); Jean-Pierre Gaspart (JPG); Nancy Ide (NI); Lynne Price (LAP);
Michael Sperberg-McQueen (MSM); Frank Tompa (FT)
 
                         Final, April 24, 1990
 
 
 
                         INTRODUCTORY BUSINESS
 
   DB welcomed the committee and apologized for the hiatus in committee
progress consequent on an unanticipated increase in his administrative
burdens at Queens.  He presented an agenda [MLA21] for the meeting which
was accepted.
 
                    1.  MINUTES OF PREVIOUS MEETING
 
   These were accepted with minor corrections as marked.
 
                          2.  DOCUMENT REVIEW
 
   The committee proceeded to a review of the ten documents listed on
the agenda.
 
MLA11 Statement of Work (misnumbered as MLW11 in the agenda)
 
   The introductory sections were accepted.  Some rephrasing of the sec-
tions on mapping between TEI and exogenous schemes were felt to be nec-
essary.  The possibility of information loss when going from TEI into a
comparatively impoverished coding should be explicitly stated.  FT noted
that, given a mapping f into TEI and another mapping g out of TEI, while
f(g(x)) would not be equal in information content to x, g(f(x)) always
should.  JPG felt that if it was stated that one mapping was a filter-
ing, it should also be stated that the other was a default.  DB agreed
to draft a few sentences to clarify, which were presented to and accept-
ed by the committee on the following day.
 
 
MLW12 List of Encoding Schemes
 
   There was some discussion of the categorization used by the document.
Items listed had been chosen both for their typicality and their gener-
ality, which needed to be stated more clearly in the text.  The document
had been intended to suggest likely work items for the subgroup rather
than be prescriptive in any sense.  In many cases inclusion in list A
rather than list B was entirely arbitrary -- if two schemes were effec-
tively identical with respect to the tools needed to translate them,
either of them might be in list A.  FT suggested that items in B should
be related explicitly to the items in A which provided the tools
required for them.  LAP suggested that a further category D for schemes
which the committee felt it fruitless even to consider might be useful:
this was not agreed.  LB suggested that Microsoft RTF might profitably
be moved from list B to list A.  It was agreed that the list should be
regarded solely as suggestions for initial work items for the Transla-
tion Subgroup.  LB undertook to make minor changes requested in its
wording.  If FT and NI chose to revise it substantially, it would be
renumbered.
 
MLW13 Guidelines
 
   Most of this was accepted with minor amendments to the wording and
revisions in the layout.  It was felt that a four column table should be
presented at the end and that only two indications were needed for each
SGML feature listed -- whether or not it was recommended for local use
and whether or not it was recommended for interchange.  A YES recommen-
dation indicated that committees might use the feature, not that it was
required.  It should be made clearer that different DTDs might be used
for interchange and for local use.  DB was also requested to remove
redundant quotation marks, tighten up the wording and flatten the struc-
ture of the document.
 
   There were a number of errors in the draft SGML declaration, of both
transcription and fact, which MSM agreed to fix after consulting with
JPG.
 
   There was much discussion about the consequences of imposing con-
straints additional to those supported by an SGML parser.  Both because
granularity of constraints enforceable by an SGML parser was too coarse,
and because some TEI constraints might relate to data content, a TEI
validator might be necessary.  FT asked whether such a validator would
have access only to the ESIS (entity structure/information structure as
output by the parser) of a document or to the document instance itself.
An inconclusive straw vote was taken on the first day of the meeting.
On the second day FT argued the case for TEI-conforming (as opposed to
validating) applications which would not need the full power of an  SGML
parser and could therefore be cheaper and quicker.  A TEI-validating
application would also need to duplicate much of the functionality of
the parser.  FT recommended that in considering restrictions additional
to those of SGML, we should bear in mind the additional cost of a TEI
validator, which should clearly be much reduced by the presence of an
SGML parser.  Lexical constraints beyond SGML must be accompanied by
algorithms for their enforcement; JPG added that such algorithms should
not change the semantics or style of SGML.
 
   There was some discussion of character set issues.  MSM reported
briefly on the proposals made at the Text Representation committee of
the previous week to recommend use solely of the non-national IRV char-
acters which formed the intersection of the ASCII and EBCDIC sets (the
so-called "cosmopolitan" character set).  It was agreed for now to rec-
ommend the use of the reference concrete syntax only (even though this
would imply usage of the non-cosmopolitan ("parochial"?)  characters ! [
and ] (exclamation mark and square brackets).
 
   JPG brought to the committee's attention the existence of a recently
published ISO Standard for an SGML Document Interchange Format (SDIF,
ISO 9069).  This was accepted as a new work item.  DB read the document
overnight and reported that it provided useful organizational principles
for the packaging the various entities involved in document interchange,
and of dealing with presentational problems.  There was no software
available to support it.  Despite some reservations over the standard's
apparent espousal of the escape-sequence mechanism in presentational
problems, the committee agreed that a working paper on the use of SDIF
or something similar within TEI should be produced.
 
      DB To draft WP on SDIF
      Due:  no date

LB opined that money might be forthcoming from the British Library to
fund a research project into its applicability.
 
MLW14 Bibliography
 
   DB circulated a copy of the formatted bibliography.  This would be
available as a series of SGML tagged files from the LISTSERV in the near
future and also as a Queens Technical Report.  An abbreviated form was
to be published in the journal Literary & Linguistic Computing.  Correc-
tions and additions should be sent to DB who would maintain the database
on the Listserv.
 
MLW15 Introductory Guide to SGML
 
   LB had produced an initial draft of some parts of this and had also
reviewed other existing introductory materials.  He asked for sugges-
tions for topics to be included and undertook to circulate a draft for
comment in the near future.  JPG suggested that concurrent document
structures should be presented as the norm with singly hierarchic docu-
ments and their tagging as a further refinement, rather than the reverse
as is normal current practice.
 
MLW18 SGML Technical Problems
 
   The committee reviewed this discussion paper, and proposed several
detailed amendments.
 
      MSM To revise MLW18
      Due:  no date
 
MLW19 Document List
 
   Documents W8, W9 and W16 were retired.  W11 was renumbered to A11.
The new drafts of W18 and W13, once produced, would be renumbered as P18
and P13 respectively, to indicate their public status.
 
MLW20 SGML-Aware Software
 
   LAP commented that the goal of this paper was not so much to explain
why SGML parsers were necessary as to warn about the difficulties of
doing without them.  SGML was not a language for which classic computing
approaches worked:  it would therefore be useful to warn people of the
potential pitfalls they would encounter when developing ad hoc parsing
software.  The need for such a document was recognized and JPG and LAP
agreed to work on it.  It was noted that some restrictions implied by
parsing difficulties might additionally be noted in ML W13.
 
      LAP, JPG To draft Parser Pitfalls Document
      Due:  eventually
 
        3.  OUTSTANDING ACTIONS FROM MINUTES OF PREVIOUS MEETING
 
Minimization features
 
   DB had been actioned to produce a document giving guidance on the use
of minimization.  MSM explained that in an early draft of the Commit-
tee's report, its previous chair had listed a number of problems result-
ing from omittag minimization, all of which could be circumvented by
some simple guidelines.  Although not our main responsibility, provision
of these guidelines would probably be useful advice for DTD writers and
others.  This lead to a lengthy discussion:  there was disagreement both
as to whether other committees should concern themselves with minimiza-
tion at all, (assuming that they were even writing DTDs or parts of
them) and as to whether we should concern ourselves with DTDs relating
to data capture (or input) only.  It was agreed that a suggestion should
be passed to the steering committee for a taskforce to consider data
capture methods during the second cycle.  After some further discussion
of its likely contents, the need for a document providing guidelines for
TEI writers was agreed.  It should make clear that minimization was not
a cause for concern and that data capture was independent of structure.
JPG suggested input of tabular material would form a good exemplary
illustration and agreed to produce a short paper.
 
      JPG To draft W22:  minimization and data capture
      Due:  24 March 90
 
W17 DTD style guidelines
 
The committee reviewed at length DD's initial draft proposals.  It was
agreed that guidelines on the layout of tagged document instances were
inappropriate and should not be included.  JPG noted that if we wished
to define a standard format for such a purpose, generating it would be
the responsibility of a particular application, a "beautifier".  DB not-
ed that most examples to be included in the June report would in any
case be irredeemably ugly as a result of their innate complexity.  The
committee agreed that where possible all departures from standard TEI
DTDs should be implemented by a single entity reference, thus:
 
         <!DOCTYPE thing PUBLIC "tei-nnn" [
                   <!entity % localmods SYSTEM "c:\project\local.dtd">
         <!--
              local mods standard for the project included in the
              file local.dtd will be implemented by the call at the end
              of this example
 
              any other mods unique to this doc go after this comment
          -->
         <!element p - - (#PCDATA)
              -- redefine p element for this document only --      >
 
         %localmods;
 
         ]>
 
   After some discussion, the committee also agreed that declarations of
related objects should be physically adjacent in a DTD:  FT pointing out
that the grouping might preserve information which would be lost by
reordering the objects as an alphabetical sequence within type.  It was
also agreed that indexes of names, tree structure diagrams should be
suggested as helpful ways of conveying the structure of a large complex
DTD.  Comments should be provided on everything.
 
   DD and MSM agreed to provide a detailed set of proposals for naming
conventions.  To assist them, the following short list of issues to be
addressed was compiled:
 
*   Structuring the name space, semantically or hierarchically
*   names in different name spaces (e.g. attributes)
*   case sensitivity
*   abbreviation conventions
*   length
*   Hamming distance (aka confusibility)
*   natural language
*   translatability
*   part of speech
*   constituent characters (dots, underscore etc)
*   id/idref conventions

There was some debate as to whether any taxonomy of attributes existed
and if so whether it should be reflected in the name space.
 
      DD, MSM to propose naming conventions
      Due:  30 March 90
 
                     4.  TAG SET FOR TEI DOCUMENTS
 
   DB and MSM had agreed that the TEI Guidelines and its internal docu-
mentation would so far as possible use the tag set described in Annex E
of ISO 8879, augmented by the editors.  MSM would convert documents pre-
pared in this way for use with the GML formatter; DB would convert to
the LaTeX formatter, for which he had a SED script.  NI asked what for-
matter was available for Macintosh systems:  LB suggested that Author
Editor or XGML could be used.
 
              5.  SPECIFYING AND IMPLEMENTING TRANSLATIONS
 
   This item was deferred to a smaller meeting (see section 7 below)
 
                 6.  ML CONTRIBUTIONS TO THE TEI REPORT
 
   MSM gave a brief overview of the structure of the report as presented
in section 7.2 of EDP1.  The Guidelines would be in three parts, discur-
sive prose, an alphabetic list of tags and attributes, and indexes and
illustrative material.  The other committees were producing drafts which
the editors would need to have complete by 1st April so as to produce an
initial first draft for the steering committee by 1st May.
 
   DB pointed out that producing Appendix 2 on translation could not be
done until the target tagset was specified:  MSM agreed to forward cop-
ies of the currently available feature lists from section 6 immediately
to DB, FT, NI and JPG.
 
   There was some discussion of section 3, for which FT proposed the
revised title "Data modelling and representation".  It was agreed that
most of the text for 3.1 would be taken from MLW15 and most of 3.2 from
MLW 13.  The current list of contents for 3.3 was felt to be largely
inappropriate, given the statement in MLW13 that SGML would be used as
the basis for TEI work.  There was much debate as to the proper place to
locate non-SGML declarations such as "this is my morphological analysis
based on x y and z as modified by q".  It was agreed that such declara-
tions were effectively part of the content of a document, though it
might be desirable to distinguish them as a marked section.  A tag such
as TEI-INFO might be used, with a variety of constituents such as
SYNTACTIC-INFO, MORPHOLOGICAL-INFO etc.  Specifying these was generally
felt to be more within the prerogative of the Documentation committee.
 
   In discussing section 9, it became apparent that there had been no
explicit reference in the document draft to the creation of TEI DTDs and
their use.  It was therefore agreed to rename this section "Using the
Guidelines" and to add a new section on Building a DTD.  Several models
for ways in which feature lists could be assembled into TEI DTDs were
proposed (the chinese menu, the pizza topping, the table d'ho^te...) of
which the Pizza Model was preferred.  This model has a base into which
any number of arbitrarily complex structures may be incorporated as
inclusion exceptions.  When committee drafts propose aggregates (sub-
trees, crystals, toppings) they can thus be included as appropriate.  A
description of the model would be drafted for this section by DB and NI,
within a week.
 
      NI, DB To draft new section 9.1
      Due:  17 March 90
 
                           7.  OTHER BUSINESS
 
   There was no further business, and no date for the next plenary meet-
ing was proposed.  It was agreed that the Translation Workgroup would
meet to plan its remaining work for Appendix 2 on March 11, 1990.
 
                                                   Final, April 24, 1990