[Note: Words noted in Italics in the printed version of the Minutes have
       been enclosed in *...* for this electronic version.- WP,
       TEI]
 
 
TEI AI7M12
==========
 
Meeting II of TEI AI7 Terminology, 1991-08-13/14                                          Version 1,
1991-09-01
at Oak Ridge National Laboratory in Oak Ridge Tennessee.
 
In attendance:
Dr. Alan K. Melby                          Chair
Dr. Gerhard Budin                          Members
Dr. Richard A. Strehlow
Dr. Sue Ellen Wright
Dr. Gregory Shreve                         Co-opted member
Dr. Michael Sperberg-McQueen               TEI Editor
Dr. Leland D. Wright                       Observer
 
Special Guests:
Dr. James Mason
Thomas O. Tallant
 
The committee wishes to express its thanks to Dr. Strehlow and to Oak Ridge
National Laboratory for their hospitality in making space available for these
meetings.
 
Procedural Comment
-------------------
 
Whereas the minutes of the first meeting reflect the ongoing development of
the reasoning of the WG, the minutes of this second meeting will serve to
document the development of a specific set of procedures governing the AI7
TEI.TERM exchange format.  The minutes themselves should be viewed as a kind
of developmental guide to a series of sub-documents that will eventually form
the core of a formal report.  These sub-documents are identified in the
minutes and cross-referenced wherever necessary.  Readers are encouraged to
move back and forth between these text segments in order to formulate an
overall picture of the proposed exchange format.
 
AI7 Sub-documents (included in AI7W13)
------------------
 
0.      Introductory Document
1.      Terminology
2.      Basic Structure of the termEntry
3.      Inter- and Intra-termEntry Links
4.      termEntry Tags, Attributes and Attribute Values
5.      Writing System Declarations and the *lang* Attribute
6.      AI7 DTD
7.      Sample <termEntry>s
 
 
Review of the Minutes of the Cleveland Meeting
----------------------------------------------
 
The first item on the agenda consisted in examining the minutes of the
previous meeting in Cleveland, Ohio (hereinafter called CLE Minutes).  The
discussion items are listed succinctly below cross-referenced by page and
paragraph number as they appear in the revised minutes (Version 4).
 
It was noted that the footnotes included in the revised minutes of 1991-07-29
should be moved to an addendum section in order to distinguish them from the
full text, which represents the actual content of the Cleveland meeting.
 
Shreve noted that "SGML compatible" should read "can be easily convertible to
SGML-conformant."  See Terminology Section and Item 7 below.
 
Page 3, Paragraph 4, l. 3 should read "AI6 on Computational Lexicons."
 
Page 8, Paragraph 3, l. 2 should read "ISO 639, Part II."
 
Page 9, header: the number "10" should be omitted.
 
The minutes were approved as corrected.  A revised version of the minutes will
be provided to the TEI Chicago office.
 
 
Other document-related comments
-------------------------------
 
The request was made that TEI.Term (AI7) documents be specifically forwarded
to the members of the dictionary committee.  Sperberg-McQueen agreed to
facilitate this request.
 
Related Reports
---------------
 
Dr. James Mason, convener of Working Group 8 on Text Description and
Processing Languages, ISO/IEC JTC1, Subcommittee 18 for Text and Office
Systems, reported that the SGML standard is entering the 5-year review
process.  Mason is a member of the Advanced Publications Technology Section of
the Publications Division of Oak Ridge National Laboratory.  He heads an SGML
team of four.  Together with Thomas O. Tallant, leader for SGML development
and application, Mason arranged for a demonstration of SGML parsers in both
the Macintosh and the PC environments.
 
Sperberg-McQueen promised to provide the WG with a discussion paper by Wilson,
"Rendering Hypertext Documents in SGML."
 
Melby provided the WG with a copy of the document "Computational Model of the
Dictionary Entry, Preliminary Report" authored by Calzolari, Peters and
Roventini.  The group expressed great interest in the progress of the
dictionary WG and registered a desire to see concrete examples of dictionary
entries to accompany the abstract templates provided in the cited paper.
 
Primary Discussion Items
------------------------
 
The initial examination of the CLE Minutes produced the following list of
issues for discussion:
 
1)      The basic structure of the <termEntry> as evidenced on pp. 8 & 10 of
        the CLE Minutes
 
2)      Normalization levels (degrees of normalization)
 
3)      <front> ... </front>
 
4)      Possible elimination of *xrefs*
        Inter- and intra-termEntry links
 
5)      Openness vis-a-vis the tag set & procedures for "opening" the set vs.
        declaring most data categories to be attribute values
 
6)      Floating vs. non-floating (unbound vs. bound) data categories
 
7)      "Easily convertible to SGML-conformant"
 
8)      TR1 & lang attribute (ISO 646).
 
9)      The ISO language codes: use of 2 vs 3-letter codes (ISO 639)
 
Item 7
------
The locus of the discussion concerning "easily convertible" related to the
long-standing use in some quarters of "SGML-compatible" to mean "easily
convertible to SGML."  Shreve correctly pointed out that "compatible" in
standard computer parlance does indeed mean "fully conformant."  The group
agreed to deprecate the term "compatible" as ambiguous and to note preference
for "conformant" over "conforming."  No specific term was found for "easily
convertible," which shall remain the approved formulation for describing this
type of format.
 
Items 3 and 9
-------------
 
The questions of front matter and whether to use 2 or 3 letter *lang*
codes were left unresolved during this meeting.  (See Addendum to these
minutes.)
 
Items 1-2 - Structure and Levels of Normalization
--------------------------------------------------
 
Items 1-2 have been incorporated into "AI7 Sub-document 2: Basic Structure of
the <termEntry>".
 
The discussion of the <termEntry> structure and the levels of normalization to
be provided within the TEI environment centered on one basic question: Where
should one posit the major conversion effort, on the side of the importer or
on the side of the exporter?  McQueen argued convincingly that the interchange
format must be highly normalized (which in essence means deeply nested) in
order to facilitate maximum flexibility on the import side.  While
acknowledging this concern, the WG expressed their sociological concerns for
the ultimate acceptance of the interchange format.  It is the consensus of the
group that the interchange format must also offer optimum flexibility on the
export side so that potential users will not be put off by an interchange
format that "doesn*t look much like" their current local applications.  As
noted in Sub-document 2, these complementary concerns dictate the necessity
for a multi-level interchange environment, wherein documents created in local
applications can be imported to TEI.TERM in a so-called "flat" form that
closely resembles the format of the source application and then subjected to
iterative conversion passes to reorganize the <termEntry> to conform to the
fully nested TEI.TERM model.
 
Item 4
-------
 
The item "inter- and intra-termEntry links" was originally included under the
discussion of the basic <termEntry> structure, but it has been moved to
combine with Item 4 because the two items address essentially the same
concern.
 
Sub-document 3 outlines the final position reached by the WG on inter- and
intra-termEntry linkage.  Inter-termEntry linkage is solved easily by the
id/xref/target capability built into TEI.  By the same token, linkage within
fully normalized nested <termEntry>s is solved by virtue of the rules of
adjacency and basic embedding mechanisms.  Designing a mechanism for intra-
termEntry linkage in flat <termEntry>s was a more difficult task.
 
Sperberg-McQueen pointed out that of the two <termEntry> types defined by the
committee (normalized and flat), the normalized <termEntry> can be readily
processed in SGML, whereas the flat <termEntry> must utilize pointers that are
currently not a part of SGML.  The pointer method lacks existing names,
constructs and hierarchies within SGML to support it.  In the nested
<termEntry>, "order" within the document is predictable within a certain set
of controlled parameters, whereas in the flat <termEntry>, one only knows that
the order of the elements is some permutation of the total number of elements
included in that <termEntry>.  (See the Addendum to these minutes for further
comment on the order of data categories within the <termEntry>.)
 
The essential factor in this argument is that the more constrained the
interchange format, the more powerful it becomes as an import-export tool.
The group is generally agreed that the flat <termEntry> represents a sort of
"half-way house" solution, i.e. that flat <termEntry>s must be converted to
nested <termEntry>s before they can be used to import data into other
applications.
 
The tagId/xref mechanism suggested in the CLE Minutes is not a viable solution
because the TEI *xref* attribute is not used for internal referencing.  The WG
had intended that an element reference could be unique to the <termEntry>
without being unique within the entire document.  Sperberg-McQueen pointed out
that this was inappropriate, that for linking purposes each element had to be
"globally available and globally unique" within the document.
 
Sperberg-McQueen also stated that the complex interlinking of elements within
a <termEntry>  is unique to AI7 and reflects a special kind of problem that
poses the potential for problems with respect to SGML.
 
Two basic identification principles emerged from the discussion: either all
elements in a terminological information group <tig> (see "Subdocument 1,
Terminology, and Subdocument 2, Basic Structure of the <termEntry>" for a
discussion of the <tig> tag) would have to be identified with some sort of SET
identifier so that discontiguous components of the set can be extracted or
reassembled (EXTRACT principle), or all elements in a <termEntry> must receive
an alpha-numerical identifier that will enable pointing from one location in a
discontiguous <tig> to another (MOVE principle).  It was also recognized that
provision must be made for two types of referencing: pure adjacency
(membership in a <tig>, direct reference to the <term>) and embedding (which
generally implies a second level of association -- information used to modify
an element which in turn is associated with the <term>).
 
The group considered three specific proposals for defining pointers in the
flat <termEntry>:
 
1)      Shreve proposed the use of a number tree method, i.e. a system of
        domain style addressing to identify all elements making up the SET that
        constitutes the <tig>.  The domain address would consist of three
        identifying strings in the pattern x.y.z, where
 
        x = the <termEntry> identifier
        y = the <tig> identifier
        z = the individual element identifier
 
        AI7 would not dictate the specific form that this system might take.
        For instance, the identifier of a given element might consist of the
        <termEntry> identifier, the <tig> identifier, and a sequentially or
        arbitrarily assigned sequence.
 
        Example:
        -------
       *MEDAIDS12345.1.def1* would constitute a unique identifier for the first
        definition associated with <tig>1 for the sample <termEntry> shown in
        Subdocument 3, page 3.
 
2)      Sperberg-McQueen proposed the use of two attribute identifiers that
        would flag data elements according to:
 
        1) the <tig> with which they are associated (<group> - accounts for
           adjacency) (SET concept)
 
        2) the specific tag element with which they are associated (<dep>       -
           accounts for embedding) (MOVE concept)
 
        After much discussion, this was the system adopted as a final position.
        See "AI7 Subdocument 3: Links" for a complete discussion of this
        option.
 
3) Melby proposed the third option which involved using a single
   coordinate pair to link to data elements.  In this system, each target
   element would have a unique name identified using the n=attribute,
   which could be referenced using two pointer attributes, coord (for
   adjacency) and depend (for embedding).  This system is based on the
   MOVE concept.
 
   The group analyzed the advantages and disadvantages of the three systems:
 
1.      Uses only one attribute               Imposes domain-style number tree
        No restrictions with respect          Requires full normalization to
        to numbers and characters             assign the number tree, which is
                                              not possible in the flat
        Easy to understand                    <termEntry>
 
 
2.      No "magic numbers"                    Requires two attributes
        Structurally simpler than
        number tree                           Potentially confusing because
                                              both the SET (*group*) and MOVE
        Easier to teach, therefore            (*depend*) mechanism exist in
        more accessible to potential          the same TEI.TERM document        fore            the
 
        Can actually be applied to the
        non-nested structure without
        preconstructing the nested
        <termEntry>
 
3.      Uses only one mechanism               Reifies the normalized tree
        No restrictions on the element
        name
 
As noted, the consensus of the group was that there were more advantages to be
gained from solution 2 than from the other options.  In effect, solution 2
represents features of both 1 and 3 because it incorporates the SET (group)
concept to accommodate adjacency and the MOVE (depend) concept to accommodate
embedding.
 
Items 5 & 6
------------
 
The WG recognizes the concern that the actual tags included in the DTD must be
limited to a basic level of abstraction and that the data categories included
in the empirical study (Wright/Budin, AI7 W-11) should be recognized as
generic identifiers (GIs) that will appear as attribute values of the
*type=x* attribute in TEI tags.  By defining a limited basic set of tags in the DTD and
publishing an open-ended Joint List of recommended attribute values, it will
be possible to add attribute values as needed to satisfy the requirements of
specific applications.
 
The group agreed that the attribute values list would appear in a non-
hierarchical form and that there would be no attempt made to restrict
attribute values to specific tags because many attribute values tend to behave
differently depending upon the structure of the system in which they are used.
 
L. Wright observed that the <termEntry> is composed essentially of three
components: terms, descriptive material related to the terms, and
administrative data.  [Coincidentally, this observation conforms to Sager*s
three primary data categories.]  The WG agreed in principle, noting that this
simple list needs some expanding to utilize or accommodate several TEI
characteristics.  The resulting list included:
 
        terms
        otherForms (which are themselves terms)
 
        description (which refers to the term)
        floating description (which refers to some other element in the <tig>)
 
        administrative data
 
        specific TEI tags that can be incorporated into TEI.TERM
 
See "Subdocument 4: TEI Data Categories: <termEntry> Structure, Tags,
Attributes and Attribute Values" for a detailed list of TEI tags and attribute
values.  The committee must prepare a list of recognized attribute values with
their definitions.  The draft level of this list can be generated from the
existing KSU data categories terminology file.  The definitions contained in
that file are strictly ad hoc formulations created for research purposes.  The
WG will have to subject them to a formal review and revision process.
 
Agreement on the meaning of the attribute values is very important for
purposes of interchange.  Although the list of attribute values is open-ended,
the addition of synonymous attrubute values is to be descouraged at the
interchange level, although local applications are completely free to use
whatever data category names they deem fit.  Export routines should account
for aliasing to existing attribute values wherever possible.
 
Review of Assignments List from the Cleveland Meeting
-----------------------------------------------------
 
   Assignments for June-July-August
   --------------------------------
A.      Write sample TEI.TERM documents using tentative        Wright & Budin
        tag set
 
        Budin reported that a few sample TEI <termEntry>s had been
generated and that he and Wright would continue this effort.  Sample
<termEntry>s will be included in "Sub-document 7".
 
B.      Consider character sets and reversible                 Melby
        transliteration
        Contact TR1
 
Item 8
-------
 
        Melby reported that it was currently essential that documents be
restricted to ISO 646 because of the prevalence of systems that use either
exclusively lower ASCII or EBCDIC.  He stressed that:
*       Transliteration must be reversible.
*       SGML % entities will probably be used.
*       UNICODE and ISO 10646 16 bit representation schemes for multilingual
characters are interesting solutions for the future, but that we must cope
with existing systems for some time to come.
        Melby noted that the problem of reversibility appears to be resolved for
non-oriental languages.  Melby will provide a more detailed summary of ISO 646
and the lang attribute, which will becomes "Sub-document 5".
 
C.      Write formal DTD using tentative tag set                    Shreve
 
        Shreve reported on the DTD, which does parse.  He will be revising the
DTD to conform to decisions reached during this meeting.  This DTD will become
"Sub-document 6".
 
D.      Test sample TEI.TERM documents and DTD using           Shreve
        an SGML parser.  See above.
 
E.      Get feedback on sample files from many TDBs            Wright & Budin
 
        Budin reported that he and Wright will contact users and operators of
terminological databases (TDBs) to solicit sample <termEntry>s for the purpose
of examining as many different system types as possible.
 
F.      Invite numerous groups to create conversion            Wright & Budin
        software to convert sample files to and from
        TEI.TERM-conformant documents
 
        Ditto above.
 
G.      Refine tag set and DTD                                 Shreve; WG
 
        Shreve will revise the DTD as noted above.  Wright will generate a list
of tags, attributes and attribute values in conjunction with the writing of
the Oak Ridge minutes.  She and Shreve will supervise the creation of a draft
list of tags, attributes and attribute values together with definitions, which
should be ready by mid-October.
 
H.      Write final report                                      Wright; WG
 
        The committee identified the following reports and papers that must
completed, together with their respective deadlines:
 
1.      Preliminary version of an AI7 status report to
        be circulated in Sub-Committee 3 of ISO                   26 Aug.-
        Technical Committee 37 on Terminology (ISO TC 37/SC 3)    1 Sept.
 
2.      Minutes of Oak Ridge Meeting                              1 Sept.
 
3.      Formal paper to be authored by Melby and Wright           25 Sept.
 
        This paper is for presentation at the International Symposium on
Terminology and Documentation in Specialized Communication to be held in Hull,
Ontario 7-8 Oct (presentation by Wright).  The same paper will form the basis
for Melby's presentation at the ASLIB conference in London in November.  There
will be a book printed as the proceedings of the ASLIB conference, in which
this paper will appear.
 
4.      Section for TEI Guidelines 2, which will be
        basically identical to the Hull/ASLIB paper,
        with minor modifications to reflect venue-
        specific concerns.                                          25 Sept.
 
5.      Preparation of a more polished proposal for TC
        37/SC 3.                                                    15-16 Nov.
 
This proposal can again be based on the ASLIB paper, and preliminary versions
can be provided to important SC 3 members prior to the November SC3 meeting.
The precise form of the SC 3 presentation can be finalized during the November
meeting in Vienna.
 
6.      Formal petition to continue the work of AI7
        within the framework of the third round of TEI
        activities                                                   ??
 
I.      Create TEI stationery header in both MS-DOS and
        Macintosh format                                       Strehlow, Wright
 
        (Purpose: to avoid identification of AI7 with any of the respective
members' parent organizations, which could impair the acceptance of the TEI
format by a broad range of terminologists)
        Strehlow reported on his design and presented the members with disks.
The group registered its collective gratitude for his efforts.
 
Proposed plans for the Vienna meeting 1991-11-15/16
 
The next meeting of AI7 will take place in Vienna on 1991-11-15/16.  Strehlow
will be unable to attend, and Melby will petition TEI to cover expenses for
Shreve in his stead.  Melby will have to attend the first two days of a four-
day meeting in Bergen, Norway on the official days of the AI7 meeting.  Budin,
Wright and Shreve will use the meeting time to review work on the various
assignments and to prepare presentations for the TC 37/SC 3 meeting to be held
in Vienna on 1991-11-18/19.  Melby will join them on the evening of 11-19.
 
The group strategy for the SC 3 meeting will be to present the TEI project as
an interchange format based on ISO 8879-1986 (SGML).  In this regard, the SGML
solution would exist as a parallel option to the existing MATER standard (ISO
6156), which is based in part on ISO 2709.  The SGML format should not be
touted as a replacement for ISO 6156, but as a second part designed to meet
needs unaddressed by the MATER standard.  There should be no effort to retain
reference to MATER in the name of the SGML solution because there is no
relationship between the actual title of that standard (with its reference to
magnetic tape) and the electronic environment for which TEI.TERM is being
designed.
 
 
Respectfully submitted, Sue Ellen Wright
 
******************************************************************************
Addendum to the Minutes of 1991-08-13/14
 
1.      2 and 3 letter language codes
 
In telephone consultation Wright, Budin and Melby have agreed on the following
position concerning ISO 639, parts 1 and 2.
 
First of all, it must be noted that many systems exist that use language codes
that differ from *either* the two or the three letter code.  It cannot be the
province of AI7 to suggest what local applications may do within the confines
of their own specific working environments.  However, Wright, Budin and Melby
are agreed that exporters should structure their export routines so that the
flat <termEntry>s produced by the initial conversion will reflect either the
two or the three letter code as standardized in ISO 639 Part 1 or Part 2.
Melby and Wright are agreed that AI7 should make a strong recommendation to
TEI Working Group TR1 that all instances of language identification that occur
in TR1 literature or in the WSD format itself should also conform to ISO 639.
The rationale behind this recommendation is that Melby, Wright and Budin
contend that TEI practice should make use of existing standards whenever
possible so long as these standards do not conflict or interfere with the
efficiency of the evolving TEI system.  Because both the two and three letter
codes are so widespread already, the three members of AI7 are of the opinion
that this request can be made of TDB users and developers without inciting
reluctance on their part to comply.
 
2.      Reversibility of <termEntry>s
 
The following statement was included in the first version of the minutes:
 
        Sperberg-McQueen also pointed out that one could convert a flat
        <termEntry> to the universal interchange format (normalized, nested
        form) and back again with no or little loss of data or
        position, whereas even if AI7 designs a mechanism for converting
        a flat <termEntry> to the universal interchange format, it will be
        impossible to automatically reproduce the structure of the flat
        <termEntry>.
 
When Melby and Wright tried to reconstruct what was actually being said here,
they decided that the statement was too inexact to codify in the minutes,
although it does definitely lead to an interesting series of observations.
 
Melby and Wright both doubt that it would be possible to reconstruct the
original <termEntry> per se, even if it did resemble the nested form, because
there is no precise imposition of order within the <tig> beyond the fact that
the term must introduce the <tig> and certain elements have to embedded in
other elements.
 
Wright went on to point out, however, that on the import side the construction
(or conceivably reconstruction) of prespecified order is absolutely necessary.
She visualizes a generic export routine that would consist of a combination
<termEntry> layout and filter template that would
1)      specify the order that certain data categories (tags, attributes and
        attribute values) would assume in the target local application format
2)      specify which set of data categories from the source system should be
        imported to the target local application format.
 
For instance, if one were exporting from a very rich system that specifies
many specific data categories to a relatively simple one that uses only a
limited number of categories, one would want first to filter out all those
items for which there was no data category within the target system, or one
might want to lump certain data categories together in a "note" or other
"soup" type data category.  Once this initial conversion pass had created a
document that closely parallels the structure of the target local application
format, an actual proprietary conversion routine could be run in order to
convert the data stream to the native mode truly compatible with the target
system.
 
3.  Language of a "tig" or a Language Section within a <termEntry>
 
Wright observed in a FAX message following the Oak Ridge meeting that although
there had been discussion of the use of the *lang* attribute to identify the
language of a term or even to refer to specific elements within a <tig>, no
decision had been made concerning conventions to identify the language of the
<tig> itself.  Nor has any system been identified for sorting all the
information contained in a <termEntry> that could be defined as constituting a
language section in the <termEntry> (assuming, for instance, that more than
one term in a given language occurs in the <termEntry>).
 
Melby pointed out that although it has been noted that the lang attribute
cannot be used as a target for pointing purposes, it can be used as a sorting
attribute by a conversion routine that would assemble the components of a
<tig> or of a language section.  Nonetheless, it would be necessary for the
sorting routine to "go out" to the WSD to find the actual language associated
with the language code included in the elements of the <termEntry>.  One
expeditious way to simplify this problem might be to require that the first
element in any WSD be either the two or the three letter language code
followed by a .-digit (example: scr.1 for one of the two writing systems used
to represent Serbo-Croatian) to indicate the specific character set used.
Thus little time would be lost in referencing the WSD, but such a scenario
would demand that the WSD in question be resident in the system at the time of
conversion in order to implement such a search.
 
Another option would be to create a *tigLang* attribute that would be generated
by the import routine at the time <tig>s are assembled in the normalized
<termEntry>.  Although this method necessitates the assignment of an
additional attribute, it has the advantage that all further processing could
take place without having to reference the WSD, which in some cases may not
even reside within the system at the time of further processing.
 
A third option suggested by Melby would be to use the existing
 *languageCode* attribute, which is currently one of the WSD
attributes, within the TEI.TERM <termEntry>.  Wright observed that this
might conceivably lead to problems in the event of inconsistencies,
which Melby acknowledged as a potential difficulty, while noting that
the problem of potential inconsistency exists in many other instances
as well.
 
Wright and Melby noted mutually that the inheritance properties that are
native to TEI can be used to generate the *tigLang value*, assuming that the
<term> tag is appropriately identified by a lang attribute and implying by
virtue of inheritance from below that the language of the <tig> is equivalent
to the language of its term unless otherwise indicated.  It is also possible
that the language of an entire monolingual terminology document could be
indicated in front matter and thus implied in every <termEntry> by virtue of
inheritance from above.
 
Any term elements that vary from the stated inheritance principles would have
to be explicitly identified with their own *lang* attributes.
 
Exporters will have to provide for the identification of the *tigLang* either
explicitly or using one of the implicit methods described above.  In fully
normalized <termEntry>s, the *tigLang* attribute would have to be explicit.
 
 
Respectfully submitted, Sue Ellen Wright