TEI AI7M5 Contents: ========= -- Minutes of Meeting of June 11, 1991 (including Figure 1 and Figure 1 Endnotes -- Agenda of Meeting of June 11, 1991 -- Addendum to Minutes of Meeting of June 11, 1991 -- Annex: Sample Term Entries ***************************************************************************** Revised Minutes Version 4 of the TEI Meetings of 1991-06-11/12/13 1991-08-21 These minutes replace all previous versions! Please discard invalid copies. In attendance: Dr. Alan K. Melby, Chair Dr. Gerhard Budin Dr. Richard A. Strehlow Dr. Sue Ellen Wright Dr. Gregory Shreve [Note on the membership of the committee: Melby, Budin, Strehlow and Wright are the official members of the committee; Shreve has proven an absolutely essential member of the team and has been recognized as a co-opted member.] 6/10/91 1st Working Meeting of A&I-7, Analysis and Interpretation-7, Terminological Data Working Group of the Text Encoding Initiative As an introductory statement to the minutes, A&I-7 expresses its appreciation to the following correspondents for their contributions to the activities of the WG: C. Boitet Group D'ETUDE POUR LA TRADUCTION AUTOMATIQUE IMAG-campus, 150, rue de la Chimie BP 53X, F-38041 GRENOBLE Cedex H. Hjulstad Radet for Teknisk terminologi Riddervolds gate 3 N-0258 Oslo 2 Norway U. Heid Inst. f. masch.Sprachverarbeitung Universität Stuttgart Keplerstraße 17 D-7000 Stuttgart 1 C. Leonhardt Promotion and Client Services Room 4F-40, 15 Eddy St. Dept. of the Secretary of State of Canada Ottawa, Ontario K1A-OM5 A. Reichling Commission of the European Communities Translation Service, Terminology L-2920 Luxembourg The meeting was called to order by Melby. The meeting began with an examination of the agenda (see attached). As a preliminary query addressed in conjunction with the examination of the agenda, Strehlow asked about the availability of SGML parsers. Melby reported that he had ascertained from Michael Sperberg-McQueen, co-editor of the TEI, that the TEI is acquiring 10 copies of the SEMA MS-DOS parser, one of which can be allotted to A&I-7. Strehlow also noted that he has access through his employer, Martin Marietta Energy Systems, to the AUTHOR-EDITOR parser running on the Macintosh platform. 1. Background Melby proceeded to cover the initial items on the agenda, What is SGML? What is TEI? and What is A&I-7? In discussing the mandate of A&I-7 coupled with the past experience of its members, the group became aware that certain items of terminology required further clarification. It was duly noted that "easily convertible" to SGML, a phrase used by some members to designate the characteristics of the ASTM and MicroMATER mark-up conventions, is used to refer to mark-up formats that are very similar to SGML and can be readily converted to SGML representations. In contrast to the concept of SGML "convertibility", the two similar terms "SGML conformant" and "SGML conforming" were recognized as variants of a single concept used to designate documents that include a reference to an SGML DTD, are encoded in conformance with that DTD, and can be parsed using an SGML parser. SGML "compatible" has essentially the same meaning, but this term should be avoided to prevent possible misunderstanding. The discussion quickly revealed that all members of the group fully understood the differences between generic and specific (i.e., descriptive and presentational) markup. Special note was taken of the distinction between the SGML declaration and the Document Type Definition. As terminologists, the group also noted the need for an effort to harmonize terminological usage within the TEI group and in the SGML community as a whole, although the actual promulgation of such a terminology standard may be premature at this time. Melby reported that he had approached TEI at an earlier time to request that a working group for terminology be included within its framework. At that time there was already an A&I WG on lexicography and there was insufficient interest in adding a WG for terminology. In October of 1990, Donald Walker, current chair of the TEI, attended the conference on Terminology and Knowledge Engineering held in Trier, West Germany. During the conference, Strehlow had conversations with Walker concerning relationships between SGML and terminology, and Melby presented a paper underscoring the importance of creating a universal exchange format for terminological information. Realizing the significance of terminology in addition to lexicographical activity and impressed with the importance of terminology work within the framework of knowledge engineering, Walker renewed contact with Melby and laid the groundwork for the formation of a terminology working group within the TEI, a step which was formalized by a vote held at the time of the Tempe conference in March, 1991. Melby noted the potential relationship between A&I-7 and other WGs. For instance, TR1 is charged with the selection of Character Sets. It was noted that this particular WG is of special interest to A&I-7 [and no doubt to multilingual dictionaries as well]. A&I-7 **must categorically** have access to reversible transliteration capability in order to ensure the complete portability of non-standard-European character sets. Melby stated that he would strongly request complete information from TR1 on this topic and would maintain close contact with Sperberg-McQueen concerning progress along these lines. Any failure to provide the necessary capability in this area will doom the attempt to produce a universal exchange format that will be widely usable across linguistic boundaries. Melby will also request information on the current status of the work of TR3 on Hypermedia. This concern reflects the growing incidence of hypermedia implementations within terminological database environments, or more specifically, the incidence of terminological database capability within hypermedia-type workstation environments for technical writers, translators, standardizers and documentation specialists. It was duly noted that the solution to problems involving formulas, tables, figures and graphs is of critical concern, particularly to standardizers of technical terminology. Other WGs with special interest for A&I-7 include A&I- 5 for Dictionaries, A&I-6 on Computational Lexicons, and the ML Metalanguage group. [It should be noted that the tags included in the following report are purely conceptual in nature. The group has decided to refrain from even trying to formulate metalinguistically "correct" tags until such time as the ML report is made available to us, at which time we will go back through our examples and harmonize them to conform to the dictates of that WG. At present we are more concerned with logical significance within the DTD than with correct form.] It should also be noted that the recent Copenhagen meeting between ISO TCs 37 and 46 has resolved some outstanding questions concerning the three-letter language code, so progress should be in evidence on that score as soon as possible. 2. Major Discussion Items Before adapting the TEI DTD for terminological documentation, it is necessary that the group designate a primary tagset that will be defined as representing the core information included in a terminological document. The group recognizes the fact that the tagset used for terminological documentation must be open-ended in order to ensure a certain level of acceptability within the terminology community, and yet contain a well-defined core in order to provide support for coherent terminology exchange. In the past, efforts to define such a central core have been frustrated by the divergent theoretical positions within the terminology community, a phenomenon that is not, of course, unique to this particular field of study. A common complaint registered by individuals holding opposing views has been that other opinions are "too theoretical." Consequently, A&I-7 is keenly concerned that any position registered by this WG not be identifiable with any one theoretical or doctrinaire position. This consideration applies specifically to the identification of data categories to be included within the primary tagset. As a result of this concern, Budin and Wright felt that it would be inappropriate to select these categories based on the personal experience and preferences of the WG, but rather on an empirical study of the categories actually used in existing terminological database systems, by national and international standards organizations and by existing interchange formats. This survey is represented in the attached charts. [Charts will follow in the next few weeks.] The purpose of the charts is to reveal those data categories that display the highest frequency of use throughout the gamut of existing systems. It must be noted that Budin and Wright are fully cognizant of the fact that the inclusion of a given category, for instance "definition," does not reveal anything about the actual content or quality of the material included under that category within the source system. This consideration is, however, endemic to the nature of SGML exchange. The parsing function with respect to an SGML document can only determine SGML conformance. Only target system users can judge the quality of the actual knowledge units contained within a marked up document. It cannot be the function of A&I-7 to assume this function or to accommodate it in a DTD because, in light of the rich variety that prevails in existing applications, to demand any given set of criteria would abrogate the desired universality of the exchange format. Within a given subset of users, however, additional granularity could be introduced for the purpose of representing more detailed information elements, such as discrete components of definitions, etc. Examination of the assembled chart (which the WG, in the spirit of true group participation, attached to the window wall in the meeting room) revealed that the following categories (See Figure 1) should be entertained for discussion as possible tags within a future set of data categories. These categories have been defined according to whether they are non-floating or floating. Non-floating tags represent primary data categories that are either generally unique to an entry (domain) or which may repeat a limited number of times within a term entry (, , , etc.> Floating tags, on the other hand, are freely combinable with non-floating tags, or, as is the case with , with virtually any other tag in the . Within SGML, this means that floating tags can be embedded inside non-floating tags. The group must still examine the master list treated by Wright and Budin to ascertain how to deal with individual tags that were left out of the primary set. It is agreed that the final DTD should allow for the greatest possible degree of freedom on the part of users, hence a number of additional tags must be accommodated one way or another. The precise status of tags and attributes, floating and non-floating tags, etc. will have to be clarified at the next meeting. Furthermore, although the KSU DATCATS list exists, no decisions have been made at this juncture concerning the precise form that the tag codes will take. This must be a topic for discussion at the next meeting. As noted above, the WG has decided that since the metalanguage document is not yet available, it makes sense to wait to work out the precise form of the tagging system later. Hence no attempt has been made to reflect the future form of the tags in the examples included in this report. 06-13-1991 Term Entry Structure: Deeply embedded, hierarchical structure ------------------------------------------------------------------------- Going into the meeting, Wright and Melby had proposed that the body of a "TEI.TERM" document consist of a list of "term.entry" () elements embedded within a highly articulated hierarchical structure. Note: Throughout the following examples, the three-letter language codes taken from the draft standard ISO 639 Part II are used. These codes are based on the most current version of the draft, which is likely to be approved early this fall. Example: automotive clutches coil spring ... ... ... helical spring ... ... ... ressort hélicoidal ... ... ... Simplified, "theory neutral" structure -------------------------------------- Although the deeply embedded structure illustrated in the first example is logically coherent and reflects the mathematical complexity that can be achieved within in SGML, it also implies the acceptance of a considerable body of terminology theory regarding the interrelationships between data elements within a term entry. Many of these assumptions can be identified with one or another "ideological" schools of terminology and are not necessarily accepted by all working terminologists. At the beginning of the second session Strehlow introduced a proposal for a simplified structure. Budin in particular concurred. After considerable discussion, the WG reached the consensus that a term entry should be as simple as possible in order to avoid association with any specific "school" or "theory." Wright and Melby had indeed taken into consideration the fact that each system designs the term record differently and had postulated that these differences would be accounted for in the various conversion routines that would have to be written to translate application-dependent term entries into the TEI.TERM format and ~ conversely ~ back to specific application environments. Nonetheless, the concern arose that if TEI.TERM itself resembles any one system or type of system, practitioners using other types of systems might well reject the format out of hand on the assumption that it would not facilitate exchange for "deviant" entry structures. Hence, it was decided, the format should be designed to obviously accommodate the greatest possible flexibility. Furthermore, since great disagreement prevails concerning the essential elements included in a term entry, the decision was made that the element would be the sole mandatory element in any given term entry, and that, given the presence of descriptive material (definitions, foreign language equivalents, etc.) the element could actually be an empty tag in some cases. Given this extreme degree of flexibility, the resulting structure is considerably less articulated ("flatter") than the original Wright-Melby proposal. This approach gained favor when, as noted above, Budin argued convincingly that it would be more readily accepted by a broad range of practicing terminology database administrators since it appears to be more theory-neutral. Example: automotive clutches coil spring ... ... ... helical spring ... ... ... ressort hélicoidal ... ... ... Not only does the "flatter" entry represent a "theory neutral" position, it also occupies considerably less space. Actually this form of the entry is nonetheless roughly equivalent to the Wright-Melby proposal since it is based on the rules of adjacency: o is the only required element in a o Any element that appears in a *before the first* is assumed to apply to the entire . (This area of the corresponds to the of the original sample entry, but is more theory-neutral.) o Any element that appears in a *after* a is implicitly associated with that . Thus, each introduces the material associated with a new "term assignment." The basic premise of the simple format assumes that users can employ a fairly straightforward routine to convert entries directly from local applications to the exchange format without resorting to complex sorting routines that would rearrange entries to suit preconceived notions of how a term entry ought to look; however, not all existing terminological database systems arrange all the information pertinent to a term in contiguous format. Instead pointers of some kind are used to link scattered information together. For instance, in such a system, all the terms associated with a concept might appear first, followed by the descriptive material (definitions, contexts, etc.) Although it is common even in such systems to retain source and responsibility codes in close conjunction with the elements to which they pertain, it is even conceivable that these items of information might appear at other points in an entry. As a consequence of the variety that can occur in ordering elements within a term entry, in addition to the default rules of adjacency, the WG proposes a method for linking any elements within a that are "out of order" relative to the rules of adjacency. This method would involve assigning a "tagid" attribute to elements that belong together, but that are separated from each other within the entry. Thus these elements could be linked by "matching up" elements that have the same value for the "tagid" attribute. The WG proposes the attribute "tagid" rather than "ID" since "tagid" need not necessarily be unique throughout the document or even **within** a . [Of course, all elements identified with the same tagid must be related.] Hence it can be characterized as an "intra-term element tagid." It only has to be recognizable within the "local" area within which it functions. Often the user will select a language code as the "tagid." For example, if the user opts to list two terms in a , followed by two definitions, each term could have a distinct "tagid" (e.g. 'eng' and 'fra' or 'eng1' and 'eng2', and the definitions would be matched up with the appropriate term by attaching a "tagid" attribute to . 'televel' could be used as a special value for the "tagid" in order to link it to the entire in those cases where the information in question does not appear before the first . (See Example 2.2 below.) A mechanism that will provide additional flexibility is the use of ID and to link term entries. For example, suppose a certain user prefers to maintain English and French language information in separate term entries. In such a case, each would have a unique ID and any could be linked to another by placing a element anywhere inside the , but not embedded inside any intermediate element of the element. Association between the s would be achieved by matching up the IDs. This same mechanism could even be employed to link a to several other s (e.g. to a German and a French ) by using several strategically placed tags. The tag introduced as a linking element for functional equivalents in other languages is a type of tag. One more example of flexibility would allow a ... element to appear before the in teiterm1.dtd. This element would contain additional elements such as , which would apply to the entire document, not just to a single . Of course, a document might also contain one or more s that pertain to different domains from the one included in the material. Such s would have to contain an explicit reference, which would override the default. In essence, the WG is proposing a two-level format. The most logical level would constitute a "normalized" format in which all related terms would be pulled into one entry and all elements in a term entry would be adjacent to the term to which they apply. The normalized form would not generally require linking via an "intra-entry" ID. However, if an exporter wished to export in a straightforward way from a database whose structure did not conform to the normalized model, he/she could use the tagid attributes to link related materials within the . The group also proposed the creation of a normalization routine to sort documents containing tagIds in order to produce normalized documents. Allowing these options would provide varying levels of normalization within the entry. This degree of flexibility is designed to encourage acceptance from users employing a wide variety of entry formats. Figure 1: Preliminary TEI-Terminology Categories (1) NF(2) domain (includes classification systems such as thesauri & indexing notation) F source F date F reliability F status F project (sometimes called subset) NF concept position (within a concept system) (antonyms were recognized as a type of conceptual relation and included in this information type) ATRIB language NF term NF synonym 2-way (3) NF abbreviated form 2-way NF full form 2-way NF [inverted form, [precise distinction and term element] relationship between these two items to be discussed at a later point] NF symbol F grammar [repeatable only per term, synonym, etc.] F pronunciation ["] F register ["] NF standard term 2-way NF preferred term 2-way NF admitted term 2-way NF deprecated term 2-way NF superseded term 2-way NF obsolete term 2-way NF neologism 2-way NF international scientific term NF legally prescribed term 2-way NF variant types=spelling, pronunciation F geographical (regional) restriction 2-way F in-house usage 2-way NF trademark NF definition NF context F note the most fully combinable of all tags(4) NF example NF table NF figure 2-way NF formula 2-way ATRIB number [iterative numbers defined by TEI]( 5) F keyword NF phraseological unit NF unit [i.e., with respect to dimension] F XREF=cross reference NF equivalent functional equivalent of in n languages; used as a kind of XREF; perhaps: NF translation [Notes appear as endnotes on the next page.] ****************************************************************************** ENDNOTES for Figure 1: 1) The following list represents the major data categories identified in Cleveland as most likely to occur in an entry. It is by no means exhaustive. During the August meeting in Oakridge, the committee will have to take a serious look at the other possible data categories and determine what status they will have in the list. 2) F=floating, NF=non-floating 3) Some data categories can function either as the name or the content of a data element. These items are identified here as "2-way". 4) is also an existing TEI tag, but no effort has been made an effort yet to convert these examples to conform to its conventions. One attribute associated with is "source", which must be studied in detail. The information on is scattered throughout the Guidelines. The information on "source" is also scattered about and is not absolutely clear. It appears that it can be used to cite the source (in some cases rather like a responsibility code) for some kinds of notes. The "source" data category traditionally used by terminologists will be replaced with the TEI or tags, depending on whether the information content of the tag is a code reference to a bibliographical list or whether it contains the complete bibliographical reference. 5) The significance of has been misconstrued. From wading back and forth through the Guidelines looking up the various references to "num", "enum" and "n", it is apparent that most of our situations require "n", not "num". The latter is a tag used to identify actual numbers in the text, whereas "n" is an attribute used to identify sequential reference for such things as footnotes, and in our case, the repeatability of any given terminological data element. In spite of the impressive looking index in the Guidelines, it is not without its frustrations: o There are listings for pages that simply do not include information on the item queried. o There are instances where the information actually shows up on another nearby page (which may be the case with the first problem as well and I just haven't been diligent enough to find the rogue item.) o There is no way to differentiate between definitive treatments of an element, where it is carefully explained, and casual references to it or instances where it is used in an example without any reference at all in the text. o Potentially confusing items, such as the various ways of treating numerical information, could be grouped together in special discussions, with detailed explanation of the differences among their uses. The same kind of treatment would be useful for the - pair. This is potentially a project for a future edition. o There really needs to be a "Terminology of TEI" including complete aliasing from full forms to tags and back again. One must know to begin with that n is what one needs in order to find the information on it. Looking up "number" will not direct one to the three possible avatars of the concept. ****************************************************************************** Follow-up Meetings The next two meetings will be as follows: August 13 & 14 Venue: Oakridge Tennessee, Oakridge National Laboratories Attendees: Melby, Budin, Strehlow, Wright, Shreve Task assignments were allotted according to the appended version of the agenda. November 15 & 16 Venue: Infoterm, Vienna Attendees: Melby, Budin, Wright, Shreve November 18 & 19 Participation in SC 3 of ISO Technical Committee 37 for Terminology Principles and Coordination: The WG that will be meeting at this time will specifically discuss the future version of ISO 6156, the existing ISO standard on the exchange of terminological data (MATER). The other members of the SC3 WG working on terminological exchange are very interested in the work of the TEI and are looking forward to learning how the project is progressing. We will provide them with our reports prior to the meeting. Clarification on the status of TEI vis à vis TC 37 SC3: As members of the U.S. TC 37 TAG, all members of TEI A&I7 are eligible to request full participation status in the SC3 WG. I have requested that Christian Galinski, who is the Secretary of TC 37, request that the three US members who will be in Vienna all be formally registered as active participants in this WG. MEETING OF THE A&I-7 WORKING GROUP of the TEI AGENDA (proposed by Alan Melby) 1. Background * What is SGML? * What is TEI * What is A&I-7? 2. Major discussion items * Survey of existing terminology formats (Wright/Melby ASTM paper, Wright/Budin detailed survey) * Selection of tentative set of primary tags for TEI term DTD (Solicit feedback from various groups) 3. Assignments for June-July-August A. Write sample TEI term files using Wright & Budin tentative tag set B. Consider character sets and reversible Melby transliteration Contact TR1 C. Write formal DTD using tentative tag set Shreve D. Test sample term files and DTD using an Shreve SGML parser E. Get feedback on sample files from many Wright & Budin sources F. Invite numerous groups to create Wright & Budin conversion software to convert sample files to and from TEI.TERM G. Refine tag set and DTD Shreve; WG H. Write final report Wright; WG I. Create TEI stationery header in both Strehlow, Wright MS.DOS and MAC format (Purpose: to avoid identification of A&I-7 with any of the respective members parent organizations, which could impair the acceptance of the TEI format by a broad range of terminologists) ADDENDUM: The following notes represent subsequent commentary on the minutes of TEI.TERM meeting 1. 1. Page 5, Paragraph 2, line 6 Domain: Melby has observed that although domain usually applies to an entire term entry or record and is therefore unique within a record, it could conceivably be iterative and be used to cite multiple domains within a single entry. For instance, in a normalized entry designed to replace redundant entries (for instance, ISO Guide 52), several domains might share a common concept, in which case the domain entry would also have to be repeatable within a given term entry. 2. Page 11, Paragraph 1, line 3 Rules of adjacency: Melby and Wright discussed the first two sentences in this paragraph. It was Wright's contention that the sentences read in accordance with the CLE discussions, but Melby felt that the word "non-floating" should appear before each of the instances of the word "element." Wright noted that this would defeat the real purpose behind using the tagId in most instances because the tagId can link up associated data elements, such as "source" or "responsibility" that may have wandered off to the nether regions of the record the way they do in some systems. She felt that if the system demands that the conversion routine locate these items and move them up into an embedded position in the associated non-floating element, the benefits of the open system are lost. Melby contends, however, that this would require a third level of adjacency, which Wright admits would introduce undue complexity. Surely these elements can elements be embedded in the non- floating element *when they occur adjacent to that element*. Her concern is that if the elements must move in this way, the converted entry will begin to look like "someone else's" theoretical position rather than like the original entry. Melby has proposed a compromise: a third level of adjacency might be possible if the conversion routine would create an empty embedded set of tags within the subject non-floating tag, including a tagId, which could then be repeated in the tag for the disconnected, but related floating elements that occur elsewhere in the record. Annex: Sample Term Entries The following complex term entry could be arranged in several different formats in different terminological database systems (TDBs). The Examples 1 and 2 of the entry represent two different, fairly common arrangements used in various databases. The first is divided by language section, whereby each language resides in a different virtual record within the system, but is linked by a common record identifier. The second example shows an entry in which all the information for a concept is included in a single physical entry. For the purpose of clarity, the representation of this record utilizes some conventions that are similar to TEI-SGML, but it should not be construed as an SGML entry. 1. Term Entry divided by language section (similar to Termium, PC-Term)(1) ------------------------------------------------------------------------------- 1a. Spanish language section 1b. English language section [term typ=abbreviated.form] VEB [term typ=abbreviated.form] EBV [full.form] virus Epstein-Barr [full.form] Epstein-Barr virus form [context] [Durante la fase [definition] the herpes virus that de infección aguda causes mononucleosis ("kissing se observa en los datos de laboratorio] disease") una linfocitosis con presencia de [context] The high incidence of elementos linfoides estimulados o infection with Epstein-Barr "virocitos" muy similares (EBV) and cytomegalovirus (CMV), morfológicamente a los que both of which are polyclonal B aparecen en la mononucleosis infecciosa cell activators, certainly por VEB [virus Epstein-Barr] o CMV contributes to this phenomenon, [citomegalovirus]. [hyperactivity of the B cell [source] HIan1989 immune response]. [page] 8 [source] HUsc1988 [page] 620 [equivalent lang=eng] EBV [equivalent lang=esl] VEB [source] MME1985 [page] 266 [te.level:domain] MED. AIDS [te.level:domain] MED.AIDS [entry.type] term entry [entry.type] term entry [date] 1991/04/15 [date] 1991/04/15 08:03 [responsibility] CAR [responsibility] CAR ____________________ (1) It must be noted that this entry is not in TEI conformant form. It simply represents an entry as it might appear with hard-programmed data elements. To dispel the notion that it ought to conform to TEI embedding rules or the like, no end tags are used. 2. Term Entry grouped according to terminological data category (possible in MTX) --------------------------------------------------------------------------- Term Entry containing all information pertaining to a concept with multilingual information included in the same virtual record. This information grouped according to terminological data category, with language section information distributed throughout the record. The original entry uses the KSU DATCATS as tags. {0} VEB {1} EBV {0FFO} virus Epstein-Barr {1FFO} Epstein-Barr virus {1DEF} the herpes virus that causes mononucleosis ("kissing disease"). {SRC} MME1985 {PAG} 266 {0CTX} [Durante la fase de infección aguda se observa en los datos de laboratorio] una linfocitosis con presencia de elementos linfoides estimulados o "virocitos" muy similares morfológicamente a los que aparecen en la mononucleosis infecciosa por VEB [virus Epstein-Barr] o CMV [citomegalovirus]. {SRC} HIan1989 {PAG} 8 {1CTX} The high incidence of infection with Epstein-Barr virus (EBV) and cytomegalovirus (CMV), both of which are polyclonal B cell activators, certainly contributes to this phenomenon, [hyperactivity of the B cell limb of the immune response]. {SRC} HUsc1988 {PAG} 620 {RL:DOM} MED. AIDS (1) {RTY} TER {DAT} 1991/04/15 08:03 {RES} CAR _________________ 1) In a fit of consistency, "RL" had been changed to "EL" to conform to the idea all the information concerning a concept constitutes an entry, not just a record. Nevertheless, in any given record -- even if several records can be combined to form a theoretical entry, one is still working with a record. More importantly, "EL" is not a possible code for "RL" as opposed to any given language code because the code for Greek is "EL". Hence the change back to "RL" and "RTY". 2.1.1 Fully Normalized TEI entry for 1 above ---------------------------------------------- This version implies that the two language sections are linked as partial entries in the parent system. It groups the two language sections in the same , but in separate elements. Note that floating tags relating to specific non-floating tags are embedded in those elements. MED. AIDS term entry 1991/04/15 08:03 CAR VEB virus Epstein-Barr [Durante la fase de infección aguda se observa en los datos de laboratorio] una linfocitosis con presencia de elementos linfoides estimulados o "virocitos" muy similares morfológicamente a los que aparecen en la mononucleosis infecciosa por VEB [virus Epstein-Barr] o CMV [citomegalovirus]. EBV Epstein-Barr virus the herpes virus that causes mononucleosis ("kissing disease") The high incidence of infection with Epstein-Barr virus (EBV) and cytomegalovirus (CMV), both of which are polyclonal B cell activators, certainly contributes to this phenomenon, [hyperactivity of the B cell limb of the immune response]. 2.1.2 Separate s for each main language section -------------------------------------------------------- The same information could conceivably feed into two s with cross- references, depending on the configuration of the original system and of the conversion routine. In this case, tags would form the cross-reference link from a source to a target . Some systems would not include the equivalents for other languages in the same record with the "source" language information and others would. Both options should be possible. The importer of these data would be free to write his/her import routine to create whatever configuration was needed in the target application. MED. AIDS term entry 1991/04/15 08:03 CAR VEB virus Epstein-Barr [Durante la fase de infección aguda se observa en los datos de laboratorio] una linfocitosis con presencia de elementos linfoides estimulados o "virocitos" muy similares morfológicamente a los que aparecen en la mononucleosis infecciosa por VEB [virus Epstein-Barr] o CMV [citomegalovirus]. EBV (1) (2) MED. AIDS term entry 1991/04/15 08:03 CAR EBV Epstein-Barr virus the herpes virus that causes mononucleosis ("kissing disease") The high incidence of infection with Epstein-Barr virus (EBV) and cytomegalovirus (CMV), both of which are polyclonal B cell activators, certainly contributes to this phenomenon, [hyperactivity of the B cell limb of the immune response]. VEB ____________________ 1) Note that the equivalent tag in essence includes redundant information that is actually contained in the target record identified by the element. Melby has argued against allowing the inclusion of this kind of redundant information because it can cause logical problems within the data base. There are two very strong arguments in favor of it, however. First of all, many systems exist that include this kind of information. Secondly, the reason they do is that many users don't want to have to navigate to the target entry unless they want to check the provenience of the equivalent. If they recognize the equivalent or otherwise trust the system, they will accept what they see without chaining. 2) Note that the same kind of format holds true for the id reference in an tag as for its use to identify the "te". 2.2 Mixed order within the term entry -------------------------------------------- VEB **note type=foot** EBV virus Epstein-Barr Epstein-Barr virus the herpes virus that causes mononucleosis ("kissing disease"). (1) [Durante la fase de infección aguda se observa en los datos de laboratorio] una linfocitosis con presencia de elementos linfoides estimulados o "virocitos" muy similar es morfológicamente a los que aparecen en la mononucleosis infecciosa por VEB [virus Epstein-Barr] o CMV [citomegalovirus]. The high incidence of infection with Epstein-Barr virus (EBV) and cytomegalovirus (CMV), both of which are polyclonal B cell act ivators, certainly contributes to this phenomenon, [hyperactivity of the B cell limb of the immune response]. MED. AIDS TER 1991/04/15 08:03 CAR 1) It should be noted that the attribute cannot be used as part of a tagId. Consequently both the and the attributes are used here, but the content of both and remains the same -- it is the three-letter language code. The reason for this is simple: in most systems, this is the only characteristic that the system supplies to distinguish one term from another. If more than one term occurs in a given language, simple iteration numbers can be used (e.g., , , etc.) (See example on p. 26.) 2) The elements associated with these tags do not have to be identified with tagId because they are embedded in their respective non- floating elements. If the sources were to appear out of order, they would require a tagid. 2-way Tags ---------- It should also be noted that some of the codes listed above may in some instances function as tags and in other instances function as attributes. For instance, in a for the term "diaphragm spring," one might encounter the following sequence: diaphragm spring conical disk spring ... A different way to represent essentially the same information would involve leading with the standard term instead of using "diaphragm spring" as the main entry term: conical disk spring diaphragm spring ... This ambivalence applies for all codes marked "2-way" in the above list. 4. The Juice Record MicroMATER record {RID} juice (1) {0} juice {POS} n {1.1} caldos {POS} n {GND} m {NUM} pl (2) {DAT} 1991/03/15 15:36 {RES} KHT {1.2SYN} zumo {POS} n {GND} m {DAT} 1991/02/19 12:27 {RES} DE {1.3SYN} jugo {POS} n {GND} m {DAT} 1991/02/13 17:26 {RES} CAR {2.1} Saft {POS} n {GND} m 1991/06/25 19:31 SEW {0DEF1} juice: The extractable fluid contents of plant cells or plant structures, consisting of water holding sugar or other substances; as the "juice of grapes" {1.1DEF} caldos: Jugo extraído de frutos, como el vino, vinagre o aceite: "Los caldos de la Rioja" ("vinos") {SRC} SGgd1985 {PAG} 292 {DAT} 1991/03/23 {RES} SEW {1.2DEF} zumo: Liquido que se obtiene exprimiendo frutas y otros vegetales. {SRC} SGgd1985 {PAG} 1937 {DAT} 1991/03/23 {RES} SEW {1.3DEF} jugo: Liquido de un cuerpo org nico que se puede extraer or que éste segrega: "El jugo de una fruta". {SRC} SGgd1985 {PAG} 1179 {DAT} 1991/03/23 {RES} SEW {0CTX2} Methods of pruning are of particular importance, and in the better areas vines are cut back very severely each winter, since a small yield of grapes per vine produces juice of significant high quality. {SRC} WIsi1983 {PAG} 2681 {DAT} 1991/03/15 15:36 {RES} KHT {0CTX3} The juice inside the grapes is normally more or less colorless... {SRC} WIsi1983-2682 {0CTX4} Grape juice turns into wine by the natural and quite spontaneous action of fermentation... {SRC} WIsi1983 {PAG} 2681 {DAT} 1991/02/17 12:47 {RES} NES {1.1CTX} Los romanos aromatizaron sus caldos con flores y frutas y, para conservarlos mejor, agregaraon a los mismos brea o miel. {SRC} GEri1975 {PAG} 9809 {DAT} 1991/03/15 15:36 {RES} KHT {1.2CTX} Licor alcohólico que se hace del zumo de las uvas exprimido y fermentado. {SRC} DEvi1978 {PAG} 631 {DAT} 1991/02/19 12:27 {RES} DE {1.3CTX} Vino es la bebida que se obtiene por la fermentación completa o parcial de la uva fresca o del jugo de ella, también llamado mosto. {SRC} DEvi1974 {PAG} 571 {DAT} 1991/02/13 17:26 {RES} CAR <2.1DEF> aus reifen Früchten oder Gemüse gewonnene Flüsigkeit GWdw1979 1179 1991/03/23 SEW {RL:XRF} must {ORG} PDS {RTY} TER {DOM} Wine {DAT} 1991/03/15 15:36 {RES} SEW _______________________ 1) If "RL" and "RTY" are restored, "RID" should be restored as well. 2) Note yet another use of number with a totally different significance. Note on the next page the use of the attribute . These procedures must be coordinated with the dictionary committee. 4.1.1 TEI Entry (June 27, 1991), structured to conform with Example 2.1.1 ------------------------------------------------------------------------- (1) wine ... SEW WOR (2) juice n ... ... SEW (3) ... ... ... KHT 4) ... ... SEW ... ... NES must (5) caldos n m pl ... KHT ... ... SEW ... ... ... KHT mosto zumo n m sing ... KHT ... ... SEW ... ... DE mosto jugo n m ... ... SEW ... ... ... CAR mosto Saft n m ... ... SEW Most ENDNOTES: 1) The numbers used as identifiers are arbitrarily assigned. As noted above, they would represent addresses assigned by the parent application (and perhaps invisible in that application) that are accessed by the conversion routine and inserted into the exchange document. 2) According to the first rule of adjacency (see the minutes of the Cleveland meeting), all the information that appears after but before the first term refers to the entire . All information that appears after any given term refers to the term itself. 3) Note that the source, date and responsibility information have been embedded in the ... element. According to the two rules of adjacency stated in the minutes, this is the only way we have of associating these items of information with the term itself as opposed to the entire term entry. 4) XREF is definitely not a type of administrative data. It is a powerful tag already inherent in TEI. We must be able to distinguish between types of XREF, however, because there may be concept system cross-references, such as "must," and there may also be term equivalent cross-references, such as those used in 4b. Note also that there is a question of redundancy here similar to the one that exists for "foreign language" equivalents. The reasons for including this redundancy are the same as those for including foreign language equivalents. 5) Note that in this position unequivocally refers to the term itself. If we allowed the third rule of adjacency for floating tags, notations that would appear elsewhere would, unless otherwise identified, refer to next previous non-floating tag, i.e. def, ctx, etc. 4.1.2 TEI Entry, structured to conform with Example 2.1.2 --------------------------------------------------------- wine ... SEW WOR must juice n ... ... SEW ... ... ... KHT ... ... SEW ... ... NES caldos (1)(2) jugos zumo Saft WINE12345.5> wine ... SEW WOR mosto caldos n m pl ... KHT ... ... SEW ... ... ... KHT juice jugo zumo Saft wine ... DE WOR mosto zumo n m sing ... KHT ... ... SEW ... ... DE juice caldos jugo Saft [4.1.2, cont.] wine ... CAR WOR mosto jugo n m ... ... SEW El jugo de una fruta. ... ... CAR juice caldos zumo Saft wine ... SEW WOR Most Saft n m ... ... SEW juice caldos zumo jugo ENDNOTES 1) The complex cross-referencing used here represents one piece of theory that is difficult to ignore. Some systems (Termium, for instance) include the information for each language in its own partial record, but it is possible to view parallel records on-screen. Consequently the user doesn't have to move from one to the other and back again. Term-PC also places information related to a given concept in the same master record, but groups language sections separately. I don't recall whether it is possible to view partial records on-screen at the same time, but in the Windows environment that ought to be doable. Based on these examples, it is not necessary to include the equivalent of a term in another language in the source language entry, but in systems that do not allow simultaneous display of more than one entry, the inclusion of the other language equivalents would be important to me. One thing is essential, however: the XREF link is required whether the equivalent actually appears or not. 2. Note that s have note been included in in record 4.1.2. This is because the structure of this record does not require cross-referencing within the record itself. 4.2 TEI Entry, structured to conform with Example 2.2 id>WINE12345.1> wine ... SEW WOR juice n caldos n m pl ... KHT zumo n m sing ... KHT jugo n m Saft n m ... ... SEW ... ... ... SEW ... ... ... SEW ... ... ... SEW ... ... SEW ... KHT ... SEW ... NES ... KHT ... DE ... CAR ... SEW must mosto 1) Most ENDNOTES: 1) I am going out on a limb here: I am guessing that if a particular element refers back to all the elements identified with an iterative attribute that one can indeed refer to all of them by dropping the numbers. Meeting of the A&I-7 Working Group of the TEI AGENDA for the August meeting 1. Review the minutes of the June meeting and resolve open issues. 2. Review the assignment list for the August meeting and discuss issues. 3. Plan the agenda for the November meeting.