PreviousUpNext

13 Terminological Databases

Part 3

Base Tag Sets

13 Terminological Databases

Terminological information generally resides in terminology databases (TDBs), but for SGML applications, these collections of data can be viewed as documents. A document containing terminological data is made up of terminological entries. Typically, a terminological entry treats a single concept and contains information on the assignment of single or multi-word terms to this concept. Bilingual and multilingual terminological entries deal with harmonized or very closely related concepts in two or more languages that are treated as functional equivalents in the context of a specific domain or subdomain. Terminological data can take the form of terminological databases (TDBs) or can be used to print hardcopy terminological documents, such as terminological dictionaries, technical vocabularies, or thesauri.

The TEI description of terminological data was originally designed primarily as a terminology interchange format (TIF) to allow users of terminology databases to exchange database records. In this guise it is called the Electronic Terminology Interchange Format (E-TIF). The exchange of database records is especially important in practice because the structure of terminological records varies considerably from TDB to TDB, reflecting differences of design and of user needs. Users of TDBs frequently need to interchange data in order to access expert information and to prevent the duplication of effort, but differences in software, hardware, and methodology complicate interchange. A universal interchange format is a crucial element in making interchange easier.

The tag set defined in this chapter may also be used to mark up documents for the purpose of printing terminological dictionaries and vocabularies, or exchanging them in electronic form. Printed terminological documents differ from terminological databases in that they are frequently divided into sections and subsections and include prose text in introductions, etc. When used for marking up printed documentation, we can speak of the tag set defined here as a Print Terminology Interchange Format (P-TIF).

Because printed terminological dictionaries differ from terminological databases, problems may arise if one attempts to use the same electronic document both for printing and to exchange records among databases. A printed terminological dictionary may contain material not suitably encoded for introduction into database records. Domain and subdomain information may be implied by the arrangement of <termEntry> s rather than by explicit domain specifications within the individual entries.

Other interchange difficulties include differences between term entry styles used in prescriptive and descriptive terminology work and problems arising from differences in the degree of detail used to classify data elements in different databases. (The term data element is used by terminologists to refer to the smallest defined individual items of information, regardless of whether they are represented as SGML elements, SGML attributes, or fields or columns in a database. That is the usage followed here.) Procedures for addressing these various problems are treated in more detail in another document, the TEI / LISA / ISO - TIF --- Terminology Interchange Format --- A Tutorial (1993). [ see note 72 ]

13.1 The Terminological Entry

The basic unit of terminology management is the terminological entry. A terminological entry documents information pertaining to a concept and generally speaking contains at least one term. In addition to the term, various kinds of descriptive and administrative data are recorded concerning the term, the concept to which it is assigned, and relationships to other terms and concepts. Administrative information supports the management of the terminology database or document.

A sample terminological entry consists of a series of entries like the following:

13.2 Tags for Terminological Data

The following sections define elements for use in tagging terminological data. The elements and attributes listed are based on empirical studies. The studies indicated the use of a wide variety of different data element types (data categories or database field types), but this variety can be reduced to a relatively small set of SGML elements and attributes expressing notions common to most, if not all, TDBs. Those elements and attributes are defined here. In addition, the global TEI attributes defined in section 3.5 , and the elements and attributes defined in chapter 6 , can all be used in terminological applications.

When tagging terminological data, three elements constitute the set of non-floating elements: <term> , <otherForm> , and <descrip> . All other elements function as floating elements, including: <admin> , <note> , <gram> , <bibl> , <biblFull> , <date> , <table> , <formula> , <figure> , and the linking elements (<ptr> , <xptr> , <ref> , and <xref> ). The rules for combining floating with non-floating elements are spelled out below in section 13.3.1 , and in section 13.3.2 .

As indicated, these elements all possess a type attribute, used to classify the generic elements so as to match the classifications used by TDBs. The type attributes allow specific items of information not defined in the DTD to be tagged as one of the defined elements with an appropriate type value. The possible values of type thus constitute a sizable open list.

At the time of publication, work is under way in ISO Technical Committee 37, Sub-Committee 3, Working Group 1 to compile an official dictionary of data element types (data categories) for use in terminology work, which will eventually provide the core for a complete list of type attribute values. This data element dictionary will appear as ISO 12 620. The attribute values that occur in the examples shown in this chapter represent a subset of those that will be defined in ISO 12 620.

The <ofig> and <otherForm> elements are not necessary if each potential <otherForm> element is recast as a term in its own <tig> . For example, a term could be placed in a <tig type=synonym> .

When the base tag set described in this chapter is used, the following attributes are added to the set of global attributes:

For discussion of the usage of these attributes, see below, section 13.3.2 .

Among the TEI core elements, the following are most likely to be found necessary in encoding terminological data; for fuller descriptions see the appropriate sections in chapter 6 . In the case of the <date> element, it should be noted that the ISO format (YYYY-MM-DD ) is preferred for terminology entries.

Like all other elements defined in the TEI DTDs, all elements in the base tag set for terminology possess the following global attributes:

Using the tags defined here, the example given above in section 13.1 might be tagged thus: [ see note 73 ]

<!-- Example 2a:  Nested Term Entry -->
<termEntry>
     <admin type='domain'> appearance of materials </admin>
     <tig lang=en>
	  <term> opacity </term>
	  <gram type=pos> n </gram>
	  <descrip type='definition'> degree of obstruction to the
	  transmission of visible light </descrip>
	  <ptr type='bibliographic' target='ASTM.E284'>
	  <admin type='responsibility' resp='ASTM E12'> </admin>
     </tig>
     <tig lang=de>
	  <term> Opazität </term>
	  <gram type=pos> n </gram>
	  <gram type=gen> f </gram>
	  <descrip type='definition'> Maß für die
	  Lichtdurchsichtigkeit </descrip>
	  <ref type='bibliographic' target='HFdn1983'> p. 383 </ref>
	  <admin type='responsibility' resp='DIN TC for paper
	  products'></admin>
     </tig>
     <tig lang=fr>
	  <term> opacité </term>
	  <gram type=pos> n </gram>
	  <gram type=gen> f </gram>
	  <descrip type='definition'> rapport du flux lumineux
	  incident au flux lumineux transmis ou réfléchi
	  par un noircissement photographique </descrip>
	  <ptr type='bibliographic' target='HJdi1986'>
	  <admin type='responsibility' resp='C.I.R.A.D.'> </admin>
     </tig>
</termEntry>

Both the <ptr type='bibliographic' target='ASTM.E284'> and <ref type='bibliographic' target='HFdn1983'> elements in the example indicate links to complete bibliographical entries included in the back matter element of the same document. `HFdn1983' is a source reference code for a book, generated according to ISO/TC 37 WI 18, Coding of Bibliographic References in Terminology Work and Terminography (1991). Its full bibliographic record would be:

<!-- Example 2b:  Full Bibliographic Entry -->
<biblFull>
       <titleStmt id=HFdn1983>
	      <title> Wörterbuch technischer Begriffe mit 4300
	      Definitionen nach DIN </title>
	      <editor> Henry G. Freeman </editor>
       </titleStmt>
       <editionStmt>
	      <edition> III </edition>
       </editionStmt>
       <extent> 703 pp </extent>
       <publicationStmt>
	      <publisher> Beuth Verlag GmbH </publisher>
	      <pubPlace> Berlin and Köln </pubPlace>
	      <date> 1983 </date>
       </publicationStmt>
       <sourceDesc><p>Compiled for the standards of the DIN (Deutsches
	      Institut für Normung).</p>
       </sourceDesc>
</biblFull>

Further examples, including alternate encodings of this term entry, are given below in section 13.3.2 , and section 13.3.3 .

The formal definition of these elements depends on which style of markup is being used; for discussion of the two styles, see the following section, 13.3 . For the formal declarations for the two styles, see sections 13.4.1 , and 13.4.2 .

13.3 Basic Structure of the Terminological Entry

A terminological entry is identified with the <termEntry> tag and contains one or more terms marked with the tag <term> , which may appear with associated SGML elements. A single term and its associated SGML elements (such as <gram> , <descrip> , <admin> ) constitute a term information group, <tig> . A <termEntry> may be made up of one or more <tig> s.

There are two structural descriptions for <termEntry> s:

The nested structure is preferred, especially for interchange with unknown partners. The flat structure provides an option that can be used between interchange partners whose systems exhibit fairly similar structures. The flat structure may also be used as an intermediate form for systems making the transition to the nested format.

13.3.1 Nested Term Entries

A nested <termEntry> uses SGML to represent the hierarchical relationships implicit in the terminological entry by utilizing the following principles of embedding and adjacency.

The conversion routine that creates the nested entry infers the language of the <tig> from the language of the <term> , a process that can be construed as `upward inheritance' from <term> to <tig> . Standard TEI `downward inheritance' applies for all the elements embedded in the <tig> : their language is that of the <tig> , unless this default value is overridden by stating a new value.

An example of a nested term entry was given in section 13.2 .

13.3.2 Flat Term Entries Using Rules of Adjacency

The flat terminological entry does not use the <tig> element to enclose a term and its associated elements. Instead, it provides other mechanisms to express the relationships that occur within and among entries in a TDB, while at the same time allowing the different types of entries found in different source TDBs to be represented in very natural ways. The difference between the nested and flat terminological entries is that, while both can express the same information, the nested structure represents the logical hierarchy implicit within the entry by embedding elements in one another, while the flat entry does not represent the logical hierarchy within the entry in this way. Since many existing TDBs do not overtly indicate any hierarchical structure such as that represented in a nested entry, the flat entry may be more apt to reflect the organization of data elements within an entry found in the particular source TDB, whereas the nested entry more obviously characterizes an ideal abstract structure of the term entry. In flat entries, terms and their associated elements are grouped by means of the following rules of adjacency:

Encoded using the flat style, the example given in section 13.2 , might look like this:

<!-- Example 3:  Flat <TermEntry>			      -->
<termEntry>
     <admin type='domain'> appearance of materials </admin>
     <term lang=en> opacity </term>
     <gram type=pos> n </gram>
     <descrip type='definition'> degree of obstruction to the
     transmission of visible light </descrip>
     <ptr type='bibliographic' target='ASTM.E284'>
     <admin type='responsibility' resp='ASTM E12'></admin>
     <term lang=de> Opazität </term>
     <gram type=pos> n </gram>
     <gram type=gen> f </gram>
     <descrip type='definition'> Maß für die
      Lichtdurchsichtigkeit
     </descrip>
     <ref type='bibliographic' target='HFdn1983'> p. 383 </ref>
     <admin type='responsibility' resp='DIN TC for paper products'>
     </admin>
     <term lang=fr> opacité </term>
     <gram type=pos> n </gram>
     <gram type=gen> f </gram>
     <descrip type='definition'> rapport du flux lumineux
     incident au flux lumineux transmis ou réfléchi
     par un noircissement photographique </descrip>
     <ptr type='bibliographic' target='HJdi1986'>
     <admin type='responsibility' resp='C.I.R.A.D.'> </admin>
</termEntry>

13.3.3 Flat Term Entries Using Group and Depend Attributes

In practice, there are term entries where elements are ordered in such a way that the rules of adjacency cannot be used. For instance, in Example 3 the <ptr> and <ref> linking elements refer to the immediately preceding <descrip> information. The <admin type='responsibility'> elements as represented here also refer to the <descrip> element. It may, however, be desirable for the bibliographic reference to refer not only to the quoted material in the descriptive element, but also to the term itself. Because the second rule of adjacency dictates that all floating elements following a non- floating element refer to that non-floating element, a mechanism is required to `point' to the <term> if the floating element depends on the <term> itself.

There are also other exceptions to the adjacency rules: in some term entries elements are associated with a <term> other than the immediately preceding <term> . Such entries may be called discontiguous flat term entries, since the constituents of a term information group may not be adjacent. In such entries, information pertaining to the entire terminological entry may not always appear at the beginning of the entry (i.e., prior to the introduction of a term).

Such an entry might be encoded as follows:

<!-- Example 4:  Discontiguous Flat <termEntry>  -->
<termEntry n=texyz>
     <term lang=en n=1> opacity </term>
     <gram type=pos depend=1> n </gram>
     <term lang=de n=2> Opazität </term>
     <gram type=pos depend=2> n </gram>
     <gram type=gen depend=2> f </gram>
     <term lang=fr n=3> opacité </term>
     <gram type=pos depend=3> n </gram>
     <gram type=gen depend=3> f </gram>

     <descrip type='definition' group=1 n=endes1> degree of
	obstruction to the transmission of visible light  </descrip>
     <descrip type='definition' group=2 n=dedes1> Maß für die
	Lichtdurchsichtigkeit  </descrip>
     <descrip type='definition' group=3 n=frdes1> rapport du
	flux lumineux incident au flux lumineux transmis ou
	réfléchi par un noircissement photographique
	</descrip>
     <ptr type='bibliographic' depend=endes1 target='ASTM.E284'>
     <admin type='responsibility' depend=endes1 resp='ASTM E12'> </admin>
     <ref type='bibliographic' depend=dedes1 target='HFdn1983'>
	p. 383 </ref>
     <admin type='responsibility' depend=dedes1
	resp='DIN.TC.for.paper'></admin>
     <ptr depend=frdes1 type='bibliographic' target='HJdi1986'>
     <admin type='responsibility' depend=frdes1
	resp='C.I.R.A.D.'> </admin>
     <admin type='domain' depend=texyz> appearance of materials
     </admin>
</termEntry>

In the above example, depend elements indicate that the material tagged with this attribute is related to the targeted element. The group elements indicate that the information so marked is part of an implicit <tig> , i.e. that it pertains either to the term or to the entire implicit <tig> . Items linked to other elements by depend do not require the group attribute because they are associated with the group already by virtue of their relation to elements that are themselves associated with the group.

So as to describe appropriate relationships in discontiguous flat <termEntry> s, it is necessary to define a pointing mechanism that allows any non-adjacent element to be related to an implicit term information group and therefore to the <term> with which it is associated or to some other specific element.

Two methods are provided to represent this association. For terminology files in which unique identifiers for all <term> elements cannot be assumed (as will often be the case in interchange), the group and depend attributes should be used. For terminology files in which unique SGML identifiers can be provided, the grpPtr and depPtr attributes should be used. The two pairs of attributes have identical significance as far as the association of elements is concerned.

The group attribute associates an element with a specific term, or with an implicit term information group: its value must be the same as the n attribute on the <term> element being pointed to. During interchange, the group attribute would be used to extract and assemble all the elements related to a specific term information group from a discontiguous flat <termEntry> by matching them to the n attributes on the terms. The group pointer accounts for the kind of relationship represented by the principle of embeddedness within a <tig> in a nested term entry.

The depend attribute associates an element with some other specific element: its value must be the same as the n attribute on the element being pointed to. As shown in the last line of Example 4, the depend attribute can also point to the entire terminological entry by targeting a value of n indicated in the <termEntry> element. If for any reason the grammatical information pertaining to a term does not follow the term immediately, this information must be linked to the term with the depend attribute.

In terms of the extended pointer notation defined in chapter 14 , the specification group=2 is synonymous with HERE ANCESTOR (1 TERMENTRY) DESCENDANT (1 TERM N 2) , and the specification depend=3 is synonymous with HERE ANCESTOR (1 TERMENTRY) DESCENDANT (1 * N 3) .

To summarize the behavior of group and depend , the group attribute identifies an implicit <tig> , whereas the depend attribute implies relatedness. If there is any ambiguity with respect to the rules of adjacency, one should use depend .

In Example 4, the English term `opacity' is identified as n =1, and all other elements associated with this <tig> are marked as group =1; in German, the term and all its associated elements are identified as n =2 and group =2, respectively; in French, the term and associated elements are marked group =3. Since the bibliographical references are displaced from the descriptive information with which they are associated, the descriptions are identified with n =endes1, n =dedes1, and n =frdes1, respectively. The <ptr> and <ref> elements are then identified with depend attributes that target the appropriate descriptions. Even if the elements in the entry were adjacent to each other in the entry, this convention would be essential if one wanted to indicate that the source applied to the <term> and hence to the entire <tig> , rather than just to the <descrip> element itself.

13.3.4 References between Term Entries

Terminology documents utilize a variety of cross-references between <termEntry> s, for instance to link to bibliographic entries or between equivalents in different languages, synonyms and related terms and concepts. These references are usually implemented using the TEI linking elements <ptr> and <ref> , together with a value of the attribute type . If, as is the case with the reference to ASTM E284, the total bibliographic source description is contained in the `target' element of the linking element, use <ptr> . If, on the other hand, a page number is included, this page number must appear as the content of a linking element introduced by the <ref> element.

Examples:

 <ptr type='bibliographic' target='ASTM.E284'>
or
 <ref type='bibliographic' target='HFdn1983'> p. 383 </ref>

If the full bibliographical citation is included in the <termEntry> itself, linking elements are unnecessary and the citation can be marked using the <bibl> , <biblStruct> , or <biblFull> elements. For further discussion of bibliographic citations and references, see section 6.10 .

13.4 Overall Structure of Terminological Documents

To enable the base tag set for terminology, a parameter entity TEI.terminology must be declared within the document type subset, the value of which is INCLUDE , as further described in section 3.3 . A document using this base tag set and no other additional tag sets will thus begin as follows:

 <!DOCTYPE TEI.2 PUBLIC "-//TEI P3//DTD Main Document Type//EN"
                        "tei2.dtd" [
      <!ENTITY % TEI.terminology 'INCLUDE' >
 ]>
This declaration makes available all of the elements described in this chapter, in addition to the core elements described in chapter 6 . The default structure for terminological documents is similar to that defined by chapter 7 : within the <TEI.2> element they contain a <teiHeader> and a <text> . The <text> element, in turn, contains as usual a <body> element, optionally preceded by a <front> and followed by a <back> . The <body> may contain a series of <termEntry> elements, which may optionally be grouped into sections tagged with the same elements (<div> , <div0> , <div1> , etc.) as defined in section 7.1 .

In order to support both the flat and the nested styles of markup, three distinct DTD fragments for terminology are provided.

In file teiterm2.dtd , the top-level elements for the terminology base are defined, and a subordinate parameter entity, termtags is defined and referred to. By default, this entity refers to file teite2n.dtd , which defines the DTD for nested markup; if the flat style of markup is to be used, the document's DTD subset should define termtags as referring to the file teite2f.dtd , as shown in the examples in section 13.3.2 .

<!-- 13.4: TEIterm2.DTD: Base tag set for terminological data -->
<!-- Text Encoding Initiative: Guidelines for Electronic      -->
<!-- Text Encoding and Interchange. Document TEI P3, 1994.    -->

<!-- Copyright (c) 1994 ACH, ACL, ALLC. Permission to copy    -->
<!-- in any form is granted, provided this notice is          -->
<!-- included in all copies.                                  -->

<!-- These materials may not be altered; modifications to     -->
<!-- these DTDs should be performed as specified in the       -->
<!-- Guidelines in chapter "Modifying the TEI DTD."           -->

<!-- These materials subject to revision. Current versions    -->
<!-- are available from the Text Encoding Initiative.         -->
<!-- First, embed the default text structure elements.        -->

<![ %TEI.singleBase [
<!ENTITY % TEI.structure.dtd system 'teistr2.dtd'               >
%TEI.structure.dtd;
]]&nil;>


<!ENTITY % termtags system 'teite2n.dtd'                        >
%termtags;

In file teiterm2.ent , terminology-specific extensions to the TEI element class system are defined, including the classes terminology , comp.terminology , terminologyInclusions , and terminologyMisc .

<!-- 13.4: TEIterm2.ent: Base tag set for terminological data -->
<!-- Text Encoding Initiative: Guidelines for Electronic      -->
<!-- Text Encoding and Interchange. Document TEI P3, 1994.    -->

<!-- Copyright (c) 1994 ACH, ACL, ALLC. Permission to copy    -->
<!-- in any form is granted, provided this notice is          -->
<!-- included in all copies.                                  -->

<!-- These materials may not be altered; modifications to     -->
<!-- these DTDs should be performed as specified in the       -->
<!-- Guidelines in chapter "Modifying the TEI DTD."           -->

<!-- These materials subject to revision. Current versions    -->
<!-- are available from the Text Encoding Initiative.         -->


<!ENTITY % x.comp.terminology ''                                >
<!ENTITY % m.comp.terminology '%x.comp.terminology termEntry'   >
<!ENTITY % seq '(%m.common; | %m.comp.terminology;)* '          >
<!ENTITY % mix.terminology '| %m.comp.terminology'              >

<!ENTITY % x.terminologyInclusions ''                           >
<!ENTITY % m.terminologyInclusions '%x.terminologyInclusions 
           date | dateStruct | note | ptr | ref | xptr | xref'  >
<!ENTITY % x.terminologyMisc ''                                 >
<!ENTITY % m.terminologyMisc '%x.terminologyMisc admin | 
           descrip'                                             >

<!-- Add attributes to the set of global attributes:          -->

<!ENTITY % a.terminology '
          grpPtr             IDREF               #IMPLIED
          depend             CDATA               #IMPLIED
          depPtr             IDREF               #IMPLIED
          group              CDATA               #IMPLIED'      >

13.4.1 DTD Fragment for Nested Style

In file teite2n.dtd the following definitions are found, which define the elements used in the nested markup style:

<!-- 13.4.1:  Elements for nested-style terminological data   -->

<!-- The nested structure is used for data interchange and    -->
<!-- represents a canonical structured form for terminology   -->
<!-- entries, which differs from the less structured forms    -->
<!-- frequently used to store data in terminological          -->
<!-- databases.                                               -->

<!ELEMENT termEntry     - O  ((%m.terminologyMisc)*, tig+) 
                                                  
                                                 +(%m.terminologyInclusions)
                                                                >
<!ATTLIST termEntry          %a.global;
          type               CDATA               #IMPLIED       >
<!-- Notes, descrip(s) and admin(s) are allowed in the        -->
<!-- termEntry to provide documentation that applies to the   -->
<!-- whole entry.                                             -->


<!-- tig='term information group'                             -->

<!-- ofig='otherform information group'                       -->


<!ELEMENT tig           - O  ((%m.terminologyMisc)*, (term, 
                             gram*), (%m.terminologyMisc)*, 
                             ofig*)                             >
<!ATTLIST tig                %a.global;
          type               CDATA               #IMPLIED       >
<!-- Order is significant: term, descrip(s), ofig(s) or       -->
<!-- otherform(s)                                             -->


<!ELEMENT ofig          - O  ((%m.terminologyMisc)*, 
                             (otherForm, gram*), 
                             (%m.terminologyMisc)*)             >
<!ATTLIST ofig               %a.global;
          type               CDATA               #IMPLIED       >
<!ELEMENT otherForm     - O  (%paraContent;)                    >
<!ATTLIST otherForm          %a.global;
          type               CDATA               #IMPLIED       >
<!ELEMENT descrip       - O  (%paraContent;)                    >
<!ATTLIST descrip            %a.global;
          type               CDATA               #IMPLIED       >
<!ELEMENT admin         - O  (%paraContent;)                    >
<!ATTLIST admin              %a.global;
          resp               CDATA               #IMPLIED
          date               CDATA               %ISO-date
          type               CDATA               #IMPLIED       >
<!-- We define a.dictionaries as the empty string, since we   -->
<!-- are not now using the tag set for dictionaries.          -->

<!ENTITY % a.dictionaries ''                                    >
<!ELEMENT gram          - O  (%paraContent;)                    >
<!ATTLIST gram               %a.global;
                             %a.dictionaries;
          type               CDATA               #IMPLIED       >

13.4.2 DTD Fragment for Flat Style

In file teite2f.dtd the following definitions, which provide support for the flat markup style, are found:

<!-- 13.4.2:  Elements for flat-style terminological data     -->
<!-- The flat structure is used to represent a variety of     -->
<!-- terminology documents that occur in practice and which   -->
<!-- do not follow the form of the nested interchange         -->
<!-- format. The flat representation allows for a less rigid  -->
<!-- structure, but provides a rich mechanism for reflecting  -->
<!-- inter-element relations.                                 -->


<!-- The declaration of termEntry enforces appearance of at   -->
<!-- least one term element in a termEntry, which may be      -->
<!-- preceded by descrip, admin, note, otherform, or gram.    -->
<!-- There may be multiple notes, admins, descrips            -->
<!-- otherforms, and grams appearing in any order. xRef,      -->
<!-- date, biblRef can appear in all positions in termEntry.  -->


<!ELEMENT termEntry     - O  ( (%m.terminologyMisc | otherForm 
                             | gram | 
                             %m.terminologyInclusions)*, (term, 
                             (%m.terminologyMisc | otherForm | 
                             gram | %m.terminologyInclusions)* 
                             )+ )                               >
<!ATTLIST termEntry          %a.global;
          type               CDATA               #IMPLIED       >
<!ELEMENT otherForm     - O  (%paraContent;)                    >
<!ATTLIST otherForm          %a.global;
          type               CDATA               #IMPLIED       >
<!ELEMENT descrip       - O  (%paraContent;)                    >
<!ATTLIST descrip            %a.global;
          type               CDATA               #IMPLIED       >
<!ELEMENT admin         - O  (%paraContent;)                    >
<!ATTLIST admin              %a.global;
          resp               CDATA               #IMPLIED
          date               CDATA               %ISO-date
          type               CDATA               #IMPLIED       >
<!-- We define a.dictionaries as the empty string, since we   -->
<!-- are not now using the tag set for dictionaries.          -->

<!ENTITY % a.dictionaries ''                                    >
<!ELEMENT gram          - O  (%paraContent;)                    >
<!ATTLIST gram               %a.global;
                             %a.dictionaries;
          type               CDATA               #IMPLIED       >

13.5 Additional Examples of Term Entries

The tag set defined in this chapter is designed to accommodate the variety of structures that occur in TDBs; this section shows the how the same information may be encoded in different ways, depending on local convenience or preferences. Example 5 gives an entry from an ISO terminological standard. Example 6 treats this English-French equivalent pair as a single nested terminological entry, whereas Example 7 splits the information into two nested entries with cross-references. Example 8 shows the same data as a flat terminological entry with adjacent elements, whereas Example 9 groups the elements according to element type, which requires the use of pointers in order to reconstruct the implicit terminological information group from discontiguous elements.

The interchange of terminological data between TDBs requires an export routine (to E-TIF) and an import routine (from E-TIF). For interchange between unknown partners, it may be desirable to normalize the encoding method rather than allow all the options presented in this section. The effect of normalization would be that import routines become easier to implement while export routines become more difficult to implement. At the time of this publication, work is under way in ISO Technical Committee 37, Subcommittee 3, Working Group 3 on a normalized version of E-TIF called ISO [DIS] 12 200. Some aspects of normalization under consideration are to use only the nested representation and avoid the use of the following options: divisions within the <body> , the <otherForm> element, the group and depend attributes, elements before the <term> element in a <tig> , inclusion exceptions other than <ptr> and <xptr> , and paragraph content other than #PCDATA in the elements <admin> and <gram> .

13.5.1 Example Term Entry from ISO 472

The following term entry is taken from ISO 472:1988, Plastics --- Vocabulary , Bilingual edition (Geneva: ISO, 1988), p. 84. The original uses typographic characteristics to represent different data element types within the term entry, not all of which have been retained in the reproduction of this sample. As prescribed by ISO layout guidelines, [ see note 74 ] the original text is printed in Helvetica, with English and French information presented in two parallel columns; head terms appear in bold face, notes in a smaller font size than the main text, and terms referred to in the cross references are printed in italics.

13.5.2 The Example Treated as a Single Term Entry in Nested Form

This treatment assumes that both the English and French terms are treated together in the same entry. The elements grouped together at the top of the term entry apply to the entire entry. Only the first of the three cross-referenced terms is included in this example; it is represented by a <ptr> link which targets a term entry (related concept) contained in the same document. The id values used here are purely arbitrary.

<termEntry id=te84.11>
    <admin type='domain'> plastics </admin>
    <ref type='bibliographic' target='ISO.472-1988'> p. 84 </ref>
    <admin type='creation' date='1988'
           resp='ISO/TC 61, Plastics'> </admin>
    <ptr type='relatedTerm' target='te04.06'>

    <tig lang=en>
          <term> thermal degradation </term>
          <gram type=pos> n </gram>
          <descrip type='definition'> The entirety of all
          deleterious chemical modifications of plastic at
          elevated temperature. </descrip>
          <note> It is essential to report the temperature and
          other environmental conditions at which the phenomenon
          is studied.  </note>
     </tig>

     <tig lang=fr>
          <term> décomposition thermique </term>
          <gram type=pos> n </gram>
          <gram type=gen> f </gram>
          <descrip type='definition'> Ensemble de toutes les
          modifications chimiques nuisibles d'un plastique à
          température élevée. </descrip>
          <note> Il est essentiel d'indiquer la température et
          les autres conditions d'environnement dans lesquelles le
          phénomène est étudié. </note>
     </tig>
</termEntry>

<!-- Referenced term entry: -->

<termEntry id=te04.06>
     <tig lang=en>
          <term> ageing </term>
          <!-- ... -->
     </tig>

     <tig lang=fr>
          <term> vieillissement </term>
          <!-- ... -->
     </tig>
 </termEntry>

13.5.3 The Example Treated as Two Separate Term Entries in Nested Form

This example takes cognizance of the fact that some TDBs treat each term in a single <termEntry> instead of grouping all the information for a single concept into a single <termEntry> . The rationale behind this approach is frequently that no two languages truly provide harmonized concepts, although in the case of standardized terminology it can generally be assumed that concepts have been harmonized. The significant difference in encoding that occurs in this type of system is that <ptr> linking elements are required more frequently to link to term equivalents and related terms in other entries in the same document. Since there is only one <tig> in each entry, the <ptr> element could come at the beginning, as shown in the previous example, or inside the <tig> as shown below.

<termEntry id=te84.11.en>

     <admin type='domain'> plastics </admin>
     <ref type='bibliographic' target='ISO.472-1988'> p. 84 </ref>
     <admin type='creation' date='1988' resp='ISO/TC 61, Plastics'></admin>

     <tig lang=en>
          <term> thermal degradation </term>
          <gram type=pos> n </gram>
          <descrip type='definition'> The entirety of all
          deleterious chemical modifications of plastic at
          elevated temperature. </descrip>
          <note> It is essential to report the temperature and
          other environmental conditions at which the phenomenon
          is studied.  </note>
          <ptr type='relatedTerm' target='te04.06.en'>
          <ptr type='equivalent' lang=fr target='te84.11.fr'>
     </tig>

</termEntry>

<termEntry id=te84.11.fr>

      <admin type='domain'> plastics </admin>
      <ref type='bibliographic' target='ISO.472-1988'> p. 84 </ref>
      <admin type='creation' date='1988'
             resp='ISO/TC 61, Plastics'> </admin>

     <tig lang=fr>
          <term> décomposition thermique </term>
          <gram type=pos> n </gram>
          <gram type=gen> f </gram>
          <descrip type='definition'> Ensemble de toutes les
          modifications chimiques nuisibles d'un plastique à
          température élevée. </descrip>
          <note> Il est essentiel d'indiquer la température et
          les autres conditions d'environnement dans lesquelles le
          phénom`ne est étudié. </note>
          <ptr type='relatedTerm' target='te04.06.fr'>
          <ptr type='equivalent' lang=en target='te84.11.en'>

     </tig>

</termEntry>

<!-- Referenced term entry: -->

<termEntry id=te04.06.en>
     <tig lang=en>
          <term> ageing </term>
          <!-- ... -->
     </tig>
</termEntry>

<termEntry id=te04.06.fr>
     <tig lang=fr>
          <term> vieillissement </term>
          <!-- ... -->
     </tig>
</termEntry>

13.5.4 The Example Treated as a Flat Term Entry Using Adjacency Rules

This version of Example 5 uses a flat style of encoding, following the pattern of many existing TDBs; elements associated with a given term follow it immediately:

<termEntry id=te84.11>
     <admin type='domain'> plastics </admin>
     <ref type='bibliographic' target='ISO.472-1988'> p. 84 </ref>
     <admin type='creation' date='1988'
            resp='ISO/TC 61, Plastics'> </admin>
     <term lang=en> thermal degradation </term>
     <gram type=pos> n </gram>
     <descrip type='definition'> The entirety of all deleterious
     chemical modifications of plastic at elevated temperature.
     </descrip>
     <note> It is essential to report the temperature and other
     environmental conditions at which the phenomenon is studied.
     </note>
     <term lang=fr> décomposition thermique </term>
     <gram type=pos> n </gram>
     <gram type=gen> f </gram>
     <descrip type='definition'> Ensemble de toutes les
     modifications chimiques nuisibles d'un plastique à
     température élevée. </descrip>
     <note> Il est essentiel d'indiquer la température et les
     autres conditions d'environnement dans lesquelles le
     phénomène est étudié. </note>
     <ptr type='relatedTerm' target='te04.06'>
</termEntry>

<!-- Referenced term entry: -->

<termEntry id=te04.06>
     <term lang=en> ageing </term>
     <!-- ... -->
     <term lang=fr> vieillissement </term>
     <!-- ... -->
</termEntry>

13.5.5 The Example Treated as a Flat Term Entry Not Using Adjacency Rules

Many translation-oriented terminologists who work with half-screen popup windows prefer the following layout because it enables them to see the various <term> options at the top part of their display window without having to scroll into the body of the <termEntry> . Note in this case that the <ref> element links the bibliographic information to the entire entry.

<termEntry id=te84.11 n=te84.11>
     <term lang=en n=1> thermal degradation </term>
     <gram type=pos depend=1> n </gram>
     <term lang=fr n=2> décomposition thermique
     </term>
     <gram type=pos depend=2> n </gram>
     <gram type=gen depend=2> f </gram>

     <descrip type='definition' group=1> The entirety of all
     deleterious chemical modifications of plastic at elevated
     temperature. </descrip>
     <descrip type='definition' group=2> Ensemble de toutes les
     modifications chimiques nuisibles d'un plastique à
     température élevée. </descrip>
     <note group=1> It is essential to report the temperature and
     other environmental conditions at which the phenomenon is
     studied. </note>
     <note group=2> Il est essentiel d'indiquer la température et
     les autres conditions d'environnement dans lesquelles le
     phénomène est étudié. </note>

     <ptr type='relatedConcept' target='te04.06'>
     <admin depend=te84.11 type='domain'> plastics </admin>
     <ref type='bibliographic' depend=te84.11 target='ISO.472-1988'>
     p. 84 </ref>
     <admin depend=te84.11 type='creation' date='1988'
     resp='ISO/TC 61, Plastics'>
     </admin>
</termEntry>

<!-- Referenced term entry: -->
<termEntry id=te04.06 n=te04.06>
     <term lang=en n=1> ageing </term>
     <!-- ... -->
     <term lang=fr n=2> vieillissement </term>
     <!-- ... -->
</termEntry>


PreviousUpNext