Problems Assigned to the TEI TRC Core Subcommittee

C. M. Sperberg-McQueen

TEI TRC Core W01

12 Dec 1996

1 From TEI ED W67
2 From TEI EX P01

1 From TEI ED W67

1.1 Errors

Problems from section 1 (errors): editors, are these:

1.1.1 1996-03-26 : Carole E. Mah

Now let me reiterate my frustration at <addSpan> not being allowed to cross the boundary between a titlePage and the rest of the document. The whole PURPOSE of <addSpan> is to span boundaries that <add> cannot!

1.1.2 1996-03-26 : Carole E. Mah

Finally, it seems silly that we have to resort to NEXT and PREV to link pieces of the title on a titlepage, to wit:

Many of you have come across <docTitle>s split by <byline>s.

For example

 
   Cats and the Mice They Catch

   by Jane CatWomon
   
   A Disgustingly Thorough Illustrated Manual
   
   Wherein Blood, Guts, and Squeaking Are Extensively Discussed

Syd suggests handling such things using linking: (for more information on linking, see the example on p. 289 and see chapter 14, Linking, Segmenta- tion, and Alignment, esp. section 14.7, "Aggregation," in P3).

 
   <titlePage>
   <docTitle id=DT1 next=DT2>
   <titlePart type=main>
   Cats and the Mice They Catch
   </titlePart>
   </docTitle>
   <byline> 
   by <docAuthor>Jane CatWomon</docAuthor>
   </byline>
   <docTitle id=DT2 prev=DT1>
   <titlePart type=sub>
   A Disgustingly Thorough Illustrated Manual
   </titlePart>
   <titlePart type=desc>
   Wherein Blood, Guts, and Squeaking Are Extensively Discussed
   </titlePart>
   </docTitle>
   </titlePage>

1.2 Typos

From section 2 (typos) (I think the first, not the second, of these was referred to us, but neither seems like more than a trivial typo):

1.2.1 1995-01-17 : Hans Dybkjaer

1) Chap. 23 "language corpora" states doctype "TEI.corpus.2", tei2.dtd says "teiCorpus.2"

1.2.2 1995-01-17 : Hans Dybkjaer

2) Chap. 11 "Trans. of Speech" states about the attribute "who": <u>:...who:...its value is the identifier of a <participant> or <participantGrp> element in the tei header. However, "participantGrp" does not occur in the TEI dtd files, and <participant> occurs only as teiccgis.dtd:

 
<!ENTITY % n.participant "participant" >

My current solution: none.

1.3 NWIs

1.3.1 1994-12-14 : Peter Robinson

There is a problem with the 'inter' class of elements. The guidelines divide the world of text neatly into three classes; "chunks", which are basically structural units like paragraphs, etc, which can contain words, phrases, etc; "phrase-level elements" which are what sit within paragraphs; and "inter" which can be both behave like paragraphs, or can sit within paragraphs (P3, p. 64, 66 etc). Fine: and both chunks and phrase level elements behave just as they should. But the inter elements do not. For example: <table> is a member of the inter class. Accordingly, you should be able to have the following:

 
<div>
<table>this is an inter element</table>
</div>

just as you can have

 
<div>
<p>this is a paragraph</p>
</div>

Well, you can't. The same applies for all the inter elements which don't happen to be members of the 'common' class, etc. I found this out when I wanted <app> to appear directly within divs, and thought I could achieve this just by adding it to the inter class. Indeed, it is actually quite clear from the description of inter in the tagdoc section that it cannot sit directly within div, div0, etc: these are not included in the list of elements in which inter elements may nest. This simply cannot be right. All members of the analogous hqinter class can indeed occur alongside p etc, directly within divs. Am I the first to spot this? Do I get a dollar? How did this mistake happen? I found myself perpetually getting lost in the maze of element classes used to organize the tei entities: we have common, various inters, lots of components and component.seqs, all of which share (allegedly, in the case of inter) the ability to sit both within and beside <p> and analogous elements. No doubt there are good reasons for this apparent overlapping. The behaviour of <stage>, and the definition of the common class (P3 p. 772) seem especially odd: obviously stage directions need to appear both within and between <p> and the like. And so it is defined as one of the inter elements. But it does NOT get its ability to sit between <p>s from this definition, as one might expect. Rather, it gets it from being included, very peculiarly, in the definition for the common class:

 
<!ENTITY % m.common '%x.common %m.bibl; | %m.chunk; | 
           %m.hqinter; | %m.lists; | %m.notes; | stage'         >

This is pretty odd: every other member of this class is a group of elements (bibls, chunks, etc) and <stage> appears rather tacked on. In fact, exactly the same effect could have been achieved just by something like the following:

 
 <!ENTITY % m.common '%x.common %m.bibl; | %m.chunk; |
           %m.hqinter; | %m.lists; | %m.notes; | %m.inter;'         >

Because stage is part of inter, then it would have got exactly the same behaviour as having it appear on its own. And what is more, then all the other elements in the inter class -- figure, table, etc -- would also have got the correct behaviour, or at least the behaviour specified for them in the guidelines.But when I tried to modify the dtd with this definition (using the tei approved method,of course) I got errors elsewhere, so that quick fix at least won't work.

1.3.2 1995-02-06 : Peter Robinson

<list> : the documentation says that a list must have more than one item, and indeed so it is implemented. In fact, there are actually times when you want a list to have only one item in it! For example: we are encoding a glossary as a series of list elements, with each list containing all the items for one letter. But one letter only has one item in it. It would seem odd to have this one letter alone not appear as a list. I don't have any problem with a list with only one item (I often have shopping lists

1.3.3 1995-02-06 : Peter Robinson

<list> in <div>: I want to put a <list> direct in a div after , thus:

 
   <div>
   <head>title for this list</head>
   <list> the list...
   </div>

TEI won't let me do that: seems I have to have:

 
   <div> 
   <list>
   <head>title for this list </head>
   the items..
   </list>
   </div>

this seems somewhat restrictive to me.

1.3.4 1995-02-17 : Richard Light

mixed-content model in elements with content model of specialPara; change documentation? Change model? [1995-02-20 Henry Thompson argues it's a bug.]

1.3.5 1995-06-02 : David M. Seaman

The <docTitle> tag requires <titlePart> according to the dtd, and the info in the list of elements (p.948), but there are a number of examples in the Guidelines where the examples have just PCDATA within <docTitle>s -- p.231, 233. It seems to be a good idea to allow just PCDATA in a <docTitle>, as many titles have just one main part.

1.3.6 1995-07-29 : Nancy Ide

: So far as we can tell, the <s> tag cannot contain the <q> tag, except indirectly (e.g., <s><hi><q>...). This is clear in the definition of <s> in TEI LITE, and seems also to tbe true in TEI.2. We have lots of cases where we had intended to use <q> inside <s>. In fact, you can imagine one of the examples in the TEI LITE document could be encoded as

 
<s>Few dictionary makers are likely to forget 
Dr. Johnson's description of the
lexicographer as <q>a harmless drudge</q>.</s>

This seems to be a bug, especially since several elements that can appear in <s> can contain <q>. Is this right?

1.3.7 1995-07-29 : CMSMcQ (in reply to Nancy Ide)

"So far as we can tell, the <s> tag cannot contain the <q> tag, except indirectly (e.g., <s><hi><q>...). This is clear in the definition of <s> in TEI LITE, and seems also to tbe true in TEI.2."

Yes. TEI Lite does not modify the main DTD in this respect, and neither DTD allows Q or other inter-level elements (such as NOTE and LIST) to occur within elements defined as containing %phrase.seq -- only within elements defined as containing %paraContent. "We have lots of cases where we had intended to use <q> inside <s>. In fact, you can imagine one of the examples in the TEI LITE document could be encoded as ... This seems to be a bug, especially since several elements that can appear in <s> can contain <q>. Is this right?" This may or may not be unwise (I am inclined to think your example shows it probably is) but I am not sure it is a 'corrigible error' in the sense of DTD maintenance, because I believe that S, and a large-ish number of other elements, were intentionally defined as containing %phrase.seq in order to keep inter-level elements out of them. In general, I think this is unwise and when I could I argued against it, but having lost the argument I cannot now claim there was no argument and the error is a mere slip.

Which means, for the TEI, that I'll refer your query to the technical review committee for action, along with other similar suggestions for changes. For you, it means that you will need to modify the declaration of S to allow interlevel elements.

To do this, include the following line in your TEI.extensions.ent fil:

 
 
  <!ENTITY s 'IGNORE' >

and the following in your TEI.extensions.dtd file:

 
<!ELEMENT %n.s;         - -  (%paraContent)       -(s)          >
<!ATTLIST %n.s;              %a.global;
                             %a.seg;
          TEIform            CDATA               's'            >

(If you don't know off hand what I mean by 'your TEI.extensions.ent file' and 'your TEI.extensions.dtd file', re-read the discussions of DTD modification in the chapter on the structure of the DTD and on modifying the TEI DTD.)

Or you could take bold action and mess with the declaration of phrase.seq, which would change the behavior of all the elements declared as containing only phrases -- but which is likely to be cumbersome and slightly error-prone, particularly in the presence of other DTD modifications. I'd recommend not modifying phrase.seq unless you do it very carefully and ensure that your modified declaration can pick up any x-dot entity declarations provided in a document instance.

1.3.8 1995-11-01 : Harry Gaylord

Please give us back the <s> tag we can use directly within a <div> without requiring a <p>. We need it for many canonical texts, like Psalm 23 as you will remember.

1.3.9 1995-12-14 : Nick Finke

The real reason for this note is a problem I am having which concerns the <note> element. I need to be able to put this in either the <docAuthor> or <byline> elements. You will say this is ridiculous, and it may indeed be so, but it is the universal practice in law journals to footnote the author's name with his/her current position and a list of degrees received with school and date for each. I admit that this is just another example of the consummate snobbery which is endemic in the legal world, but it also happens to exist and I can't mark it up.

1.3.10 1996-03-14 : Keith Handley

Why does the <div> element require a sub-element at all?

2 From TEI EX P01

2.1 Syd Bauman, 28 September 1994, on ORNAMENT (EX P01 s.1)

Date: Wed, 28 Sep 1994 21:23:55 CDT
Reply-To: Syd Bauman <SYD@BROWNVM.BITNET>
Sender: Text Encoding Initiative public discussion list <TEI-L@UICVM.BITNET>
From: Syd Bauman <SYD@BROWNVM.BITNET>
Subject: When an ornament cannot be an ORNAMENT

There is no ORNAMENT element in P3 (even though one occurs in an example). Is there something I'm missing for encoding that which we used to call ORNAMENT? What are other folks doing?

-- Syd Bauman, textbase programmer/analyst
Brown University Women Writers Project
Syd_Bauman@Brown.edu

2.2 E.H.M. Van den Hout, 11 May 1995, Linegroups (EX P01 s.8)

Date: Thu, 11 May 1995 13:23:48 CDT
Reply-To: "Van den Hout E.H.M." <s0760412@let.rug.nl>
Sender: Text Encoding Initiative public discussion list <TEI-L@UICVM.BITNET>
From: "Van den Hout E.H.M." <s0760412@let.rug.nl>
Subject: Re: problem with linegroups
To: Multiple recipients of list TEI-L <TEI-L@UICVM>

David Megginson writes:

>> >>>>> "Van" == Van de Hout E H M <s0760412@let.rug.nl> writes:

>>The problem is as follows: I want to have a <q> tag to enclose
>>several lines within a linegroup. The structure is as follows:
 
<lb>>>     <lg> <l></l> ...  <l></l> <q> <l></l> ...  <l></l> </q> <l></l> ...
<lb>>>     <l></l> </lg>
>>The TEI-DTD does not provide for such a structure. The only way to
>>make this work I could think of is by redefining the entire
>>structure of the linegroup in my document subset. Is there another,
>>more elegant, simple way of doing this?

>Try this:
 
>   <lg>
>     <l>..</l>
>     <l>..</l>
>     <milestone unit=squot>
>     <l>..</l>
>     <l>..</l>
>     <milestone unit=equot>
>     <l>..</l>
>   </lg>
>If you redefine the structure of <lg>, you will still run into
>problems if a quotation (for example) begins in the _middle_ of a
>line. For a detailed discussion, see Chapter 31, "Multiple
>Hierarchies," in the TEI P3 guidelines.

For lines in which the quotation starts in the middle, I have used <seg> and <join>. Therefore if I cannot use <q>, I will choose for <join> instead of <milestone>. The most pressing reason for me to use <q> is the easier way of coding information about the speaker. What I have done with the divided parts of direct speech was rather far fetched: I have linked the join to a glossary item in which the speaker's name is the label. If I wish to something like this with milestones I would have to give them ID's, so I can refer to them. But the poem already has ID's for each line, therefore it would be easier to use those, than to generate milestones with ID's and then link them.

And Perry Willett writes:

>Could you use div instead of lg? This would work:

 
> <body>
> <div>
>    <l></l>
>    ...
>    <l></l>
>    <q>
>       <l></l>
>       ...
>       <l></l>
>    </q>
>    <l></l>
>    ...
>    <l></l>
> </div>
> </body>

This could be a solution, but because I am tagging an epic poem, "The song of Hiawatha," I would prefer <lg> because it has been specifically designed for tagging poetry. In the document I already have <div0> for the entiry poem, and <div1> for the individual songs. So I guess it would not be a problem to use <div2 type=stanza> to tag linegroups.

But, I wonder, why is <lg> only for linegroups in which no direct speech occurs which spans several verse lines? A lot of poetry I have read consists of stanzas containing direct speech, consisting of several verse lines (e.g. Ilias, Odyssee, Mei (this is a Dutch poem from Gorter) and above mentioned Hiawatha). I would like to know something about the motivations behind this choice, could someone give me some references on this subject?

Thank you for your replies,

Erik.

2.3 Jean Veronis, 5 July 1995, rend for abbr (EX P01 s.15)

Date: Wed, 5 Jul 1995 17:10:29 CDT
Sender: Text Encoding Initiative public discussion list <TEI-L@UICVM.UIC.EDU>
From: Jean Veronis <veronis@univ-aix.fr>
Subject: Re: Encoding the Rendering of an abbreviation
To: Multiple recipients of list TEI-L <TEI-L@UICVM.UIC.EDU>

At 15:19 4/07/95, James K. Tauber wrote:
> [...] A "rend" attribute on <abbr> would be nice [...]

A related question:

At the moment we use a "rend" attribute on <abbr> to mark the small superscript "r" in "Mr." or the small subscript "2" in "SO2", as in:

<ABBR REND=TAIL-SUPER>Mr</ABBR> Dondelinger <ABBR REND=TAIL-SUB>SO2</ABBR>

This is far from satisfactory, since the range of possibilities in potentially open (e.g. "H2O", etc.).

Any suggestion?

2.4 Harry Gaylord, 7 July 1995, s and p (EX P01 s.16)

From: Harry Gaylord <galiard@let.rug.nl>
Subject: Bible tagging
To: lou@vax.ox.ac.uk (Lou Burnard), U35395@uicvm.cc.uic.edu (Michael Sperberg McQeeen)
Date: Fri, 7 Jul 1995 13:27:06 +0200 (METDST)

Gentlemen,

In P1 s was on the same level as p so that we could use divs for book and chapter and s for verse.

In P3 s is contained in p a.o. so that verses have to become ps.

Do you have a better solution?

Harry

2.5 Nancy M. Ide, 29 July 1995, on S and Q (EX P01 s.25)

Date: Sat, 29 Jul 95 14:44:27 EDT
From: ide@cs.vassar.edu (Nancy M. Ide)
To: u35395@uicvm.uic.edu
Subject: <s> / <q> bug?

Michael,

So far as we can tell, the <s> tag cannot contain the <q> tag, except indirectly (e.g., <s><hi><q>...). This is clear in the definition of <s> in TEI LITE, and seems also to tbe true in TEI.2.

We have lots of cases where we had intended to use <q> inside <s>. In fact, you can imagine one of the examples in the TEI LITE document could be encoded as

 
 
<s>Few dictionary makers are likely to forget
Dr. Johnson's description of the
lexicographer as <q>a harmless drudge</q>.</s>

This seems to be a bug, especially since several elements that can appear in <s> can contain <q>. Is this right?

Nancy

2.6 C. M. Sperberg-McQueen, 29 July 1995, on S and Q (EX P01 s.26)

Date: Sat, 29 Jul 95 17:18:16 CDT
From: "C. M. Sperberg-McQueen" <U35395@UICVM>
Organization: ACH/ACL/ALLC Text Encoding Initiative
Subject: Re: <s> / <q> bug?
To: "Nancy M. Ide" <ide@cs.vassar.edu>
cc: Lou Burnard <LOU@VAX.OX.AC.UK>
In-Reply-To: Your message of Sat, 29 Jul 95 14:44:27 EDT

On Sat, 29 Jul 95 14:44:27 EDT you said:
>Michael,
>
>So far as we can tell, the <s> tag cannot contain the <q> tag, except
>indirectly (e.g., <s><hi><q>...). This is clear in the definition of <s>
>in TEI LITE, and seems also to tbe true in TEI.2.

>We have lots of cases where we had intended to use <q> inside <s>. In
>fact, you can imagine one of the examples in the TEI LITE document
>could be encoded as

 
<s>Few dictionary makers are likely to forget
Dr. Johnson's description of the
lexicographer as <q>a harmless drudge</q>.</s>

>This seems to be a bug, especially since several elements that can
>appear in <s> can contain <q>. Is this right?

This may or may not be unwise (I am inclined to think your example shows it probably is) but I am not sure it is a 'corrigible error' in the sense of DTD maintenance, because I believe that S, and a large-ish number of other elements, were intentionally defined as containing %phrase.seq in order to keep inter-level elements out of them. In general, I think this is unwise and when I could I argued against it, but having lost the argument I cannot now claim there was no argument and the error is a mere slip.

To do this, include the following line in your TEI.extensions.ent file:

 
  <!ENTITY s 'IGNORE' >

and the following in your TEI.extensions.dtd file:

 
<!ELEMENT %n.s;         - -  (%paraContent)       -(s)          >
<!ATTLIST %n.s;              %a.global;
                             %a.seg;
          TEIform            CDATA               's'            >

I hope this helps.

Michael

2.7 Syd Bauman, 18 September 1995, Varia (EX P01 s.31)

Date: Mon, 18 Sep 1995 11:45:35 EDT
Reply-To: "TEI Technical Review discussion list (summer 1992)" <TEI-TECH@UICVM.UIC.EDU>
Sender: "TEI Technical Review discussion list (summer 1992)" <TEI-TECH@UICVM.UIC.EDU>
From: Syd Bauman <SYD@BROWNVM.BITNET>
Subject: some (technical) thoughts on the tagset(s)

* Some (technical) Thoughts on the Tagset(s)

1 - Should the <CERTAINTY> element allow #PCDATA or %phrase.seq; content instead of, or in addition to, the desc attribute?

...

4 - At one point back in Dec 94 I told MSMcQ that the problem he was dealing with seemed analagous to an errata list, and that the methodology the WWP developed for encoding errata sheet might help him out. Since then I have completly forgotten what the encoding problem was. (All I have is a note to myself "send MSMcQ errata encoding".)-: So I thought I'd post our method here.

 
    <sp who=DO><p>Toto, I have a feeling wer'e not in
    <ref type=err target=ec45 id=er45>Kentucky</ref>
    anymore.</p></sp>
    ...
    <div type=corrigenda>
        <head>Errata!</head>
        <list type=errata>
            <item>...
            <item>On Page 12, line 34, "we're not in Kentucky anymore"
                  should read "we're not in
                 <ref type=errare id=ec45 target=er45>Kansas</ref> anymore"
                 </item>
            <item>...
            </list>
        </div>
<p>

(The ID value abbreviations are 'er' for "error" and 'ec' for "error correction".) The main advantage is that the double-linking allows sophisticated software to replace "Kentucky" with "Kansas", or to jump to Dorothy's speech from the errata list. The main disadvantage is that it would take very sophisticated software. There are other advantages. No modifications to the TEI DTDs are required at all. Linking between text and errata list item can occur even if there is no specific replacement word, just a description. For those who'd prefer a simpler method or are not planning to encode the errata sheet itself, I'd recommend either of the following.

 
    <sp who=DO><p>Toto, I have a feeling wer'e not in
    <sic corr=Kansas resp=errata>Kentucky</sic>
    anymore.</p></sp>
-- or, the mirror image --
    <sp who=DO><p>Toto, I have a feeling wer'e not in
    <corr sic=Kentucky resp=errata>Kansas</corr>
    anymore.</p></sp>

(As long as there is no one at your project whose name or initals are "errata" :-) Somewhere in the CORRECTION element (of the EDITORIALDECL element of the ENCODINGDESC in the TEIHEADER) you would have to explain that resp=errata implies that the correction was taken from an errata sheet.

2.8 Dominic Dunlop, 25 October 1995, on MS markup (EX P01 s.33)

Date: Wed, 25 Oct 1995 10:42:23 CDT
Reply-To: Dominic Dunlop/EOI <Dominic_Dunlop/EOI.EOI@HERMES.eo.net>
Sender: "TEI (Text Encoding Initiative) public discussion list" <TEI-L@UICVM.UIC.EDU>
From: Dominic Dunlop/EOI <Dominic_Dunlop/EOI.EOI@HERMES.eo.net>
Subject: Re: A couple of questions relating to manuscript markup

James K. Tauber asks:

...

> 2. We are using <milestone> to mark traditional chapter/verse boundaries,
> reserving <div> for marking pages, columns and lines. But what element do
> we make the text at the lowest level? <div> can't take character data and
> <p> doesn't seem appropriate.

I suspect you need <s> here. Its description, "a sentence-like division of a text", is intentionally vague: it's up to you to decide what is "sentence-like". For example, in the British National Corpus, an <s> is anything that automatic parsing software said was an <s>. In the conventionally punctuated prose, <s>s correspond closely to what most would regard as sentences; in verse, no <s> is longer than a verse line, even if the line is not a complete semantic unit (or whatever); and in the absence of cues (for example, in tabular material), <s>s have rather arbitrary boundaries. Buts that's all fine: the documentation in the BNC's header <segmentation> element says that this is what's happening, and that's all that needs to be done. So, define what constitutes a lowest-level element, document it in <segmentation>, and mark it with <s>.

Maybe you already have <s> (which can't be nested) ear-marked for some other function. If so, you could use <p> for lowest level elements. If you think that its name carries too much semantic baggage, give it another name. Again, the Guidelines allow you to do this. (It's all in chapter 29.) Alternatively, consider using <phr> instead of <s>, freeing up <s> for those lowest-level elements.

...

2.9 Lou Burnard, 5 October 1995, on SEG (EX P01 s.35)

Date: Thu, 5 Oct 1995 18:24:31 +0100
Reply-To: "TEI Technical Review discussion list (summer 1992)" <TEI-TECH@UICVM.UIC.EDU>
Sender: "TEI Technical Review discussion list (summer 1992)" <TEI-TECH@UICVM.UIC.EDU>
From: Lou Burnard <lou@VAX.OX.AC.UK>
Subject: seg+mixed content
From: OXVAXD::LOU "Lou Burnard" 5-OCT-1995 18:24:04.78
To: MX%"ht@cogsci.ed.ac.uk"
CC: LOU
Subj: RE: SEG and mixed content query

>This comes at the end of a long afternoon poking around, so it may be
>too contextualised to make sense, but here goes:

Well it makes sense, and it's quite interesting, so I'm forwarding it (together with my interlarded comments) to tei-tech to see whether others agree.

>SEG is supposed to be a general-purpose grouping element for use in
>mixed content. But since its content is %paraContent, which is NOT
>sensitive to alternative bases, if I do e.g.

This is not true. %paraContent is sensitive to the selection of bases -- if you choose the verse base it will include additional phrase level elements defined for that base. Likewise for additional tagsets -- indeed, you wouldn't be able to use <seg> at all if didnt become a member of m.phrase (and hence of %paraContent) when you select TEI.linking .

 
<!ENTITY % TEI.prose 'IGNORE'>
<!ENTITY % TEI.spoken 'INCLUDE'>
<!ENTITY % TEI.transcr 'IGNORE'>
<!ENTITY % TEI.linking 'INCLUDE'>

>I can't use <seg> to mark sub-parts of <u> in general, because the
>content model of <u> allows special-purpose spoken crystals
>(e.g. <vocal>) which are NOT in paraContent and thus not allowed
>inside <seg>.

This looks like an error in the dtd to me. Clearly, you should be able to use <seg> to segment <u>s whether or not they contain <vocal>s -- which are empty anyway. Probably compSpoken should be a subclass of m.comp.

>Do you agree this is a problem, and if so what's the right solution,
>perhaps using something more like specialPara (of course, using our
>CORRECT definition thereof, see previous argument about invalidity of
><q>
><p>
>under the official version :-).

Yes, I agree it's a problem, and I'm not sure of the right solution. Not sure what your CORRECT definition would be, but I assume it contains some species of magic to overcome the implications of SGML rules about mixed content ...

Lou

2.10 Henry S. Thompson, 6 October 1995, on SEG (EX P01 s.36)

Date: Fri, 6 Oct 1995 11:17:31 BST
Reply-To: "TEI Technical Review discussion list (summer 1992)" <TEI-TECH@UICVM.UIC.EDU>
Sender: "TEI Technical Review discussion list (summer 1992)" <TEI-TECH@UICVM.UIC.EDU>
From: "Henry S. Thompson" <ht@COGSCI.ED.AC.UK>
Subject: Re: SEG and mixed content query
Comments: To: lou@vax.ox.ac.uk
In-Reply-To: <009976D8.341349CA.410@vax.ox.ac.uk> (message from Lou Burnard on Thu, 05 Oct 1995 18:23:59 +0100)

> >SEG is supposed to be a general-purpose grouping element for use in
> >mixed content. But since its content is %paraContent, which is NOT
> >sensitive to alternative bases, if I do e.g.
>
> This is not true. %paraContent is sensitive to the selection of bases --if
> you choose the verse base it will include additional phrase level elements
> defined for that base. Likewise for additional tagsets--indeed, you wouldn't
> be able to use <seg> at all if didnt become a member of m.phrase (and hence
> of %paraContent) when you select TEI.linking .
>

 
  <!ENTITY % TEI.prose 'IGNORE'>
  <!ENTITY % TEI.spoken 'INCLUDE'>
  <!ENTITY % TEI.transcr 'IGNORE'>
  <!ENTITY % TEI.linking 'INCLUDE'>

Sorry I wasn't clear here, but I still think I'm right -- the point is that the only entities in teiclas2 which are sensitive to what tagsets are included are the common/component/component.seq/specialPara family.

Note also that it is not the case that "<seg> . . . become[s] a member of m.phrase (and hence of %paraContent) when you select TEI.linking" -- it is a member of m.phrase explicitly and unconditionally. It seems like all additional tagsets are created equal, but some (linking, drama and verse most obviously) are more equal than others, in that some of their tags are explicitly and unconditonally included in e.g. %m.phrase, which simply has no effect if the relevant tagset is not actually included.

> >I can't use <seg> to mark sub-parts of <u> in general, because the
> >content model of <u> allows special-purpose spoken crystals
> >(e.g. <vocal>) which are NOT in paraContent and thus not allowed
> >inside <seg>.
>
> This looks like an error in the dtd to me. Clearly,you should be able to use
> <seg> to segment <u>s whether or not they contain <vocal>s-- which are empty
> anyway. Probably compSpoken should be a subclass of m.comp.
>

Now I'm really confused -- I'm using a version of the p3 dtd's fetched from Exeter which are all dated Feb 7 1995 (The fact that there is no file-internal version indication is a separate problem to which I will return another time :-), and teiclas2.ent contains no entity %m.comp, nor does teispok2.ent have a %compSpoken. Assuming this is just loose talk and I'm not looking at the wrong files, I don't THINK this is the solution, since %m.comp.spoken is already included in %component, but that's precisely the point -- %component doesn't make it under <seg>, because seg's content model is %paraContent.

> >Do you agree this is a problem, and if so what's the right solution,
> >perhaps using something more like specialPara (of course, using our
> >CORRECT definition thereof, see previous argument about invalidity of
> ><q>
> ><p>
> >under the official version :-).
>
> Yes, I agree it's a problem, and I'm not sure of the right solution.
> Not sure what your CORRECT definition would be,but I assume it contains some
> species of magic to overcome the implications of SGML rules about mixed
> content ...

[Here's the current definition for reference:

 
 <!ENTITY % specialPara '(((%m.chunk), (%component.seq)) | (%paraContent))']

Well, as you know you CAN'T actually OVERCOME the mixed-content difficulty, but I am prepared to give up your desire to see specialPara as containing either element-only content, roughly %component.seq, or mixed content (%paraContent), since whatever your intentions the standard treats it as mixed content tout court, and makes the above example invalid. My basic view is that what it means to have specialPara content is to be transparent, i.e. that wherever something with specialPara content occurs, what's allowed inside should be what's allowed at that point already.

I'm not sure all the things currently defined to have specialPara content should on this analysis, but let's take q, quote, sic, corr and, if I'm right, seg as prototypical.

The first two are (hq)inter, the rest phrase. Not surprisingly, this is just the disjunction (along with #PCDATA), which makes up paraContent.

But somewhat curiously, specialPara currently includes %chunk, which means I can include e.g. a quoted tree in a place I can't include a tree, which seems counter-intuitive . . .

So for better or worse, we've been using the following definition of specialPara:

 
 <!ENTITY % specialPara '(#PCDATA | %m.phrase | %m.inter |  %m.chunk)*

which is backward compatible (it still allows e.g. quoted trees), but doesn't suffer from the mixed content problem (e.g. it allows " <q> <p> "). It does of course give up the original goal of forcing a once-and-for-all choice of element-only vs. mixed which is as I understand it the motivation of the current version.

Now the above redefinition doesn't actually solve the seg/u/vocal problem, but it would if %component were used instead of '%m.inter | %m.chunk', BUT that would not quite by backward compatible because of the slight difference between m.common and '%m.inter | %m.chunk', namely the lack in the former of stageDirection, castList, figure, table and text (a VERY mixed bag :-).

So EITHER we have

 
 <!ENTITY % specialPara '(#PCDATA | %m.phrase | %component)*'>

or, if we are scrupulous about backward compatibility,

 
 <!ENTITY % specialPara '(#PCDATA | %m.phrase | %component |
                          %m.stageDirection; | %n.castList; |
                          %n.figure; | %n.table; | %n.text;)*'>

and we change seg's content model to be specialPara.

Still with me? :-)

Hope this is helpful.

2.11 Harry Gaylord, 1 November 1995, on S (EX P01 s.37)

Date: Wed, 1 Nov 1995 12:01:26 CST
Reply-To: Harry Gaylord <galiard@let.rug.nl>
Sender: "TEI (Text Encoding Initiative) public discussion list" <TEI-L@UICVM.UIC.EDU>
From: Harry Gaylord <galiard@let.rug.nl>
Subject: The <s> tag

This is addressed in particular to our beloved co-editors.

Please give us back the <s> tag we can use directly within a <div> without requiring a <p>. We need it for many canonical texts, like Psalm 23 as you will remember.

A presenter of P1

2.12 Peter Flynn, 26 January 1996, on tagging SGML documentation, linking to TSD (EX P01 s.50)

Date: Fri, 26 Jan 1996 12:43:14 CST
Reply-To: Peter Flynn <pflynn@curia.ucc.ie>
Sender: "TEI (Text Encoding Initiative) public discussion list" <TEI-L@UICVM.UIC.EDU>
From: Peter Flynn <pflynn@curia.ucc.ie>
Subject: Embedding TEITSD2 in a DOCTYPE

I'm having a little difficulty finding a home for a reference to the Tag Set Descriptions DTD. I want to include this because I am trying to document (in a TEI header) how I am using some elements, and there appears to be no other facility for marking up the GIs of elements, attribute names, etc (tell me what I've missed :-)

Unfortunately, although the _contents_ of the TSD is well described in Chapter 27, there is nothing to say how to incorporate it (is that because it is classed as an Auxiliary DTD as distinct from an Additional Tagset? The file tei2.dtd contains no reference to teitsd2.

Chapter 28 indeed says (28.1.2) that the local processing format may include TSDs acording to the spec in Ch.27; and 28.3 shows how to write a doctype declaration. Ch.29 explains modifications using parameter entities, and I have a lot of these very successfully already for additions such as tei.prose, tei.transcr, tei.textcrit etc.

I therefore added this to my doctype declaration at the top of an instance as suggested on p.739 (there are also files teitsd2a.dtd and teitsd2b.dtd - what are these for?)

 
   <!ENTITY % TEI.tagsets SYSTEM "teitsd2.dtd">
   %TEI.tagsets;

This certainly lets me used <tag> in <tagsdecl> and elsewhere in the header, which is great, does, but at the expense of reams of errors when I run SGMLS, in the form:

 
sgmls: SGML error at teicore2.dtd, line 9 in declaration parameter 5:
       Duplicate specification occurred for "P"; duplicate ignored
sgmls: SGML error at teicore2.dtd, line 11 in declaration parameter 41:
       Attempted redefinition of attribute definition list ignored

Clearly something in teitsd2 was upsetting the definitions somewhere.

I tried a second tack, less pleasant as it involved making a change to tei2.dtd, which is Something You Don't Do. I added the lines

 
   <![ %TEI.tagsets [
   <!ENTITY % TEI.tagsets.dtd SYSTEM 'teitsd2.dtd'                   >
   %TEI.tagsets.dtd;
   ]]>

to the end of tei2.dtd, and changed the reference in my doctype to say

 
   <!-- ENTITY % TEI.tagsets            'INCLUDE'-->

so it follows the pattern of all the others (a second level of indirection). This also "works" in that I can use <tag>, but it gives me the same kind of error.

The remainder of Ch.29 concerns itself with manual mods at a low level, not with the (simple?) inclusion of prewritten auxiliary files like teitsd2.

Where am I going wrong?

///Peter

PS while I'm on it, even before I started this, with an existing doctype (attached), I still get SGMLS errors of the form:

 
sgmls: SGML error at teihdr2.dtd, line 54 in declaration parameter 4:
       Content model is ambiguous
sgmls: SGML error at teihdr2.dtd, line 61 in declaration parameter 4:
       Content model is ambiguous

(dozens of 'em). Is this normal? Here's the doctype dec:

 
<!DOCTYPE TEI.2 SYSTEM "tei2.dtd"
[
<!ENTITY % TEI.corpus.dtd         'INCLUDE'>
<!ENTITY % TEI.prose              'INCLUDE'>
<!--ENTITY % TEI.verse              'INCLUDE' won't work with prose -->
<!ENTITY % TEI.transcr            'INCLUDE'>
<!ENTITY % TEI.textcrit           'INCLUDE'>
<!ENTITY % TEI.names.dates        'INCLUDE'>
<!ENTITY % TEI.linking            'INCLUDE'>
<!-- Extra tagset needed to allow documentation of tags in header -->
<!-- ENTITY % TEI.tagsets            'INCLUDE' commented out pro tem -->
<!-- Standard character entities -->
<!ENTITY % ISOlat1 system         "ISOLat1"
        --"ISO 8879:1986//ENTITIES Added Latin 1//EN"-->
%ISOlat1;
<!ENTITY % ISOlat2 system         "ISOLat2"
        --"ISO 8879:1986//ENTITIES Added Latin 2//EN"-->
%ISOlat2;
]>

2.13 Peter Flynn, 14 June 1996, Curia Project Suggestions (EX P01 s.62)

Date: 14 Jun 1996 16:47:03 +0100
From: Peter Flynn <pflynn@curia.ucc.ie>
Subject: lou.burnard@oxford.ac.uk
To: tei@uic.edu

Here are some items which the CURIA project would like to submit for consideration by the TRC. This file is also at http://curia.ucc.ie/curia/doc/newtei.html [and at http://www-tei.uic.edu/orgs/tei/trc/nwi/curia.html -Ed.]

I look forward to seeing you both in Bergen.

///Peter --

[Document itself omitted, because it was in HTML and I didn't want to translate it into TEI. -CMSMcQ]

2.14 Frans Wiering, 14 June 1996, title pages, diagrams, musical notation (EX P01 s.63)

Date: Fri, 14 Jun 1996 10:58:04 -0600 (CST)
From: Frans Wiering <Frans.Wiering@let.RUU.NL>
Subject: RE: Call for suggestions: new work items for the TEI
Sender: Frans.Wiering@let.RUU.NL
To: U35395@UICVM.bitnet

In message Thu, 06 Jun 1996 16:12:09 -0500 (CDT), Michael Sperberg-McQueen <U35395@UICVM.bitnet> writes:
> As users of the TEI encoding scheme -- or perhaps as potential users who
> have stayed away from the TEI so far because it doesn't include the tags
> you need for what you want to do -- you are in the best of all possible
> situations to advise the Technical Review Committee on what areas most
> need new work and a new TEI tag set, or further work and the refinement
> of an existing TEI tag set.
>
> So: tell us what you think most needs to be done. Tell us privately,
> with notes to the TEI editors, or tell us publicly, with postings on
> TEI-L. Tell us briefly, with telegraphic phrasing; tell us at length
> with well reasoned explanations of what is needed and who will benefit.
> Tell us what will benefit you or your project; tell us what you think
> might benefit others.
>
> But above all, TELL us.

Yes, I have a couple of short comments to make (ask me for more if it seems relevant). They concern my experience with entering a 16th Century Italian music treatise, Zarlino's Istitutioni harmoniche, as a part of the Thesaurus musicarum italicarum project of which you have already received some information.

1. Some problems with title pages. There is no 'figure' in the content model, so that the printer's mark could not be tagged (but it was easy enough to extend the DTD). Also, The author's name is included in the title of the work -- something that often happens in works from this period.No solution yet.

2. I tried to use the tei.nets tag set for Zarlino's mathematical and musical diagrams. My experience is that this tag set needs to be much adapted in order to be able to represent source information effectively. In particular, the diagrams are inaccessible to the primary sources and critical apparatus tag sets. I remedied this by, for example, redefining nodes as :

 
<!ELEMENT node - O (%paraContent;)                          >

But I have not gone much further than this.

3. Musical information. I know this is a hard one. There are two different issues: isolated musical symbols that may occur in the text, like the sharp, flat, and natural signs (16th c. writings often use use a 'square b') for the last one. The second issue is music examples. I had hoped that SMDL would provide the solution, but the proposed standard does not yet include a description of the actual notation (only the 'abstract score' can be encoded). I will be giving this matter some extra thought during the coming year.

I hope this is what you need. Please contact me for further information if necessary.

All best,

Frans Wiering
**********************************************************************
Dr. Frans Wiering
Vakgroep Computer en Letteren / Department of Computer and Humanities
Achter de Dom 22-24
3512 JP Utrecht, Netherlands
tel. +31-30-2536335 fax: +31-30-2539221 E-mail: f.wiering@let.ruu.nl
WWW: http://www.let.ruu.nl/C+L/wiering/tmi_home.htm
http://www.let.ruu.nl/C+L/wiering/home.htm
**********************************************************************

2.15 Syd Bauman, 15 June 1996, Various errors and new work items (EX P01 s.64)

Date: Sat, 15 Jun 1996 16:27:30 EDT
Sender: TEI Technical Review discussion list <TEI-TECH@UICVM.UIC.EDU>
From: Syd Bauman <SYD@BROWNVM.BROWN.EDU>
Subject: Re: Call for suggestions: new work items for the TEI
To: Multiple recipients of list TEI-TECH <TEI-TECH@UICVM.UIC.EDU>
In-Reply-To: Message of Thu, 6 Jun 1996 09:59:43 CDT from <U35395@UICVM>

2.15.1 ERRORS

The placemnt of NOTE elements seems to us to be too restrictive. (E.g., you can place a NOTE as a child of HEAD, but not in as a child of OPENER, CLOSER, or DATELINE; you can place a note as a child of BODY, but not of FRONT or BACK.) NOTE is a member of %m.notes, and thus also of %m.inter and %m.common; it is also directly in the content models of several bibliographic elements. An ANCHOR, as a member of %m.seg (and thus of %m.phrase), is not as restricted as NOTE, but still there are places it cannot appear that I think it should be allowed: DATELINE, and RESPSTMNT, for example.
Besides not being allowed most anywhere (which others have reported), PERSNAME is not allowed in RESPSTMT. (Same for ORGNAME.) This may be a separate error, because even if PERSNAME were added to the classes %m.data and %m.agent (which are the classes to which NAME belongs), it would not be allowed in RESPSTMT, because it appears in that content model directly, not as a member of a class parameter entity reference. (The same is true for SETTING and DATELINE.) One reasonable solution might be to use %m.demographic instead of NAME.
When using the drama base tag set, the CASTLIST element can be the child of BODY, but not of FRONT or BACK.

2.15.2 Typos

The "suggested values include" for type= of STAGE includes "mixed", but the default value for type= of STAGE in teicore2.dtd is "mix".

2.15.3 New Work Items

...
Consider including additional tagsets for address & phone numbers. I don't see any point in breaking down an address into ADDRLINEs. Either leave it as an un-encoded block of text, or allow for proper encoding. Each country or block of countries with separate addressing standards could have a separate additional tag set.
Consider possible new keyword for location ladders: URL (url)
...

Problems Assigned to the TEI TRC Core Subcommittee

Table of Contents