Summary of Status:

Problems Assigned to the TEI TRC Core Subcommittee


C. M. Sperberg-McQueen

TEI TRC Core W02

14 Jan 1997, 1 Feb 1997

Table of Contents


From TEI ED W67

Errors

Problems from section 1 (errors):

1996-03-26 : Carole E. Mah

The <addSpan> element cannot begin or end in a titlepage.

Disposition: add <addSpan> and <delSpan> (et al.?) to class globincl. To be revisited.

1996-03-26 : Carole E. Mah

It is cumbersome to require next and prev to link portions of discontiguous titles on title pages.

Disposition: no change. The attributes are not required for most processing, as the complete title can be inferred by normal sequential processing; next andprev are only required if the fragments must be reordered. The complete title, as aggregated and reorderd, can and should in any case be given in the TEI header's title statement.

Typos

1995-01-17 : Hans Dybkjaer

The GI is given both as <TEI.corpus.2> and <teiCorpus.2>.

Disposition: lean to yes. Check for deeper issues (CMSMcQ?), revisit.

1995-01-17 : Hans Dybkjaer

The GIs <participantGrp> and <participant> appear to be ghosts.

Disposition: lean to yes. Check for deeper issues (CMSMcQ?), revisit.

NWIs

1994-12-14 : Peter Robinson

The inter class is ill documented; in particular, adding new elements to it does not have the desired effect: they must be added to common as well, to work as desired.

Members of class inter, e.g. <table>, cannot occur directly within <div> elements: the following should be legal but is not.

 
<div>
<table>this is an inter element</table>
</div>

Disposition: add explicit note to discussion of inter noting that user-defined inter-level elements should also be added to common, or to the appropriate tag-set-specific class.

[MSM: perhaps should, instead, remove all base-specific elements from inter, so that common can be replaced directly by (%m.inter; | %m.chunk;), and each base can define component thus:

 
<![ %TEI.drama; [
<!ENTITY % component
'(%m.inter; | %m.chunk; | %m.inter.drama; | %m.comp.drama)' >
]]>
with chunk restricted to elements from the core and additional tag sets.

1995-02-06 : Peter Robinson

The <list> element should be able to contain a single <item> but cannot.

Disposition: It can, but empty lists are currently illegal. The committee recommends allowing empty lists and divisions; if it's possible to add a global tight/loose parameter to the DTD, it would be convenient, to allow users to choose at parse time whether to allow empty lists and divisions, or not.

To be revisited.

1995-02-06 : Peter Robinson

The <list> cannot occur in a <div> directly after a head: the following should be legal but is not:

 
<div>
<head>title for this list</head>
<list>
<item>first item</item>
<item>secont item</item>
</list>
</div>
while the following is legal:
 
<div>
<list>
<head>title for this list </head>
<item>first item</item>
<item>second item</item>
</list>
</div>

Disposition: Same issue and disposition as previous section.

1995-02-17 : Richard Light

Elements with content model of specialPara are subject to the Mixed Content Gotcha. Documentation or content model should change. [1995-02-20 Henry Thompson argues it's a bug.]

Disposition: no change in the immediate future. Add a health warning in the documentation. For longer term, consider replacing all occurrences of specialPara with one of:

unless WG8 solves the Mixed-Content Gotcha first.

1995-06-02 : David M. Seaman

The <docTitle> element cannot contain character data, only <titlePart> elements. But some examples do show <docTitle> with #PCDATA content, and it would be a good idea in any case, as many titles have just one main part.

Disposition: correct the examples; do not change <docTitle>.

1995-07-29 : Nancy Ide

The <s> element should be able to contain <q> elements, but cannot.

Disposition: CMSMcQ to create and distribute lists of membership of inter, chunk, and phrase.seq, also lists of elements defined with various distinct content models.

To be revisited.

1995-07-29 : CMSMcQ (in reply to Nancy Ide)

Same as above, but more generally: many elements defined as phrase.seq may need to be able to contain some or all inter-level elements.

Disposition: As above.

1995-11-01 : Harry Gaylord

The <s> element should be able to occur directly within a <div> but cannot.

Disposition: <div> can include <s> directly in P3. DD to contact HG for clarification.

1995-12-14 : Nick Finke

The <note> element should be able to appear within <docAuthor> and <byline> elements (e.g. to handle footnotes with affiliation info), but cannot.

[Another instance of possibly inappropriate use of phrase.seq. -MSM]

Disposition: as for generic phrase.seq issue.

1996-03-14 : Keith Handley

Why does the <div> element require a sub-element at all?

Disposition: as for generic empty-list issue.

From TEI EX P01

Syd Bauman, 28 September 1994, on ORNAMENT (EX P01 s.1)

The <ornament> element of P1 has disappeared; what should we use to tag ornaments?

Disposition: use <fig>; add a type attribute to it.

E.H.M. Van den Hout, 11 May 1995, Linegroups (EX P01 s.8)

The <q> element should be, but is not, legal within <lg>, to surround several lines, thus:

 
<lg>
   <l> ... </l>
   <l> ... </l>
   <q><l> ... </l><l> ... </l></q>
   <l> ... </l>
   <l> ... </l>
</lg>

For lines in which the quotation starts in the middle, one can use <seg> and <join>.

Perry Willett notes that this, on the other hand, is legal:

 
<body>
<div>
   <l> ... </l>
   <l> ... </l>
   <q><l> ... </l><l> ... </l></q>
   <l> ... </l>
   <l> ... </l>
</div>
</body>

In tagging an epic poem <lg> would be preferable to <div>.

Disposition: allow <lg> to contain <q> (parallel to <div>)

Jean Veronis, 5 July 1995, rend for abbr (EX P01 s.15)

We use a rend attribute on <abbr> to mark the small superscript r in Mr. or the small subscript 2 in SO2, as in:

 
<abbr rend='tail-super'>Mr</abbr> Dondelinger
<abbr rend='tail-sub'>SO2</abbr>

This is unsatisfactory, since the range of possibilities in potentially open (e.g. H2O, etc.). Any suggestions?

Disposition: the range of possibilities can be controlled by making the special rendition apply not to the entire abbreviation but to some part of it, which is tagged <hi>.

Harry Gaylord, 7 July 1995, s and p (EX P01 s.16)

[Same as 1995-11-01 : Harry Gaylord above.]

Nancy M. Ide, 29 July 1995, on S and Q (EX P01 s.25)

[Same as 1995-07-29 : Nancy Ide above.]

C. M. Sperberg-McQueen, 29 July 1995, on S and Q (EX P01 s.26)

[Same as 1995-07-29 : CMSMcQ above.]

Syd Bauman, 18 September 1995, Varia (EX P01 s.31)

Should the <certainty> element allow content instead of, or in addition to, the desc attribute?

Disposition: no. Use <note> to document the basis for the judgement of certainty.

Encoding of errata lists: the WWP encloses both the replacement reading and the reading being replaced in <ref> elements, double-linked to each other. The double-linking allows sophisticated software to replace the errors with the corrections at rendering time, or to jump to the text from the errata list. No modifications to the TEI DTDs are required at all.

Disposition: Include a discussion of this technique in the next revision of the Guidelines.

Dominic Dunlop, 25 October 1995, on MS markup (EX P01 s.33)

James K. Tauber asks how to mark Bible text within <div> elements representing page, column, and line. The <s> element is probably needed here. If <s> is already in use, perhaps <p> should be used, perhaps under a different name. Or <phr>.

Disposition: define an element similar to <seg> but which can appear at the component level. <block>? CMSMcQ to draft proposal.

Lou Burnard, 5 October 1995, on SEG (EX P01 s.35)

[Reporting comments by Henry Thompson] The <seg> element cannot appear within <u>, but should be able to; <seg> should be able to contain the same things as <u>.

[LB] %paraContent is sensitive to the selection of bases. Its inability to handle <vocal> etc. is probably an error. Probably compSpoken should be a subclass of m.comp.

Disposition: LB to draft solution ensuring that <seg> can occur within <u> and can contain the elements legal there.

Henry S. Thompson, 6 October 1995, on SEG (EX P01 s.36)

The DTD is inconsistent in its treatment of elements declared only in specialized tag sets. Some entities have distinct declarations depending on which tag sets are active, e.g. the common / component / component.seq / specialPara family. Others (e.g. paraContent) do not. Why?

The current definition of specialPara attempts to allow only element-only content or mixed content, but ISO 8879 requires parsers always to treat it as mixed content, with bad results for usability (Mixed Content Gotcha). This content model should be used for elements which are transparent, i.e. which can contain within themselves whatever is allowed at that point already. Not all elements now declared with specialPara meet this criterion: some are members of inter and some of phrase, but specialPara allows them to contain chunks. So a quoted <tree> can be included in places where a <tree> itself cannot be; this is counter-intuitive.

So for better or worse, we've been using the following definition of specialPara:

 
 <!ENTITY % specialPara '(#PCDATA | %m.phrase | %m.inter |  %m.chunk)*
which is backward compatible (it still allows e.g. quoted trees), but doesn't suffer from the mixed content problem -- e.g. it allows
 
<q>
<p>
...
</p>
</q>
It does of course give up the original goal of forcing a once-and-for-all choice of element-only vs. mixed which is as I understand it the motivation of the current version.

Now the above redefinition doesn't actually solve the <seg> / <u> / <vocal> problem, but it would if %component were used instead of %m.inter | %m.chunk, but that would not quite by backward compatible because of the slight difference between m.common and %m.inter | %m.chunk, namely the lack in the former of <stageDirection>, <castList>, <figure>, <table> and <text> (a very mixed bag).

So either we have

 
 <!ENTITY % specialPara '(#PCDATA | %m.phrase | %component)*'>
or, if we are scrupulous about backward compatibility,
 
 <!ENTITY % specialPara '(#PCDATA | %m.phrase | %component |
                          %m.stageDirection; | %n.castList; |
                          %n.figure; | %n.table; | %n.text;)*'>
and we change <seg>'s content model to be specialPara.

Disposition: related to the previous two items; to be discussed further.

Harry Gaylord, 1 November 1995, on S (EX P01 s.37)

[Same as 1995-11-01 : Harry Gaylord above.]

Peter Flynn, 26 January 1996, on tagging SGML documentation, linking to TSD (EX P01 s.50)

The phrase-level tags of the TSD should be accessible within the TEI header and possibly elsewhere. The TSD itself should be referred to from within the TEI header.

Disposition: clarify how to make the elements accessible (P3 itself can be used as an example). Also discuss linkage between TSD and document instance (using <xref> perhaps).

Files teitsd2a and teitsd2b are present in the distribution but not documented.

Disposition: these are split in order to make it possible to make the phrase-level elements of the TSD accessible within paragraphs. Clarify this in the documentation.

My parser (sgmls) complains about ambiguous content models many times, beginning on lines 54 and 61 of teihdr2.dtd. Here is the DTD subset:

 
<!DOCTYPE TEI.2 SYSTEM "tei2.dtd" [
<!ENTITY % TEI.corpus.dtd         'INCLUDE'>
<!ENTITY % TEI.prose              'INCLUDE'>
<!--ENTITY % TEI.verse              'INCLUDE' won't work with prose -->
<!ENTITY % TEI.transcr            'INCLUDE'>
<!ENTITY % TEI.textcrit           'INCLUDE'>
<!ENTITY % TEI.names.dates        'INCLUDE'>
<!ENTITY % TEI.linking            'INCLUDE'>
<!-- Extra tagset needed to allow documentation of tags in header -->
<!-- ENTITY % TEI.tagsets            'INCLUDE' commented out pro tem -->
<!-- Standard character entities -->
<!ENTITY % ISOlat1 system         "ISOLat1"
        --"ISO 8879:1986//ENTITIES Added Latin 1//EN"-->
%ISOlat1;
<!ENTITY % ISOlat2 system         "ISOLat2"
        --"ISO 8879:1986//ENTITIES Added Latin 2//EN"-->
%ISOlat2;
]>

Disposition:

Peter Flynn, 14 June 1996, Curia Project Suggestions (EX P01 s.62)

The element <lg> should be legal within paragraphs but is not.

Disposition: <lg> should be a member of inter. The <l> element, by contrast, needs to remain a member of chunk, in order to allow end-tag omission. Surrounding individual <l>s with an <lg> to enable them to occur within a <p> is not as burdensome as allowing <l> to self-nest.

The mechanism of <span>, also used for <addSpan> and <delSpan>, should be extended to cover arbitrary GIs (e.g. <unclear>, <sic>, or <q>). A source attribute is also needed.

Disposition: such a mechanism already exists: use <join> for arbitrary GIs. Need more information about source information; it may be that this is information that should be expressed using tc.

The <note> element should be legal between the lines or line groups within a line group, but is not.

Disposition: <note> and <q> should be allowed within <lg>.

The <gloss> element should be able to contain notes, lines of verse, etc. [i.e. it should have paraContent not phrase.seq? -CMSMcQ]

Disposition: open (part of general phrase vs paraContent discussion).

The <head> element should be able to contain much more structure, to accommodate complex heads of chapters or sections.

Disposition: DB to write PF asking for clarification.

The elements <gi>, <att>, and <ent> should be accessible as phrase-level elements.

Disposition: editors to make small additional tag set for technical documentation, for next revision of Guidelines.

The type attribute should be made global, or at least added to <text>.

Disposition: this was referred to architecture work group.

The <corr> and <sic> elements should have a hand attribute.

Disposition: this was referred to the TC work group.

There should be an element for subheads.

Disposition: Use <head type='sub'>.

The declaration of a canonical reference scheme should have a way to say how many units of a given type should be included in each file when the document is chunked for inclusion on a network server.

Disposition: This belongs in a style sheet. If tag abuse is preferred, then abuse a <milestone> element. CMSMcQ to write up what PF might actually need.

Frans Wiering, 14 June 1996, title pages, diagrams, musical notation (EX P01 s.63)

The <figure> element should be legal within title pages, but appears not to be (I need it for printer's marks).

Authors' names are often included in the title of the work, in my period. How should this be tagged?

The elements of the tei.nets tag set cannot include elements from the MSS and text-criticism tag sets; I made <node> contain paraContent but have not gone much further.

Isolated musical symbols in the text, like the sharp, flat, and natural signs need markup.

Musical examples are hard to encode: SMDL does not yet include a description of the actual notation (only the 'abstract score' can be encoded).

Disposition:

Syd Bauman, 15 June 1996, Various errors and new work items (EX P01 s.64)

ERRORS

Disposition: place name correction already made.

Typos

Disposition:

New Work Items

Disposition: DD to contact SB to discuss addresses in more depth; committee thinks <address> adequate as now defined in tag set for names and dates. Addition of email, phone, fax, etc. is desirable. To be revisited.

No need for a new element for URLs: <xptr> appears adequate for referencing URLs where necessary.

Other Problems

...

Summary of Open Questions

...