Interim Report on Status of P5 [MEW07]


Contents

This document attempts to cover several different things, which may at some stage be better unravelled into different documents.
  • Summarize and clarify our strategic goals in revising TEI P5
  • List some specific steps we currently identify as necessary to accomplish those goals
  • Request endorsement from the Council for those steps
  • Document the current state of affairs with respect to the new ODD format

The document includes a number of specific technical proposals for change which we would like the Council to endorse. Without this agreement, we cannot make much further progress.

The interdependency of many of the steps involved make this a difficult process. For example, we have agreed to produce a complete revision of the Tag Set Documentation chapter of P4, so that TEI P5 will be truly self-documenting. But we cannot produce this chapter until we have finalized our revisions of TEI ODD format. But we cannot finalize those until we have thought through the documentary consequences of other changes in the TEI scheme as a whole. This is not simply a boot-strapping problem (though certainly that is part of it): it has to be possible to revise the draft document defining the ODD format independently of the review process, since we may need to revise the ODD format in order to produce other documents for review!

Strategic Goals

These may be summarized under the following headings:
Interoperability
taking advantage of the work done by others
Expansion
addressing areas as yet untamed
House cleaning
clearing out the accretions of a decade
The need for interoperability implies (at least) the following subsidiary goals:
  • support for multiple namespaces
  • support for non-DTD schema mechanisms
  • clearer definition of the TEI abstract model
The need for expansion implies (at least) the following subsidiary goals:
  • incorporation of additional modules
  • survey of user requirements for further additions
  • testing and documentation of the customization layer, and training in its use

House-cleaning is a self-evident need, which probably only counts as a strategic goal because we have not had any opportunity to re-think the basic assumptions built into the TEI Architecture since c. 1994.

TEI P5: contents

As of this writing, we expect to add the following completely new chapters at P5:
  • Manuscript description
  • Multimedia and graphics
  • Standoff annotation
In addition to these new chapters, we expect substantial revision of the following chapters:
  • Authoring and tag documentation
  • Feature structures
  • Transcription of primary sources
  • Languages and Character sets
  • Linking, Segmentation, and Alignment
In addition, substantial revision will be needed in the following chapters (but has not yet been assigned, much less begun)
  • Gentle Introduction (doesnt discuss relax)
  • The header
  • Default structure (discusses class system)
  • Core (too long)
It is currently proposed to remove the following chapters:
  • Writing System Declaration (agreed at Council meeting May 2003)
  • Graphs, networks, and trees
  • Terminology

Steps on the path

Currently chartered workgroups are producing relevant material for several of the new and revised chapters specified above. A major problem at the moment is that drafts cannot be produced in semi-final form until we have a working documented version of the new ODD format.

In November 2003, we thought the timetable was like this:
December 2003
public call for small changes
January 2004
complete and document revisions of new ODD format
February/March 2004
incorporate new/revised chapters
April 2004
end of public call for small changes
May 2004
seek approval by council of new/revised chapters
June-July 2004
public alpha review (possibility of feature change)
July-October 2004
public beta review (feature frozen)
November 2004
release to TEI members
December 2004
full public release of P5

Although somewhat optimistic, this timetable usefully indicates the sequence of events: we cannot reasonably expect new drafts until we have documented their format, and we cannot do that until we've decided what it should be! For that reason, it seems to us rather urgent to reach agreement on a number of pervasive changes in the current TEI system, which are listed below, in no particular order.

Naming changes
At P5 we plan to make a large number of systematic changes in naming conventions. A separate working paper (ME W 08) lists these. [Note: Agree or improve name change proposals in MEW08]
Schema language/s
At the Oxford meeting of the TEI Council, we agreed to move to using RelaxNG as the means of formal expression for the TEI schema. This has been implemented. We also agreed that it would be desirable to generate output schemas in each of XML DTD language and W3C Schema as well, on user request. We did not explicitly address the question of whether user extension and modification of the scheme should be supported in all three schema languages: in the event, our current belief is that this is really only practicable in Relax NG. (A RNG schema can however be converted to any of the other languages as a second step). [Note: User extensions and modification files must be prepared using RelaxNG syntax]
Rationalization of attribute values
With the move to RelaxNG we aim to introduce a better range of attribute value validation facilities. The goal is that all attribute values should match a W3C datatype, or one of a small set of TEI-defined patterns. This subsumes the agreed need to remove attribute values which potentially contain tagged text. [Note: All attribute values to be reviewed and changed to match W3C datatypes or TEI-defined pattern] [Note: all elements bearing "text" attribute values to be rethought and redesigned, perhaps using the <choice> mechanism]
All elements to be classified
The class system of P4 is only partially applied, largely because its implementation via parameter entities is so fiendishly complicated. The use of RelaxNG patterns gives us a class system which is easy to apply and understand, and also greatly simplifies modification of the schema. We therefore propose to extend the class system systematically, and to deprecate content models which refer to specific elements rather than to element classes. [Note: As far as possible, all content models to be re-expressed using either one of the standard content macros, or by reference to classes of element. ]
Remove multiply defined elements
In P4, there is a small number of elements (e.g. <eg> , <gram> ) which have different definitions in different modules. This is now seen as a needless complication: if an element is to have a different definition in some context, this should be achieved by redefinition of the element in the same way as usual. [Note: All element names to be unique across the scheme]
Changes to the architecture
There are five existing auxiliary tagsets: writing system declaration (WSD), feature system declaration (FSD), independent header (IHS), and tagset declaration (TSD). It has already been agreed to drop the WSD and to recast the TSD as an additional tagset. The IHS is an artefact to enable a valid document consisting only of headers, which could be accomplished in other ways. This leaves only the FSD, which could also be handled in the same way as any other module. With a view to further simplifying the process of schema construction, we are considering whether or not the current distinction between base and additional tagsets is necessary. [Note: Concept of auxiliary DTD to be dropped in favour of a discussion about namespaces and ways of combining TEI modules]
Branch ODD format
At present the maintenance version of the Guidelines is still in the (undocumented) P4 ODD format. The experimental P5 versions are generated from these by an increasingly tortuous series of scripts. To make real progress in testing and developing the new ODD format, it has to become the maintenance form for TEI P5. We would therefore like to freeze the current P4 ODDs (they will still be needed for maintenance of TEI P4), to run the conversion process once more, and then to use TEI P5 format ODDs as the development source for all subsequent work on TEI P5. Some external checking of the process seems advisable before this can take place however. [Note: Check equivalence amongst (at least) P4 content models, generated P5 RelaxNG equivalents, and P5 DTD-generated equivalents] [Note: Switch to developing and maintaining TEI P5 sources in ODD-NG only]

Detailed changes

The document ChangeLog lists changes made to the P5 ODDs to date. We highlight below a few specific kinds of change below:

Class changes, content model changes

  1. added new classes: paragraph, categorize, segment, profile, encoding, header. These are mostly coping with vagaries in the header, where content models refer to elements which are not defined until modules are loaded.
  2. listBibl.tag: in place of "trailer", use "divbot" class, to avoid dependency
  3. add class.teiText and class.teiHeader, and change content model of <TEI> accordingly.
  4. dieg.tag: changed name of this <eg> to <dicteg> , to avoid overlap with <eg> from tagdocs
  5. add pattern for "schemapattern"; this defaults to ANY, but p5odds.xsl redefines it as "anything from RelaxNG"
  6. define and add ODDPHR and ODDREF from tagdocs to low-level classes

Schema generation

  1. when creating output modules, do not produce those with type "decls"; instead, include the contents of that module at the front of the module with the corresponding name with "-decl" stripped off.
  2. put a fixed list of overrides of special defines in the start of tei.rng
  3. in schema, each element interleaves itself into its model class. we just include the schema file if we want to use it, and it extends the classes of which it is a member

Additions and deletions

  1. figure.tag: added url, width, height, scale attributes
  2. xpointer.cla: added url attribute
  3. kill all of WSD, which means all the following files are gone: basewsd.tag directn.tag exceptns.tag script.tag teiwsd.tag wsdccs.tag wsdchar.tag wsdchars.tag wsddesc.tag wsdents.tag wsdfig.tag wsdform.tag wsdglob.cla wsdlang.tag wsdnote.tag wsdxfig.tag
  4. take out top-level material which makes TSD a separate DTD

Last recorded change to this page: 2007-09-16  •  For corrections or updates, contact webmaster AT tei-c DOT org