Draft of TEI XML Migration Task Force Meeting Minutes, 2003-06-16/17


Initials Used for People

  • SB Syd Bauman
  • AB Alejandro Bia
  • LB Lou Burnard
  • TE Tomaž Erjavec
  • JH Jessica Hekman
  • TR Tobias Rischer
  • CR Christine Ruotolo
  • SS Susan Schreibman
  • NS Natalia Smith
  • JU John Unsworth
  • JW John Walsh
  • SW Sarah Wells
  • FW Frans Wiering
  • CW Christian Wittern

Meeting, hosted by the Biblioteca Virtual Miguel de Cervantes, took place at the Universidad de Alicante, on June 16th & 17th, 2003. All times are local (+01).

Commenced a bit late at ~10:50 due to bus delay ostensibly due to weather, with SB, LB, AB, JH, CR, NS, JW, SS, SW, TE.

The Chair and local host worked out a few schedule changes accordingly.

Group extends a warm and heartfelt ‘thank-you’ to AB and the Cervantes project for hosting this meeting.

Contents

Version Control

Question of whether or not we need our own version control system for the group. We have already suffered at least one failure (two people writing to the same file). LB doesn't think we need to, as there should always be only a two-way communication between author & technical writer (SW), and then technical writer & editors.

Process editors use to update TEI depot and website explained.

CR suggests the following work flow:
  • author finishes section, sends to list,
  • task force members send comments directly to list,
  • author updates (canonical copy on TEI website);
  • at some point ownership changes from author to technical writer (SW),
  • from that point on comments continue to go to list, but only technical writer updates (canonical copy still on TEI website.)

For this meeting, SW will make changes, website will be updated afterwords.

osx

1.5 is the public version that contains some of JH's changes. New changes (since last meeting) have been checked in by JH, but (at request of OpenJade team) to two different branches, thus very difficult to even build it via CVS (and there are no tarballs, let alone binaries).

Consensus is that our documents need to be out before we can be assured that OpenJade will update osx, so we will plan on including migration steps that hack around OpenSP 1.5 limitations, but footnote them as not needed iff new version of osx is available.

Agreed to put any available binaries we have (cygwin or Mac OS X or whatever) on publicly available website. Consensus of group is that we have done what we reasonably could to get a Windows binary; we don't have access to the programming expertise needed, and on the theory that small projects can use the web interface, any larger project will have GNU/Linux or at least cygwin capability, we're going to all but stop trying.

Action 1: SB create web interface to osx
Action 2: JH start ball rolling on getting osx 1.6 created via merging the current bug-fix and new-feature branches 2003-06-24

Survey and Deadlines

Action 3: CR double-check what our NEH deadlines are 2003-06-24
Suggestion to issue a request for comments publicly on TEI-L on Mon, 01 Sep 03 (i.e., finish all revisions of all documents) to get feedback that might be incorporated before the Members' Meeting.

Thus, we can establish deadlines by working backwards from that date.

Action 4: SWtechnical writer returns ‘ownership’ to section authors (by sending documents to editors who post to TEI website, then announcing transfer on list).2003-06-21
Action 5: SWtechnical writer posts revised MI W 022003-06-21
Action 6: SWtechnical writer posts revised MI W 032003-06-28
Action 7: authorsMI W 02 sections posted to list2003-07-05
Action 8: authorsownership of MI W 02 to technical writer2003-07-12
Action 9: authorslast date for posting revised section to list for comments2003-07-19
Action 10: authorsownership of MI W 03 to technical writer2003-07-26
Action 11: SWtechnical writer transfers ‘ownership’ of MI W 02, 03, & 04 to editors 2003-08-23
Action 12: edspost final revisions of MI W 02, 03, & 04 to website2003-08-31
Action 13: CRMI W 02, 03, & 04 announced to TEI-L.2003-09-01

Currently authors have ownership of case studies.

Reports: Macro Issues

CR raises issue that MI W 02 and MI W 03 have quite a bit of overlap — general agreement that there will be some parallel and coverage redundancy, but to continue policy of general overview for MI W 02 and details in MI W 03. Specifically CR's nice table on osx switches should be moved from MI W 02 to MI W 03.

Editors concede that they need to write their section. Editors solicited input as to what should be in this section, and were given the following items:
  • dropping support for P3
  • no route from P3 directly to P5
  • we don't know details yet but here are things that will most certainly be different
I.e., what you have now is dead, what is coming is even better.
Action 14: eds write your section 2003-06-28
Missing sections:
  • miw02: CP & eds.
  • miw03: none
  • miw06: CR & SB
Consensus not to request MI W 06 section from TR.

It was agreed that we need a consistent terminology of migrations.

Agreed to remove casual terminology, not address reader as ‘you’, and to refer to this task force as ‘we’.

Agreed that readers should see titles, although reminder to authors to please encode with an <xref> with url="./miwXX.html" (not <xptr> ).

Specific Document Issues

MI W 02 Introduction

SS & CR to take a crack at re-writing.

Action 15: SS & CR re-write introduction section of MI W 02 2003-07-05
A preface of sorts with our statement of authority, as it were, should go on index page of website and in introduction to MI W 02.

MI W 02 Motivations, Opportunities, Challenges

CR lists motivations as (in order):
  1. P3 not supported;
  2. difficulty in P3 directly to P5;
  3. availability of tools and related specs.

On the topic of open source tools we decided that MI W 02 should simply point to TEI Software page. But the list of X- standards should be expanded

and explained more?

.

Suggested re-write of para 1 sentences 2-4 of Motivation section to SS, who will re-write and post.

Action 16: SS re-write sentences 2–4 of paragraph 1 of Motivation section of MI W 02 2003-07-05

After a bit of discussion on the implications of ‘standards’ it was agreed to change ‘standards’ to ‘standards and specifications’.

Action 17: JW change "standards" to "standards and specifications" 2003-07-05

The point of the Challenges subsection is to admit up front there are costs. SW has some prose for it.

Concern (SB & CR) that we have too many internal references. Consensus was that they're a good thing, but that after we've assembled documents into a whole we need to look over and see if there are too many.

MI W 02 Areas of Migration

At lunch SS & JH …

Unless someone can usefully fill this in, I'll just delete it.

MI W 02 Workflow

Add mention of ‘if old DTD via chef, make new one’.

Add sentence at end of DTD section that DTD extensions can be hard, whereas many will find instances easy. Include explicit pointer to section of MI W 03.

Catalog file section to mention that entity conversion is a pain. Decided to put this under processing environment. JH asks about XML catalog syntax. Consensus is that we will provide a pointer to further info. If software is available at the time, we'll mention it.

JH to later the "by ahnd" phrase.

if anyone has any clue what this might mean, please let me know. Otherwise, we nuke it.

Processing environment: XML tools more likely to stop at first error; more detail about delivery environment including index tools, web output generation stuff

SS is concerned over discrepancy between "is straightforward" (instance section) and TR's list (MI W 04). JH to change to can be straightforward.

MI W 02 General Recommendations

List of editors: disagreement among group, but in the end we decided that a disclaimer ‘you need to think about choosing your editor carefully, but we're not talking about particular editors’ probably with a pointer to somewhere that discusses this stuff.

We decided the wording of ‘Target production environment’ was a bit too strong — while projects should consider production environment first, it may be too much to actually have it running first. 1

The subsection on Training, once the tools discussion is removed, is pretty short, and therefore will be rolled into the Resources section.

Replace language of 2nd para of Migration method: with first para of Resources. Discussion of the issue of production stop or not. The language is to be toned down a bit.

Changes to final para of Resources section (SW has details).

Suggestion that the data testing rule of thumb ‘1, 10, 100%’ be used.

Add item to list in Other recommendations for ‘design and run test procedures’.

MI W 02 Special Considerations in Migration

Table of switches moves to MI W 03.

Question of whether a comment is a declaration or not arose — answer is that in SGML they are called ‘comment declarations’ while in XML they are ‘comments’ 2 .

Consensus was that we should recommend use of the -xpreserve-case and -xempty switches in all conversions that use osx.

Large discussion on what it means to be a robust conversion; then all conversion categories. Agreed to eliminate the explicit distinction into easy, minimally invasive, robust.

Discussion of ESIS — decided to eliminate reference to jargon here.

Discussion of being more generic than mentioning osx specifically.

New Topic: ‘supra-validation’

3

JH suggests (and no objections raised) that a new paragraph or section about the various possible pitfalls of migration (including errors created by migration scripts, some of which might not be caught by validation) be added to MI W 03. And in the General Recommendation section of MI W 02 a recommendation to design a way to check your data.

Action 18: JH Write paras in MIW03d section on looking for errors introduced by migration, send to list. 2003-07-07
Action 19: eds generate list of attributes that have default attributes done

MI W 04

Action 20: CR Take a look at updating MI W 04 to be a useful checklist for migrators rather than survey takers.

MI W 03

Comments are for the author, should be removed before we make the files public. If author wants to send commentary to the rest of the group uses <note> .

CR is suggesting outline:
  • Intro (by AB)
  • Tools (by JH)
  • Workflow (by AB)
  • SDATA (by CW)
  • DTD (by TR)

Action 21: JH incorporate info on HTML Tidy (posted to list by TE and SS) into Tools section 2003-07-22

SS & CR to re-work intro to MI W 03 to make it more parallel to the intro of MI W 02.

Question of XMetal as tool was raised —

Action 22: JH look at FW posting on XMetal and decide how to incorporate it into tools section

Discussion of tei2tei.xsl — we'll be discussing it as an example, not a full tool.

With respect to osx, group recommends JH put in a footnote referring to soon-to-be-available features, which could then be changed to a paragraph quickly if in fact the new version is released in time.

Decided to either not mention Notetab at all or in a general sentence in workflow about using editors as a happy front end.

Long discussion on DTDs: point made (JH) that we don't migrate the TEI DTD, we just need to tell people how to go get the new one.

Workflow section to two subsections: Intro, DTDs (w/ pointer to extensions section), catalogs, bulk on instances, processing environment.

Action 23: JH research SGML -> XML catalog converters 2003-07-07

Instance migration section

  • Numeric version of osx to be relegated to footnote;
  • change ‘Open SX’ to ‘OpenSP’;
  • fix case of program names;
  • further notes from JH to SW directly.
Actual flow of steps (currently under ‘Document migration’) needs to be altered to something like:
  1. osx,
  2. possibly a style-sheet (e.g. tei2tei.xsl) to a) correct case, and b) remove default attributes, and c) pretty-print, or
  3. if no (2) use pretty-printer (e.g. HTML Tidy) if desired.

‘… likely workflow that integrates all of the steps above.’

At LB's suggestion the bash script in File conversion in batch mode section should become more generic, pseudo-code if you will. We will then have pointers into the tools page at least from the case studies if not from here also to the actual scripts that people used (and AB's example), well commented.

SDATA section

Not a lot to say now. SW to copy-edit and repost to list

Action 24: SW Copy-edit SDATA section of MIW 03 and repost to list. 2003-06-28
Action 25: CR send mail to CW & TR giving each a heads up and the option to work on their sections if they want 2003-06-24

DTD extension migration

Not much to say now; v. good section, although it outweighs all the others by several pages. Discussion of whether or not we should ditch

Action 26: CR draft an introduction for MI W 06 2003-07-07
the explanation of extensions; people liked it too much to discard.
Action 27: CR look at LB's reference to BNC documentation and decide whether or not it gets sucked into our document world (per SW's question). 2003-07-07
Action 28: SB clean & post draft minutes 2003-06-22

Notes
1.
Another version control problem cropped up.
2.
This is not entirely precise. In SGML,
<! -- comm decl with -- -- two comments -- >
is a single ‘comment declaration’ with two ‘comments’; in XML there is no such distinction, and the above is not allowed. (I.e., only one comment per comment declaration, no whitespace between the ‘--’ and the ‘<!’ or ‘>’, and the whole thing is called a ‘comment’.)
3.
We did not call this ‘supra-validation’ at the meeting, but I needed some way to refer to it here in the minutes, and that's what we call it at the WWP. Feel free to suggest another term.