TEI MI M 01 (draft)Draft of TEI XML Migration Task Force Meeting Minutes, 2002-10-13/14


Initials Used for People

  • SB Syd Bauman
  • AB Alejandro Bia
  • LB Lou Burnard
  • JH Jessica Hekman
  • TR Tobias Rischer
  • CR Christine Ruotolo
  • NS Natalia Smith
  • ST Syun Tutiya
  • JU John Unsworth
  • CW Christian Wittern

Meeting took place in the Claridge Hotel, Chicago IL, USA on Sunday 13 and Monday 14 October. All times listed are the local-time in Chicago.

Commenced ~13:28 with SB, LB, AB, JH, TR, CR, NS, ST; CW joined ~13:50.

Introductions.

Contents

Objectives

Review of list of objectives from our charge.

CR Q: are DTDs in scope? Consensus is that they are, but because few people will need help here, low priority. CR: plan to have relatively vague suggestions in recommendation documents.

CR suggests our focus should be on P3->P4. Consensus is that outlining tasks for P3->P4(XML) will include all steps needed for P4 SGML -> P4 XML. Asks if we want to have more of an advocacy role. LB answers yes. SB agrees, but wonders if any advocacy is necessary. LB points out that (disregarding extensions) a P3 document is ipso facto a P4 document.

Brief discussion of why a project wants to move to XML: access to new technologies, new tools (XML); non-support of P3 (P).

We have 2 sections of document already! 1. Scoping; 2. Motivation.

Case studies.

SB asks do we need a test suite? Is it hard to make? JH is concerned we may not be able think of 'em all.

LB points out difference between test suite and using samples. Thinks we need to ascertain what practices are via survey (#2).

Action 1: LB 2002-10-22 Remind MP to send OTA materials
Action 2: SB 2002-10-22 Send WWP samples

CR asks whether or not we need test suite. SB asks how hard is it to do? CW suggests start with list of differences between XML and SGML. 1 .

Question comes up as to who we are surveying: SB holds repository reps insufficient to survey.

Summary that we will not use test suite, but rather results of survey of real cases, perhaps augmented with a fabricated test if deemed necessary.

Although software development is not an output of this group, suggestions for areas ripe for new tools or modifications to existing ones are.

Modifications to ED W 76 made.

Action 3: CR 2002-11-01 Write preliminary work-plan and circulate to list

Survey

CR: not too many responses. 2

CW explains the recent experience of Character Set WG with its survey.

SB suggests as only 50 projects listed on TEI website, perhaps phone survey. Generally disliked, but LB counter proposes e-mail with caveats of privacy. LB likes e-mail and phone call. Question discussed about whether we just want files or answers to survey questions, too.

So, after identification stage letter asking a very brief survey culminating with asking for only a small data sample (no DTD or other supporting files should be explicitly requested). Non-respondents to be contacted by phone. Respondents for whom we have questions followed-up by e-mail. Also a thank-you.

Five stages of survey project:
  • Identification of projects using TEI (SGML)
  • Survey letter for collection of samples. Telephone follow-up of non-responders (repository group to help)
  • Analysis stage: divvy up sample files and check for various features.
  • Follow-up based on number and nature of samples — e.g., asking for DTDs when needed, getting info on technical, organizational challenges and opportunities
Samples will then be checked against a checklist of issues.
Action 4: TR 2002-11-01 Create MI W 04, the checklist for stage 3 examination of files
Action 5: CR ?? Develop database of contacts
Action 6: SB 2002-10-23 Follow up on JF's survey (of which data went to JU), find out where the data is.
Action 7: LB 2002-11-01 Develop list of projects that use TEI to which we should send survey, get data
Action 8: all 2002-10-26 Send LB any projects you know of
Action 9: CR & SB 2003-01-02 Draft survey letter asking for samples and asking questions
Action 10: LB 2002-11-01 Look for "tei" on HUMBUL; coordinate the great TEI search.
Action 11: SB 2002-10-26 Draft "stand up and identify yourself" letter

XML4LIB, TEI-L, HUMANIST, BIBLIOTECH, DIGLIB, LINGUIST, CORPORA, ANSAX-L,

Action 12: SB & LB 2002-11-15 Get a list of lists from MF and get "stand up and identify yourself" letter posted to all lists (including above).

Identify . . .

Split out technical to expert group, organizational to repository group.

Action 13: CR 2002-10-28 Initiate organizational discussion in repository group.

Discussion of order of objectives in Charge. Decided charge is really unordered, not to worry about it. CR to provide order in work-plan.

Decided to discuss further issues (e.g. XPointer and other P5ish issues) in appendix to output reports.

Adjourned ~17:18.

Minimally invasive vs. canonical

Commenced ~09:15.

CR reviews discussion from list.

Discussion of whitespace. General agreement that we need to try to munge source whitespace so that parsed whitespace matches.

Discussion of character entity references. LB argues that in migration character entity references should be converted to characters or numeric character references. Consensus is to have prose discussing reasons for desiring this conversion (that later XML processes won't be able to handle character entity references), but to recommend it as an option.

Discussion of external entities.

Action 14: SB 2002-10-28 Ask Steve DeRose for his notes of what he did to convert P3 ODD files to P4 ODD files.
Consensus is the same as for the previous two: user option with discussion of why you'd prefer to use XInclude to system entities.

Discussion on DTDs: yes, we need to keep 'em. XML tools that won't do well-formedness work on files that specify a DOCTYPE declaration are broken, so it's not our problem.

Can address dirty hacks.

Comments: can't have comments inside other declarations; can't have multiple comments inside one comment; <!> not permitted.

Action 15: JH 2002-10-14 Investigate how comments are processed in SX or other tools

‘strategies’ document will have things like advising migrators to think about issues of, say, XInclude v. external entities. ‘practices’ document will have advise on how to convert to XInclude or how to migrate without converting.

In strategy document we should probably point out that more migrations in the future are likely, but that if you're happy with P4, TEI does plan to support it, you could just stay there.

Specification of defaulted attributes: we'll recommend not to specify them (and hopefully point out ways to migrate without them) unless you really need them.

Discussion of DTD conversion: we can't help those who did not use extension mechanism, but we should have a paragraph addressing the problems created by not doing so.

Strategic document should discuss the fact that migration may be an opportunity to improve your DTD.

CR: In technical report document we need to address
  • minimal conversion
  • easy conversion
  • conversion that maximizes XML tool usability
  • conversion that is forward-looking to P5, or at least what we can predict of P5.
  • in depth discussion of macro issues identified in samples
SDATA entity discussion. SB suggests three categories
  • characters that are in Unicode
  • characters that are not in Unicode

    solutions, ala P4 chap 4.2.1

    • CDATA
    • PIs
    • markup (<c>)
    SB suggests we need to better describe the disadvantages of each method in our practices document
  • others
    • ambiguous glyph
    • glyph exists in Unicode with different meaning in the document
    • temp data capture flags

Processing environment

LB points out difficulty in actually managing all the little pieces of a sample (or real) case. Corollary is that practices document needs to address catalog files.

Things to Consider

  • instances
  • DTD extension files
  • catalog files
  • style-sheets and other parts of processing environment

Add questions about processing environment to third round survey questions.

SDATA entities to be attacked by a separate individual in practices document.

Discussion of problems found in samples

TR: ??

LB: consultancy may be desirable. General agreement that a workshop on specific issue like, e.g. extension files, would be a good thing.

SB asks about recommending open source v. proprietary software. In resulting discussion LB points out that he'd prefer we say ‘this tool does this’ rather than make a recommendation ‘use this tool’.

LB sees only three strategies for obtaining tools for migration:
  • in-house development
  • buy proprietary tools
  • use open source
Action 16: JH 2002-11-04 Seek out vendors of useful tools, and contact them to find out rudimentary information about their tools.

Case Studies

CR expects repository reps to write up a case study each. Recommendations for tools & strategies should be ready by mid- to late-December to give repository reps a month to work before joint meeting.

Action 17: CR 2002-12-01 Write up a framework of feedback information we want from repository reps, MI W 01, Format for Case Study Feedback

Dividing up Labor for Writing up Reports

Strategic document: MI W 02 Strategic Considerations in Migrating TEI documents from SGML to XML.

  • Challenges, opportunities, and motivation.
  • Types or scope of migration (P3->P4 or P4->P4)
  • Areas of migration (instances, DTD extensions, catalog files, processing environment)
  • Levels of migration, e.g. minimal surgery approach, get almost to P5 approach, et. al.
  • Appendix: potential impact of future versions of the Guidelines on migration issues.

MI W 03 Practical Guide to Migration of TEI Documents from SGML to XML

  • DTD conversions
    • SDATA (ST)
    • Extension files (TR)
  • Instance conversion: tools. Issues: whitespace & comments, prologue & file structure (e.g. external entities) (JH)
  • Recommended work-flow (AB)

Section write-ups due 2002-12-02.

Action 18: CR 2002-11-25 Send reminder mail to group to get write-ups done in 1 week

Adjourned ~16:00.

Notes
1.
Available from http://www.w3.org/TR/NOTE-sgml-xml; doesn't seem to be on the CD, though.
2.
To posting to TEI-L of 2002-10-04 14:12-04 migrating TEI resources from SGML to XML.