From: CBS%UK.AC.EARN-RELAY::EARN.UICVM::TEISTEER,CBS%UK.AC.EARN-RELAY::EARN.UICVM::U35395 25-FEB-1992 00:27:43.33 To: LOU CC: Subj: TEI SC M20 Via: UK.AC.EARN-RELAY; Tue, 25 Feb 92 0:27 GMT Received: from UKACRL by UK.AC.RL.IB (Mailer R2.07) with BSMTP id 3412; Mon, 24 Feb 92 22:42:53 GMT Received: from UICVM by UKACRL.BITNET (Mailer R2.07) with BSMTP id 5141; Mon, 24 Feb 92 22:42:50 GM Received: by UICVM (Mailer R2.07) id 3241; Mon, 24 Feb 92 16:40:23 CST Date: Mon, 24 Feb 1992 16:39:52 CST Reply-To: "TEISteer: Text Encoding Initiative Steering Committee List" < TEISTEER@EARN.UICVM>, "C. M. Sperberg-McQueen" Sender: "TEISteer: Text Encoding Initiative Steering Committee List" < TEISTEER@EARN.UICVM> From: "C. M. Sperberg-McQueen" Subject: TEI SC M20 To: Lou Burnard Minutes of the Steering Committee Meeting Bergen, 18-19 November 1992 C. M. Sperberg-McQueen TEI SC M20 February 24, 1992 (16:37:42) Draft February 24, 1992 (16:37:42) Section 1 Section 1 APOLOGIES FOR ABSENCE Present were: Absent were: NI, RA. 2 2 MINUTES OF THE LAST MEETING Accepted. 3 3 REVIEW OF MYRDAL MEETING Outstanding Problems The outstanding question is what to do about feature structures. Gary Simons (GS)'s suggestions were quite clear, but very substantial for something presented new on the last day. TL noted that he had long been rethinking the feature structure notation but had not presented his suggestions in writing. GS had persuaded TL that some of the details of his thinking were wrong, so the changes now on the table are in fact not very radical. Should the atomic content become an attribute value? If so, should it be on an ATOMIC tag or on the parent FEATURE tag? This is motivated in part by a comment from Mark Vilain, who objected to atomics' having data content. The argument for renaming F.STRUCT as RECORD and FEATURE as FIELD is to make the names less intimidating to non-linguists, without affecting linguists' ability to use the structures. LB observed that not everyone liked the new names; MSM suggested F and FG. The group discussed the possible addition of a feature structure type declaration. TL presented an example of the cross-reference mechanism for feature structures; noun singular noun ... SH reviewed the subcommittees which had worked in Vatnahalsen. Character set proposals were mostly uncontroversial. Text chunk committee proposals achieved near consensus; MSM and LB were not sure that it would actually handle Shakespeare's casual shifts from verse to prose. The committee should be left to prepare their pro- posal, and asked for some examples. Inheritance and grouping committee. PR did not want to recommend a particular mechanism because of the lack of software; this suggests a problem of presentation: should the application of general tools to specific areas be described under the general tools or under the specif- ic areas? MSM suggested that description under the specific areas would be better. MSM expressed serious skepticism about the idea of borrowing the GRP tag to handle text-critical problems; he did not think any content model could be written to behave as required. Group 5, situational parameters, achieved consensus and its work is in a good state. SJ pointed out that place, time, and channel may all depend upon individual participants: telephone calls, for example, do not take place in a single place. The situational parameters also do not, in speech, characterize the whole text, but parts of it. (LB observed that this could also be true in written texts.) SJ recalled the original distinction between quotation and direct speech; this became quotation in P1. SJ would prefer quotation to be used only where there is real quotation. LB noted that the same ques- tion had arisen in the literary work group. This is not to be resolved immediately, but it would be nice to solve it. NB this is still out- standing. The uncertainty group did not impress everyone as having solved the problem, but MSM argued that SJ suggested the name INDISTINCT for illegible/inaudible passages, and PRECISION rather than certainty for some current applications and in general for measurements. Group 7 on pointers and alignment maps did not completely finish; TL wants to sit down with MSM and SJ. Plan for Resolving them Some things that need change are simply things that need cleaning up; some are directions for long-term further work. What to do with P2; we had expected no serious changes between P2 and P3, but with the volume of new material now in the works for P2, we will definitely need to perform a technical review of P2. LB urged that it would be dangerous to drift into an unintended poli- cy of making serious technical changes between P2 and P3; MSM agreed, but suggested that the technical review would be necessary anyway, to identify problem and distinguish those which can be fixed with minor changes, those which represent gaps to fill with further work, and those which represent serious flaws (and would result in removing material between P2 and P3). SH proposed, funding permitting, to hold a review meeting in the spring, with working papers evaluating sections of P2. The meeting would produce lists of corrections to make in P3, and lists of areas for further long-term work. Possible methods of review: * review meeting as per SH * do nothing: mail P2 out and hope * select specific reviewers and ask for reports (how motivate?) * hold meeting/conference to plan future (Poughkeepsie II) * inform public to comment before the specific meeting * do something with affiliated projects * (SJ) mail to specified reviewers with a request to look at very spe- cific points, e.g. Wilhelm Ott with a request to look at character sets -- people who can be expected to do something and may feel some obligation to do something. Then invite people to a meeting depend- ing on how people have responded. * get advisory board more deeply involved in soliciting technical review by their people. If such a meeting is to be held, it would be 23-24 March 1992. Probably Chicago (possibly New Jersey as backup). Undecided whom to invite, what sort of people. Can we do anything about the work groups which have not produced work? Mss. still plans to draft. Paul Ellison claims to be writing sections on mathematics and on tables. 1. Character sets, done if they produce text 2. Text criticism has a draft 3. Hypermedia 4. Formulas 5. Corpora -- will appear mostly in header chapter 6. Printed books -- will not appear 7. Literature -- will do joint draft 8. Linguistics -- will include morphological starter set as FSD 9. Spoken texts -- will have draft 10. History -- will do names, dates 11. Dictionary -- Nicoletta will do it 12. Lexicon -- Ingria says he will draft 13. Terminology -- will have a draft Tutorials and Casebook SH asked SJ, TL what topics should be covered by specific tutorials: * spoken texts * morphological analysis * trees (from Penn Tree Bank) * historians * literary texts (needs more thought) * terminology * dictionary / lexicographers * text critics * metrical analysis? (David Chisolm) For the case book, editors should list material we might get from affil- iated projects and circulate it. Plan of work SH noted the plan for the TEI to continue as an umbrella organization in the standardization area. 4 4 MATTERS ARISING FROM PREVIOUS MEETING NOT DEALT WITH ELSEWHERE TR9 American representation MSM spoke to Elizabeth Brown of AHA, who has a number of suggestions for American participation. This area will need to be reconsidered, along with printed books; no action to be taken now. AI2 document to D. Gibbon LB sent a list of addresses to Wendy Plotkin asking that AI2 W1 be sent to them; MSM believes that WP has sent the paper to all of them. MacWhinney LB reported that his most recent contact with MacWhinney has been quite cordial; no action is needed. AI3 summary and survey of contents SH has recently been asked what the status is of Rosanne Potter's survey of literary scholars; RP seems to be collecting responses, but tabulating them is not at the top of the list. The MLA does not seem very organized with regard to TEI participation; it's not clear what is to be done. SH actioned to contact relevant people in MLA and sort out the problems, to wit: how to channel information within the MLA. MSM noted that the MLA Newsletter has run pieces on the TEI two or three times, so that it is not an issue of informing the membership at large; the problem appears to be that relevant committees and projects within the MLA have not kept themselves informed and we have not kept them informed. SH will write and send a reply. The summary of the AI3 comments on P1 has not gone further. SH will continue to ask for an agreement that the summary is a fair paraphrase of the comments; MSM will ask WP to cross-index the summary with the original. SH will draft reply to the critique; it is important that we be seen to have answered it point by point. 5 5 REVIEW OF BUDGET AND REMAINING EXPENDITURE A review of the accounts showed the following amounts remaining dis- posable (not spent and not committed) in the following accounts: $136,000 in NEH cycle 2 $ 60,000 in Mellon funds $106,000 (85,000 ECU) in EEC funds for 91-92 (when the contract is signed) $302,000 total If we assume the Myrdal meeting will cost $50,000, we have about $250,000 left. $40,000 for the advisory board meeting leaves about $210,000. Were the EEC contract to fail, we would have $196,000 on hand. Our commitments then would include: * ca. $20,000 already lent to the EEC account from Mellon * ca. $25,000 for Oxford salary (committed in writing) * ca. $20,000 for Oslo * ca. $50,000 for the Myrdal meeting * ca. $35,000 for the Advisory Board meeting In this case, we would have about $45,000 disposable. That would be enough to fund another review meeting for P2. It was agreed to fund TR9's manuscript meeting in February if the EEC contract is funded by then. SH will so inform JH. The point of the meeting is to review JH's and CH's draft(s), refine the suggestions, and identify areas needing yet further work. It was explicitly understood that this meeting's results may not -- probably will not -- be reflected in P3. A review meeting (23-24 March) would cost ca. $17,000 with 15 people. MSM suggested some further meeting with the affiliated projects might be feasible. LB noted that Gunnel KaEllgren and Rich Giordano have asked for fund- ing for sending Rich to Stockholm for a couple days to work on the TEI encoding of the Stockholm/Umeå corpus. Agreed to fund up to 300 pounds. MSM reported a suggestion from Dan Greenstein that a meeting of four or five people in Glasgow could readily make serious progress on the issues of historical encoding. Agreed to fund this if the EEC funding comes through. Try to hold this in the first six weeks of 1992; produce a case study and some tutorial material? General rule agreed that if the EEC contract comes through, the edi- tors jointly can authorize ad hoc travel up to $500. 6 6 EDITORS' REPORT ON OTHER ACTIVITIES LB reported on the Bibliothe"que de France's work group meeting on the Projet de Lecteurs Assiste''s par Ordinator. This was a work group / seminar for the exchange of ideas, much of which was very interesting. About 60,000 works are to be digitized either into bit-mapped images or (further) into encoded texts. Franc,,ois Chahuneau was asked and recom- mended they invite LB. SGML 91 in Providence was very successful; a substantial trip report is in progress. SH asked whether we are operating on approximately the same plane as others working with SGML; LB said that on the contrary, Yuri Rubinsky had included the TEI on his list of the most important events of 1991. The total funding for the TEI and all its affiliated projects, estimated roughly, came to some tens of millions of dollars of funds on projects committed to TEI and SGML. CARG at AAR/SBL will be the end of this week. MSM will give a prose- lytizing speech to CARG. The editors will also confer, and are assigned to ask Bob Kraft urgently to see that SBL files a formal request to join the advisory board as soon as possible, to ensure that SBL can be repre- sented at the May Advisory Board meeting. CATH 91 is to have a two-hour session on the TEI to be done by LB and Elaine Brennan; LB reports that EB has fallen silent and he is growing concerned. He has asked Rich Giordano to come as well, but has not spe- cifically invited RG to help with the workshop. MLA will have a TEI session: Elaine Brennan, Malcolm Brown, and MSM will speak. The Pisa workshop may be followed by a low-key informal meeting of the editors; DW was unhappy that the list of attendees was not firm, and felt that if the workshop was not well attended it would damage the TEI's reputation. LB and MSM resisted the characterization of the meet- ing as a 'workshop'. SH asked about the editors' work with Chadwyck-Healey. At a presen- tation of the English Poetry project at Princeton, the C-H presenter laid great stress on the participation of the editors on the advisory committee and as consultants. This can have an impact on the TEI's pub- lic image, particularly when the presentation is poor. LB pointed out that any poorly presented project which advertises its relation with the TEI would have the same effect. After discussion, MSM suggested that this situation could be attacked in several ways: * the editors could ask Chadwyck-Healey not to mention the TEI in their advertising * the editors could sever their relations with C-H * the editors could endeavor to ensure that C-H presenters gain a bet- ter grasp of the technical aspect of the encoding 7 7 FUTURE OF THE TEI Funding Enquiries AZ has made progress with the EEC, and says prospects are encourag- ing, but nothing has been signed yet. NI has not spoken to NEH; SH is uncertain whether we should count on NEH funds, since they have already been so generous. DW has spoken with NN about getting money from the Linguistic Data Consortium, and to DARPA -- but is rapidly approaching the point at which a firm formal proposal must be made. DW spoke with Syun Tutiya in Myrdal; he will know in January whether a proposal for $100,000 from the Ministry of Education will be available in April, mainly to cover travel and conferences. It was not clear whether this was for Japanese travel to meetings elsewhere or for inter- nal Japanese meetings. In February, a proposal is due with the Education Ministry for 500 M yen ($3,000,000) over three years under a program for international joint research. This program is in response to the NSF/DARPA stimula- tion. Like DARPA and ESPRIT this will cover many things, not just TEI; it represents a framework within which TEI support may be gained. In addition, Toshio Yokoi (head of Electronic Dictionary Research project) is preparing a MITI proposal on large-scale knowledge-bases using text as the storage format (as opposed to CYC, which they had found disap- pointing). Yokoi's funds have supported all the cooperative travel so far. Members of the TEI Japan Committee include Nagao, Yokoi, Syun Tutiya and _. Kameda, as well as about 25 others. The Japanese seem very serious, judging by the amount of work they have done so far. Yokoi has explicitly envisaged the possibility of contributing finan- cially to the funding of TEI core activities. The EEC call for tenders for the Linguistic Research and Engineering Framework program specifically mentions the TEI; the TEI does not intend, however, to seek funds specifically under this line. SH and LB observed that proposals under this line must come from partners in dif- ferent European countries, ideally with an industrial partner. MSM asked whether Pisa and Oxford could not apply under this line as part- ners to coordinate the European participation in the TEI. This was felt not to be possible. SH felt very strongly that the TEI must formulate a coherent plan of work and decide where to seek money for different activities. Proposals to individual funding agencies should be made in the larger context, and not ad hoc, as our previous proposals have been. MSM summarized the discussion thus: we will need long term funding for ongoing work, and believe we need a coherent central plan for that. A coherent central plan will not be possible before mid-1992, leading to funding in 1993. The consequence is that right now we are looking for short-term bridge funding to get us through 1992-93. If the EEC funds come through and we can extend the NEH funding, we can possibly make it through that period even if we don't get further U.S. funding. SH and DW assigned to find out about deadlines and form of proposal for a U.S. contribution to the interim funding (DARPA, NSF). SH will also continue looking through the foundation indices for leads. To clear up MSM's confusion about the possible U.S. funding sources, DW reviewed the players: * Consortium for Lexical Research, NMSU, a funding sink not source * Linguistic Data Consortium, part of a program initiated by Congress and funded by DARPA, to support development of industrially impor- tant technologies; LDC is one of several consortia thus created, with the primary purpose of encouraging creation and processing of massive bodies of data in speech, text, and lexicon; grammar collec- tions somewhat less critical. LDC will not support research, though the data collected will support the aims of the DARPA research pro- gram. Mark Liberman heads committee to set up the structure. * National Science Foundation If Terry Langendoen is willing to continue, we should put in for his salary during this bridge period. DW felt that dissemination, education, and evaluation would be funda- ble activities; also coding oversight (data validation). Program of Work SH listed five specific areas needing further work: * linguistics * speech * literature * historians * physical description of copy text MSM suggested a number of possible activities for the TEI to be involved in, in the long run. As revised by the committee, these included: 1. review of proposals and regular publication of updates 2. formation of work groups and production of proposals 3. dissemination of copies and publication program, including work- shops 4. work with affiliated projects / exemplary applications; this is a continuation of AP work 5. validation of TEI conformance of data (possibly eventually as a publicly available service. 6a. software specification and evaluation 6b. software development LB thought that software evaluation should be very high on the list. Information about what is available, what it does and what can be done with it, and how to obtain it. This would need to be tied to work in software specification, along the lines of the test bed / test suite described in the Oxford meeting of ML. LB is also becoming less hostile to software development, especially in the areas where the TEI is most innovative: manipulation of f.structs, extraction of f.struct data for loading in databases, graphic display of overlap from timelines, etc. It was agreed that software- related activities were not ipso facto inappropriate for the TEI, though we will need to be careful about relationship with industrial vendors. DW noted that other kinds of ideas for long term activities might arise from the functional analysis needed for the program of work. For the interim period (1992-93), we could see these activities tak- ing the following specific forms: * The new work (item 2) could include the four activities already not- ed. LB noted that in all of these areas, JH's model of proposals presented to gradually wider audiences could be used in these, if it works in MSS. * Publication program: user training, workshops; tutorials, case books, collections of working papers. * Work with individual projects; DW argued that this should be linked to the development of data validation methods and measures; LB objected to continuing the current notion of the affiliated projects. It was agreed that the current affiliated projects should be discharged at the end of this cycle; any work with projects in the future could be done on a different basis. The long-term basis of such collaborative work must be carefully worked out; we need to monitor who is using the Guidelines and checking to see whether they are running into problems. Intractable problems found by the projects need to be submitted to the review group. SH suggested that for the interim period, the editors or other consultants should simply be allowed to request funds for specific consultation work. * Review group: SH will produce a plan in a paper for the next SC meeting. * Software will not be a priority for the interim period. SH asked whether we want to operate with the same structure and per- sonnel, or change. The SC will continue; should the editors continue, with their support? Should support to TL and SJ continue? Should sup- port be added for the other areas of new work? MSM expressed his willingness and desire to continue as editor in chief at least during the interim period, pending approval from his superiors. LB asked for a half-time person for TEI-related clerical work. If this is possible, and a better compensation can be arranged for OUCS, he is willing to continue to serve during the interim period. He would also like to discuss his position vis a vis the steering committee and the constitution of the steering committee; this was postponed so it could be discussed when the entire steering committee was present. 8 8 PUBLICATION OF P3 This topic was postponed from earlier meetings. DW urged that the question facing us was not merely a choice of commercial publisher, but the decision whether to use a commercial publisher at all. 1a. commercial publication -- over their imprint 1b. commercial publication -- published by X on behalf of 1c. commercial publication -- over our imprint 2. we handle distribution ourselves (as with P1) -- for free or for charge 3. we provide electronic distribution from a fileserver SH provided the following overview of the advantages and disadvantag- es. Commercial publication: provides a seal of approval. (DW noted that P3 will already have been reviewed extensively and a commercial publisher would not provide an imprimatur.) They will also handle pro- duction, design, distribution, and reviews. Drawback is that it will cost more and may be delayed in appearance. Publishers may also wish to constrain our other activities with the text. Distribution on our own part will make ongoing work for us, require handling of money, and will not necessarily penetrate to unexpected areas. Electronic distribution will require special negotiation with any print publisher and may make it harder to do print publication. LB sug- gested that the crucial parts for electronic distribution were the DTDs and the examples. On the basis of 8 years publishing Computational Linguistics as a contractor working with subcontractors, DW argued that self-publication was a more viable option. A chief advantage is in the price flexibility it gives us. There is very little delay in publication; recognizing that we have already had a very extensive review process. MSM observed that commercial publication would have the advantage of getting notice of the publication to the library community. DW argued that through the advisory board we will have better access to individu- als, and that the inferior access to the institutional market does not matter. The topic cannot be decided at this meeting; what is essential is that a clear summary of the arguments be provided to all members of the steering committee. DW pointed out also that loose-leaf publication and revisions will be harder for a commmercial publisher. LB pointed out that one partial solution to this would be to divide the object. 9 9 DATE OF NEXT MEETING The date for this was fixed at the previous meeting: January 30-31; in Pisa or alternatively in New Jersey. 10 10 ANY OTHER BUSINESS SH suggested that in addition to summarizing the contents of P3, the TEI session at ALLC should also give a presentation from David Robey, Tom Corns, and Elli Mylonas, to bring the TEI to the literary community. LB objected to focusing too much on any one part of the Guidelines. DW asked whether anyone other than the editors needs to be brought in to be responsible for the case book or the tutorials. MSM reported that he had received a request to perform consulting work on an academic project; the committee had no objection and left the decision to MSM. LB reported that he is acting as a technical consultant for Oxford Electronic Publishing's library of electronic text. This is done on his own time. LB reminded the committee that the historians asked us to consider publishing their book. Draft February 24, 1992 (16:37:42)