Trip Report:
International Conference on Very Large-Scale Knowledge Bases and Knowledge
Systems, Tokyo, 1-2 December 1993
Workshop on Very Large-Scale Knowledge Bases and Knowledge Systems, Tokyo,
3-4 December 1993
SGML '93, Boston, 6-9 December 1993
SGML Open Technical Committee meeting, Boston, 10 December 1993
C. M. Sperberg-McQueen
December 1993
This trip report must be too brief to do justice to the
meetings I attended, but perhaps I can record some points of interest.
The International Conference on Building and Sharing of Very
Large-Scale Knowledge Bases '93 was organized by the Japan Information
Processing Development Center, with the avowed hope that it will be followed by
other similar conferences held elsewhere. Some speculation around the conference
held that it was intended as a prelude to a large-scale knowledge base project
intended to be a successor to the Fifth Generation and Electronic Dictionary
Research projects, a connection subtly conveyed by the keynote addresses, given
by by Kazuhiro Fuchi of the Fifth Generation Project and Toshio Yokoi of the
EDR.
Most notable to me, in Yokoi's keynote address, was the
emphasis on problems like "Making information machine-readable" and "Making
information hyperstructured", for which SGML appears to offer an important tool
for future work. I was also interested by the stress on reusability and
modifiability of the knowledge representation methods used in building
large-scale knowledge bases, since many of the concerns expressed mirror those
of the TEI. As the conference wore on, I began to realize that for this
audience, at least, the simplest way to describe the TEI's product is as a 'text
representation language', analogous to a 'knowledge representation language.'
(This notion then invites speculation on how the SGML and TEI communities can
develop better languages for modeling, querying, and manipulation of texts.)
During the first morning, Norio Fujisawa gave an outline of a
Platonic view of 'knowledge' which provided a high standard of clarity and a
refreshing background for consideration of what everyone else was talking about.
"Plato, however, clearly denies the information stored in a book (or in a
computer) to be in itself true knowledge." Most interesting was
Fujisawa's objection to the usual subject-predicate, substance-attribute method
of expressing propositional content; his remarks could not help reminding me,
however, of an imaginary language described by Jorge Luis Borges in which no
nouns exist and the only open lexical class is that of verbs, precisely in order
to avoid the use of subject-predicate patterns.
There followed a survey of current language technology, in
which Makato Nagao gave an informative survey of NLP work, and Susan
Armstrong-Warwick spoke of problems involved in the acquisition and exploitation
of textual resources for NLP. SAW rather stressed the difficulties of acquiring
permissions, and downplayed the fundamental problems of data representation,
which rather disappointed me, but otherwise I liked her talk quite well.
A session on "Sharable Knowledge Sources" allowed Antonio
Zampolli to list the bewildering variety of European projects aimed a producing
same, and Susan Hockey to describe the TEI and its relevance for sharable
materials. In the same session, Douglas Lenat spoke about the Cyc project,
providing (inter alia) a virtuoso set of reasons for not publishing a lot of
papers about a project, including the unanswerable one that getting the project
done is more important than publishing papers about it. This endeared him to my
heart enough that I was able to forgive him when he, too, radically understated
the amount of information and sophistication present in properly done
representations of textual material.
The concluding panel included a great deal of wise speculation
about the future of knowledge bases, which I cannot summarize if I am to get
this report out today. Its most memorable moment, for me, came during the
question period, when Hisao Yamada (of the National Center for Science
Information Systems of Japan) said everything everyone had said seemed to be
floating in the air, because all the techniques they were talking about were
suitable for Western languages, but not for scripts which use Chinese
characters. Knowledge representation languages, SGML, the TEI were fine for
alphabets, but have not addressed the problem of writing in kana, let alone
kanji. I had risen to reply, but the chair recognized Prof. Yokoi, who spoke at
some length about the fact that the TEI had in fact addressed the character set
issue head on, thanks in large part to the work of Prof. Tutiya, and that
continued collaboration with the TEI was an obvious desideratum for Japanese
work in document and natural language processing, for which funding was being
actively sought. Having nothing to add (and indeed not wanting to spoil the
moment), I sat back down without speaking.
During the conference, several members of the steering
committee met with several representatives of the Japanese research
establishment; a memorandum of the discussion at this meeting will be
distributed separately.
The next two days were occupied by a workshop on the same
topic as the conference, but somewhat less formal and limited to 60 people,
instead of the 450-odd who had been attending the conference. There were a
number of good talks, as well as some rather disappointing ones, but the most
important results of the meeting for me lay in the personal contact made. I
believe that at least Tim Finin of Univ. Maryland/Baltimore County, who is
working on a language for distributed knowledge querying, now has a stronger
interest in SGML and the TEI as a basis for text representation, which should
fill a gaping hole in the language he is designing as it now exists. Also
notable was the interest in SGML from the database people, including separate
invitations from Joachim Schmidt of Hamburg and A. Desai Narasimhalu of
Singapore to consider collaborating with them, and/or using their software
systems as a basis for work with SGML. Schmidt is developing object-oriented
dbms for objects of a polymorphic type system with variable persistence;
Narasimhalu and his colleagues have already developed an SQL-based Document
Query Language (which was presented at SGML '93) and, in conjunction with
Fujitsu, an SGML system built on a dbms foundation.
On the final morning of the workshop, Syun Tutiya arranged for
me to have breakfast with several representatives of East Asian countries,
including China, Korea, Malaysia, and Thailand. We exchanged compliments and
expressions of mutual interest, and I explained a bit about TEI character set
handling, stressing that the definition of TEI conformance has no component
governing character set usage, so that TEI conformance is possible no matter
what character set mechanisms one is using; this seemed to reassure the Thais in
particular. ST expressed some interest in getting people from other countries in
East Asia to attend a TEI workshop in Asia sometime in 1994, and even suggested
that perhaps it should be held elsewhere in Asia, rather than in Japan. Our
colleague in Bangkok (Vilas Wuwongse of the Asian Institute of Technology) later
privately offered assistance with organizing such a workshop in Bangkok.
From Tokyo I went directly to Boston, where the Graphic
Communications Association sponsored its annual SGML conference, and where Lou
Burnard and I made a TEI sandwich of the meeting by giving, respectively, the
opening and the closing keynote addresses. Lou managed, by focusing on the
acccomplishments of the TEI in using SGML, to clarify the direct relevance of
the TEI to other SGML projects in a way that previous talks on the TEI succeeded
in doing; focusing as they often have done on the peculiarly difficult problems
posed by some older texts, some of our earlier presentations clearly left some
people with the idea that the TEI was all about medieval manuscripts, and had
nothing to do with problems like modularity of DTDs, class systems for SGML
elements, version control, and the like.
As usual, the conference was full of interesting talks and
attended by many interesting people; probably the most important items to note
in this report are these:
- Charles Goldfarb and Erik Naggum announced the release, in the first
quarter of 1994, of a C++ library of SGML parsing routines and a 'portable
object-oriented entity manager' (POEM), implemented by a consortium going
under the name of Project YAO. This library should make it easier to embed
SGML awareness in processors other than SGML parsers, and POEM should make it
easier to use external entities other than files in SGML documents.
- A talk by Dave Sklar of Electronic Book Technologies, and a panel of
implementers, on the subject of SGML transformation engines, showed quite
clearly that the problem of text manipulation is receiving a good deal of
attention. The sample problems, and their solutions using the SGML Hammer
(Avalanche), OmniMark (Exoterica), Balise and Polypus (AIS/Berger-Levrault),
TagWrite (Zandar), and CoST (the Copenhagen SGML Tool, a non-commercial tool
written by Claus Harbo, rather in the style of Lou Burnard's Spitbol-based tf
filter program) should all be posted on comp.text.sgml, and may be collected
and published in a journal of some kind.
- Word Perfect appeared for the first time among the vendors; alas, I never
did see the demo. But they were there.
- Several vendors showed database-oriented projects, most (but I think not
all) based on an underlying database technology (rather than on the left-right
scanning characteristic of most existing SGML products).
The day after the meeting concluded, the vendor consortium
SGML Open held a full day of meetings, which Lou and I attended on behalf of the
TEI. According to Yuri Rubinsky, the TEI will be an affiliated organization,
which gives us the same rights and privileges as a corporate associate member
(i.e. somewhat less than the rights of a corporate sponsoring member, and much
more than a simple subscriber). Notably, we will have the ability to participate
as members in the technical committees of SGML Open, though not the ability to
vote on the adoption of committee reports by SGML Open. I was impressed into
service on a working committee to address character set issues, but managed to
persuade Steve Edwards of Recording for the Blind to chair the committee, and to
persuade the committee to accept chapters CH and WD as representing the TEI
position on character sets.
C. M.
Sperberg-McQueen