From LISTSERV@LISTSERV.UIC.EDU Wed Sep  1 17:43:19 1999
Date: Wed, 1 Sep 1999 11:21:27 -0500
From: "L-Soft list server at University of Illinois at Chicago (1.8c)"
    <LISTSERV@LISTSERV.UIC.EDU>
To: Lou Burnard <lou.burnard@COMPUTING-SERVICES.OXFORD.AC.UK>
Subject: File: "EDW34 DOC"

               Trip Report:  Modern Language Association

                     ED W34 (check document number)
                         C. M. Sperberg-McQueen
                            30 December 1992

The Modern Language Association convention, held annually from the
evening of December 27 through midday on December 30, took place this
year in New York City; arriving too late for some sessions I would have
liked to attend, and leaving too early for others, I was able to
hear the talks at only two sessions relevant to computing in the
humanities.

First was the panel discussion on Electronic Archives, chaired by James
Sosnowski.  Here, I was called upon to report, in place of Nancy Ide
(who could not make it to New York, being confined by a hostile fate to
the south of France), on the progress of the Text Encoding Initiative.
After a rapid review of the TEI's form and goals, I discussed the
Babel-like confusion of our markup languages and distinguished the
semantic divergence of languages, with which the TEI can help only
within narrow limits, from their formal and syntactic divergence, for
which the TEI or any standard syntax can provide a more complete cure.
Finally, I described some preconditions for a successful growth in the
importance and utility of electronic archives, stressing the need for
greater commitment to the reuse of electronic resources, the importance
of firmly separating data formats from the software which manipulates
them, the opportunities for software development, and the potential
benefits of agreement on a common language for text interchange (for
which the TEI should be a candidate too obvious, I hoped, to need
mention).

Susan Hockey followed with a lucid description of the Center for
Electronic Texts in the Humanities (CETH) and the ways she hopes to work
to make texts more available and more useful:  by creating e-texts, by
cataloguing them, by working on storage and retrieval mechanisms and
mechanisms for access to texts both over networks and for non-networked
users, and by helping to encourage standard procedures for handling
e-texts in library contexts.  She noted that even the most sophisticated
markup will not fully replace the visual image of the source text and
proposed that in the long run texts should be stored both in
TEI-conformant SGML transcriptions and in digital images of the source
text.  Reviewing current and future activities at CETH, she reported
that the Rutgers Inventory of Machine-Readable Texts in the Humanities
has provided a useful, if partial, census of electronic texts, as well
as clarifying hitherto obscure issues in cataloguing of electronic texts
(as opposed to survey data or other social science data sets).  At
present, MRTH information shows us that about 25% of electronic texts
have been created by, and are available, from, individuals or small
projects; another 70% are maintained by large institutions such as those
at Nancy in France or Pisa in Italy.  Many of these large institutes are
at work on historical dictionaries for which their collections are to
provide citations, and many are now expanding their work into corpus
linguistics.  Only 5% of extant e-texts, by contrast, are in libraries;
that number will grow.  To make these texts usable, however, will
require improvements in retrieval software, which has not, SH lamented,
moved substantially forward in thirty years:  retrieval software is
still largely limited to searching for strings in the text, even though
for many applications what are required instead are lexical items, or
concepts.  Word-class annotation, sense discrimination, and other
analysis will be necessary to allow searches to distinguish between, for
example, 'lead' the metal, 'lead' (a leash), and 'lead' the verb.
To speed such endeavors we must plan for further work with electronic
dictionaries and work to build lexical knowledge bases.  To achieve
better access for readers and researchers will require standard formats
(i.e. the completion of the TEI) and a lot of further thinking about
methods of network distribution.

Ian Lancashire spoke next, on "The Public-Domain Shakespeare," in which
he reviewed the various versions of Shakespeare available in electronic
form, including numerous versions of the Folio and early Quarto texts
--- which however include only 64 texts of the canon, leaving 133 early
printings yet un-electrified.  He drew attention to the commercially
available texts (some based on well known reputable editions, some not)
as well as the many texts available from the Oxford Text Archive, and
dwelt some time on the various methods adopted by encoders in handling
such things as textual variation, archaisms of spelling, and corruption
in the text.  Many available texts contain editorial emendations, which
IL curiously contrasted with "Shakespeare" pure and simple, to the
detriment of the emenders.  The opposition struck me as very odd, since
"Shakespeare" pure and simple is precisely what the editors, by
emending, are claiming not to be represented by the early printings.
Unless IL wished to claim that the early printings of Shakespeare all
represent the author's ipsissima verba, it seems dangerously misleading
to refer to the early printings as "Shakespeare" and attempts to correct
their defects as merely "editors".  Phrased differently (the e-texts
contain editorial interventions instead of simple reproductions of the
sometimes dubious early printings), the point is valid and important but
ceases to prejudice the case against the editors.  IL's talk was
supplemented by a handout, of which I was unable to secure a copy.  This
in itself signaled a happy turn of events:  though he came prepared with
a number more than ample for the usual turnout at sessions on electronic
text, IL rapidly ran out of handouts in a crowd of over fifty people.
The talk provided a usefully concrete supplement to the broad, sometimes
vague, generalities of the other speakers.

The final speaker of the panel was Elaine Brennan, the assistant
director of the Brown University Women Writers Project, who suffered,
she claimed, the fate of so many fourth speakers in finding her major
points already raised by previous panelists.  She began by briefly
describing the beginnings of the WWP as an attempt to create a (print)
anthology of extracts from early women writers in English, which moved
to full-text transcription when it became clear that most of the
audience wanted not snippets but entire texts from which to make their
own selections.  Wistfully, she recalled the slightly later but still
very early notion that the WWP could create electronic versions of
everything written by women in English before 1830.  "Not in my
lifetime," she said, "but I'd like to make a good start at it."  The
amount of material falling within the WWP's area of interest, and the
degree to which it is still terra incognita, poses a problem for the
WWP.  How can we provide access to a large number of writers now almost
wholly unknown?  It is not enough simply to put them all on a disk,
without guidance for a reader, without software to help browse or troll
for texts and passages of thematic, stylistic, historical, or other
interest.  By contrast with Shakespeare, well enough known and
thoroughly enough studied to be useful in electronic form even without
special finding aids, women's writing --- like any 'new' literature ---
poses sharp challenges for indexing and retrieval software.  The volume
of material also makes conventional publication impossible:  the book
market could not possibly absorb print editions of all these writers in
any reasonably short period; most thus have no immediate commercial
potential.  Network or other electronic publication seems the only
plausible solution, but though they can help us evade the problems of
market forces, they do nothing to diminish the high costs of data entry,
text markup, and proofreading.  The WWP and similar projects must thus
perform a sort of scholarly triage, trying to find the correct balance
of activities to benefit scholarship as a whole.  Unfortunately, the
software needed to turn the resulting decisions into practice is simply
not here yet; EB mentioned browsing, bibliographic control, and text
distribution as posing particular problems for which extant software is
not yet what it should be.  She ended by asking the audience what it
wants and needs in the way of access and work with the texts.  Whether
it is to select twenty-three poems to teach next month in a class on
Romanticism, or to download a large verse romance to compare it to
Sidney, the best part, she said, is yet to come.

The audience responded to EB's query and the invitation of the chair
with a vigorous and useful discussion in which CD-ROM publication, the
danger of marginalizing writers by publishing them only electronically,
and, once again, the need for more and better software all put in their
appearance.

The second session I was able to attend was the first of two organized
by the Association for Computers and the Humanities under the common
title "Signs, Symbols, Discourses:  A New Direction for Computer-Aided
Literature Studies."  The ACH had invited responses to Mark Olsen's
critique, at last year's MLA, of the methodology of computing in the
humanities, with these sessions as the result.  The first paper, by Gina
Greco and Peter Shoemaker, erected a model of textual variation and
intertextuality on the grounds of a "poetics of process," itself
grounded in the combination of reproduction and revision characteristic
of medieval scribes, glossators, and florilegists.  I regretted the
over-relaxed lunch which caused me to miss the opening minutes of the
paper.

The second paper, by Ellen Spolsky, dismissed Olsen's statistical model
as tainted by its statistical basis with the traditional sins of
stylo-statistics, which ES felt had been permanently refuted by Stanley
Fish in his acerbic critique of stylistics.  To replace the statistical
model, ES proposed an analogy with a recent theory of human vision
developed within cognitive science, which posits two distinct types of
visual processing and memory:  a "two-and-a-half-D" processing which
effectively interprets the retinal image but is not stored permanently,
and a "3-D" vision which is not observer-centered but object-centered
and which does remain in memory.  She suggested, plausibly enough, that
like human vision, human writing might be "double-coded", with the
different codes fulfilling different functions.  She drew attention to
the distinction between public and private language, or between dominant
discourse and &eacute;criture f&eacute;minine, asking whether the latter
was just the difference between publicly negotiated language and
personal language.  From the ideas of Popper and Gombrich on competing
hypotheses she developed a notion of finding stylistic individualism
(traits unique to an author) within larger patterns, but the details of
the link to Popper and Gombrich escaped me.

The discussion of the two forms of visual processing was fascinating,
but since ES declined to hazard any guess even at how many distinct
"codes" might be active in writing, and left completely unspecified what
she meant by "code" in that context, the analogy to vision seemed to
remain a mere analogy, rather than a model of stylistics with binding
consequences.  ES clearly has thought deeply about stylistics, and about
modern critical theory; her paper almost made me regret my inability to
follow her argument in the passages where the technical terminology and
jargon were clustering most densely.

Mark Olsen, in his rejoinder, acknowledged the justice of his critics'
points, but pointed out some difficulties arising from their
counter-proposals, notably the difficulty in distinguishing between
"public" and "private" discourse in practice; ES replied with a clear
but irreproducible test:  when you don't understand what the other
person is saying, they are [may be?] using private language.

The book exhibits were, as usual nowadays, relatively poor in
computer-related displays.  (Gone are the days when IBM took three
booths at MLA to show new hardware and software!)  But Chadwyck-Healey
were showing both the first installment of their English Poetry database
and a pre-release version of the Patrologia Latina, both using a
much-modified version of Dynatext as the user interface.  The screen was
crowded with icons, but after a couple minutes I got the hang of it and
was able to do searches on my own.  I was the first visitor, however, to
ask whether the SGML-tagged form of the text could be displayed on the
screen; the answer is apparently "no," and to inspect the markup we had
to print the SGML version to disk and read the file.  Among the
brochures on Chadwyck-Healey's booth was one for a Database of
African-American Poetry, 1760-1900, on CD-ROM, which seems to be modeled
very closely on the methods used for the English Poetry product.  The
corpus is defined by an existing bibliography of the subject (in this
case William French's Afro-American Poetry and Drama 1760-1975) and the
markup will once more be TEI-influenced SGML.  (Price:  $3400, $290
pre-publication.)  Another brochure caused a little flurry for me:
Spanish Women Writers 1500-1900.  But it turns out to be a microfiche
project, not an electronic one.

I went to the Oxford University Press booth in hopes of seeing the OED
CD-ROM, having seen (illegitimately, I heard later) a pre-release beta
version last year, but the only computer in sight had only a fragment of
Wuthering Heights on the screen, in Word Perfect (with percent signs
around the italics? can one believe one's eyes?), and posters for
Post-Modern Culture, which OUP is now selling on diskette, presumably to
those with no Internet access and to libraries who cannot bring
themselves to acquire something for their collections unless they pay
someone for it.  No sign of the OED, sorry to say.  (I am told there
were flyers for it, but I didn't see them.)

Because I was there so briefly, I did not go through all the exhibits
carefully and thus cannot report on what OCLC has done with the MLA
bibliography (I thought about it, but decided my heart might not be
strong enough to stand the expected shock) or other similar items of
potential interest.  I did however manage to get to the McGraw Hill
technical bookstore and buy some books to read on the plane home.