From LISTSERV@LISTSERV.UIC.EDU Wed Sep 1 17:43:19 1999 Date: Wed, 1 Sep 1999 11:21:27 -0500 From: "L-Soft list server at University of Illinois at Chicago (1.8c)" To: Lou Burnard Subject: File: "EDW34 DOC" Trip Report: Modern Language Association ED W34 (check document number) C. M. Sperberg-McQueen 30 December 1992 The Modern Language Association convention, held annually from the evening of December 27 through midday on December 30, took place this year in New York City; arriving too late for some sessions I would have liked to attend, and leaving too early for others, I was able to hear the talks at only two sessions relevant to computing in the humanities. First was the panel discussion on Electronic Archives, chaired by James Sosnowski. Here, I was called upon to report, in place of Nancy Ide (who could not make it to New York, being confined by a hostile fate to the south of France), on the progress of the Text Encoding Initiative. After a rapid review of the TEI's form and goals, I discussed the Babel-like confusion of our markup languages and distinguished the semantic divergence of languages, with which the TEI can help only within narrow limits, from their formal and syntactic divergence, for which the TEI or any standard syntax can provide a more complete cure. Finally, I described some preconditions for a successful growth in the importance and utility of electronic archives, stressing the need for greater commitment to the reuse of electronic resources, the importance of firmly separating data formats from the software which manipulates them, the opportunities for software development, and the potential benefits of agreement on a common language for text interchange (for which the TEI should be a candidate too obvious, I hoped, to need mention). Susan Hockey followed with a lucid description of the Center for Electronic Texts in the Humanities (CETH) and the ways she hopes to work to make texts more available and more useful: by creating e-texts, by cataloguing them, by working on storage and retrieval mechanisms and mechanisms for access to texts both over networks and for non-networked users, and by helping to encourage standard procedures for handling e-texts in library contexts. She noted that even the most sophisticated markup will not fully replace the visual image of the source text and proposed that in the long run texts should be stored both in TEI-conformant SGML transcriptions and in digital images of the source text. Reviewing current and future activities at CETH, she reported that the Rutgers Inventory of Machine-Readable Texts in the Humanities has provided a useful, if partial, census of electronic texts, as well as clarifying hitherto obscure issues in cataloguing of electronic texts (as opposed to survey data or other social science data sets). At present, MRTH information shows us that about 25% of electronic texts have been created by, and are available, from, individuals or small projects; another 70% are maintained by large institutions such as those at Nancy in France or Pisa in Italy. Many of these large institutes are at work on historical dictionaries for which their collections are to provide citations, and many are now expanding their work into corpus linguistics. Only 5% of extant e-texts, by contrast, are in libraries; that number will grow. To make these texts usable, however, will require improvements in retrieval software, which has not, SH lamented, moved substantially forward in thirty years: retrieval software is still largely limited to searching for strings in the text, even though for many applications what are required instead are lexical items, or concepts. Word-class annotation, sense discrimination, and other analysis will be necessary to allow searches to distinguish between, for example, 'lead' the metal, 'lead' (a leash), and 'lead' the verb. To speed such endeavors we must plan for further work with electronic dictionaries and work to build lexical knowledge bases. To achieve better access for readers and researchers will require standard formats (i.e. the completion of the TEI) and a lot of further thinking about methods of network distribution. Ian Lancashire spoke next, on "The Public-Domain Shakespeare," in which he reviewed the various versions of Shakespeare available in electronic form, including numerous versions of the Folio and early Quarto texts --- which however include only 64 texts of the canon, leaving 133 early printings yet un-electrified. He drew attention to the commercially available texts (some based on well known reputable editions, some not) as well as the many texts available from the Oxford Text Archive, and dwelt some time on the various methods adopted by encoders in handling such things as textual variation, archaisms of spelling, and corruption in the text. Many available texts contain editorial emendations, which IL curiously contrasted with "Shakespeare" pure and simple, to the detriment of the emenders. The opposition struck me as very odd, since "Shakespeare" pure and simple is precisely what the editors, by emending, are claiming not to be represented by the early printings. Unless IL wished to claim that the early printings of Shakespeare all represent the author's ipsissima verba, it seems dangerously misleading to refer to the early printings as "Shakespeare" and attempts to correct their defects as merely "editors". Phrased differently (the e-texts contain editorial interventions instead of simple reproductions of the sometimes dubious early printings), the point is valid and important but ceases to prejudice the case against the editors. IL's talk was supplemented by a handout, of which I was unable to secure a copy. This in itself signaled a happy turn of events: though he came prepared with a number more than ample for the usual turnout at sessions on electronic text, IL rapidly ran out of handouts in a crowd of over fifty people. The talk provided a usefully concrete supplement to the broad, sometimes vague, generalities of the other speakers. The final speaker of the panel was Elaine Brennan, the assistant director of the Brown University Women Writers Project, who suffered, she claimed, the fate of so many fourth speakers in finding her major points already raised by previous panelists. She began by briefly describing the beginnings of the WWP as an attempt to create a (print) anthology of extracts from early women writers in English, which moved to full-text transcription when it became clear that most of the audience wanted not snippets but entire texts from which to make their own selections. Wistfully, she recalled the slightly later but still very early notion that the WWP could create electronic versions of everything written by women in English before 1830. "Not in my lifetime," she said, "but I'd like to make a good start at it." The amount of material falling within the WWP's area of interest, and the degree to which it is still terra incognita, poses a problem for the WWP. How can we provide access to a large number of writers now almost wholly unknown? It is not enough simply to put them all on a disk, without guidance for a reader, without software to help browse or troll for texts and passages of thematic, stylistic, historical, or other interest. By contrast with Shakespeare, well enough known and thoroughly enough studied to be useful in electronic form even without special finding aids, women's writing --- like any 'new' literature --- poses sharp challenges for indexing and retrieval software. The volume of material also makes conventional publication impossible: the book market could not possibly absorb print editions of all these writers in any reasonably short period; most thus have no immediate commercial potential. Network or other electronic publication seems the only plausible solution, but though they can help us evade the problems of market forces, they do nothing to diminish the high costs of data entry, text markup, and proofreading. The WWP and similar projects must thus perform a sort of scholarly triage, trying to find the correct balance of activities to benefit scholarship as a whole. Unfortunately, the software needed to turn the resulting decisions into practice is simply not here yet; EB mentioned browsing, bibliographic control, and text distribution as posing particular problems for which extant software is not yet what it should be. She ended by asking the audience what it wants and needs in the way of access and work with the texts. Whether it is to select twenty-three poems to teach next month in a class on Romanticism, or to download a large verse romance to compare it to Sidney, the best part, she said, is yet to come. The audience responded to EB's query and the invitation of the chair with a vigorous and useful discussion in which CD-ROM publication, the danger of marginalizing writers by publishing them only electronically, and, once again, the need for more and better software all put in their appearance. The second session I was able to attend was the first of two organized by the Association for Computers and the Humanities under the common title "Signs, Symbols, Discourses: A New Direction for Computer-Aided Literature Studies." The ACH had invited responses to Mark Olsen's critique, at last year's MLA, of the methodology of computing in the humanities, with these sessions as the result. The first paper, by Gina Greco and Peter Shoemaker, erected a model of textual variation and intertextuality on the grounds of a "poetics of process," itself grounded in the combination of reproduction and revision characteristic of medieval scribes, glossators, and florilegists. I regretted the over-relaxed lunch which caused me to miss the opening minutes of the paper. The second paper, by Ellen Spolsky, dismissed Olsen's statistical model as tainted by its statistical basis with the traditional sins of stylo-statistics, which ES felt had been permanently refuted by Stanley Fish in his acerbic critique of stylistics. To replace the statistical model, ES proposed an analogy with a recent theory of human vision developed within cognitive science, which posits two distinct types of visual processing and memory: a "two-and-a-half-D" processing which effectively interprets the retinal image but is not stored permanently, and a "3-D" vision which is not observer-centered but object-centered and which does remain in memory. She suggested, plausibly enough, that like human vision, human writing might be "double-coded", with the different codes fulfilling different functions. She drew attention to the distinction between public and private language, or between dominant discourse and écriture féminine, asking whether the latter was just the difference between publicly negotiated language and personal language. From the ideas of Popper and Gombrich on competing hypotheses she developed a notion of finding stylistic individualism (traits unique to an author) within larger patterns, but the details of the link to Popper and Gombrich escaped me. The discussion of the two forms of visual processing was fascinating, but since ES declined to hazard any guess even at how many distinct "codes" might be active in writing, and left completely unspecified what she meant by "code" in that context, the analogy to vision seemed to remain a mere analogy, rather than a model of stylistics with binding consequences. ES clearly has thought deeply about stylistics, and about modern critical theory; her paper almost made me regret my inability to follow her argument in the passages where the technical terminology and jargon were clustering most densely. Mark Olsen, in his rejoinder, acknowledged the justice of his critics' points, but pointed out some difficulties arising from their counter-proposals, notably the difficulty in distinguishing between "public" and "private" discourse in practice; ES replied with a clear but irreproducible test: when you don't understand what the other person is saying, they are [may be?] using private language. The book exhibits were, as usual nowadays, relatively poor in computer-related displays. (Gone are the days when IBM took three booths at MLA to show new hardware and software!) But Chadwyck-Healey were showing both the first installment of their English Poetry database and a pre-release version of the Patrologia Latina, both using a much-modified version of Dynatext as the user interface. The screen was crowded with icons, but after a couple minutes I got the hang of it and was able to do searches on my own. I was the first visitor, however, to ask whether the SGML-tagged form of the text could be displayed on the screen; the answer is apparently "no," and to inspect the markup we had to print the SGML version to disk and read the file. Among the brochures on Chadwyck-Healey's booth was one for a Database of African-American Poetry, 1760-1900, on CD-ROM, which seems to be modeled very closely on the methods used for the English Poetry product. The corpus is defined by an existing bibliography of the subject (in this case William French's Afro-American Poetry and Drama 1760-1975) and the markup will once more be TEI-influenced SGML. (Price: $3400, $290 pre-publication.) Another brochure caused a little flurry for me: Spanish Women Writers 1500-1900. But it turns out to be a microfiche project, not an electronic one. I went to the Oxford University Press booth in hopes of seeing the OED CD-ROM, having seen (illegitimately, I heard later) a pre-release beta version last year, but the only computer in sight had only a fragment of Wuthering Heights on the screen, in Word Perfect (with percent signs around the italics? can one believe one's eyes?), and posters for Post-Modern Culture, which OUP is now selling on diskette, presumably to those with no Internet access and to libraries who cannot bring themselves to acquire something for their collections unless they pay someone for it. No sign of the OED, sorry to say. (I am told there were flyers for it, but I didn't see them.) Because I was there so briefly, I did not go through all the exhibits carefully and thus cannot report on what OCLC has done with the MLA bibliography (I thought about it, but decided my heart might not be strong enough to stand the expected shock) or other similar items of potential interest. I did however manage to get to the McGraw Hill technical bookstore and buy some books to read on the plane home.