Date: 22 December 1987 16:14:23 CST From: "Michael Sperberg-McQueen" To: "Robert A. Amsler" , cc: "Nancy Ide" 914 452-7000 x2478 Subject: Quick reactions to your comments Thank you for your comments on the minutes. I will save any fuller response for a general discussion among the steering committee, but wanted to respond to just a couple of points. In some cases your comments coincide with points made (sometimes at length) in the discussion in Pisa, but which were not mentioned in the draft minutes I circulated; in a couple I believe I agree with your premises but not with your conclusions. 1 You observe that many computational linguists are more interested in contemporary than in historical texts, in utilitarian rather than high-culture texts. I confess to finding this disheartening, since I believe exclusion of literary texts leads inexorably to a distorted view of language and language use -- but I will accept your description as a statement of fact about many computational linguists. (But tell me: are historical linguists excluded from the term 'computational linguistics' by definition? The historical linguists I have known all dealt perforce with 'literary' texts among others.) But your conclusion ("there might be more interest in the dictionary standard than the rest of the humanities standard") does not seem to follow. Are student essays structurally so different from Ruskin's essays that a tag set developed for one is irrelevant to the other? I think the issues of method raised by 'literary' texts are likely to recur for 'non-literary' texts; to be sure, there will be some divergence in the tag sets for different document types, but that will be the case no matter what. As I said a few weeks ago on humanist: we can all go our separate ways so as to ensure that the various tagging schemes we use are incompatible and we cannot easily use each others' texts, or we can work to arrive at some consensus. It will not surprise me for interest in the various problems of the guidelines to be unevenly distributed among the members of the various societies. But I rather hope that the computational linguists will recognize that their interests here really do coincide with those of the larger text-analysis community. And that they (and ACL) will be willing to participate actively in the work as a whole even if their main interest is in dictionary formats. 2 "Interest" in participating ^= "funding". Very true, and noted. 3 Checking the Chicago Manual of Style for other topics to entrust to committee 2. Excellent idea. 4 About spoken texts and reference works. The crucial task in delineating the tasks of committees 2 and 3 was not to achieve logical consistency or elegant analysis, but only to get a line drawn which, pragmatically, enables the work to proceed. Within the domain of written texts, the distinction 'objective'/'subjective' is clearly hopeless. Even if it can be drawn, it would leave most of the hard work to committee 3. The distinction drawn, based on what is conventionally represented on the graphetic level, is flexible, easier to apply than the other, and provides a better balance between the loads of the two committees 2 and 3. The *special* problems presented by transcripts of spoken language clearly belong, by this rule, in committee 3: what can be said to be conventionally represented by the typesetting is by definition not a special problem of spoken texts. Pragmatically, too: the problems associated with spoken texts connect intimately with more general problems of linguistic analysis, discourse analysis, phonology, and so on. They connect not at all with most of the problems of committee 2. Dictionaries and encyclopedias were given to committee 3 on purely pragmatic grounds: their special problems arise because they are of particular interest and use in certain kinds of *analysis*. Whereas many scholars will be perfectly content with a Shakespeare (for example) encoded only with tags from committee 2, most people interested in dictionaries and encyclopedias are also interested in things like parsing, knowledge representation, and so on. That means whoever works on dictionaries and encyclopedias will have to have very intimate cooperation with committee 3 -- whereas they will need less close contact with committee 2. Above all, please do not encourage the notion that assignment to the province of committee 2 is somehow an honorific, assignment to committee 3 a pejorative, classification of a text type. 5 Committee 3 clearly does have several different tasks. You mention this, which seems to imply a certain skepticism, but