1. The Problem of Theoretical Diversity
Textual features, their attributes, and their structural relations
cannot be postulated in a conceptual vacuum. They require some
theoretical basis. In many fields (e.g. classical metrics) scholars
agree on many or all of the pertinent features and their
characteristics; in others, several divergent theories posit different
sets of textual features; in still others, scholars disagree without the
theoretical bases of the disagreement becoming visible.
In the extreme cases, theoretical diversity poses no problems for the
working committees: on the one hand, a clear scholarly consensus can
readily be translated into a single list of textual features, while on
the other a lack of theoretical clarity will make it virtually
impossible to elicit any consensus as to the textual features at stake,
and no tag set can be developed at all.
The middle case, however, presents the working committees with a
delicate problem.
2. Harmonization of Theoretical Conflict
At one extreme, tags might be provided only to represent the features
tagged by a given theory, without any consideration given to their
relation to similar features used by other theories. At the other
extreme, a tag set might express a consensus among representatives of
various theories and provide a “theory neutral” or
“poly-theoretical” notation for the expression of analytic results.
Such harmonization or resolution of theoretical diversity takes place
over a set of “systems.” The universe of systems to be considered
comprises:
- different theories current in the field
- different practices current in the field
- schemes used in the various corpora relevant to the field
Six levels of theoretical harmonization can be specified; work may, but
need not, progress through the six levels sequentially. In developing a
tag set to encode the analytic results of a field, a number of
possibilites exist:
Choice of a single theory:
provide tags for a single system, ignoring the others in the field.
Pluralism (informal):
elicit, for each system, a full description of the system (as it applies
to text encoding), including
list of features
examples of each feature
structural properties of each feature (especially combinations
with other features and the like)
test criteria for the recognition of the feature
formalizations available for the system
This will be possible for different theories in different measure.
Pluralism (formal):
generate an SGML formalization for the feature set of each system.
If SGML syntax does not suffice, the metalanguage committee must be
asked to consider or develop extensions to handle the recalcitrant
features.
Note that at this low level of formalization, different theories will
have separate and incongruent tag sets. The meaning of any tag will
be defined only in natural language, and users of different theoretical
orientations will be responsible for any translation into their own
terms. The same generic identifier might be used for textual features
postulated by different theories, and thus be ambiguous when viewed in
isolation. The two following approaches handle this ambiguity
differently.
Eclecticism:
define a single tag set created by the union of all system-specific
tag sets, eliminating ambiguity by giving each theoretically distinct
textual feature a unique generic identifer (“tag”). Users of the
scheme will be expected to tag some subset of the features in the set,
mixing and matching as they wish.
Controlled semantics:
eliminate ambiguity of generic identifiers by the explicit formal
definition of the linguistic and computational meaning of the tags.
Different usages of the same term must at this level be reduced to a
finite list of questions with enumerable sets of possible answers.
Ultimately, of course, the meaning of these questions and their answers
will be expressed in natural languages, so that this cannot amount to a
full specification of meaning. The formal system, however, will be
constrained more fully by these formal definitions than at the lower
levels of conflict resolution.
.*
[
1]
Now, for any given finite list of questions, with well-defined ranges
for each answer, a set of SGML tags and attributes can be created which
specifies answers to each question. An SGML processor could then
ascertain that answers are specified for each question (although it
could not necessarily check the consistency of the answers with the
actual practice in the encoding).
Polytheoretical consensus:
provide the smallest possible single set of features which includes as a
subset each set of features used by an existing system, or else define
explicit mappings from the tag sets provided for one system into the tag
sets provided for another. Unlike an eclectic tag/feature set, a
polytheoretical set avoids all redundancy and is expected to be used
as a unit, with all features tagged, so that a text tagged with such a
set is useful to researchers of widely varying theoretical persuasions.
Each working committee must decide on the basis of its own knowledge how
much theoretical harmonization is possible in a given area. The working
committees should strive for the highest level of harmonization they
believe feasible, given the theoretical climate and the resources at
hand.