ML W3 C MSM - Comments on draft section of 28 April on attributes. -MSM
As far as I can tell at a glance, they apply equally to the draft
(numbered MLW4) of 8 May 1989.


If I understand the draft correctly, it advances six arguments against
allowing attributes:

1.  The use of multiple notations is more complex than the use of single
notations with the same expressive power and should therefore be banned.

2.  There is no universal unambiguous method for choosing between the
use of attributes and tags to express any given state of affairs.

3.  Criteria relevant to any distinction between attributes and tags
(e.g. "content" vs. "non-content") depend upon point of view.

4.  Tags are a more powerful notation than attributes and are therefore
to be preferred.

5.  The checking of attribute values or text element content for
validity belongs in the application.

6.  The attribute mechanism of SGML is flawed (by being too weak).

-------

These arguments aren't quite convincing to me, even where I see some
true point being made in them.

1.  The use of multiple notations is more complex than the use of single
notations with the same expressive power and should therefore be banned.

This does not seem to me an argument in favor of forbidding the use of
attributes in our DTDs.  By the same argument we can argue that in
sentential logic the Boolean operators IF-THEN and IF-AND-ONLY-IF should
be forbidden since they can be re-expressed using AND, OR, and NOT --
and then that AND, OR, and NOT should be outlawed in favor of the
Sheffer stroke (meaning NOT BOTH), since having multiple operators is
more complex than having a single operator.  We can also outlaw the
assignment operators =+ and =- and so on from C, eliminate the CASE
statement in C and Pascal, write all software in assembler (because it
is easier to parse) and return to the use of Latin for all learned
publications, since we could thus simplify our languages without
reducing expressive power.

The premise that DTDs are simpler without attributes seems shaky, too,
even leaving aside the question "simpler for whom?"  We can use tags
instead of attributes by using our attribute names as tag names instead
and defining an appropriate content model.  But is a content model like
((a?) & (b?) & (c?) & (d?) & (e?)) really easier to handle than a set
of five attributes a-e?

Example (1):  use of tags as substitutes for attributes.

    <!ELEMENT sampletag - - ((a?) & (b?) & (c?) & (d?) & (e?)) >
    <!ELEMENT a         - - ( #PCDATA ) >
    <!ELEMENT b         - - ( #PCDATA ) >
    <!ELEMENT c         - - ( #PCDATA ) >
    <!ELEMENT d         - - ( #PCDATA ) >
    <!ELEMENT e         - - ( #PCDATA ) >

Example (2):  use of attributes.

    <!ELEMENT sampletag - - (#PCDATA) >
    <!ATTLIST sampletag a (CDATA) #IMPLIED
                        b (CDATA) #IMPLIED
                        c (CDATA) #IMPLIED
                        d (CDATA) #IMPLIED
                        e (CDATA) #IMPLIED >

2.  There is no universal unambiguous method for choosing between the
use of attributes and tags to express any given state of affairs.

This observation is a corollary of the fact that the two notations
have (roughly) the same expressive power.  It is true.  But I don't see
how one can infer from it the conclusion "therefore attribute notation
should be forbidden".  The strongest conclusion I can derive from it is
"Therefore, the choice must be made on non-universal and possibly
subjective grounds" -- i.e. the choice will be a design decision for
the working committees.

3.  Criteria relevant to any distinction between attributes and tags
(e.g. "content" vs. "non-content") depend upon point of view.

I agree that points of view will vary, and agree further that it is
possible for two possible points of view to diverge precisely in
thinking that something is or is not a natural attribute or a natural
component.  It seems to me to be an argument in favor of letting those
who are devising the tag sets decide, and not an argument in favor of
forbidding them to express in the notation the point of view they find
most attractive.

4.  Tags are a more powerful notation than attributes and are therefore
to be preferred.

The term "powerful" is undefined here.  I believe it must mean that
all DTDs with attributes can be translated into functionally equivalent
(almost) DTDs without attributes, but not vice versa; if so, then the
condition is true.  But the conclusion does not follow immediately:  why
prefer the more "powerful" notation and forbid the other?  Assembler is
more powerful in this sense than high-level languages, and use of the
GOTO is more powerful than structured programming.  (It is
straightforward to rewrite structured code using GOTO, but not all
programs using GOTO can readily be rewritten without it.  At most,
structured constructs are as powerful as GOTO, and a less economical
notation.)  All combinations of context-free grammars and regular
expressions can be rewritten as context-free grammars, but not vice
versa.  Should people refrain from using regular expressions?

5.  The checking of attribute values or text element content for
validity belongs in the application.

I am not sure what to make of this argument.  What we are writing *is*
an SGML application, is it not?  If what is meant is that the checking
of data validity belongs in the semantic actions and not in the parser,
I do not understand the reasoning, or if I do, then I am inclined to
disagree.  Constraints on data values belong where they can be most
elegantly expressed and most conveniently checked--where their
expression sheds the most light on the nature of the textual structures
being marked.  I believe we should prefer the judgements of the
designers of the tag sets to any a priori claim of this kind.  Myself, I
am inclined to believe that if data validity can be checked during
parsing, it should be.  But I'm unwilling to try to make that a general
rule, any more than the opposite rule that it should never be.

6.  The attribute mechanism of SGML is flawed (by being too weak).

This is an argument for extending the attribute facility of SGML but not
for forbidding its use where the existing mechanism is adequate.  We can
surmise, from the limitations of the attribute datatype checking, that
attributes will not always be an attractive option for the other working
committees; I don't see that they give us any reason to forbid
the use of attributes even in cases where the other working committees
do find them useful.

-------

General:  The common theme of the arguments appears to be that
attributes are not necessary in any strict sense.  The lack of
necessity, however, does not constitute a reason for forbidding them.

Often I see some truth in SM's arguments, and can sympathize with them,
without feeling that they should be controlling considerations.  It
isn't clear to me whether my arguments make even that amount of sense to
her and the rest of you.

I believe that attributes provide a superior notation for some textual
features and simplify processing more than they complicate it.  I am not
convinced that they complicate processing appreciably at all.  In any
case I believe that the utility of attributes to the user (especailly in
allowing a distinction between constituents and features) should weigh
more heavily than convenience in making parsers or processors.  This is
especially true since I believe most users of the TEI guidelines will be
using full SGML processors rather than writing processors from scratch,
and full SGML processors already support attributes.

I also believe that attributes make the analysis of document types
and the preparation of DTDs easier and more reliable, not harder and
more complicated, in the same way that sentential logic is easier to
do with AND, OR, IF-THEN and NOT than with the Sheffer stroke alone.
If I am wrong in believing this, I believe the working committees can
be trusted to avoid attributes as needless complications.

In sum, I see no reason to forbid attributes, and one very good one
(elegance of notation) to allow them.

P.S.  About the combinatorial explosion.  SM is correct that the method
she describes on 28 April can control the combinatorial explosion of
tags entailed by the method for eliminating attributes which had been
described in document MLW2, the second draft of the syntax document.
This I pointed out in the preface to my comments of 23 March.  I did not
bother to rehearse yet again my arguments against this approach, since I
had done so already on February 9.  (In brief:  number and tense are
features, not component parts, of a verb, and to represent them as
components is to distort instead of displaying clearly the nature of the
data.)