ML W3 C MSM - Comments on draft section of 28 April on attributes. -MSM As far as I can tell at a glance, they apply equally to the draft (numbered MLW4) of 8 May 1989. If I understand the draft correctly, it advances six arguments against allowing attributes: 1. The use of multiple notations is more complex than the use of single notations with the same expressive power and should therefore be banned. 2. There is no universal unambiguous method for choosing between the use of attributes and tags to express any given state of affairs. 3. Criteria relevant to any distinction between attributes and tags (e.g. "content" vs. "non-content") depend upon point of view. 4. Tags are a more powerful notation than attributes and are therefore to be preferred. 5. The checking of attribute values or text element content for validity belongs in the application. 6. The attribute mechanism of SGML is flawed (by being too weak). ------- These arguments aren't quite convincing to me, even where I see some true point being made in them. 1. The use of multiple notations is more complex than the use of single notations with the same expressive power and should therefore be banned. This does not seem to me an argument in favor of forbidding the use of attributes in our DTDs. By the same argument we can argue that in sentential logic the Boolean operators IF-THEN and IF-AND-ONLY-IF should be forbidden since they can be re-expressed using AND, OR, and NOT -- and then that AND, OR, and NOT should be outlawed in favor of the Sheffer stroke (meaning NOT BOTH), since having multiple operators is more complex than having a single operator. We can also outlaw the assignment operators =+ and =- and so on from C, eliminate the CASE statement in C and Pascal, write all software in assembler (because it is easier to parse) and return to the use of Latin for all learned publications, since we could thus simplify our languages without reducing expressive power. The premise that DTDs are simpler without attributes seems shaky, too, even leaving aside the question "simpler for whom?" We can use tags instead of attributes by using our attribute names as tag names instead and defining an appropriate content model. But is a content model like ((a?) & (b?) & (c?) & (d?) & (e?)) really easier to handle than a set of five attributes a-e? Example (1): use of tags as substitutes for attributes. Example (2): use of attributes. 2. There is no universal unambiguous method for choosing between the use of attributes and tags to express any given state of affairs. This observation is a corollary of the fact that the two notations have (roughly) the same expressive power. It is true. But I don't see how one can infer from it the conclusion "therefore attribute notation should be forbidden". The strongest conclusion I can derive from it is "Therefore, the choice must be made on non-universal and possibly subjective grounds" -- i.e. the choice will be a design decision for the working committees. 3. Criteria relevant to any distinction between attributes and tags (e.g. "content" vs. "non-content") depend upon point of view. I agree that points of view will vary, and agree further that it is possible for two possible points of view to diverge precisely in thinking that something is or is not a natural attribute or a natural component. It seems to me to be an argument in favor of letting those who are devising the tag sets decide, and not an argument in favor of forbidding them to express in the notation the point of view they find most attractive. 4. Tags are a more powerful notation than attributes and are therefore to be preferred. The term "powerful" is undefined here. I believe it must mean that all DTDs with attributes can be translated into functionally equivalent (almost) DTDs without attributes, but not vice versa; if so, then the condition is true. But the conclusion does not follow immediately: why prefer the more "powerful" notation and forbid the other? Assembler is more powerful in this sense than high-level languages, and use of the GOTO is more powerful than structured programming. (It is straightforward to rewrite structured code using GOTO, but not all programs using GOTO can readily be rewritten without it. At most, structured constructs are as powerful as GOTO, and a less economical notation.) All combinations of context-free grammars and regular expressions can be rewritten as context-free grammars, but not vice versa. Should people refrain from using regular expressions? 5. The checking of attribute values or text element content for validity belongs in the application. I am not sure what to make of this argument. What we are writing *is* an SGML application, is it not? If what is meant is that the checking of data validity belongs in the semantic actions and not in the parser, I do not understand the reasoning, or if I do, then I am inclined to disagree. Constraints on data values belong where they can be most elegantly expressed and most conveniently checked--where their expression sheds the most light on the nature of the textual structures being marked. I believe we should prefer the judgements of the designers of the tag sets to any a priori claim of this kind. Myself, I am inclined to believe that if data validity can be checked during parsing, it should be. But I'm unwilling to try to make that a general rule, any more than the opposite rule that it should never be. 6. The attribute mechanism of SGML is flawed (by being too weak). This is an argument for extending the attribute facility of SGML but not for forbidding its use where the existing mechanism is adequate. We can surmise, from the limitations of the attribute datatype checking, that attributes will not always be an attractive option for the other working committees; I don't see that they give us any reason to forbid the use of attributes even in cases where the other working committees do find them useful. ------- General: The common theme of the arguments appears to be that attributes are not necessary in any strict sense. The lack of necessity, however, does not constitute a reason for forbidding them. Often I see some truth in SM's arguments, and can sympathize with them, without feeling that they should be controlling considerations. It isn't clear to me whether my arguments make even that amount of sense to her and the rest of you. I believe that attributes provide a superior notation for some textual features and simplify processing more than they complicate it. I am not convinced that they complicate processing appreciably at all. In any case I believe that the utility of attributes to the user (especailly in allowing a distinction between constituents and features) should weigh more heavily than convenience in making parsers or processors. This is especially true since I believe most users of the TEI guidelines will be using full SGML processors rather than writing processors from scratch, and full SGML processors already support attributes. I also believe that attributes make the analysis of document types and the preparation of DTDs easier and more reliable, not harder and more complicated, in the same way that sentential logic is easier to do with AND, OR, IF-THEN and NOT than with the Sheffer stroke alone. If I am wrong in believing this, I believe the working committees can be trusted to avoid attributes as needless complications. In sum, I see no reason to forbid attributes, and one very good one (elegance of notation) to allow them. P.S. About the combinatorial explosion. SM is correct that the method she describes on 28 April can control the combinatorial explosion of tags entailed by the method for eliminating attributes which had been described in document MLW2, the second draft of the syntax document. This I pointed out in the preface to my comments of 23 March. I did not bother to rehearse yet again my arguments against this approach, since I had done so already on February 9. (In brief: number and tense are features, not component parts, of a verb, and to represent them as components is to distort instead of displaying clearly the nature of the data.)