DeRose on AAP | 1 Suggestions for Improving the AAP tag set Report TEI TRR7 by Steven J. DeRose, Ph.D. Summer Institute of Linguistics 7500 West Camp Wisdom Road Dallas, Texas 75236 USA D106GFS@UTARLVM1.Bitnet texbell!txsil!steved.uucp Copyright 1989 by Steven J. DeRose Last updated August 23, 1989 Introduction This document has its origins in documents David Durand and I prepared for the Document Interchange Project (DIP) of Brown University's Computing and Information Services. This project was funded by Brown, and carried out by Steven J. DeRose and David G. Durand on a consulting basis, and Mary E. McClure, Andrew Gilmartin, and several others as staff of CIS. DIP has as its goal the portability of documents across various text-processing systems, via parsers which translate various word-processor encodings to and from an interlingua (or "in-between language") closely resembling the AAP tag set, a conforming application of SGML. For discussion of many of the ideas I am indebted to members of Brown University's Computing in the Humanities User's Group, especially Elli Mylonas and Allen Renear. The current version has been prepared as a working paper for the Text Encoding Initiative. The case for descriptive markup is made in an annex to the SGML standard (ISO 8879, available from ANSI, 1430 Broadway, New York, NY 10018), in "Markup Systems and the Future of Scholarly Text Processing" (J. Coombs, A. Renear, and S. DeRose, Communications of the Association for Computing Machinery, November 1987), and elsewhere. Markup standards are very different from standards like PostScript|, in that they conceive of documents in authorial terms, rather than typographic ones. The Association of American Publishers has promulgated a markup system, or tag set, which conforms to the SGML standard, and now has ANSI approval. It is documented in "Reference Manual on Electronic Manuscript Preparation and Markup" and "Standard for Electronic Manuscript Preparation and Markup". Both were once available from AAP, 2005 Mass Avenue NW, Washington, DC 20036, (202)-232-3335. They are currently available via OCLC, (800) 848-5878, extension 6195. As Nicholas Alter writes in the AAP Reference Manual, p. ii, "The idea isn't to format pages first, but to create a generically structured electronic file that gives the publisher [who is sometimes the author] complete freedom to create any kind of product, whether print or electronic." The changes recommended here are largely intended to enhance AAP's expression of the ideals of generic/descriptive tagging. Design Considerations Special Characters Many characters used in writing (especially technical and scholarly writing) are not included in the standard ASCII set. Perhaps given such things as BCD we should be thankful we even have mixed-case.... At any rate, special codings must be used for all other characters. A large table is found in the AAP documents cited above. I will not discuss the issues of character encoding here, but will merely mention that a number of complicated and serious issues are involved; one particularly relevant to the AAP standard is whether accented characters are conceived of as one unit or many units. The AAP standard takes them to be single units (following the example, though not the requirement, of the SGML standard). For example, an "a" with an acute accent is coded as "á" rather than merely as "a" preceded or followed by an accent indicator (e.g. "a´"). This seems to me unfortunate, because (a) it carries on a tradition based mainly on limitations of past computer technology; (b) it does not mirror the pre-existing conceptual reality that there are letters and accents, which can be combined; and (c) it leads to an unduly large number of entities and names. Tag-set Philosophy The tags which are defined for a document should, in the opinion of the DIP developers, fulfill the following criteria (at least): 1) They should reflect the conceptual structure(s) of documents. 2) They should be consistently mnemonic. 3) They should be short enough for convenient typing. 4) They should themselves form coherent categories and groups. 5) They should not proliferate more than necessary. The AAP tag set (described in Appendix D of the Standard) fulfills some of these criteria, but does not appear to apply a consistent plan, and so has a number of problems in its vocabulary of well over 200 tags. Some specific classes of problems with the AAP tag set are listed in the following section. But a more fundamental conceptual flaw in the AAP tag set is one it inherits from SGML: There is no clear syntactic means for expressing conceptual relationships between tags per se. For example, the fact that there are several kinds of things we consider "lists" cannot conveniently be built into the tag set. Various hacks can employed to get around this: 1) The problem can be ignored, with an arbitrary tag defined for each sub-type: , , etc. 2) Similar tag names can be defined, such as , , etc., or the opaque , , etc. 3) One tag can be defined, with an embedded tag to indicate type: , , etc. 4) One tag can be defined, with an attribute to indicate sub-type: , , etc. None of these solutions is entirely ideal, but the last (#4) at least expresses the fact that there is a single class of object, with sub-types. It also has the advantage that only one element need be defined, and that all sub-types share the same permissible internal structure (i.e., items of an ordered list obey the same structural rules as items of a bulleted list). In my opinion it is vastly superior to the other methods. Currently the AAP tag set uses a mixture of methods 1 and 2. Unfortunately, many products do not let format be determined by attribute values. This makes method 4 more difficult to use for formatting applications. For example, few formatters allow a definition like If the type attribute of
  • is 'ordered', then increment list_counter and print it ahead of the content; else print '*' ahead of the content. A second conceptual flaw, not inherited from SGML, is that the AAP tag set appears to assume that processing programs can have no knowledge of the context in which a tag appears. But one of the most important advances of SGML is that context can be known. One obvious example is that if the starts and ends of lists are marked, as they are in SGML, then there is no need for special tags for embedded lists; a list which occurs within the context of another list is predictably an embedded list. The number of tags required, and the consequent learning curve for users, can be drastically reduced by assuming the processor can take context into account. There is one case where the AAP tag set does assume context is known: the (Number) tag is defined such that "the type of element in which it is contained identifies what type of number it is...." Some SGML-based word processors have this limitation. They allow the user to state that block quotes (say, ) are to be set in a particular point size (or, better, a certain number of points smaller than the current point size); however, they may not support a statement such as If this block quote is within a footnote , then set it in the same point size as the containing text; else set it 2 points smaller. A third conceptual flaw is an apparent drive towards very short names. Short names have two apparent advantages: first, they take less time to type; second, they take less space to store. However, the first advantage is illusory: even with relatively primitive editing programs macro facilities obviate typing tags; with an actual SGML editor there should be no need to type tags at all. The second advantage is extremely minor; it is especially minor in the case of the many tags which occur only a few times in a given document (for example, tags for large structural units). Also, space-saving grows less important with every passing month due to technological advances. In short, the predilection for very short names is a leftover from the days of typewriters and typewriter-like programs. The real problem is making tags perspicuous to later human readers and interpreters. For example, the typesetting professional, the later author or editor, and the database or information retrieval specialist must be able readily to intuit the meaning of tags from their names. The effort wasted in reconstructing meaning from overly brief and non-mnemonic tags far outweighs the few extra bytes required for better names. A conceptual problem which always arises involves semi-closed sets of elements, such as the roles a person may play which must be cited in bibliographies: author, editor, translator, and a few more cover nearly all cases. But if a tag, or even a fixed attribute value or entity, is assigned for each role, then someone will surely run into the case of needing a role-label which has not been provided. On the other hand, if no control is provided, then different users will create different names for the same roles, and the same names for different roles, leading to a loss of perspicuity and hence portability. Types of Problems Many objects which can occur in several places are given inconsistent names in those places. For example, consider t vs tl vs ti vs c in: (Appendix title); (Article title); (Box title); (Chapter title); (Part title); (Section title); (Title); (Figure caption). Likewise for references-to; footnotes; address; and many more. Since many of these elements are identical except for the context in which they appear, and the context is already made explicit by higher-level tags (such as (Chapter)), these tags can be combined, thus requiring the user (and the computer) to learn fewer tags. The sub-objects which structural objects have are not readily predictable. An example in the Standard shows an abstract containing a (Heading) as title. But then appendices own not only (Titles) as opposed to (Headings), but (worse) special (Appendix titles). Some tags have unclear uses. For example, (Language), versus specific (Cyrillic) and (Greek) tags, versus things like (Language of Abstract). The ambiguity left in the use of such tags lets one imagine a tagging as opaque as: Abstract CyrillicCyrillic.... Some structural phenomena have more than one obvious tagging. For example, a particular phrase could be marked as (Bold Italic) or as (Emphasis, type #3). In this case, neither tagging is ideal: The first expresses a typographical rather than structural property (which has the unfortunate effect that the tag is misleading if a particular display device has limits, or if French typographic style is used, which does not have italics, but spreads characters apart instead). The second is not mnemonic, and it would be better to provide emphasis subtypes such as , , etc. Several groups of objects do not have super-ordinating tags. For example, there is no super-ordinate , but there are tags for (Day), (Month), and (Year). Yet there are specific tags for (Copyright date) and (Publication date). Likewise missing are ,
    , , etc. Some group-objects do not have sub-objects which might reasonably be expected by symmetry with the rest of the system. (Degree granted) does not own a "Date". Similarly, there is only one tag for (Electronic address), though it claims to cover telephone, telex, and e-mail addresses with no way to distinguish them. Some much-needed objects are simply missing: tags for what year- numbering or other unit system is in use; various parts of names; poetry elements; examples; etc. Some tags do not belong in a descriptive system, as they serve only typographic purposes. Perhaps the best example of this is the highly suspect (Acid free paper indicator) tag. Other questionable tags might be , , , and others, though there may be subtler justifications for some of them. Certain objects, when appearing in different contexts, differ in whether they have internal structure. For example, (Publication date) and (Copyright date) have no internal structural tags. However, there is a macro in the SGML prolog code for the AAP tag set which does define it; the fact that such a macro was conceptually useful in writing the prolog should indicate that a comparable tag might be comparably useful to authors. Other dates do not have external structure, being codable only as individual day, month, and year fields. The mnemonic system is inconsistent. For example, a reference-to indicator appears as both (Reference to bibliography) and (Appendix reference). Spell-out of names is inconsistent: vs. . Prefixes, likewise: (Chapter) vs. (Chapter title), (Figure) vs. (Reference to figure). Some tags alphabetize far from the other tags they relate to, while others seem to have been carefully chosen to avoid this problem. For example, most bibliography-related tags start with "b", some with "bib", but then we find for "Other bibliography information". Some tags come in varieties: (Emphasis), where "*" varies over several numbers. There is no explicit specification that numbered variants are related; further, the use of numbers is highly non-mnemonic. Two methods are very helpful in this and the case of tags with very limited environments: First, a single tag can be defined for the related class of purposes, with a sub- type attribute. For example, or . This has the advantage that new sub-types can be coined (if the attribute is allowed unrestricted values | this can also be prevented by the DTD if desired). It also very clearly expresses that the functions form a single class, with variants. Second, a set of tags can be used, with compound tag names. This is somewhat like the current AAP approach, but the use of a separator character and a mnemonic sub-type name is much clearer. For example, for sub-parts of a bibliography which do not occur elsewhere (this is a potentially very useful convention), or for a particular kind of emphasis tag. This method has as one advantage that it allows the differing objects to be formatted differently even on those SGML-based formatters which cannot make formatting a consequence of attribute values, but only of element types per se. Programming languages abandoned limitations such as 6-character names long ago, and abandoned numbered variables even earlier, both for good reasons. Compound names were also introduced long ago, and are now standard, as in hierarchical file systems and record or structure variables in programs and databases. Strict naming limitations are unnecessary, and lead to substantial problems in devising perspicuous names. There are some references to structure, such as the notation that (Author's affiliation) has "organization components". But these structural components are to be found only as macros in the formal syntax section of the Standard. They are not tags available to the user, nor are they documented. The AAP tag set, in short, has not treated the objects of discourse fully as objects, which fall into classes and have specific, conceptually significant internal structures. These recommendations are intended to (a) remedy these asymmetries and (b) greatly reduce the number of tags needed, while (c) retaining the full range of expressiveness, for which the AAP standard is quite insightful. Attributes SGML allows attributes on tags. The AAP standard uses attributes only to express element instance identifiers, and references to those identifiers. For example, one might define a figure as and then later code "See also example ". This is clearly appropriate. Many other systems use attributes to encode pure typesetting/formatting functions, rather than descriptive and structural functions; this is to be deprecated. Tags with attributes always can be transformed into tags with embedded tags (whose content is the erstwhile attribute's value). The reverse transformation is also generally possible. It has been argued that attributes should be totally foregone, thus simplifying the syntactic mechanisms of SGML. However, I feel they serve a function in making certain structures more perspicuous, and should therefore be retained. Also, the most troublesome syntactic aspects of SGML are not attribute-related; much more important simplifications should be applied first. In my view, the main legitimate uses for attributes are in providing information about the text which is a) specific to element instances (i.e., not element types); b) not part of the text per se (and hence not content); c) not contributory to the logical structure (e.g., information which shares exactly the scope of other, independently motivated elements). This is somewhat subjective, but examples of this might be specification of the language of some text, the floatability of a figure (because this property may have determined the phrasing of text leading up to the figure), and identifying names for element instances to which reference must sometime be made. This convention has the advantage (pointed out to me by Michael Sperberg-McQueen and David Durand) that processing of the text is equivalent to processing of the content. For example, a search program can merely ignore tags, and need not know which other things are not truly content. A second argument for the judicious use of attributes is that they are the only means for expressing that something is a property of an element per se, and does not merely share an element's scope by chance. Also, by encoding properties at the element level, scope ambiguities can be avoided, such as: "

    Some Greek text

    " versus "

    Some Greek text

    ". A Modified Tag Set Here I describe a partial set of tags, organized by class and use. Some names are the same as names in the AAP tag set, and many are obviously similar; but many AAP tags are subsumed under others and hence do not appear. I wish to point out that this tag set owes a great deal to the AAP set; it is far easier to build upon previous work and insights, than to generate results ex nihilo. Yet I feel these changes are substantial, and could contribute to making the tag set significantly more usable. Structural tags define the overall tree structure of the document's components. In a particular document, the tree will be headed by the document-type node. Typically, the text will work down through various levels of sections, eventually reaching containers such as paragraphs, and even lower-level objects. I have applied certain conventions in the naming. Among them are: 1) Names are, in general, spelled out in full, rather than abbreviated. Exceptions to this convention occur when: a) The element name is very long; then it is simply truncated, rather than having selected characters removed (e.g. instead of ). b) The element is conceptually a semantically restricted sub-piece of exactly one other element; then the name of the superordinate element is abbreviated and prefixed to the sub-element (e.g., instead of ). c) The element is extremely common (e.g.,

    instead of ). 2) Parts of crystals (see below) are named with the compound name convention, i.e., the first letter of the crystal's name, followed by a period, followed by the sub-element name. This applies to elements which occur only in a certain very restricted context, and helps to make that fact explicit. 3) Elements which have sub-types, such as those which are differentiated by numbers in the AAP tag set, are given a single tag name plus a type attribute which specifies the sub-types (the relationship between this convention and the previous is an area deserving further clarification). Structural Hierarchy Tags Document type1 Administrative Matter Production Information Distribution Information Publisher's front matter (?) Cataloging information: ISBN ISSN CODEN (?) Cataloging in publication data Coded Bibliographic Slip Keywords Key phrases Subject headings Front Matter: Abstract Colophon Copyright Information Dedication Epigraph Foreword Preface Publication information Tables (Contents, Figures...) Title Page Body2: Sections of successive levels Back Matter: Appendix Bibliography Glossary Index Notes Running Headers/Footers/Repeated Material Tags These tags are used for defining special material which appears periodically in the document, such as running headers and footers. The only AAP tag that seems to be intended for this use is (Repeating identifier). This should be clarified. Floatable Material Tags These tags mark objects which do not necessarily have a fixed position in the text, but may (at least potentially) be floated to other locations. An attribute to specify whether floating is actually permitted for a particular instance may be useful. Artwork Figure Footnote Sidebar Text Containers Block quotation Quotation

    Paragraph Column-oriented/Mono-spaced text Example Poetry Tags The current AAP standard entirely ignores poetry, which is unfortunate, especially since fragments of poetry are widespread in other documents. The poetry tags suggested here, like some other classes, employ the convention that subordinate elements specific to one particular superordinate element are named with the first letter of the super-ordinate element's name, followed by a dot and the subordinate element's name. Poem: Stanza Line of poem Continued line Omitted start of line1 Emphasis Tags Emphasis tags affect the way their contained text appears, generally as to font, style, or other means of expression. They are the tags which are most commonly confused with typesetting codes; but they should be considered kinds of emphasis rather than kinds of type. Indeed, depending on the output device, the visual result will be different. The emphasis tags are all collapsed to one. The type attribute could have values such as Roman, Italic, Bold, Bold-Italic; these are not ideally descriptive, but at least are familiar to most authors, and so have migrated from the realm of typography to that of authorship. Better would be classes such as Terminology, Jargon, Keyword, Interjection, Variable Name, etc.; by allowing a string-valued attribute, new classes of emphatic elements can be introduced by the user without modifying the DTD. The drawback of this method, namely that it is necessary to modify the DTD if verification of permitted emphasis types is desired, is real but insignificant, because the same is true of the current method. Emphasis type Language Specifications The tag is used for specifying Greek and Cyrillic as well as other languages, obviating and . The "Language of abstract" and the (Romanization scheme) tag are also obviated since they can be determined from context. Language Crystals "Crystals" is my term for those special small objects which include particular semantically constrained sorts of data. Most of them follow the compound name convention described above. The crystals include: Additional possibilities: date, address, title, box, tc, bib Date: Day Month Year Year-system1 Address: City Subdivision of country Country Postal code Street address Postal Box Telephone number Fax number Electronic mail address:1 User id/account Network routing Network node Network Name: Title before name2 First Name Middle initial or name Surname Suffix to name (e.g. Jr., III) Organization: Department Division Name Address Id number Event:3 Sponsor Number (e.g. 1st annual) Address Date Conference Name Title group:4 Volume Title Article Title Abbreviated Title Subtitle Citation: 1 Title group Author Editor Revisor Translator Conference Volume Issue Edition Publication date Publisher Holdings Organization References:2 Author3 Thesis-specific information:4 Name Vita Affiliations Degree being granted Degree-granting institution Previous degree School's standard "Submitted to".... Date of submission Standard "Accepted by" blurb Acceptance date Accepting authority Lists A list is a sequence of text containers, which we will call list items. Each list item may have a leading marker, which we will call a list item heading; this would commonly be the case in lists such as glossaries. Usually, all items in the list are of one syntactic type, but this is not essential. Note also that the list may have its own headings: one for the list as a whole (such as "Glossary"); one to identify the following list headings (such as "Term"); and one to identify the text of the following items themselves (such as "Definition"). It is useful to distinguish ordered (e.g. numbered or lettered), unordered (e.g. bulleted), and simple (no demarcation) lists. Doing this via attributes or compound tag names would be better than using numbers, as the AAP tag set does currently ( for "List, type 1, 2,..."). The AAP tag set provides distinct tags for

    (Definition list)", (Item list), and (Glossary), which could also be treated as a list. There is also (Bibliography list). These may be better combined into one notion of list. Yet, it is important to be able to handle demarcation. So the tags shown in the following example will be supported. , , and are new. AAP's is covered by . Glossary Term Definition
  • Aardwolf

    A mammal like the hyena, Proteles cristatus,....

  • Tables Tables are hard any way one goes about it. Common methods include at least the following, in increasing order of sophistication: 1) Giving rough hand-drawn tables to the publisher, whose problem they then become. 2) Literally pasting in hand-created tables. 3) Switching to a mono-spaced font and lining up columns by hand with the word-processor's formatting disabled. 4) Treating tables as graphical objects, produced by external programs and electronically pasted in. 5) Writing specialized programs and/or macros to handle the particular tables needed. 6) Putting explicit tab characters (whether ASCII tab or a substitute) between columns, newlines between rows, and setting column tab stops. 7) Generically coding rows and columns. The AAP tag set's codes for tables seem adequate for basic purposes. A useful extension would be an alignment attribute on the tag, which would indicate for each column whether its content was to be left, right, center, or decimal aligned. But this would be pushing the boundary towards presentational as opposed to descriptive markup. Conversion tables The table below is intended to list all standard AAP tags. The AAP tag name comes first, then the AAP description (sometimes abbreviated). Last comes a suggested equivalent. If a column is "|" then the tag has no clear equivalent in the tag set represented by that column; if it is "?", I am foregoing discussion of the tag. Sorted by AAP Tag AAP tag description New tag abn Abstract number no abs Abstract abs abt Title abbreviated abt ack Acknowledgement ack adv Advertisement | aff Author's affiliation aff aid Article identifier aid alt Title alternate | aon Acquisition/order number | apfn Appendix footnote fn app Appendix app appm Appendix matter appm apr Appendix, reference to ref apt Appendix title ti art Artwork art article Article document type article artr Artwork, reference to ref asq Aseq asq atl Article title atl au Author auth aud Audience aud aug Author group aug avl Distributor distrib awd Afterword awd b Bold e bb Biblio ref text bb bdy Body matter body bi Bold ital e bib Biblio bib bibl Biblio list l bm Back Matter bm book Book/monogr doctype doc bq Quotation block bq bx Boxed material box bxb Box body b.body bxh Box head b.head bxt Box title ti c Table cell entry c cad Corporate body address addr cau Corporate body as author au cbb Coded bibliographic strip cbb cci CCC indicator ? cdn CODEN coden cgn Contract/grant number cgn cgs Contract/grant sponsor cgs chp Chapter h cip CIP data cip cnd Conference Date c.date cnm Conference Name c.name cnn Conference cnp Conference Address c.addr cns Conference Sponsor c.spon cny Country cny col Colophon col cpg Corporate group as author au crd Copyright notice date date crn Copyright notice name name crt Copyright notice crt crx Other copyright info crx ct Chapter title tig cty City city cyr Cyrillic lg day Day day dd Definition descr li ddhd Definition descr head tig ded Dedication ded deg Degree granted deg dl Definition list l dsg Physical descr group dsg dt Definition term tig dthd Definition term head lihh e* Emphasis, type n e* ead Electronic address email ed Edition statement ed emq Quotation embedded q ep Epigraph ep ext Extent of Work ? fgb Figure body f.body fgc Figure caption ti fgr Figure, ref to ref fig Figure fig fm Front Matter fm fn Footnote fn fnm First Name n.f fnr Footnote, ref to ref ftg Foreign title group tig? fwd Foreword fwd gk Greek lg gl Glossary gl h Heading sec h* Head n sec hst History statement hist iad Indiv address addr idx Index index iid Issue ident iid ino Issue number iid inst Degree-granting institution inst int Intro int ipt Issue part ipt isbn ISBN isbn issn ISSN issn it Italic e itm Item li itml Item list l kph Key phrase keyphrases kwd Key word keywords l List l l* List type n l lab Language of abstract lg lcn LC card lcn lh List head lih li List item li li* List item subord ? line Line (?) line lit Literal | lng Language lg loc Publisher's location addr lsc Language of source item lg maa Multiple author affil ? mau Multiple author list ? med Medium | mo Month month mrg Note, marginal margnote msn Monograph series number msn nit Note in text intextnote no Number no notes Notes notes ntr Note in text, ref to ref oad Org. Address o.addr obi Other biblio info b.other odv Org. Division o.div onm Org. Name o.name p Paragraph p part Part h pc Postal code pc pdt Publication date date pf Preface pf phi Acid Free | pid Publisher's id ? pkg Packaging | pnm Publisher name pnm pp Page number of ref ? prc Price | psan Publisher SAN ? pt Part title ti pubfm Publisher's fm group pubfm pug Publisher imprint group | q Quotation in-line q rb Biblio, ref to ref rep* Repeating ident n rep* rfn Reference numbered ? rfu Reference unnumbered ? rid Report ident ? rl Reading Level | rm Roman e rno Reference number no role Role indicator role rom Roman scheme | row Table row row rps Reprint source | rto Reproduction rate | san Standard address number | sbd Country subdiv sbd sbdy Serial body matter bdy sbk Subbook h sbm Serial back matter bm sbr Sidebar sbr sbt Subtitle sbt scd Subject code subjecthead sch Degree granting institution sch scp Small caps e sct Title named section ti sec Section h serial Serial doctype doc sfm Serial front matter fm sid Supplement to issue ID ? sit* Item, subord li sm Simple math ? smtl Supporting mat'l avail'y ? snm Surname n.s spt Title parallel subord ? spubfm Serial publ's fm group pubfm src Source note ? srr Source note, ref to ref srt Title monographic series msertitle ss* Subsection n h ssc Serial named section h st Section title ti stm Subject term | str Street 0 tbl Table tbl tbr Table, ref to ref tc Table of contents (=Tc) tc tcart Tc, article title in ? tcchp Tc, chapter title in ? tcpt Tc, part title in ? tcsec Tc, section title in ? tcss* Tc, subsec title in, level n ? tcssc Tc, serial st in ? th Table column header th ti Title ti tig Title group tig tip Title parallel 0? tir Title romanized tir top* Topic para (captioned) n | trm Availability terms/info | tsb Table stub lin tsb tsc Source of document title | tsh Table col subord header tsh tt Table title ti tx* Text, type n | uig Unique id group | vid Volume id vid vt Vita vita yr Year year | Acceptance date a.date | "Accepted by".... accept | Accepting authority a.name | Address addr | Administrative Matter admin | Columnar/Mono-spaced mono | Conference conf | Continued poetry line p.cline | Date date | Date of submission subdat | Degree of thesis newdeg | Distribution Information distrib | Document, type spec doc type= | E-mail Network e.net | E-mail Network node e.node | E-mail Network routing e.route | E-mail User id/accoun e.acct | Editor editor | Example example | Fax number fax | Header for List item headers lihh | Header for List item texts lith | Holdings holdings | Line of poem p.line | List-item header lih | Middle initial or name n.m | Name name | Name Title (Ms., Mr.) n.t | Omitted start of poetry line p.omit | Org. Department o.dept | Org. Id number o.id | Organization org | Poem poem | Poem Stanza p.stanza | Postal box number pobox | Previous degree prevdeg | Production Information prod | Publication Information pubfm | References ref | Revisor revisor | "Submitted to".... submit | Suffix to name (e.g. Jr., III) n.z | Telephone number phone | Thesis reader's name reader | Thesis reader's sign'r signature | Title Page titlep | Translator translator | Year system ysys Sorted by Suggested Tag AAP tag description New tag str Street 0 tip Title parallel 0? cci CCC indicator ? ext Extent of Work ? li* List item subord ? maa Multiple author affil ? mau Multiple author list ? pid Publisher's id ? pp Page number of ref ? psan Publisher SAN ? rfn Reference numbered ? rfu Reference unnumbered ? rid Report ident ? sid Supplement to issue ID ? sm Simple math ? smtl Supporting mat'l avail'y ? spt Title parallel subord ? src Source note ? tcart Tc, article title in ? tcchp Tc, chapter title in ? tcpt Tc, part title in ? tcsec Tc, section title in ? tcss* Tc, subsec title in, level n ? tcssc Tc, serial st in ? a.| Acceptance date a.date a.| Accepting authority a.name abs Abstract abs abt Title abbreviated abt | "Accepted by".... accept ack Acknowledgement ack cad Corporate body address addr iad Indiv address addr loc Publisher's location addr | Address addr | Administrative Matter admin aff Author's affiliation aff aid Article identifier aid app Appendix app appm Appendix matter appm art Artwork art article Article document type article asq Aseq asq atl Article title atl cau Corporate body as author au cpg Corporate group as author au aud Audience aud aug Author group aug au Author auth awd Afterword awd bxb Box body b.body bxh Box head b.head obi Other biblio info b.other bb Biblio ref text bb sbdy Serial body matter bdy bib Biblio bib bm Back Matter bm sbm Serial back matter bm bdy Body matter body bx Boxed material box bq Quotation block bq c Table cell entry c cnp Conference Address c.addr cnd Conference Date c.date cnm Conference Name c.name cnn Conference cns Conference Sponsor c.spon cbb Coded bibliographic strip cbb cgn Contract/grant number cgn cgs Contract/grant sponsor cgs cip CIP data cip cty City city cny Country cny cdn CODEN coden col Colophon col | Conference conf crt Copyright notice crt crx Other copyright info crx crd Copyright notice date date pdt Publication date date | Date date day Day day ded Dedication ded deg Degree granted deg avl Distributor distrib | Distribution Information distrib book Book/monogr doctype doc serial Serial doctype doc | Document, type spec doc type= dsg Physical descr group dsg b Bold e bi Bold ital e it Italic e rm Roman e scp Small caps e e* Emphasis, type n e* | E-mail User id/accoun e.acct | E-mail Network e.net | E-mail Network node e.node | E-mail Network routing e.route ed Edition statement ed | Editor editor ead Electronic address email ep Epigraph ep | Example example fgb Figure body f.body | Fax number fax fig Figure fig fm Front Matter fm sfm Serial front matter fm apfn Appendix footnote fn fn Footnote fn fwd Foreword fwd gl Glossary gl chp Chapter h part Part h sbk Subbook h sec Section h ss* Subsection n h ssc Serial named section h hst History statement hist | Holdings holdings iid Issue ident iid ino Issue number iid idx Index index inst Degree-granting institution inst int Intro int nit Note in text intextnote ipt Issue part ipt isbn ISBN isbn issn ISSN issn kph Key phrase keyphrases kwd Key word keywords bibl Biblio list l dl Definition list l itml Item list l l List l l* List type n l lcn LC card lcn cyr Cyrillic lg gk Greek lg lab Language of abstract lg lng Language lg lsc Language of source item lg dd Definition descr li itm Item li li List item li sit* Item, subord li lh List head lih | List-item header lih dthd Definition term head lihh | Header for List item headers lihh line Line (?) line | Header for List item texts lith mrg Note, marginal margnote | Columnar/Mono-spaced mono mo Month month srt Title monographic series msertitle msn Monograph series number msn fnm First Name n.f | Middle initial or name n.m snm Surname n.s | Name Title (Ms., Mr.) n.t | Suffix to name (e.g. Jr., III) n.z crn Copyright notice name name | Name name | Degree of thesis newdeg abn Abstract number no no Number no rno Reference number no notes Notes notes oad Org. Address o.addr | Org. Department o.dept odv Org. Division o.div | Org. Id number o.id onm Org. Name o.name | Organization org p Paragraph p | Continued poetry line p.cline | Line of poem p.line | Omitted start of poetry line p.omit | Poem Stanza p.stanza pc Postal code pc pf Preface pf | Telephone number phone pnm Publisher name pnm | Postal box number pobox | Poem poem | Previous degree prevdeg | Production Information prod pubfm Publisher's fm group pubfm spubfm Serial publ's fm group pubfm | Publication Information pubfm emq Quotation embedded q q Quotation in-line q | Thesis reader's name reader apr Appendix, reference to ref artr Artwork, reference to ref fgr Figure, ref to ref fnr Footnote, ref to ref ntr Note in text, ref to ref rb Biblio, ref to ref srr Source note, ref to ref tbr Table, ref to ref | References ref rep* Repeating ident n rep* | Revisor revisor role Role indicator role row Table row row sbd Country subdiv sbd sbr Sidebar sbr sbt Subtitle sbt sch Degree granting institution sch h Heading sec h* Head n sec | Thesis reader's sign'r signature | Date of submission subdat scd Subject code subjecthead | "Submitted to".... submit tbl Table tbl tc Table of contents (=Tc) tc th Table column header th apt Appendix title ti bxt Box title ti fgc Figure caption ti pt Part title ti sct Title named section ti st Section title ti ti Title ti tt Table title ti ct Chapter title tig ddhd Definition descr head tig dt Definition term tig tig Title group tig ftg Foreign title group tig? tir Title romanized tir | Title Page titlep | Translator translator tsb Table stub lin tsb tsh Table col subord header tsh vid Volume id vid vt Vita vita yr Year year | Year system ysys adv Advertisement | alt Title alternate | aon Acquisition/order number | lit Literal | med Medium | phi Acid Free | pkg Packaging | prc Price | pug Publisher imprint group | rl Reading Level | rom Roman scheme | rps Reprint source | rto Reproduction rate | san Standard address number | stm Subject term | top* Topic para (captioned) n | trm Availability terms/info | tsc Source of document title | tx* Text, type n | uig Unique id group | Revision History June 26, 1987: Written by Steven J. DeRose. June 29, 1987: Table added, mapping AAP to BITS tags. Section on terminology added. Organize tags by structure and use. July 11, 1987 sjd: Add Revision History section. Add description of tables and lists. Modify list tags to provide headers without the full complexity of a tig, and to provide another layer or heading for the sub-parts of the list. July 12, 1987 sjd: Add level attribute on h; phone, pobox, mono. Rename author to match AAP au, pub to match pubfm. Add include tag. Add type and atr attributes to ref. Add titlep tag. Consider problems of superscripting, footnote numbering in general, document tag with sub-types for book, serial, etc. Add alphabetical list of all BITS tags. July 16, 1987 sjd, dgd: Remove nearly all attributes . Attributes are generally used for formatting information, and hence not needed. Re-cast the 'type' attributes (as for headings, lists, etc into type_subtype form. Introduce '_' convention for grouping tags into families. June 21, 1989 sjd: Re-write from BITS specification to evaluation of AAP. Use "." as name separator character, not "_". June 26, 1989 sjd: Remove alphabetical list of new tags, incorporate into conversion table. Make "crystal"-sub-element naming more consistent. July 20, 1989 sjd: Finish updating tag set changes, reconcile tables and text. Change formatting of tag descriptions. Expand arguments re. attributes, naming conventions. August 1, 1989 sjd: Clean up formatting, improve a handful of tag names, explicate naming conventions further. 1type = (book | serial | article). 2These all optionally own a (Title group), and a (Number). The latter is a serial number for the specific instance of the object, such as in "Chapter 1." It may also be desirable eventually that every element have a unique identifier; if retained across edits, such would provide ideal handles for interchange of hypertextual linkage information. 1This tag marks a start of a poetry line which is to be omitted in display or printing. such text must be included and tagged, because standard typographic practice requires that the remainder of the line be indented as if the omitted portion were present (i.e., indented by an amount equal to the width of the omitted text). 1 E.g., B.C., A.D., B.C.E., C.E. 1This crystal is part of the Address crystal. 2 E.g. Ms., Mr. 3Used for conferences and similar events. 4The title group demarcates the title of any work of literature. This referential use must be carefully distinguished from the tag used to indicate the title of the current work, e.g., on the title page. 1Used for bibliography entries, of which in-line references are abbreviations. An alternative to be considered, is to eliminate the specific name-tags for various relevant roles, and to include a role attribute on , with values such as editor, author, etc. The true complexity of citations is much greater than this definition indicates; analyses such as are evidenced in standard bibliography database software should inform the tag set definition. Some of the names used in the AAP tag set appear to be modelled after the ANSI bibliography standard, but the extensive coverage of that standard has not been retained. 2All of the cross-reference tags can be combined into one. If there is need to distinguish sub- types, this could be done with an additional attribute. Combining the reference tags is also good because the SGML definition of ID attributes requires that they be unique over an entire document, not merely over the values attested on each element type. Also, the standard should clarify what such a tag means: is the refid itself to be printed, or is it intended as a function which looks up an arbitrary piece of content which is defined when the id attribute is parsed, and inserts that, or does the tag merely enclose literal reference text such as "see Chapter 2"? 3Author information will, in most instances, be just a name. However, additional data may also go here; The best structure for specifying corporate authors may be to stick with the separate tag as in AAP. 4Certainly more tags are needed for theses; an obvious example is that I have provided no internal structure for the vita, although it needs various kinds of headings, dated item lists, etc.