Notes on Media formats and XPointer


Graphic Images

Summary of Recommendations:

It is recommended ( som26.html ) that external entity declarations are deprecated in the TEI. Given this recommendation, it is doubly important that an alternative container is provided for image metadata. Since there is a standard (SVG) that encodes (or wraps in the case of raster images) an image in xml. and since SVG has an adequate metadata structure for describing what an image denotes (headings, captions, rights (via the metadata tag) etc) it is proposed that the existing tags are supplemented by allowing the insertion of images wrapped in svg.

The current mechanism for marking up images using the TEI tags <figDesc> , <note> and <head> may still be used for simple figure description, though their use is deprecated. Instead it is recommended that these tags be used inside an svg <desc> element.

Users should be warned against using labels like ‘Above’ in <head> tags (as in the pullman example given in P4) where these are intended to describe the position of the image with respect to the caption since relative positions are determined by the stylesheet. Labels like ‘Above:’ and ‘Left:’ should be included in <head> tags only when faithful transcription of an existing text is required. When control over layout is required (for example where a text refers to a number of images from a single block of text) the use of SVG (or SMIL for time-variant media) brings many benefits.

This approach does not address concerns about how to handle references to multiple versions of the same image. This was felt to be beyond the scope of this working group. In particular, in the case of multimedia which may contain multiple audio and video tracks sourced from sections of other files, this is recognised as a problem for which no adequate solution currently exists.

Introduction

Many documents, both historical and contemporary, include not only text but also graphics, artwork, and other images. Increasingly, on-line text also includes multimedia imagery, including audio and video material. Accurate markup of this multimedia imagery is non-trivial. Images may be cropped, deformed, held in multiple formats and displayed in different ways and at different resolutions. Video may be speeded up or slowed down, either intentionally or inadvertently. It is obviously important that markup should continue to describe the content of an image or video accurately after such distortion. Many specialist domains have developed standards to meet the need for markup of images and multimedia content (eg DICOM - Digital Imaging and Communications in Medicine - for markup of radiography images), but open standards suitable for standoff markup of general images have been slow to emerge.

By contrast, standards to describe graphics and multimedia imagery have been around since before the birth of the internet - for example the Joint Photographic Experts Group (JPEG) standard for still images and a set of standards defined by the Motion Picture Experts Group (MPEG) for video. Additionally a number of de facto standards have been widely adopted (for example PNG and GIF for still images, AVI, Quicktime for video). These describe how an image or video should be rendered to a computer screen. Some provide limited support for annotation and for defining areas of an image or segments of a video but this is not their main purpose.

All of these standards describe raster graphics, where the image is made up of a list of points, or dots. A vector image, in contrast, is a list of geometrical objects, such as lines, circles, arcs, or even cubes. Although proprietary formats for vector graphics are common (PowerPoint, Visio, and Flash), standard open formats are only just beginning to emerge in the form of SVG (Scalable Vector Graphics), a language for describing two-dimensional vector and mixed vector/raster graphics in XML, and SMIL Synchronized Multimedia Integration Language, which defines an XML-based language that allows authors to write interactive multimedia presentations. Mixing raster and vector images has the benefit of providing an easy way of encoding the main 'body' of an image, while separately marking up areas of the image using vectors which carry very little overhead. Both SVG and SMIL are open W3C standards that allow users to 'wrap' multimedia content so that it can be addressed with stand-off markup. Alternative vector graphics standards have been proposed (VML by microsoft, PGML by adobe) but neither appears to have significant support, even within their parent organisations.

There exists a bewildering variety of different graphics formats that might be wrapped in SVG, and previous versions of the TEI have listed some of these. Since the number of possible graphics, sound and video formats has grown significantly since the guidelines were first published, this is now of questionable value. The original list neither endorsed nor recommended any particular format, and so the value of including such a list in this context is questionable. It is likely that SVG will most often be used to wrap JPEG or PNG. SMIL will most often be used to wrap Quicktime and AVI movie files or streams, and WAV audio files. SVG is itself a vector graphics format. While simple SVG diagrams can be written in XML by SVG experts, most users will probably prefer to author SVG images using proprietary drawing tools that can export in SVG format.

Specific Elements for Graphic Images - the current position

The following special purpose elements are provided by the current (TEI P4) tag set to indicate the presence of graphic images within a document:
  • <figure> indicates the location of a graphic, illustration, or figure.
  • <entity> names the external entity within which the graphic image of the figure is stored.
  • <figDesc> contains a brief prose description of the appearance or content of a graphic figure, for use when documenting an image without displaying it. No attributes other than those globally available (see definition for a.global)
Currently, inclusion of a graphic image in a marked-up TEI document typically requires three distinct steps:
  • The notation employed by the image itself must be defined; this is done with a notation declaration in the document type definition.
  • The external entity in which the image is stored must be defined; this is done with an entity declaration, which refers to the notation declared at step one.
  • Within the document, the <figure> element is used to mark the position of the image, which is referenced by name, like any other kind of external entity.

In the TEI scheme, these three functions are currently carried out as follows.

Declarations for all notations used by a document must be provided within the DTD subset, as described above in section 22.2 Formulae and Mathematical Expressions. Many such notations are in common use; for details see section 22.5 Graphic Image Formats.

Entity declarations for the entities containing the graphics themselves must be made, using system or public identifiers, within the document's DTD subset, either directly or by including them within a suitable file, as in the example below.

<!DOCTYPE TEI.2 PUBLIC "-//TEI P4//DTD Main Document Type//EN"
            "http://www.tei-c.org/P4X/DTD/tei2.dtd"
            <!ENTITY % TEI.XML "INCLUDE">
            <!ENTITY % TEI.prose "INCLUDE">
            <!ENTITY % TEI.graphics "INCLUDE">
            <!-- Graphics notations used in this document ... -->
            <!NOTATION svg PUBLIC '-//TEI//NOTATION W3C Scalable Vector Graphics Format//EN' >
            <!NOTATION png PUBLIC
            '-//TEI//NOTATION IETF RFC2083 Portable Network Graphics//EN'>
            <!NOTATION jpeg  PUBLIC 'ISO DIS 10918//NOTATION JPEG Graphics Format//EN' >
            <!-- The file 'figures.ent' contains entity declarations -->
            <!-- for all external entities needed by this document -->
            <!ENTITY % myFigures SYSTEM "figures.ent">
            %myFigures;
            ]> 
The file figures.ent will contain a series of declarations like the following:
              <!ENTITY Fig1    SYSTEM  "fig1.svg"    NDATA svg>
              <!ENTITY Fig1th  SYSTEM  "fig1.jpg"    NDATA jpeg>
              <!ENTITY pullman SYSTEM  "pullman.png" NDATA png>
the effect of which is to associate the name Fig1 with the external entity fig1.svg, and also to declare that that entity uses the notation called svg, which is declared in the DTD subset. In the same way, the external entity fig1.jpg is defined as using the jpeg notation, and may be referenced by the name Fig1th (see further below).
Finally, the <figure> element is used to indicate the location of the graphic image in the text. For example:
<figure entity='Fig1'></figure>
Under current guidelines three kinds of content may be supplied: the element <head> may be used to transcribe (or supply) a descriptive heading or title for the graphic itself as in this example:
            <figure entity='Fig1'>
            <head>Figure One: The View from the Bridge</head>
          </figure>
Figures are often accompanied not only by a title or heading, but by a paragraph or so of commentary or caption. One or more <p> elements following the <head> may be used to transcribe any caption or discussion of the figure in the source:
              <figure entity='pullman'>
              <head>Above:</head>
              <p>The drawing room of the Pullman house, the white and gold saloon
              where the magnate delighted in giving receptions for several
              hundred people.</p>
              <figDesc>The figure shows an elaborately decorated room, at least
              twenty-five feet side to side and fifty feet long, with ornate
              mouldings and Corinthian columns on the walls, overstuffed
              armchairs and loveseats arranged in several conversational
              groupings, and two large chandeliers.</figDesc>
            </figure>
Here, the paragraph ‘The drawing room ... several hundred people’ is transcribed from the source, while the description is provided by the encoder, for use by applications which cannot display the graphic directly. The content of the <figDesc> element should only describe what is denoted by the image. In documents created in electronic form with the needs of print-handicapped readers in mind, the <figDesc> element may be provided by the author rather than a subsequent encoder. Text interpreting the content of an image should be placed in a <note> element.
              <figure entity='Fig1'>
              <head>Figure One: The View from the Bridge</head>
              <figDesc>A Whistleresque view showing four or five sailing boats
              in the foreground, and a series of buoys strung out between
              them.</figDesc>
              <note>The inclusion of this image in the text gives us some clues
              about mood the author is trying to create.</note>
            </figure>

Suggested changes

It is proposed that the TEI allows both inline inclusion and external linking of SVG and SMIL in a TEI document. Of particular importance is the ability to encode metadata within the svg element - especially if the committee accepts the recomendation to do away with external entity declarations.

Using inline SVG, the most conservative way to encode the example given above would be as follows.

              <figure><svg xmlns="http://www/w3/org/2000/svg"
              xmlns:xlink="http://www.w3.org/1999/xlink">
              <image xlink:href="http://www.tei-c.org/......................./pullman.jpg"/>
            </svg>
              <head>Figure One: The View from the Bridge</head>
              <figDesc>A <emph>Whistleresque</emph> view showing four or five sailing boats
              in the foreground, and a series of buoys strung out between
              them.</figDesc>
              <note>The inclusion of this image in the text gives us some clues
              about mood the author is trying to create.</note>
            </figure>
              

Alternatively the user could mark up the same example within the SVG tags. This gives greater flexibility, and has the advantage of keeping all image-related content within a single element. Note that the SVG may include tags from the TEI namespace in the <svg:title> and <svg:desc> tags. For these reasons we propose that this be recommended as the standard approach.

              <svg xmlns="http://www/w3/org/2000/svg"
              xmlns:xlink="http://www.w3.org/1999/xlink">
              <image xlink:href="http://www.tei-c.org/......................./pullman.jpg"/>
              <title>The View from the Bridge</title>
              <desc><tei:head>Figure One: The View from the Bridge</tei:head><tei:figDesc>A <tei:emph>Whistleresque</tei:emph> view showing four or five sailing boats
              in the foreground, and a series of buoys strung out between
              them.<tei:figDesc>
                      <tei:note>The inclusion of this image in the text gives us some clues
              about mood the author is trying to create.</tei:note>

              </desc>
              <metadata>
              ...............  allows inclusion of markup from other namespaces - eg Dublin Core metadata marked up as RDF
            </metadata>
            </svg>

Where the graphic itself contains large amounts of text, perhaps with a complex structure, and perhaps difficult to distinguish from the graphic, the encoder should choose whether to regard the graphic as containing the text or to regard the enclosed text as being a separate division of the <text> element in which the graphic appears. In the first case, a <text> element may be included within the <svg> element - providing complete control over the position, angle, font etc., at the expense of some additional effort in encoding the text (for example an svg text element does not automatically word wrap). In the latter case, an appropriate divn class element may be used for the text represented within the graphic, and the <svg> element embedded within it. The choice will depend to a large degree on the encoder's understanding of the relationship between the graphic and the surrounding text.

Commentary and caption can often describe part of the image, rather than the whole of it. It can also be useful to point to sections of the image from the body of the text.

Linking from image to text

The mechanism for linking from an image to text (analogous to an image map in html) will probably only rarely be used for marking up TEI documents. However, since it is the easiest to understand, it is a convenient place to start. We first wrap the image in an svg document:
<svg xmlns="http://www/w3/org/2000/svg"
     xmlns:xlink="http://www.w3.org/1999/xlink">
  <image xlink:href="http://www.bioimage.org/f21.jpg"/>
</svg>
An area within the JPEG image can then be described in the svg document by defining a polygon.

                <polygon id="areaofinterest"
                points="50,50, 50,100 100,100 100,50"
                style="stroke: white; fill:none"/>

              </svg>

                
SVG provides an <a> element, analogous to HTML's <a> element, to indicate links. SVG uses XLink ([XLink]) for all link definitions.
                <a xlink:href="http://www.w3.org">
                <polygon points="5,5, 45,45 5,45 45,5"
                style="stroke: red; fill:none"/>
              </a>

The remote resource (the destination for the link) is defined by a URI specified by the XLink href attribute on the <a> element. The remote resource may be any Web resource (e.g., an image, a video clip, a sound bite, a program, another SVG document, an HTML document, an element within the current document, an element within a different document, etc.).

Linking from Text to Image

SVG allows users to identify areas of an image in a three different ways.
  • Shorthand bare name form of addressing (e.g., MyDrawing.svg#MyView). This form of addressing, which allows addressing an SVG element by its ID, is compatible with the fragment addressing mechanism for older versions of HTML and the shorthand bare name formulation in "XML Pointer Language (XPointer)" [XPTR]. (The bare name form of addressing #MyElement is equivalent to the XPointer formulation #xpointer(id('MyView')).)
  • XPointer-compatible ID reference (e.g., MyDrawing.svg#xpointer(id('MyView'))). This form of addressing, which also allows addressing an SVG element by its ID, is compatible with "XML Pointer Language (XPointer)" [XPTR] syntax and the XPath syntax for referencing IDs.
  • Users already familiar with SVG may also find it useful to specify the view of an image that is to be rendered. For example, a document discussing a painting might have an image of the painting itself, and a number of smaller images throughout the document drawing attention to particular details. This can be achieved without storing and downloading multiple representations of the same image using the SVG view specification (e.g., MyDrawing.svg#svgView(viewBox(0,200,1000,1000))). This form of addressing specifies the desired view of the document (e.g., the region of the document to view, the initial zoom level) completely within the SVG fragment specification. The contents of the SVG view specification are the five parameter specifications, viewBox(...), preserveAspectRatio(...), transform(...), zoomAndPan(...) and viewTarget(...), whose parameters have the same meaning as the corresponding attributes on a 'view' element, or, in the case of transform(...), the same meaning as the corresponding attribute has on a 'g' element).
Given the document fragment:
                <svg xmlns="http://www/w3/org/2000/svg"
                xmlns:xlink="http://www.w3.org/1999/xlink">
                <image xlink:href="http://www.bioimage.org/f21.jpg"/>
                <polygon id="areaofinterest"
                points="5,5, 45,45 5,45 45,5"
                style="stroke: red; fill:none"/>
              </svg>
We could point to the area bounded by the polygon with:
svgsample.svg#xpointer(id('areaofinterest')))

The following is an image of what such an SVG document would look like. If you are browsing this with an SVG-enabled system, click on the image to render the actual SVG in a separate document

Linking from image to image

Documents often use a small 'thumbnail' version of an image to reference a larger, higher resolution version of the same image. This may be done either to avoid the bandwidth cost of downloading the high-resolution version, or for layout reasons. In the former case, there will be two files — in the latter case there may only be one file, with the thumbnail being presented as a donwsized version of the high-resolution image. In TEI terms, one image acts as a reference to the other.

In the case where there are two images we can embed a reference to the image using the simple <ref> element discussed in 6.6 Simple Links and Cross References
<ref target="svg:bigImage">Click here <svg id='smallImage' xmlns="http://www/w3/org/2000/svg"
              xmlns:xlink="http://www.w3.org/1999/xlink">
              <image xlink:href="http://www.tei-c.org/......................./pullman_small.jpg"/></svg>
                for enlightenment</ref>
                <!-- ... -->
                <!-- elsewhere in the document -->
                <svg id='bigImage' xmlns="http://www/w3/org/2000/svg"
              xmlns:xlink="http://www.w3.org/1999/xlink">
              <image xlink:href="http://www.tei-c.org/......................./pullman_big.jpg"/></svg>
                <!-- other figures here -->

Time variant media — SMIL

While SVG provides a convenient way of linking static images to text, it is not intended to handle time-variant media (audio and video). The relevant standard here is SMIL — the Synchronized Multimedia Integration Language (pronounced 'smile'). Where possible, SMIL linking constructs have the same names as constructs from XLink [XLINK]. However, the SMIL linking attributes are distinct from the XLink constructs and are part of a separate namespace. Using SMIL's modularization mechanism, these constructs are not in the XLink namespace but in the namespaces defined in the SMIL 2.0 specification.

The SMIL 2.0 specification allows but does not require that user agents be able to process XPointers in SMIL 2.0 URI attribute values

SMIL profiles may use XML Base [XMLBase].

Linking from Multimedia to Text

It is currently quite unusual to find links from areas of multimedia objects to text. Suffice to say it is possible to create such links in SMIL if required, using the <area> element ( http://www.w3.org/TR/smil20/extended-linking.html#edef-area ) which is analogous to the HTML area element, and uses a coords attribute to define the area.

SMIL area elements may contain a shape attribute with the values default|rect|circle|poly — for example:
<a xlink:href="http://www.w3.org">
                <ellipse cx="2.5" cy="1.5" rx="2" ry="1" fill="red" /></a>
defines a link from a red elipse to the W3C site.

Linking from Text to Multimedia

Links from text to a segment of multimedia have many uses since they make it possible to link annotation or critical analysis to a precise segment and section of a multimedia presentation. In SMIL 2.0 this is done through the Linking Modules — which support name fragment identifiers and the ‘#’ connector. The fragment part is an id value that identifies one of the elements within the referenced SMIL document. With this construct, SMIL 2.0 supports locators as currently used in HTML (that is, it uses locators of the form http://www.example.org/some/path#anchor1), with the difference that the values are of unique identifiers and not the values of name attributes. Of course, this type of link can only target elements that have an attribute of type ID.

Links using fragment identifiers enable authors to encode links to a SMIL 2.0 presentation at the start time of a particular element rather than at the beginning of its presentation. If a link containing a fragment part is followed, the presentation should start as if the user had fast-forwarded the presentation represented by the destination document to the effective begin of the element designated by the fragment.

1

Linking from Still Image to Multimedia

Linking from a still image to a section of a multimedia presentation is a frequently used indexing method for video — analagous to the use of thumbnails for static images discussed above. Authors can link from a still image to a segment of a video using name fragment identifiers and the '#' connector. The still images can be placed in a SVG wrapper, and define the link using the xlink syntax described above.

Worked example - encoding the Comenius example from P4 as SVG

To demonstrate the use of SVG to link between image and text, we consider how we might encode the alignments in an extract from Comenius' Orbis Sensualium Pictus. Each topic covered in this work has three parts: a picture, a prose text in Latin describing the topic, and a carefully-aligned translation of the Latin into English, German or some other vernacular. Key terms in the two texts are typographically distinct, and are linked to the picture by numbers, which appear in the two texts and within the picture as well.

First, we present the text portions. The English and Latin portions have been encoded as distinct <div> elements. Identifiers have been attached to each typographic line, but no other encoding added, to simplify the example.

<!-- English text --> <div id="e98" lang="en" type="lesson">
              <head>The Study</head>
              <p>
              <seg id="e9801">The Study</seg>
              <seg id="e9802">is a place</seg>
              <seg id="e9803">where a Student,</seg>
              <seg id="e9804">a part from men,</seg>
              <seg id="e9805">sitteth alone,</seg>
              <seg id="e9806">addicted to his Studies,</seg>
              <seg id="e9807">whilst he readeth</seg>
              <seg id="e9808">Books,</seg>
              <!-- ... -->
            </p>
            </div>
              <!-- Latin text -->
              <div id="l98" lang="la" type="lesson">
              <head>Mus&eacute;um</head>
              <p>
              <seg id="l9801">Museum</seg>
              <seg id="l9802">est locus</seg>
              <seg id="l9803">ubi Studiosus,</seg>
              <seg id="l9804">secretus ab hominibus,</seg>
              <seg id="l9805">solus sedet,</seg>
              <seg id="l9806">Studiis deditus,</seg>
              <seg id="l9807">dum lectitat</seg>
              <seg id="l9808">Libros,</seg>
              <!-- ... -->
            </p> </div>
Next we assume that we have a digitized image of the picture itself called compic.png. We can wrap this image in as an SVG document and address either the whole image, or sections of the image using the rect element. In this example we point to two portions of the image, one containing the picture of a student and the other of a book, as follows:
              <svg xmlns="http://www/w3/org/2000/svg"
              xmlns:xlink="http://www.w3.org/1999/xlink">
              <image id="p981"
              xlink:href="http://www.tei-c.org/Guidelines/Figures/compic.png"/>
              <rect id="p982"
              x="75" y="5"
              width="58" height="70"
              style="stroke: red; fill:none"/>
              <rect id="p983"
              x="55" y="42"
              width="35" height="18"
              style="stroke: red; fill:none"/>
            </svg>
Note that each <rect> element has its own unique identifier.

As printed, the text exhibits three kinds of alignment. The English and Latin portions are printed in two parallel columns, with corresponding phrases, (represented above by <seg> elements), more or less next to each other. Particular words or phrases are marked as terms in the two languages by a change of rendition: the English text, which otherwise uses black letter type throughout, has the words ‘The Study’, ‘a Student’, ‘Studies’, and ‘Books’ in a roman font; in the Latin text, which is printed in roman, the corresponding words (‘Museum’, ‘Studiosus’, ‘Studiis’, and ‘Libros’) are all in italic. Numbered labels appear within the text portions, linking keywords to each other and to sections of the picture. These labels, which have been left out of the above encoding, are attached to the first third and last segment in each language quoted below, and also appear (rather indistinctly) within the picture itself. If it is desired to transcribe them in the text, they might be encoded using as <ref> elements, <anchor> elements, or <xptr> s to the picture; the number itself would be transcribed as the value of the n attribute (or as the content of the <ref> ).

The first kind of alignment might be represented by using the corresp attribute on the <seg> element. The second kind might be represented by using the <gloss> and <term> mechanism described in section 6.3.4 Terms, Glosses, and Cited Words. The third kind of alignment is represented using pointers embedded within the texts. This allows us to use SVG to clearly mark the area being referenced. We can now link from either the latin word 'Museum' or the english 'Study' to the appropriate area in the image:
<link href="e9801 l9801 compic.svg#p981"/>
Notes
1.

It's perhaps worth noting that SMIL is also appropriate for linking transcriptions of speech to the recorded multimedia object. Some more examples of this would be useful.


Last recorded change to this page: 2007-09-16  •  For corrections or updates, contact webmaster AT tei-c DOT org