Notes on Packing TEI files C. M. Sperberg-McQueen, 5 Nov 92 a propos of the problem of packing TAG files 1 What We Need We need a poor-folk's version of SDIF. The TEIPack Packer should: - parse the document, recognizing entity declarations and entity references (and expanding all entity references, both general and parameter) - note the system DTD file - note which entities are external entities - note which external entities are referred to in the course of the document - pack all the system entities needed for the document into a single data stream Packing can occur in the same pass as parsing, or in a second pass, possibly with a separate program. In Phase 2, we can add function to the same processor or a separate filter to read WSDs for the local character set and transmission character set, recognize characters not in the transmission character set and pack them using the appropriate entity references. Such a character-set packer need not be used just for interchange; it could be useful to perform such a transformation on local files. The TEIUnPack unpacker should: - read the packed file and unpack each distinct system entity, where possible giving it the same name and directory location as the source In Phase 2, the unpacker should read the WSDs for the local and transmission character sets and expand entity references appropriately. Like the character-set packer, the character-set unpacker might be useful locally, not just in connection with interchange. The central problems are: delimiting the original system entities in a foolproof way, and recording relevant information about the system entities (such as their names and locations) so they can be unpacked appropriately. 2 Proposal for Getting It 2.1 The Packer For the Packer, we use a two-pass operation. In the first pass, we create a file called a packing-list file, which looks like this: #include "P2PRDRIV.ODD" #include "c:\tei\odd\tiny.dtd" #include "p2co.odd" etc. ... etc. Each entity is preceded by two comments giving its entity name and its "entity text", and followed by a comment indicating an entity end. The file being processed has no explicit entity name and neither does the base DTD; they get the names '*SGMLDOC' and '*DTDENT' as in ArcSGML. Everything else gets the name in the entity declaration. In this packing list, the entity itself is represented by an #include line indicating the file name to be included in the packed file; the second pass uses the existing SPP.SPT program to read the packing list and produce the packed file. As long as no input file contains the string '