Response to Vendor Questions about the TEI RFP
- What have been the minimum and maximum lot sizes aggregated in the past?
- What is the expected volume on monthly, quarterly, or annual bases?
- In what way are the sample documents intended to be representative? In type? age? quality? country of origin?
- What is the average page size measured in kb?
- What will be the ratio of books vs. serials vs. newspapers vs. manuscripts?
- What will be the ratio of contemporary vs. older documents (i.e. modern vs. older, unusual, or obsolete handwriting and typefaces) ?
- What will be the ratio of printed vs. handwritten vs. microfilm?
- What will be the ratio of materials in Non-Western vs. Western character sets?
- What is the likelihood of receiving materials in languages other than English?
- What are the minimum standards for validation and character encoding?
- What is the current approach on aggregation and collection of the source material followed by the TEI?
- How does TEI envision the management of the subscription to the sites where materials for aggregation will be sourced from? Is this something that vendors should take into the costing consideration or something that will be taken care of by TEI members?
- Will the vendor be coordinating directly with the members or will TEI function as the middle-man and notify the vendor once there are work requests from their members?
- What administrative support will TEI provide to its members in relation to this effort? Will contracts (Services Agreements, Statements of Work) and invoicing/billing go through a single point at TEI or will they be negotiated with each member?
- Will the TEI be the single point of contact for all projects coming from different consortium members, or will each TEI member have a project manager to coordinate with the vendor?
- As the materials aggregated are raw source, are we expected to do the inventory?
We believe that we have incorporated answers to all of the questions submitted to us. If a specific question submitted by a vendor does not appear explicitly in this list, it may appear in the answers, which we have tried to write quite broadly. If you need additional help please contact us at email@example.com.
We would like to thank the vendors for their very constructive and useful questions and David Sewell, Paul Schaffner, and Michelle Dalmau for their help on this and other aspects of the RFP.
Volume and Demand
In considering volume and demand, the most import thing to keep in mind is that a goal of this benefit is to create a new market from members who have not in the past considered outsourcing digitization and new business from members who currently outsource digitization but would do more if an appropriate program existed. In this sense (as with questions pertaining to the mix of document types) past and current practice, where known, may provide only a moderately accurate gauge of future demand. If this program works as we envision, we would expect it to increase the volume of outsourced digitization projects conducted by our members and bring in new members to the organisation. Indeed 75% of the respondents in our survey indicated that they would outsource more material if they could do so at an appropriate price.
The TEI has not offered an aggregation or digitization service in the past, so the information that is available to answer these questions comes from a survey of the community (both current and prospective members) we conducted last year: a copy of these survey results is available at the following URL: http://www.tei-c.org/Admin/Survey.pdf. The following answers will refer to this survey.
In interpreting these results, we are assuming that the survey is representative of our membership base. Approximately half the respondents, however, are not currently members. This suggests that interest in the benefit is wider than our current membership of approximately 84 paid-up members, ranging from large institutions to small projects (this number fluctuates in the course of the year due to resignations, new memberships, and late renewals).
See Survey Question (SQ) 5, 7, and 10. According to SQ7, half the respondents reported that they currently outsource digitization. SQ5 indicates that respondents digitize between zero and 8 million pages a year. On this basis, our best estimate wold be of an average demand is 176,000 pages per member.
One the other hand, however, in answer to SQ 10, those respondents who already outsource their digitization report that an average of only 40,000 pages a year.
Taking these two sets of numbers together, we end up with the following figures: if the benefit attracts no new members, we would expect about half our members to take part in the program and these members to digitize between approximately 1.68 (SQ10) and 7.4 (SQ5) million pages per year (0.42-1.85 million pages per quarter; 140,000-616,000 pages per month).
These figures can only be estimates, of course. In evaluating them, it is important to realize both that the purpose of this RFP is to attract new members to the TEI (and the benefit will only be of interest to those who have digitization work to contract) and that price has an extremely strong effect on estimated volumes (see SQ 11 and 12): according to the survey, 75% would digitize more if the price were lower. While this RFP has been ongoing, indeed, discussion on our various mailing lists has suggested that some members might even use the service for far more casual work: throwing small batches of non-time sensitive material into the queue rather than hiring students to do this transcription.
Material and Encoding
In terms of material, our survey focussed on the kind of material members (or prospective members) needed to digitize. In building on this information, vendors have extended these questions primarily to ask about the specific breakdown among the different types. Unfortunately, since we do not have past experience to go on and the goal of this programe is to create a market where one previously had not existed, this information is illusive. While we do our best in the answers that follow to recover information about the breakdown in our members' material, we structured the RFP in such a way as to allow vendors to consider offering tiered pricing for different kinds of materials.
On the basis of informal talks with likely customers of this benefit, we have discovered that members do not consider it unreasonable for vendors to offer the most advantageous prices for the simplest digitization jobs (contemporary print fitting vendor specified technical specifications), and more expensive tiers for more difficult jobs. They are also willing to accept special prices (whether negotiated or fixed as part of this benefit) for "extras" such as vendor-supplied scanning services or rush jobs. And they are willing to be flexible in terms of turnaround time, especially if this keeps price low. Where our members are uncompromising, however, is quality: as primarily academic projects and institutions, they generally cannot accept anything but the highest levels of accuracy and highest-quality encoding.
In short, our members are extremely price and quality conscious. They are less concerned about turnaround time (provided this remains reasonable) and they are willing accept a benefit that offers straightforward pricing for the easiest material even if pricing for complex material or custom jobs is done on a per case or negotiated basis, provided this negotiated price is also discounted from the rate they would have been able to achieve outside the program. Because a significant number of our customers do digitize manuscripts and non-Western material, however, a vendor that was able to incorporte these types of material into a tiered pricing plan would have a considerable advantage. But we are relatively sure that in the case of manuscript or non-Western material, our members would be able to accept some kind of predictable discount off a negotiated price instead of a fixed per page or per kilobyte charge.
Another important factor to keep in mind is that many of our current and potential members under this benefit already outsource digitization. In order to be competitive with these members' current vendors, the pricing of this benefit needs to be as low as possible. In order to achieve this cost saving, we believe that it is acceptable to require members (or the TEI as a whole) to accumulate a certain amount of material before a job is run, to submit material to the vendor in a specified format (e.g. 150 dpi tiff format images), and accept a standard level of detail in their XML encoding. We have issued this RFP on the assumption that we can find enough commonality among our membership to allow a vendor who is able to be creative to attract a considerable base of potential customers. Because all the aggregated jobs will use the same tightly-defined schema and other parameters such as image type and quality, moreover, we also expect that the customer base should be easier to process in aggregate than they would on a piecemeal basis.
In what way are the sample documents intended to be representative? In type? age? quality? country of origin?
The samples are only intended to represent the range of material that might be submitted by TEI members, in terms of character sets, typefaces, languages, source material, etc. More than that should not be read into their choice: for example, the fact that a newspaper sample happens to be British does not mean that we think the bulk of newspaper material submitted will be British—it just means we think some members will want to digitize newspapers. After a shortlist of vendors has been made, new samples will be issued for trial encoding.
We estimate that this will vary quite broadly depending on type of material, and that vendors' own experience across a large range of material would be as good a guide as any to the likely average. (One member, looking at a cross-section of materials, found an average 3,500 characters per page; we don't know how representative this is.)
About half the respondents to SQ19 in our survey report needing to digitize serials (53%) and just under half reported needing to digitise newspapers (42%). This, of course, does not mean that half the material TEI members submit will be newspapers and the other half will be serials, but it probably indicates that serials and newspapers might make up a significant percentage of the total. On manuscripts, see below.
What will be the ratio of contemporary vs. older documents (i.e. modern vs. older, unusual, or obsolete handwriting and typefaces)
Approximately half the survey respondents to SQ15 in our survey report that they digitize material that is produced before the 19th century. We do not have a breakdown specifically by typeface. However, the dating information should be fairly predicative for questions about script and typeface.
SQ 14 in our survey shows that 78.3% digitize print documents, 56.6% digitize handwritten documents, and 27.7% microform. We do not have a breakdown of type per member.
It is important to remember that this and other breakdowns refer to current practice and are not correlated to price. A goal of this benefit is to create new business. Given the price and quality sensitivity of our membership, vendors might well find that participants in the program increase the amounts of material they outsource for digitization in some categories and reduce it in others. Thus members might outsource more digitization of more straightforward material such as modern western print materials and use the internal capacity they are thus able to free up to focus on the in-house digitization of more difficult material such as manuscripts. Vendors should be prepared to offer discounted pricing on the entire range of material our members digitize. But our members recognise that some material may be easier and cheaper to outsource than others. We suspect that manuscript encoding is the type of encoding that members will be the most willing to do in-house or on a negotiated basis due to the expertise required.
Approximately half the respondents to SQ17 in our survey report digitizing materials in non-Roman character sets.
Significantly more than half (68.7%) of the respondents to SQ 18 in our survey report digitizing materials in languages other than English. Vendors should consider it highly likely that they will get non-trivial amounts of material in languages other than English.
As a precondition of participation in this program, members will agree to use the TEI Tite XML tagset. This is a customization of more general TEI P5 Guidelines focussed on representing common document structures and minimizing keystrokes. It is based on the current practice of several large libraries and digitization projects in the United States. Neither members nor vendors participating in this benefit in its standard form should request changes to the schema, although vendors and customers are free to negotiate additional "add on" services including additional or custom extensions.
Vendors should expect to encode texts submitted as part of this program in XML that can be validated against both the TEI Tite DTD (to test valid structure and to add the proper namespaces) and its RELAX NG schema (for its additional datatype checks).
Character encoding must be Unicode. We are willing to work with vendor preferences within this broad rubric.
Management and Administration
This proposed program is an attempt to channel latent demand for digitization from (primarily) academic organisations towards a preferred vendor in order to improve the membership base of the TEI as a non-profit society. We expect that this will both create significant new business for the preferred vendor and provide the TEI with a valuable benefit to attract new members.
In preparing bids vendors should take into consideration the nature of the TEI as an organization. The primary business of the TEI is the production and maintenance of its Guidelines . Its administration and staff are almost entirely volunteer and made up for the most part of senior academics and academic administrators at major research universities in North America and Europe. In other words, most day-to-day administration of the TEI is carried out by professionals alongside their demanding regular work. These volunteers are also commonly members of other professional organizations and boards and well known in their fields (primarily Library and Information Science, Government agencies, and the Scientific and Humanities research communities). This means that a vendor offering a well-crafted and smoothly-run benefit has the opportunity to become widely known in the broader community in which the TEI operates as well as within the organization itself. But this also means that the TEI itself has limited resources available for hands-on administration. The most attractive proposals will automate much of the administration or alternatively offer such attractive terms that the likely increase in membership would make hiring special administrative staff by the TEI cost-effective.
What is the current approach on aggregation and collection of the source material followed by the TEI?
The TEI currently does not aggregate or coordinate the digitization of source material for its members. This is a new process and a new benefit.
This RFP is part of a series of new initiatives the TEI is pursuing in improving the value received by its member institutions. For example, the TEI is currently implementing a new Zencart-powered shopping cart system which will be used for most aspects of member management, book sales, and conference and workshop registration. The goal of this and other initiatives is to reduce the labor involved in TEI business processes, while improving member access. Responses to this RFP should align with the goals of this organizational effort, although it is important to stress that the TEI is not requiring vendors to design for any particular software or service platform.
In other words, we are open to creative ideas that will help us deliver this benefit to our members in a way that keeps our administrative overhead low. This means that there is considerable scope for creativity on the part of vendors in proposing how the program should be administered. The TEI has not previously operated such a benefit, so no prior system needs to be taken into account. The Zencart system is in the development stage and can easily be adapted to suit the needs of vendors should they wish to work with it; but we have no particular requirement that any proposed system work with our shopping cart (contact firstname.lastname@example.org if you require access to our shopping cart test server). We can run the program under the branding of tei-c.org and its (forthcoming) webstore, or from a URL of the vendor's choosing. Since TEI does not have full-time staff, we believe it would be good to have a considerable part of these transactions be self-service for TEI members (with checks to ensure that the benefit is realised by members only); but we are also open to proposals that involve more hands-on administration if it seems likely that we would be able to recover the administrative costs through the increased membership.
How does TEI envision the management of the subscription to the sites where materials for aggregation will be sourced from? Is this something that vendors should take into the costing consideration or something that will be taken care of by TEI members?
At a minimum, vendors should expect the TEI to be able to provide them with real-time information about participants' membership status. For example, members could be given a code indicating the status and duration of their membership. They could then enter this code when purchasing discounted products and services. This code could be used either within the shopping cart system (as will be the case with our conference registration and print-on-demand publications) or outside the shopping cart system at a vendor's own site (as is currently the case for print-on-demand publications sold for us by Omnipress and will will be the case with some remaindered publications to be sold for us by the University of Virginia Press). It may also be possible to share access to our membership database provided suitable privacy protections are maintained. We are also open to other suggestions.
As far as possible, vendor proposals should avoid systems that require TEI personnel to play an active role in its day-to-day operations, though we realize that there will be a point at which significant price differences may hinge on the ability and willingness of TEI to assume some administrative responsibility. We invite vendors to indicate those points and consequent alternative pricing: since lower costs will lead to a more successful program, a very successful program might increase our membership to the point where we could afford to pay for project management personnel.
Will contracts (Services Agreements, Statements of Work) and invoicing/billing go through a single point at TEI or will they be negotiated with each member?
Our preference is that invoicing and billing should be communicated to individual members. It is possible, however, in cases of pricing that allows the TEI to assume more administrative responsibility (see above) or strong integration with the Zencart shopping cart system that the TEI could invoice and collect from members for the vendor. We are currently planning on doing this with a print-on-demand publisher whose book we will begin selling from our shopping cart once it is launched (scheduled for later this month pending final board approval).
Will the TEI be the single point of contact for all projects coming from different consortium members, or will each TEI member have a project manager to coordinate with the vendor?
Most issues having to do with digitization requests by individual members will be discussed with representatives of the individual members themselves, or handled through a self-service system, if they are not explicitly negotiated as part of this RFP. Such issues might include time-to-delivery for specific jobs (if that falls outside of negotiated parameters), billing, initial quality assurance questions, and additional costs for services not included in the basic member benefit (e.g. scanning or the conversion of legacy file formats). Vendors can expect to deal with the TEI on programmatic issues (e.g., needed adjustments to the benefit), and in case of disputes that cannot be resolved between the vendor and a participating member.
Thus, as part of this RFP process, we expect to develop a standard workflow, contract, and pricing schedule to which participating members would be expected to adhere as a condition of their participation. But we would also expect the vendor to deal directly with the participating members with regard to the encoding and delivery of digitized material, unless price differentials in the benefit are such that we think we would be able to hire project management personnel as intermediaries.
Regardless of the final management system of the benefit, the nature of our members' material and professional focus is such that they will invariably have project managers overseeing their own digitization work. Except in the most straighforward cases, some contact with these managers may prove common.
The precise nature of inventory control will depend on the nature of the proposed management system. This might involve automatically assigning file IDs, DOIs, or other identifiers, identifying files by member ID and batch/file number, or anything else the vendor considers advantageous and cost effective. Unless specifically requested by the vendor as a means of reducing costs, we expect that metadata which is usually stored in the TEI Header will be managed by TEI members outside of the processes entailed in this RFP. The TEI Tite schema in fact omits the TEI Header in order to streamline this processing. We are, however, happy to work with a vendor who would prefer to use the TEI header for inventory control.