Formats
Goals for electronic transcription formats include:
- Non-proprietary file format. Files must be accessible without buying proprietary software. Data will be stored in plain text files, using open, internationally recognized standards for character encoding.
- Human-readable markup. Markup -- a set of annotations to text that describe how it is to be structured, laid out, or formatted -- should be accessible to users.
- Compliant with widely-used standards. Use of standards facilitates interoperability and longevity
- Platform independent. The file formats used should be usable on as wide a variety of hardware platforms as possible
- Backwards-compatible. File formats chosen given due consideration of older, entrenched formats that may still be widely used. Files should not be incompatible with these formats without good reason.
- Incorporate meta-data in standard formats. Meta-data is data describing the file itself, for example, bibliographical data. This data should be incorporated into the file, so that is doesn't become separated from it. Standard formats should be used so
that it may be automatically parsed by third-party software.
- Accessible to online search and retrieval engines. File and meta-data formats used should facilitate visibility by internet search engines and automatic library catalogs.
- Accessible to non-technical users, using commonly available web browsers
- Convenient to download to local disk or printer - so users can more easily read materials off-line
- Single format - don't want to have to maintain a given etext in multiple formats - may diverge if corrections are made.
In furtherance of these goals, text materials will be maintained as XHTML1.1 -compliant files. Metadata will be embedded in the files, using the Dublin Core standard. CSS stylesheets will be used to format documents for screen and print. Images will be stored in jpg format.
Supplements markup may be added to the files, as time permits. This markup would indicate things like names, places, dates, events, page numbers, or chapter numbers. This supplemental markup could eventually be used, in conjunction with special-purpose scripts, to perform more complex analyses or cross-referencing of the materials.
Since this project began in the mid 1990's, before WWW standards were mature, many of the earlier files are still in older formats. An on-going (but low-priority) effort is being made to update these files.
Because users bookmark and reference files in this website, the URLs will be kept stable, and will not be changed without good reason. If a change is necessary, symbolic links will be used so that old URLs will continue to work for a period of time to allow references and links to be updated.