e-text (from " electronics text"; sometimes written as etext) is a general term for any document that is read in digital data, and especially a document that is mainly text. For example, a computer-based book of art with minimal text, or a set of or Book scanning, would not usually be called an "e-text". An e-text may be a Binary file or a plain text file, viewed with any open source or proprietary software. An e-text may have markup language or other Formatted text information, or not. An e-text may be an electronic edition of a work originally composed or published in other media, or may be created in electronic form originally. The term is usually synonymous with e-book.
First, this narrow type of "e-text" is limited to the English letters. Not even Spanish ñ or the accented vowels used in many European languages cannot be represented (unless awkwardly and ambiguously as "~n" "a'"). Asian, Slavic, Greek, and other writing systems are impossible.
Second, diagrams and pictures cannot be accommodated, and many books have at least some such material; often it is essential to the book.
Third, "e-texts" in this narrow sense have no reliable way to distinguish "the text" from other things that occur in a work. For example, page numbers, , and might be omitted, or might simply appear as additional lines of text, perhaps with blank lines before and after (or not). An ornate separator line might be represented instead by a line of asterisks (or not). Chapter and sections titles, likewise, are just additional lines of text: they might be detectable by capitalization if they were all caps in the original (or not). Even to discover what conventions (if any) were used, makes each book a new research or reverse-engineering project.
In consequence of this, such texts cannot be reliably re-formatted. A program cannot reliably tell where footnotes, headers or footers are, or perhaps even paragraphs, so it cannot re-arrange the text, for example to fit a narrower screen, or read it aloud for the visually impaired. Programs might apply heuristics to guess at the structure, but this can easily fail.
Fourth, and a perhaps surprisingly important issue, a "plain-text" e-text affords no way to represent information about the work. For example, is it the first or the tenth edition? Who prepared it, and what rights do they reserve or grant to others? Is this the raw version straight off a scanner, or has it been proofread and corrected? Metadata relating to the text is sometimes included with an e-text, but there is by this definition no way to say whether or where it is preset. At best, the text of the title page might be included (or not), perhaps with centering imitated by indentation.
Fifth, texts with more complicated information cannot really be handled at all. A bilingual edition, or a critical edition with footnotes, commentary, critical apparatus, cross-references, or even the simplest tables. This leads to endless practical problems: for example, if the computer cannot reliably distinguish footnotes, it cannot find a phrase that a footnote interrupts.
Even raw scanner OCR output usually produces more information than this, such as the use of bold and italic. If this information is not kept, it is expensive and time-consuming to reconstruct it; more sophisticated information such as what edition you have, may not be recoverable at all.
If actuality, even "plain text" uses some kind of "markup"—usually control characters, spaces, tabs, and the like: Spaces between words; two returns and 5 spaces for paragraph. The main difference from more formal markup is that "plain texts" use implicit, usually undocumented conventions, which are therefore inconsistent and difficult to recognize.
The narrow sense of e-text as "plain vanilla ASCII" has fallen out of favor. Nevertheless, many such texts are freely available on the Web, perhaps as much because they are easily produced as because of any purported portability advantage. For many years Project Gutenberg strongly favored this model of text, but with time, has begun to develop and distribute more capable forms such as HTML.
|
|