These terms have very specific meanings in the CITE architecture. A “text” is…
…a collection of leaf-nodes consisting of character-data and well-formed XML markup,Or, in slightly more general terms, something is a text if it consists of language, if you can cite it as a whole unambiguously, if you can cite its parts unabiguously, and if it is intended to be read in a sequence.
…each leaf node having a specified place in a sequence, and
…a specified position in a citation-hierarchy at least one level deep,
…and the collection as a whole taking its place in an ontological hiearchy of text-group, work, edition/translation.
A “collection” is…
…a group of data-objects each consisting of one or more fields, but each object having the same fields,Or, in other words, a dictionary is a “collection”, as is a database of plant-records, a telephone directory, and the digital representations of the folios of a book.
…which objects may be in a sequence (an ordered collection), or not (an unordered collection),
…having no citation-hierarchy beyond the object’s identifier, although individual field may be addressed.
Clearly there is some possible overlap and ambiguity. It would be possible to treat a sonnet as an “ordered collection of poetic lines”; it would be possible to deliver a telephone directory as a “text with a one-level deep citation-hiearchy”. You can argue whether a telephone directory or a dictionary are “ordered” or “unordered” collections—is alphabetical order inherent? Or merely convenient? In the case of the sonnet and phone-book, some reflection on the principal mode of interaction resolves the question: a sonnet is intended to be read (even if an analysis of a sonnet might pull out and quote particular lines), and a phone book is intended for random access (even if we occasionally read down a column to find just which John Smith we want to all).
In the case of Horti Sicci 212 and 232, the herbarium volumes of Mark Catesby that Amy and Patrick have begun recording, and which will be the first volumes fro the Sloane to go online, we faced a trickier question: Attached to each specimen are one or more labels; these labels fall into discrete categories (as Amy described in the previous post); clearly all the labels by Hans Sloane are one set; all the copperplate labels another; those by Richard Howard yet another set. Is each of these sets a “text” or a “collection”?
My first instinct was “text”. We are entirely indebted to the Homer Multitext for our vision of a digital herbarium, and the tools with which to create one. The HMT offers what would seem to be a very clear parallel: the scholia to the poetic text of the Iliad on Byzantine manuscripts. These are marginal notes on the text; there are different categories of notes, and the editors of the HMT have successfully treated each category as a separate “text”. (The full online catalogue of those texts is here). It seemed to make sense to follow this established standard, which the students at Holy Cross, Furman University, and the University of Houston have used to make significant discoveries about the history of Greek epic poetry.
The Venetus A Manuscript of the Iliad: One folio, many texts |
I talked through this question with Neel Smith of the Collect of the Holy Cross, whose instinct about the nature of knowledge and its digital avatars is unerring. And I now think the collected labels are not texts, but collections. Here’s why. Each scholion has a place in a sequence only because it refers to a specific part of the poem on which it comments. It is worth reading the scholia in order only because we read the Iliad in order. The labels do not comment on a text, but on what is clearly a collection—the collection of Catesby’s specimens—and an undered collection at that.
So it does not seem that we do violence to the labels by considering them member of collections. At this point, we can set aside ontological rigor and ask questions of convenience.
Catesby, H.S. 212, page 4: One folio, many collections |
What advantages did the editors of the Homer Multitext gain from decreeing the scholia to be Texts rather than Collections? Internal markup, more than anything. The scholia are in Greek; they are argumentative; they discuss linguistic features of the poem, geographical places, personal names, and often quote from other literature. All of these are features that are inherent to the content of the scholia, and varied enough to justify internal markup of an XML text. While our botanical labels offer some of these, each is short enough, and likely to contain at most only a few external references (place-names, cross-references, vel sim.) that we can better capture that information through indexing.
So this is simply one example of the kind of thinking that occupies us as we plan and proceed to publish these snapshots of botanical history.