Updates on research on historical botany, focusing on the Carolinas. This is a collaboration between researchers at the South Carolina Botanical Garden at Clemson University and Furman University’s Department of Classics.
Sunday, March 10, 2013
Botanica Caroliniana on the Radio
Amy Hackney Blackwell and Patrick McMillan appeared on the SCETV radio program, Walter Edgar’s Journal. They discuss botany, history, and the importance of curation, both real and digital. You can listen to the interview on the show’s website.
Monday, March 4, 2013
Indigo, before indigo was big
H.S. 232 f.106 contains a specimen of Indigofera tinctoria
L., indigo. Solander identified it as such in a hand-written label. Indigo is the source of a blue dye that was in high demand in
Europe during the colonial period. (It is still used worldwide today, well
known as the dye that makes denim blue.) In principle, it should not be
surprising to find indigo among specimens Catesby collected in Carolina. This
plant was cultivated as an export crop on the Coastal Plain of Georgia and
South Carolina in the 17th and 18th centuries. But Catesby’s
specimen predates the widespread commercialization of this plant; South
Carolina’s indigo industry did not emerge until about 1740. Is there a story?
Well, maybe. Blue dye was a bigger deal than moderns might appreciate. Indigo was the stuff of fortunes, but only for a while. Economies rose and fell on the human desire for blue cloth.
Before 1500, European cloth manufacturers wishing to dye fabric blue used woad, Isatis
tinctoria. Woad is a plant in the mustard family Brassicaceae. It produces the same pigment as indigo but in lower concentrations. It was also the pigment that ancient Britons used to color their bodies blue, the better to dismay the Romans.
Indigo is a better dye than woad. It became available in Europe after trade
opened up with the East Indies, and by the late 1600s indigo had become the
blue dye of choice for European textile manufacturers. (There was a time when indigo was controversial, and regions passed laws to protect their woad-growers from competition. How times do change.) French, English, and
Spanish colonists began growing indigo in the Americas in the first half of the
17th century. Indigo grew well
in the Caribbean and Central America.
The Lords Proprietors of Carolina began experimenting with
indigo cultivation in the 1670s. The climate of coastal South Carolina proved ideal for
growing the crop, and the plants in these initial experimental
gardens grew well. By the 1690s, however, the South Carolina indigo experiment had been largely
abandoned as economically unviable; West Indian indigo was of higher quality, and rice was a more profitable crop in the Carolinas. In the 1740s
Carolina growers once again attempted to grow indigo – Eliza Lucas Pinckney is
often credited with establishing the South Carolina indigo industry – and this
time the crop was immensely profitable and a good supplement to rice culture. This
was true despite the fact that Carolina indigo had a reputation for being of
poor quality. By the mid 1700s, European nations were importing two
million pounds of indigo annually from the Western Hemisphere.
Indigo’s profitability to South Carolina lasted only a few decades. By 1800, cotton
had replaced it as the cash crop of choice. Indigo perked along as a dye crop for another century, but these days most blue dye is synthetic.
Catesby’s indigo specimen predates the establishment of the
Carolina indigo industry by nearly twenty years. This plant may have been a
remnant of earlier experiments by the first group of settlers, perhaps using
seeds imported from Barbados or Jamaica. It is likely not related to Eliza
Pinckney’s later crops, which she grew from seeds her father sent her from the
West Indies.
If you want to read more about the history or economic implications of the SC indigo industry, try:
Coon, David L. 1976. “Eliza Lucas
Pinckney and the Reintroduction of Indigo Culture in South Carolina.” The
Journal of Southern History 42 (1) (February 1): 61–76.
doi:10.2307/2205661.
Nash, R.
C. 2010. “South Carolina Indigo, European Textiles, and the British Atlantic
Economy in the Eighteenth Century.” The Economic History Review 63 (2):
362–392. doi:10.1111/j.1468-0289.2009.00487.x.
Tuesday, September 18, 2012
Taxonomy - what are we missing?
Theophrastus looked at plants and asked what distinguished one from the other. He saw differences in trees, shrubs, herbs, in roots and leaves, from one plant to the next, and so created the basics of modern taxonomy. Dioscorides looked at plants and asked what they could do for humans. He saw differences in uses, in medical applications and foods, and codified an early version of economic botany.
As fortune would have it, Dioscorides' work thrived from ancient times until the Renaissance. Theophrastus got buried in a pit and mostly forgotten. Plant taxonomy as we know it languished until Europeans regained an interest in the natural world. One big problem? No one knew what to call plants. No one knew if they were talking about the same plant. Local names were handy locally, but worse than useless scientifically. It took Linnaeus and his system of binomial nomenclature to finally make it possible to have a coherent discussion about a particular plant and for all participants to be certain that they were discussing the same thing. (See Anna Pavord's book The Naming of Names for an excellent overview.)
Linnaeus and his descendant taxonomists follow essentially the same rules as Theophrastus. They look at physical features of plants. Plants that share features get grouped together. Genetics and DNA mapping have allowed taxonomists to create groupings that are more evolutionarily accurate than groupings created by mere gross physical resemblance - for example, Sarracenia and Nepenthes are not close relatives even though they are both types of insectivorous pitcher plants - but the principle remains the same, plants grouped on the basis of physical similarity.
All of this means that plant taxonomy is a weird, free-floating subject of study. We memorize identifying features such as placentation, leaf arrangement, number of carpels and ovary placement. We use these traits in dichotomous keys to place plant specimens into families, genera, or species. So, for example, we know that members of the family Salicaceae are trees or shrubs, the lowermost bud scale is centered over the leaf scar, the flowers are reduced, unisexual, and subtended by hairy bracts, and that the fruits are loculicidal capsules. Members of the Fabaceae have root-nodules containing nitrogen-fixing bacteria, alternate leaves that are typically compound, stipules that range from inconspicuous to leaf-like, and usually five sepals.
But what does that tell me? These synapomorphies, or shared features, are the result of evolution, of millions of years of natural selection. The social Hymenoptera, which include bees and ants, appeared during the Cretaceous (145-65 mya). Angiosperms appeared between 130 and 125 million years ago. The Asteraceae appeared around 50 million years ago, during the Cenozoic era. The Cenozoic started 65 million years ago, after the end of the Cretaceous, and after the dinosaurs disappeared. This was the age of mammals, but also of various other kinds of life, especially insects. The Asteraceae thrived and radiated during the unstable climate of the Pleistocene - 2.5 million to 12,000 years ago - downright recent by human standards.
Homo sapiens, on the other hand, appeared about 200,000 years ago, and only got to looking and acting modern 50,000 years ago. What have we to do with plants? They were all here when we arrived on the scene - whatever morphological features they "chose" were not chosen with us in mind. Plants are speaking a very different language from ours, and on a different timescale. They were formed by insects and other animals and climates and geography that we can never know.
This, I think, is why economic botany tends to win out over pure systematics. We teach students economically important plants in each major plant family - peaches come from Rosaceae, cotton from Malvaceae - because otherwise it is awfully difficult to persuade them that they should know the classifications. The plants simply aren't talking to us. We don't assess flowers for landing strips or accessibility for our probosces. A magnolia flower has a lot to say to a beetle and rose perfume says a lot to bees, but to us they're just pretty. The ginkgo still makes fruit that may have been custom-blended by dinosaurs, and the durian makes giant spiny fruits that might have been intended for elephants. (Connie Barlow, The Ghosts of Evolution.) No wonder we focus our study on "useful" plants - otherwise we feel left out of the discussion.
Pure taxonomy - we need the names named, so that we can know what we're talking about. But I find it tantalizing, dancing around real answers. How is it that I can study botany for years and still can barely identify a single insect? Why are we studying plants in splendid isolation, instead of pairing them with their many partners? Why doesn't taxonomy start with a chronology, slotting plants in where they appeared, along with animals and other living organisms and continental positions and climate? (That's a poster I've been planning to make for years. Complicated, it is.)
There is so much more to why plants are what they are than their medical and fiber applications. Applied plant science is all very well, but the full story is so much more interesting. Even if it's still fragmentary, we could be telling it.
As fortune would have it, Dioscorides' work thrived from ancient times until the Renaissance. Theophrastus got buried in a pit and mostly forgotten. Plant taxonomy as we know it languished until Europeans regained an interest in the natural world. One big problem? No one knew what to call plants. No one knew if they were talking about the same plant. Local names were handy locally, but worse than useless scientifically. It took Linnaeus and his system of binomial nomenclature to finally make it possible to have a coherent discussion about a particular plant and for all participants to be certain that they were discussing the same thing. (See Anna Pavord's book The Naming of Names for an excellent overview.)
Linnaeus and his descendant taxonomists follow essentially the same rules as Theophrastus. They look at physical features of plants. Plants that share features get grouped together. Genetics and DNA mapping have allowed taxonomists to create groupings that are more evolutionarily accurate than groupings created by mere gross physical resemblance - for example, Sarracenia and Nepenthes are not close relatives even though they are both types of insectivorous pitcher plants - but the principle remains the same, plants grouped on the basis of physical similarity.
All of this means that plant taxonomy is a weird, free-floating subject of study. We memorize identifying features such as placentation, leaf arrangement, number of carpels and ovary placement. We use these traits in dichotomous keys to place plant specimens into families, genera, or species. So, for example, we know that members of the family Salicaceae are trees or shrubs, the lowermost bud scale is centered over the leaf scar, the flowers are reduced, unisexual, and subtended by hairy bracts, and that the fruits are loculicidal capsules. Members of the Fabaceae have root-nodules containing nitrogen-fixing bacteria, alternate leaves that are typically compound, stipules that range from inconspicuous to leaf-like, and usually five sepals.
But what does that tell me? These synapomorphies, or shared features, are the result of evolution, of millions of years of natural selection. The social Hymenoptera, which include bees and ants, appeared during the Cretaceous (145-65 mya). Angiosperms appeared between 130 and 125 million years ago. The Asteraceae appeared around 50 million years ago, during the Cenozoic era. The Cenozoic started 65 million years ago, after the end of the Cretaceous, and after the dinosaurs disappeared. This was the age of mammals, but also of various other kinds of life, especially insects. The Asteraceae thrived and radiated during the unstable climate of the Pleistocene - 2.5 million to 12,000 years ago - downright recent by human standards.
Homo sapiens, on the other hand, appeared about 200,000 years ago, and only got to looking and acting modern 50,000 years ago. What have we to do with plants? They were all here when we arrived on the scene - whatever morphological features they "chose" were not chosen with us in mind. Plants are speaking a very different language from ours, and on a different timescale. They were formed by insects and other animals and climates and geography that we can never know.
This, I think, is why economic botany tends to win out over pure systematics. We teach students economically important plants in each major plant family - peaches come from Rosaceae, cotton from Malvaceae - because otherwise it is awfully difficult to persuade them that they should know the classifications. The plants simply aren't talking to us. We don't assess flowers for landing strips or accessibility for our probosces. A magnolia flower has a lot to say to a beetle and rose perfume says a lot to bees, but to us they're just pretty. The ginkgo still makes fruit that may have been custom-blended by dinosaurs, and the durian makes giant spiny fruits that might have been intended for elephants. (Connie Barlow, The Ghosts of Evolution.) No wonder we focus our study on "useful" plants - otherwise we feel left out of the discussion.
Pure taxonomy - we need the names named, so that we can know what we're talking about. But I find it tantalizing, dancing around real answers. How is it that I can study botany for years and still can barely identify a single insect? Why are we studying plants in splendid isolation, instead of pairing them with their many partners? Why doesn't taxonomy start with a chronology, slotting plants in where they appeared, along with animals and other living organisms and continental positions and climate? (That's a poster I've been planning to make for years. Complicated, it is.)
There is so much more to why plants are what they are than their medical and fiber applications. Applied plant science is all very well, but the full story is so much more interesting. Even if it's still fragmentary, we could be telling it.
Monday, September 10, 2012
An Update on Balsam Pears, and other Colonial Plants
Last week I went to Clemson’s library and checked out
Lawrence D. Griffith’s book Flowers and Herbs of Early America. This lovely
book, published in 2008, is an encyclopedia of plants that would have been
favorite garden plants in 18th century Virginia. Griffith is Curator
of Plants of Colonial Williamsburg, and this book is the result of a project he
conducted starting in 2001, in which he researched the plants that 1700s
Americans would have been growing, found seeds for them, and grew them all.
My big takeaway? Upper-class Americans and Brits wanted to
grow beautiful and exotic plants from other parts of the world. The list of 60
or so species he studied consists largely of species not from North America –
some from South America, some from Asia, many from Europe. Europeans, for their
part, wanted American plants. Even in the early 1700s ornamental gardeners were
growing exotics. Seeds traveled almost as fast as information does today.
That Balsam Pear I found in the Oxford collection, Momordica charantia – Griffith writes that it was introduced into Europe in 1710 but
before was widely used in its native regions of Africa, Asia, and the
Caribbean. It was used as a stomach medicine and as a food crop. The balsam
apple, M. balsamina, on the other hand, was introduced into Europe much
earlier, appearing in Leonhart Fuch’s 1542 herbal, in which he describes it as “planted
in many gardens.”
Griffith also grew Eupatorium perfoliatum, boneset, which
Catesby collected in South Carolina. It’s preserved at H.S. 212 f. 74. Boneset
is a native plant, commonly used as a treatment for fevers – the “bones” in
boneset refer to breakbone fever, or dengue, which I can attest makes one’s
entire skeleton ache. Griffith comments that Linnaeus named the species, suggesting
that John Clayton collected it, though no specimen exists in the Clayton Herbarium
in London. Catesby’s specimen doesn’t have a binomial label, so it appears that
Solander didn’t identify this one. Interesting….
Wednesday, August 29, 2012
Better Access to Images of Mark Catesby’s Work
We have updated our list of digital images of Mark Catesby’s Horti Sicci from the Sloane Herbarium, and his Natural History here:
http://folio.furman.edu/botcarweb/browseimg?urn=urn:cite:fufolioimg:BotCarCatesbyHS&offset=0&limit=25
This update takes advantage of new technologies for making these images more useful to a broad audience.
These digital images are served from the CITE Image Service, a technology developed for the Homer Multitext Project. The images and the service are hosted at the University of Houston’s High Performance Computing Center, thanks to the generosity and vision of Keith Crabb, its director.
The CITE Image Service provides canonical citation of images and regions-of-interest on those images. A “normal” request to the service has many parameters, allowing retrieval of images at different scales, portions of images, dynamic views of images, and various kinds of metadata. This can be complex, and complexity limits casual use.
Inspired by the Linked Ancient World Data Institute, an NEH funded event at New York University in the summer of 2012, my collaborators Neel Smith, Ryan Bauman and I have worked to make access to our data services more simple and more useful.
Each image in the Botanica Caroliniana collection is accessble via an HTTP-URI. That is, you can call up an image using something that looks like a normal URL. E.g.
http://folio.furman.edu/botcarimage/urn:cite:fufolioimg:BotCarCatesbyHS.Catesby_HS232_016_0636
“http://folio.furman.edu/citeimg/’ plus the canonical URN that defines the image.
This will invoke a “GetImagePlus” request, which will return:
* a view of the image…
* linked to a dynamic high-resolution view that you can zoom,
* its caption and statement of rights, and
* a link to the Image Citation Tool that allows scholars to generated URNs pointing to specific regions-of-interest on the image.
URNs that specify regions-of-interest work, too:
http://folio.furman.edu/botcarimage/urn:cite:fufolioimg:BotCarCatesbyHS.Catesby_HS232_016_0636@0.177,0.088,0.392,0.206
The data returned by these URIs is raw XML, and thus easily processed programmatically. The XML invokes a stylesheet that any modern web-browser will format for human readers and browsers.)
The goal is, as ever, to give access to our data that is as flexible as possible, that constrains users as little as possible, and that makes possible research that is serious or casual, human-centered or automated, according to the needs of individual scholars and readers.
(Of course, the raw data of these images is directly available at http://amphoreus.hpcc.uh.edu/botcar/ .)
http://folio.furman.edu/botcarweb/browseimg?urn=urn:cite:fufolioimg:BotCarCatesbyHS&offset=0&limit=25
This update takes advantage of new technologies for making these images more useful to a broad audience.
These digital images are served from the CITE Image Service, a technology developed for the Homer Multitext Project. The images and the service are hosted at the University of Houston’s High Performance Computing Center, thanks to the generosity and vision of Keith Crabb, its director.
The CITE Image Service provides canonical citation of images and regions-of-interest on those images. A “normal” request to the service has many parameters, allowing retrieval of images at different scales, portions of images, dynamic views of images, and various kinds of metadata. This can be complex, and complexity limits casual use.
Inspired by the Linked Ancient World Data Institute, an NEH funded event at New York University in the summer of 2012, my collaborators Neel Smith, Ryan Bauman and I have worked to make access to our data services more simple and more useful.
Each image in the Botanica Caroliniana collection is accessble via an HTTP-URI. That is, you can call up an image using something that looks like a normal URL. E.g.
http://folio.furman.edu/botcarimage/urn:cite:fufolioimg:BotCarCatesbyHS.Catesby_HS232_016_0636
“http://folio.furman.edu/citeimg/’ plus the canonical URN that defines the image.
This will invoke a “GetImagePlus” request, which will return:
* a view of the image…
* linked to a dynamic high-resolution view that you can zoom,
* its caption and statement of rights, and
* a link to the Image Citation Tool that allows scholars to generated URNs pointing to specific regions-of-interest on the image.
URNs that specify regions-of-interest work, too:
http://folio.furman.edu/botcarimage/urn:cite:fufolioimg:BotCarCatesbyHS.Catesby_HS232_016_0636@0.177,0.088,0.392,0.206
The data returned by these URIs is raw XML, and thus easily processed programmatically. The XML invokes a stylesheet that any modern web-browser will format for human readers and browsers.)
The goal is, as ever, to give access to our data that is as flexible as possible, that constrains users as little as possible, and that makes possible research that is serious or casual, human-centered or automated, according to the needs of individual scholars and readers.
(Of course, the raw data of these images is directly available at http://amphoreus.hpcc.uh.edu/botcar/ .)
Friday, July 27, 2012
An African Melon in South Carolina
In Oxford’s Sherard collection of Mark Catesby specimens
lurks this tidbit, Sher-2195, a tendrilled vine with heavily dissected leaves,
characteristic of the cucumber family, Cucurbitaceae. Patrick has identified it
as Momordica charantia L., commonly known as Balsam pear, balsam apple, bitter
melon, and bitter gourd. The common name is a bit confusing because a related
species, Momordica balsamina, also goes by most of those names. No matter.
The notes on the specimen page read: “Bryonia.
Cucumis parvus Marianus, Bryonia alba foliis minoribus,
polycarpus Pluk. Manu (?) 59” and “Mr. Catesby S. Carolina USA(?) from the
upper part of the country.” The fact that this is in the Sherard collection in
Oxford suggests that this plant was collected by Mark Catesby in South
Carolina in 1722 or so. The note suggests that he found it some distance from the coast.
But this plant is not native to North America. Alan Weakley
says that the vines of the genus Momordica are native to the Old World tropics.
Momordica charantia is a native of Africa. Weakley’s flora contains no
distribution map for Momordica charantia (though it does for M. balsamina), and
notes only that the species has been found recently in the Panhandle of
Florida. The USDA Plants distribution map doesn’t have any record of this species
occurring in South Carolina.
So Mark Catesby cut a specimen of this plant in South
Carolina in 1722. How did it get there?
Charleston, SC, was founded in 1670. It was a major port,
and one of the points in North America where ships from Africa unloaded their
cargoes of slaves and African plants. Could an African cucurbit make its way
from Charleston to the “upper part of the country” by 1720? Fifty years is a
long time. Was the plant in a settler’s garden? Did someone bring seeds from
Europe?
Momordica was a known garden plant by the early 1800s.
Thomas Jefferson planted balsam apple, apparently M. balsamina, in his garden
at Monticello in 1810. The Monticello website claims that M. balsamina was
introduced into Europe by 1568. (The source for this claim is the book Flowers
and Herbs of Early America by Lawrence D. Griffith and Barbara Temple Lombardi,
Yale University Press, 2008. I need to check this out of the Clemson library as
soon as I head back to campus. I'm dying to know who brought it to Europe, where they got it, and how we know this.) Anyway, the plant had apparently been used as a
medicine in Europe for over two centuries and was attractive enough for Jefferson to consider the
seeds worth acquiring and planting in his annual bed.
Balsam apples appear in 18th and 19th
century American paintings. The Pope Brown Collection of South Carolina Natural
History contains this depiction of a Balsam Apple, either Momordica balsamina
or Momordica charantia, painted c. 1765-1775 – not so long after Catesby. The
Metropolitan Museum of Art has “Still Life: Balsam Apple and Vegetables, ” an
oil painting done by James Peale of Maryland. So there were definitely Monardia
growing on the East Coast between 1765 and 1820.
Eat The Weeds.com reports that M. charantia occurs from
Connecticut south to Florida and west to Texas, as well as in parts south. It’s
all over Florida today. This website comfortingly informs me that no one knows
where it came from originally.
M. charantia certainly has a global distribution today. It’s
a common vegetable throughout Asia, Africa, South America and the Caribbean. (Here's a nice botanical illustration done by a Japanese high school student for the Tsukuba Botanical Garden.)
Despite its reputation for bitterness and the fact that it is poisonous if
eaten raw, it apparently enhances a variety of dishes. It also has been used as
a folk remedy all manner of ailment for centuries, and today is the subject of
numerous studies of its pharmacological properties. Some experts think it might
be useful for controlling diabetes.
Thursday, July 26, 2012
Corpus Botany
Michael Dosmann, Curator of Living Collections at the Arnold Arboretum at Harvard University, gave this sage advice to anyone working to manage a collection of botanical data: “Don’t spend your life chasing taxonomy!” The world of botanical taxonomy is endlessly complex and dynamic, changing rapidly from month to month. It is a global effort to build a single hierarchical tree that captures reality, with contributors working from different directions using different and evolving techniques to understand a body of data that is expanding with new discoveries and shrinking from anthropogenic changes to the planet. It is built on a foundation that began with Theophrastus in the 4th Century BCE and was canonized by the 17th and 18th century natural philosophers of Europe, but this traditional foundation is being bent rather violently to accommodate three subsequent centuries of new understanding.
The traditional taxonomic ladder is captured in the mnemonic “King Philip Came Over For Good Sex” (shout-out to XKCD): Kingdom, Phylum, Class, Order, Family, Genus, Species.
But the Integrated Taxonomic Information System (ITIS) presents online users with the following: Kingdom, Subkingdom, Infrakingdom, Division, Subdivision, Infradivision, Superclass, Class, Subclass, Superorder, Order, Suborder, Family, Subfamily, Tribe, Subtribe, Genus… (I cut it off at the Genus level, since the point was made).
This list makes clearly shows an ongoing process of rebuilding-the-ship-as-it-sails, shoehorning sub- and super-categories into the ladder in order to reflect a growing understanding of increasing complexity.
Hence Dosmann’s advice: You can’t wait for this to get sorted out before getting down to work.
For Botanica Caroliniana we want to collect and juxtapose useful data on the history of botanical science. We are not in a hurry and are willing to take the time to work methodically, to separate concerns, to recognize that the underlying data is more important than an immediate, glossy online presentation. But we don’t want to wait forever.
And we need to name plants. These names must be unique, stable, and machine-actionable. Linnaean binomials are pretty good, and traditional. They are supposed to be unique. They are not stable, as further study will inevitably split species, rearrange genera; ITIS and IPNI (the Integrated Plant Names Index) will happily provide countless synonyms for any given Linnaean binomial. They are certainly not machine-actionable.
For our digital library we need machine-actionable identifiers that we can use now. They need to be unique and stable within the digital library, while accommodating subsequent changes to the scientific reality of the objects to which they point. Here we can borrow from the disciplines of information science and corpus linguistics.
Corpus Linguistics. It is extremely difficult to make assertions about “how the English Language works”; people keep saying new things, keep changing how they speak, keep encountering new situations that need new word and new constructions. It is much, much easier to make assertions about “the language of New York Times reporting from 1941 - 1945”: How did the NYT refer to the enemies and allies of the United States? What verbs did they use for military victory and defeat, for casualty figures, to describe economic hardships at home? Answers to those questions are easier to formulate, and can be assessed in the context of the explicitly defined corpus. Many answers that are valid for one corpus would be invalid for another—racial slurs that were acceptable to the NYT in 1942 would never be allowed in print today; the language describing the Soviet Union (I bet) changed dramatically between 1943 and 1947. Corpus linguistics allows us to study real phenomena within defined constraints that make intractable datasets manageable.
Namespaces and Arbitrary Identifiers. Information scientists deal with large numbers of things. Amazon.com sells billions of products; they need to keep track of those products, to share information about them through the digital medium that the company inhabits. Under these circumstances it is immediately obvious that the acts of identification and description must be separate. This is not complicated: each product has a unique, machine-actionable ID, which points to a body of data that includes description, price, reader reviews, and so forth.
Digital librarians have a greater challenge than online merchants, since their IDs need to survive in the wild, outside the confines of a particular database. The “Rachael Ray 1.5 Quart WhistlingTeakettle” has an Amazon ID of 1343145892. That ID, elsewhere on the Internet, points to a product in a Japanese cosmetics catalogue, a Seller Profile on Ebay.com, and a team-building event for Western Union Employees, to name a few.
The answer for a digital library is to use Namespaces. There are no doubt billions of digital objects in the online universe with an ID of 19. But there is only one with an ID that is urn:cite:botcar:sloane.19. That is, “a URN using the CITE protocol, in the BOTCAR namespace, in the Sloane collection, number 19.”
Corpus Botany. For Botanica Caroliniana, we give each object of our interest a URN identifier: herbaria, folios in herbaria, specimens on folios, digital images of folios, and the notional species which these represent.
These species URNs are the glue that holds this digital library together. So the species Acer negundo L. has a URN: urn:cite:botcar:species.Acernegundo. The last element of the URN is somewhat human-readable, but it is worth emphasizing that this is an ID, an arbitrary identifier, and nothing more. As a data-object, urn:cite:botcar:species.Acernegundo identifies a notional species that we can label, for human readers, Acer negundo L., that we can supply with bibliography, or that we can link to an ITIS record (TSN serial no. 28749).
The important thing, though, is that we can build our digital library simply by creating a graph of URNs, with each URN maintaining a strict separation of concerns. A specimen (urn:cite:botcar:sloane.422) appears on a folio (urn:cite:fufolio:CatesbyHS212.12) and is an example of a species (urn:cite:botcar:species.Acernegundo), which belongs to a genus (urn:cite:botcar:genera.Acer), which in turn belongs to a family (urn:cite:botcar:family.Sapindaceae); the folio (urn:cite:fufolio:CatesbyHS212.12) is illustrated by a digital image (urn:cite:fufolioimg:Caroliniana.Catesby_HS212_012_0493), and the specimen itself is illustrated by a region of interest on that image (urn:cite:fufolioimg:Caroliniana.Catesby_HS212_012_0493:0.423,0.2,0.549,0.715).
The traditional taxonomic ladder is captured in the mnemonic “King Philip Came Over For Good Sex” (shout-out to XKCD): Kingdom, Phylum, Class, Order, Family, Genus, Species.
But the Integrated Taxonomic Information System (ITIS) presents online users with the following: Kingdom, Subkingdom, Infrakingdom, Division, Subdivision, Infradivision, Superclass, Class, Subclass, Superorder, Order, Suborder, Family, Subfamily, Tribe, Subtribe, Genus… (I cut it off at the Genus level, since the point was made).
This list makes clearly shows an ongoing process of rebuilding-the-ship-as-it-sails, shoehorning sub- and super-categories into the ladder in order to reflect a growing understanding of increasing complexity.
Hence Dosmann’s advice: You can’t wait for this to get sorted out before getting down to work.
For Botanica Caroliniana we want to collect and juxtapose useful data on the history of botanical science. We are not in a hurry and are willing to take the time to work methodically, to separate concerns, to recognize that the underlying data is more important than an immediate, glossy online presentation. But we don’t want to wait forever.
And we need to name plants. These names must be unique, stable, and machine-actionable. Linnaean binomials are pretty good, and traditional. They are supposed to be unique. They are not stable, as further study will inevitably split species, rearrange genera; ITIS and IPNI (the Integrated Plant Names Index) will happily provide countless synonyms for any given Linnaean binomial. They are certainly not machine-actionable.
For our digital library we need machine-actionable identifiers that we can use now. They need to be unique and stable within the digital library, while accommodating subsequent changes to the scientific reality of the objects to which they point. Here we can borrow from the disciplines of information science and corpus linguistics.
Corpus Linguistics. It is extremely difficult to make assertions about “how the English Language works”; people keep saying new things, keep changing how they speak, keep encountering new situations that need new word and new constructions. It is much, much easier to make assertions about “the language of New York Times reporting from 1941 - 1945”: How did the NYT refer to the enemies and allies of the United States? What verbs did they use for military victory and defeat, for casualty figures, to describe economic hardships at home? Answers to those questions are easier to formulate, and can be assessed in the context of the explicitly defined corpus. Many answers that are valid for one corpus would be invalid for another—racial slurs that were acceptable to the NYT in 1942 would never be allowed in print today; the language describing the Soviet Union (I bet) changed dramatically between 1943 and 1947. Corpus linguistics allows us to study real phenomena within defined constraints that make intractable datasets manageable.
Namespaces and Arbitrary Identifiers. Information scientists deal with large numbers of things. Amazon.com sells billions of products; they need to keep track of those products, to share information about them through the digital medium that the company inhabits. Under these circumstances it is immediately obvious that the acts of identification and description must be separate. This is not complicated: each product has a unique, machine-actionable ID, which points to a body of data that includes description, price, reader reviews, and so forth.
Digital librarians have a greater challenge than online merchants, since their IDs need to survive in the wild, outside the confines of a particular database. The “Rachael Ray 1.5 Quart WhistlingTeakettle” has an Amazon ID of 1343145892. That ID, elsewhere on the Internet, points to a product in a Japanese cosmetics catalogue, a Seller Profile on Ebay.com, and a team-building event for Western Union Employees, to name a few.
The answer for a digital library is to use Namespaces. There are no doubt billions of digital objects in the online universe with an ID of 19. But there is only one with an ID that is urn:cite:botcar:sloane.19. That is, “a URN using the CITE protocol, in the BOTCAR namespace, in the Sloane collection, number 19.”
Corpus Botany. For Botanica Caroliniana, we give each object of our interest a URN identifier: herbaria, folios in herbaria, specimens on folios, digital images of folios, and the notional species which these represent.
These species URNs are the glue that holds this digital library together. So the species Acer negundo L. has a URN: urn:cite:botcar:species.Acernegundo. The last element of the URN is somewhat human-readable, but it is worth emphasizing that this is an ID, an arbitrary identifier, and nothing more. As a data-object, urn:cite:botcar:species.Acernegundo identifies a notional species that we can label, for human readers, Acer negundo L., that we can supply with bibliography, or that we can link to an ITIS record (TSN serial no. 28749).
The important thing, though, is that we can build our digital library simply by creating a graph of URNs, with each URN maintaining a strict separation of concerns. A specimen (urn:cite:botcar:sloane.422) appears on a folio (urn:cite:fufolio:CatesbyHS212.12) and is an example of a species (urn:cite:botcar:species.Acernegundo), which belongs to a genus (urn:cite:botcar:genera.Acer), which in turn belongs to a family (urn:cite:botcar:family.Sapindaceae); the folio (urn:cite:fufolio:CatesbyHS212.12) is illustrated by a digital image (urn:cite:fufolioimg:Caroliniana.Catesby_HS212_012_0493), and the specimen itself is illustrated by a region of interest on that image (urn:cite:fufolioimg:Caroliniana.Catesby_HS212_012_0493:0.423,0.2,0.549,0.715).
In the above sentences, every noun is represented by a URN, and each verb can be as well. This is the subject of a subsequent posting on this blog.
For now, it is enough to say that we do not have to capture the entire taxonomic world in order to build a useful digital library of historical scientific data. We can give URNs to the objects of our concern, and only to those objects. Because our linking mechanisms are arbitrary identifiers, the structure of our digital library can accommodate advances in scientific understanding easily. If genus Acer is split in two, and Acer negundo suddenly belongs to a new genus, we need make only one change to the hundreds of thousands of points in our graph.* We change one entry from:
urn:cite:botcar:species.Acernegundo isMemberOf urn:cite:botcar:genera.Acer
to
urn:cite:botcar:species.Acernegundo isMemberOf urn:cite:botcar:genera.SomeNewGenus
If Acer is renamed by some authority, but remains a member of Sapindaceae and the parent of Acer negundo, then we are under no obligation to change our URN, which can remain “the genus is referred to in Botanica Caroliniana as urn:cite:botcar:genera.Acer”. We can and should update any data fields pointed to by that URN to reflect the new scientific reality.
So this is what we are calling “corpus botany”: comprehensive coverage of data in a constrained, publicly defined corpus, with separation of concerns, tied together with namespaced, arbitrary identifiers.
Hortus Siccus Facsimile Model (Google Fusion Table)
Catesby Specimen Collection URNs (Google Fusion Table)
Botanical Species URNs (Google Fusion Table)
Botanical Genus URNs (Google Fusion Table)
Botanical Family URNs (Google Fusion Table)
The organization of these URNs into Collections, how we can use them in online publications, and the important topic of how to link them together, will be the subject of subsequent posts.
* Genus Aster is problematic. Alan S. Weakley, in the Flora of the Southern and Mid-Atlantic States, says, “It is now abundantly clear that the traditional, broad circumscription of Aster, as a genus of some 250 species of North America and Eurasia, is untenable.”
The evolving collections of URN identifiers are store in public Google Fusion Tables:
The organization of these URNs into Collections, how we can use them in online publications, and the important topic of how to link them together, will be the subject of subsequent posts.
* Genus Aster is problematic. Alan S. Weakley, in the Flora of the Southern and Mid-Atlantic States, says, “It is now abundantly clear that the traditional, broad circumscription of Aster, as a genus of some 250 species of North America and Eurasia, is untenable.”
Subscribe to:
Posts (Atom)