Commons

From Genmine
Jump to: navigation, search

A recent thread on the soc.genealogy.methods newsgroup under the subject "Genealogical Estate Planning" has started me thinking about the problem of preserving the results of our genealogical research so that it will still be accessible to future generations.

Digital data

The most reliable way of ensuring that something still exists several centuries in the future is to make sure that as many copies as possible exist in different places. Losses can happen in even the most secure of archives, but with many copies, if some get destroyed, others still stand a chance of surviving. Publishing research in book form is one way of achieving this, but this can be costly, take a lot of time, and many people are reluctant to publish things they consider still to be works in progress. What is needed is an easier way to publish research: a way so easy that every little breakthrough can be published.

Almost all our research can be converted into digital format: speculation, inferences and deductions can be written down; texts can be transcribed; photos and certificates can be scanned; memorials, statues, paintings and medals can be digitally photographed; audio and video are often already digitised these days and can be made so if they're not. Digital information can easily be shared over the Internet, and it does not degrade as it is copied and recopied. (There are problems when formats become obsolete, and that is something that I shall return to.) Putting digitised research online seems easiest way of publishing it.

Online data, however, often has no permanence. If you publish on a personal website, who will pay the hosting costs long after you've died? If it's a free hosting provider, will it still exist in twenty years' time? In the mid-90s geocities.com was the most popular free provider, but sites created there then no longer exist. What about ancestry.com? That has a better chance of surviving because has an obvious, viable business plan. But its ultimate aim is to make money, not to preserve your data. You have no guarantee that they won't delete your data, and even if they don't they might start charging prohibitively to access it. Trusting the preservation of your research exclusively to one company is a bad strategy. (Throughout this page I use ancestry.com as an example of a commercial genealogy site. My comments about it could be equally be considered as general comments on an arbitrary commercial genealogy site.)

A genealogical commons

The best way to ensure the long-term preservation of digital data has to be to make sure it is continually being copied to new places, and not tied to fortunes of a few present-day companies. When a new genealogy company sets up, you need them to be legally free to import your research onto their site. That freedom to copy in perpetuity is essential. If I download someone's research from ancestry.com, I find it marked with ancestry.com's copyright statement and I am seemingly not free to publish it elsewhere. (In fact, my understanding is that copyright typically remains with the original researcher, despite the notice seemingly to the contrary, but either way, I do not have permission to publish research downloaded from ancestry.com.) If the original data had been clearly and unambiguously made available under a “share-alike” or “copyleft” licence, such as the Creative Commons CC-BY-SA licence, this situation couldn't arise. If you publish your research under a licence like CC-BY-SA, a company like ancestry.com would be free to import the data on to their site, but not free to prevent their competitors, current or future, from copying it from them.

However, the licence is only one part of the problem, and a copyleft licence introduces problems of its own. We still need to find a place to put copies initially. A company like ancestry.com will not collate research from individual websites as it does not have the resources, and, to the best of my knowledge, most of the existing genealogical websites do not allow copyleft uploads — rather they require you to accept their licence conditions. What is needed is an organisation whose primary objective is to collect researchers' work and make it easily accessible. In a word, what we need is a commons, a collected body of research that is accessible to all, hopefully in perpetuity.

The idea of a vast, online commons has already been used extensively by the Wikimedia project, in the form of the Wikimedia Commons, a collection of over 11 million media files (mostly images), free for use by anyone, providing they acknowledge the original creator. What we need is a genealogical version of that: a well-indexed, freely-available repository of research. And in the short term, that needs an organisation and website to manage it.

It is worth clarifying, lest there is any doubt on this point, that this genealogical commons would not be managed remotely like Wikipedia. The original, unmodified version of your research would always be there, and other people would not be modifying it. Others may cite your research, may quote it, may produce research derived from it, or even produce a new version “correcting” what an errors that the later researcher feels you may have made, but your original research is always there, separate from any derivative work.

Financing it

The Wikimedia Commons exists thanks to donations from users and from companies who wish to support it. Another model is to make people pay to access the research in the commons. (This is not necessarily incompatible with a copyleft licence, depending on which one is used.) But I don't see that either of these models will work for genealogical research because a lot of research will probably get accessed very infrequently. However, I think it is quite feasible for the organisation running a genealogical commons to provide indefinite storage of research for a single, fairly modest up-front fee. Some rough calculation suggest that £20 per gigabyte (€25, US$30) could be enough, which seems good value to me. This assumes two things: first, that there will always be more researchers coming along wanting to store their research, and second that storage continues to get cheaper. Specifically, we are assuming that the cost of storing data forever is finite because the unit cost of storage is falling exponentially, and that cost of making this ever-increasing body of research available to the public (e.g. the electricity and bandwidth costs, administrative overheads, and so on) can always be met by the up front fees from the current generation of researchers.

“Anyone who believes exponential growth can go on forever in a finite world is either a madman or an economist”, or so the saying goes. On the face of it, this is what we are doing. However, we do not require it to go on for ever: merely for long enough that the commons is well enough known to have been incorporated into other companies' databases. If it survives after then, so much the better; if not, it's still served it's purpose. The key is that it does survive that long, and that its content is tempting enough that it does get combined into other databases. In this case “tempting enough” means large enough — a large commercial site like ancestry.com are not going to go to the hassle of incorporating data from a single person's research, but if they can include many in a single go, they will.

Personal tools
Namespaces

Variants
Actions
Navigation
Tools