Digital Artifacts - Weather, Or Not

Digital Artifacts [Dec. 29th, 2012|07:29 pm]
I just spent about six hours correcting the digitally-generated texts at the California Digital Newspaper Collection. The application that converts the newspapers into plain text for display as HTML is far from perfect, and most articles as rendered are riddled with typographical errors, even though most of the scans themselves are easy enough for people to read.

The 20-terrabyte collection was put together and is maintained by the California Newspaper Project at UC Riverside, and it used to be funded by the state with $216,000 a year, but the funding was eliminated two years ago. Newspapers which had not yet been digitized now deteriorate in the stacks of libraries while the project is at a standstill. There was never enough money to have paid staff do corrections of the plain text, so it has to be done by the site's users.

I enjoy doing corrections, but at the back of my mind is always the fear that the lack of funding will lead to the abandonment of the project, and that everything I've put into it will then be lost. I've found the collection to be a very useful resource for research, and I suspect that many other people do as well, including people whose work actually contributes something tangible to the state's economy. If the collection goes away from the Internet, I suspect that the loss to the state will add up to far more than the $216,000 a year in public funds that were being spent to expand and maintain it.

But then all sorts of useful things have been cut from California's budget over the last four years. $216,000 a year is probably enough to keep two or three elderly hippie pot smokers in one of our prisons for another year of their third-strike life sentences, and Reagan knows we wouldn't want to let any of them out to wreak havoc on our... well, whatever things they would be likely to wreak havoc on. Our de-funded and closed state parks, perhaps? Intolerable! Best to leave them and history both to rot.

After all, if the history is available on the Internets, people might read it and discover that California used to be much better at funding public goods. Then where our elected officials be? Exactly! And the last thing we want is to have that paddle-free lot thrashing around in the headwaters of Shit Creek, polluting it. After all, that's Sacramento's water supply!