Disclaimer: This is actually more of a rant about software than an explanation of how we're cataloguing data during the move, but all the info is there if you care to wade through the rant.I am moving this week. Part of this process is putting a huge fraction of the
stuff that's built up into storage. We're going to do our best to throw out what we can (example: stuff still in boxes from the last time we moved...), but a lot of it can't go away.
As it happens, Sandy was looking at a piece of software she thought was interesting for cataloguing books,
Delicious Library. I didn't remember the piece of software that I'd heard of, but
Toby had
mentioned it recently. Turns out Toby is referring to
Library Thing and Sandy is referring to something completely different. Er, let me elaborate a little bit on "different." She and I are referring to two entirely
separate pieces of software. Both very much contemporary software, both very "Web 2.0" (in the case of Delicious Library, perhaps a bit
too "2.0"), and in fact both being very complementary.
So here's the difference between the two. Delicious Library (hereafter DL in the interest of brevity) actually allows me to scan in, by bar code (or – gasp – with the iSight!), all the books and music and movies, etc., that I have in the
katamari house. So, this is great, except that it's particularly brain dead in how it goes about doing it. The process is thus. I have a big pile of stuff, I wave it under my scanner, and it then makes SOAP calls out to Amazon, and attempts to fill in details. It fails from time to time, and asks that I manually populate this data (as in the case of books published prior to the ubiquity of bar codes – I have many more of these than I thought). At this point, you
can ask DL to fill it in, by issuing ⌘-R. Fair enough, I guess. One would think that handing it an ISBN or the number on the bar code, something like that, it would just
do that, given that is in fact its purpose in life.
Anyways, so it lets you go and scan everything. Terrific, right? Sure, it
is terrific, if everything is in one place. I know what is in the library/study/etc. Now, what happens if I want to know what box in which storage unit that book or CD is actually in? It gets trickier. So DL has the ability to throw things into "sets," which it calls "shelves." Unfortunately, the assumption is that all shelves are actually part of a bigger library, and it isn't possible to make logical distinctions among them. If I delete a book, it's gone from
all of them, no questions asked. Further – and this is just an amateur mistake – if I delete it, and then "undo" the delete, it doesn't actually restore the structures, it
re-adds them. That is, they're all added back, but without the membership in their respective sets. Which is what you'd expect, if the software was designed for
cataloguing and not
organizing.

Initially, I thought to myself, well, I guess people don't generally have enough books that they have separate locations for each of them. Or, really, that if they have that many, that there's some sort of cataloguing system they're using, which inherently has organization to it.
But this is just entirely wrong. If anyone cares enough about books – right there we've cut out a huge portion of humanity in general – to catalogue all of their books and store them in XML, the very fact that they're
doing that means they want them organized as well.
Well, this is where Library Thing comes in (hereafter LT, blah blah). LT is great in that it has a flickresque tagging setup, as well as sharing (again similar to flickr). The strength of LT is that it's an ads-based (essentially participation-based) revenue model. If it does well, it does well because it doesn't suck. It has all the organization you'd want, but it is missing one crucial point: the actual cataloguing. Because LT is web-based, the only way they can get data from your dead-tree store is by you uploading it. How you do that is kind of an open-ended question. They recommend a usb-powered barcode reader (more on this in a second), and they accept XML, etc (they mention DL by name, in fact). After you've done this, all your data is online, nirvana is attained, and so on.
This is a very attractive idea. Picture this. If you're at all like me (no, don't picture
that...), you occasionally think, "my god! Eli Schleismann said that
exact same thing in
Palestine Reconsidered! I
must go read that!" while you're reading a book (Dyson/Tipler, etc). Okay, so the problem with this is that it's very difficult for me to locate that book, and then to find the text
in that book that I want to re-read. This leaves me with very few options.
At
ACS, years ago, I helped them get all their printed media digitized, and it's all now online. And, hey,
readable. That means if I have that a-ha moment while I'm reading
Chemical and Engineering News, I can find what I need in the
ACS Archives. In order to do that, we cut the spines off of
three million pages of printed material,
destroying the originals, and scanning at 600 dpi, running DejaVu and Adobe Capture for
months on end, and storing many terabytes of data (in 2001 terabytes, not 2007 terabytes, remember...). I've often thought about doing this on my own, but it's not economically feasible.
Things have changed since 2001. Google now has full-text searching, as well as Amazon. By searching by author name, I can get a list of books they've written, and search through the text. Now that I have DL and LT, I can also see if I have that book. Ostensibly, I can even see
where I have that book, which is really more important to me than
if I have the book.

But where does this break down?
There are a couple of components here that don't work together. Worse, they
intentionally don't work together. The one that bothers me most is that Delicious Library is $40 when it is missing functionality that is implemented by another application. That application, of course, being Library Thing. It's easy to say "oh, well, they'll have that functionality added at a later date," or other excuses. Actually, that's incorrect. It is not
in their interest to complete this application. If they are able to produce as many sales as they can for the software in the state that it is in – very "Web 2.0", shiny, etc – somebody like Google (or Amazon, or ...) will buy them. If one has any question about their motives, one need look no further than the infamous
Wil Shipley Talk. In particular,
I should also mention that if you look at the slides you'll see another picture of my Ardent Red Lotus Elise. This bears mentioning because I actually agonized about whether to show off my car at all, ever. I decided that, in this context, it was OK, because essentially the whole talk is about how if you follow your dream you'll not only be happy, but you'll also be financially secure, and it's easier to believe that kind of advice when it's given to you by someone not LIVING IN A SHACK DOWN BY THE RIVER. I asked some of the students afterwards if they thought the car thing was totally pretentious and they said no, it came off the right way.
Now, I am sure people will call me a communist for citing that. Okay. Then let's compare the bar code readers suggested by Delicious Library and that of Library Thing. The former, $150. The latter, $15. Who do you suppose is making money on the deal? Or, you could have a look at the strident Mac-only attitude of the former when compared to the latter. For those of you still scratching heads, let's quote
Eric Raymond (add'l cite below; it's a fabulous paper):
The "utility function" Linux hackers are maximizing is not classically economic, but is the intangible [product] of their own ego satisfaction and reputation among other hackers. (One may call their motivation "altruistic", but this ignores the fact that altruism is itself a form of ego satisfaction for the altruist). Voluntary cultures that work this way are not actually uncommon; one other in which I have long participated is science fiction fandom, which unlike hackerdom explicitly recognizes "egoboo" (the enhancement of one's reputation among other fans) as the basic drive behind volunteer activity.
So there are two reasons behind the software not
quite fitting where it should. First, to do so would require cooperating with others (not Shipley's strong suite). Second, the software as it is now is entirely salable (it has terrific presence in search engines, and people such as myself link to it), so why would they
bother to fix it? I remember years ago, it was commonplace for somebody to say that they had started a company, they had some great idea, and that while they were burning through cash like crazy, it was only a matter of time before somebody bigger bought them and they could cash out.
Mature Payware 2.0
Since then we've all kind of grown up and realized that this is the reason the whole industry imploded in the 2000 - 2003 timeframe. I still hear people saying they can do that these days, but the ones that really worry me are the ones that say they're actually interested in their product, and that they are performing some need for the community, that they value their users, etc.
Just like a real company, like they intend to be in business as said company because to do otherwise would so obviously be screwing their customers, long-term. To say you're pre-IPO or pre-buyout is to say that you don't intend to support anyone, that you're in it for the cashout, and to indicate quite clearly that business with you is risky. So they don't. Instead it's this cockamamie pretense of stuff that's just shiny enough to make you think it works and that they have a personal stake in it (be it pride, or some part of the community... these things
do still exist...).
Should you actually rely upon these companies or products, it may be possible to accomplish what you set out to accomplish (sharing photos online, cataloguing or organizing books, managing contacts or CRM, etc). However, the second the vendor decides they can get a better deal elsewhere (oh, somebody hires the head developer, somebody buys the company, the user database, etc), that's that. There wasn't any intention of actually producing a
product, or to maintain a
community; that was tertiary to the goal of producing revenue. No, instead we have companies (or, rather, a
coffee shop with "
development and operations") and products which are
worse than useless. These new products and companies are utility disguised in a wrapper that is shiny and appears to be worth something, just enough to get you to use it, so that down the line, a
Yahoo or
whomever can screw you in a very (ahem)
"Old School" way.
It's almost as if the product was a legacy application before you bought it. Most of us have cried "Why in the
hell are we doing xyz
this way? We could have
written this for a tenth of what we've invested in this!" This idea of bar code tracking (or RFID or whichever) is not a very complicated one. People did it before Delicious Library, and had I not seen DL as a way of saving myself the few hundred lines of perl, I would have spent today writing software that I would be using now. Instead, I spent a day (and $40) figuring out how all this software does (not) work together, and realizing that it was just as bad as it is with Flickr (sure, I can administrate Apache, get a Dreamhost account, but why do that when there's this fancy interface for it? Well, because Flickr makes it hard to un-flickr everything, and now I have to deal with Yahoo because, well, the
flickreenos are too busy watching their stock). Or, surprise surprise, I have to write software to pull my own content down from their servers, and write my own software, again, to host it. Like I should have done from the beginning. I have
no idea what people who aren't capable of this actually do.

If you want to save me time, please just tell me that you're going to offer me a bait-and-switch, and that I can pay $40 to be screwed in two years, or I can pay $40 to write my own perl code and never see you again. I'll
give you the $40 so that I don't for a moment think that you're worth my time. Honestly.
So the last thing to say here is that Library Thing (and
smugmug) is really very cool. I can get a $15 USB barcode reader, scan everything into Excel (or to text, or whatever), and organize it on the web. They get paid for me
using the software. Which means they have an interest in making their software not suck. And they've done an admirable job of doing it. Incidentally, while I worry deeply about what Google will do in three years when I have all my e-mail, etc., stored with them, for the last five years, they've done me no harm. The same is true of Amazon, and I've been doing business with them for a decade (well, except that recommending Neal Asher part). The important part is that their revenue model is inherently based upon their service not sucking (note:
Seth Finkelstein and
others would
disagree here), or more correctly, sucking less than their competition. If you pay to buy in to a product, and it isn't easy to get out of it afterwards, there is no reason whatsoever for the vendor to support you. None.
This Web 2.0 stuff is nowhere
near as shiny as it looks. I'll be over here,
avec chapeau de clinquant bidon, coding my own shit, and waiting for Crash 2.0. My feeling is the notion of "Web 2.0" should include a concept of a revenue model that is based on merit, rather than being an
oligopsony (or of course,
oligopolies).
* Lancashire, David,
Code, Culture, and Cash: The Fading Altruism of Open Source Development, August 2001.
(also)