25 July, 2007

Getting serious about media


Apple has been courting the music devotee for quite some time. While some of us may think this goes back to, say, the first generation iPod, it of course goes much further. Step back even further and you see Altivec, which was essentially a means to render floats on the proc (because back then, graphics cards weren't the hyperactive reality-engine-in-a-chip that they are today). Further yet, we have the 'av' models (660av, 840av, 6100/8100av, etc), which were the company's first efforts at producing a real graphics machine for consumers (the other choice for the 'prosumer' market was spending a couple hundred grand on a deskside machine). We can even extrapolate and point out that Quicktime was a step in this direction, as well (and I had it on my SE/30, to give you an idea of how far back this goes).

And yet, Apple has no serious approach for storage. You can now get 2tb of storage in your Mac, built-to-order. In fact, as soon as the deal with Dell/Alienware expires, Hitachi will be selling you 1tb drives for your Mac, pushing that number up to 4tb, in the chassis alone.

What about the XServe RAID? The problem you'll encounter here is precisely the same problem you'll have with 4tb in your tower. Because Apple is taking a bottom-up approach, adapting consumer hardware to professional uses, they run into ugly issues like the OS (or the controller) sleeping the drives when they're not in use – even when they're part of a volume group. I have in front of me five 250GB firewire 800 drives. Unfortunately, even if I were to make a RAID 5 out of them, I expose myself to substantial risk of data corruption. I could instead go with RAID 0, but the problem there is of course that the risk I mitigate by switching from 5 to 0 is offset by the increased risk of failure due to reduced redundancy.

The other problem that bothers me is the absolute, glaring failure of Apple to actually support the two-percent (you could call this hyperparetotic if you like, although it might be more applicable to ask Benford for a reacharound) folks in the media market. Because their product lines encourage people to expand their storage needs at a rate much faster than the rest of the consumer market, and the baseline of what they consider normal (Apple's selling 80GB iPods, and I reckon we'll make it to 100 or 120gb before Apple changes the form factor in some way), that hyper-extended 2% (it becomes much more dilute if we extend out to the traditional 80/20 Pareto principle) will be consuming seemingly exponentially larger storage real estate, and are going to need novel ways to manage them. How plausible is this? Well, they've managed to get most of us carrying around accelerometers. How far away could it be that we begin to understand logical versus physical volumes and volume groups? (the bad news: iSCSI is already taken, they'll need a new product name)

We wouldn't need novel forms of storage if everyone understood how to manage an FCAL loop or could trouble themselves to memorize what RAID levels 0, 1, 5, 10, and 15 are. But, we do need that technology, and Apple is in a unique place to provide it to a segment of the market that doesn't have a problem spending a thousand or two more for a laptop, every year.

Let me change directions a bit here. It's very unusual to find a database that is greater in size than a terabyte. Further, the data contained therein is generally smaller than its footprint by a factor of six to ten. So it's fair to say that for most individuals – indeed most organizations – their data footprint is smaller than 250GB, and probably smaller than 100GB. However, when running through indices for data that large, when we want subsecond response times, most organizations that are serious about data (Oracle, SGI, and RHAT, for example) realize that the filesystem very much gets in the way.

Consider this. When I moved all my iTunes data off of the primary filesystem and into a logical partition therein, I essentially said to the operating system that I didn't care too much about the niceties of file systems. Instead, I wanted portability, scalability, and containment. But why not add to that performance? What are the needs of your average iTunes user? It seems like an obvious answer at first, but really, it's quite complicated.

  • Performance (prefetch; no gaps between gapless tracks or video segments)
  • Redundancy (durability; with "thousands of songs" in my pocket, at $1-$20 per each, I don't want them to disappear)
  • Containment (protection from commingling; Sandy's media should remain separate from mine)
  • Portability (the ability to move media from one machine or device to another; not necessarily the ability to duplicate or "share")
  • Scalability (reformatting or upgrading devices is inherently dangerous; I would like the ability to simply add storage when I run out of room, be it with a physical or logical device, or expanding a logical device)
But we don't have anything that addresses even one of these items from Apple. My current inventory looks to be about 30,000 music items (this includes iTunes U and music videos), 200 movies, 250 TV items, 300 podcasts, 200 book items (misleading, as most of them are split into 3-5 pieces). All told, it's about 300gb (incidentally, the library is about 3 million [logical] words, or about 60MB of XML... more on this in a minute). What I'm getting at is that all of these things have different needs.

Colons? Never! We're running Unix under the hood!

Apple has given us the ability to denote where a library lives. That's half the code you need to support multiple libraries. Of course, from that point, you're two-thirds of the way to defining kinds of libraries in multiple locations. I could specify ten gigs for gapless music, a hundred gigs for television, a hundred gigs for "normal" music, and twenty-five gigs for low-quality audio like iTunes U and podcasts. Because the hardware requirements of all four of those types of media are different, why not specify them in different places (e.g., on different physical volumes)? Moreover, why not specify them as separate libraries so that I could take parts with me and leave others at home? Do I really need to keep two hundred movies on me? Probably not. Of course, I can come up with smart playlists, but I can't manage them manually without spending substantial time on just that. It becomes necessary to have the software understand them in some way. Even extremely rudimentary functionality in this regard would be a significant improvement.

The other thing here is the filesystem getting in the way. Oracle and other database vendors get past this problem by having what they call "raw mode." Essentially, the database owns the physical disk, rather than having the operating system format it and manage it. Why would you want to do this, right? THe answer is simple. The database has its own set of users and advanced ackles. Why does it need to have the operating system managing permission on that disk? Just let "oracle" own it. Since the system administrator is making sure nobody's looking up Oracle's skirt, and Oracle is making sure that the data it's sending out is going to the right, vetted people, Oracle gets the benefit of not having to ask the filesystem for permission to do everything.

Consider the 4mb "song" versus the 750+mb movie or 350mb television show. Among songs, we have albums like Dark Side of the Moon, at 100mb, but with individual components ranging from 2MB to 20MB. DSOTM in particular is intended to be one single piece rather than ten individual pieces. So, when we're at 1.95MB on track A, we need to be reading .25MB into track B. This is governed by the filesystem. With RAIDs, we can set the "block size" to optimize this process. The notion is that for bigger files, we don't want to have to go back to the disk a bunch of times to read a file that's a gig in length. So we set the block size to very high numbers like 64MB or even higher. The corollary to this of course is that for very small files, we don't want to sit there waiting to read 64MB when the file itself is 1MB in length. In this case, we can set block sizes down as small as a few KB.

Apple is in a unique position to help these customers out. First, Apple has been culturing a userbase (in the "imma growin me sum pleghs" sense of the word culture, not as in "high"** culture) of people with enormous storage needs, who spend lots of money on their products, and whose storage needs are generally easily grouped into a few narrow categories. How hard would these things be to understand?

  • An "iTunes Disk" function in the preferences (or in Disk Utility.app), or "Let iTunes manage this disk".
  • A "multiple libraries" function under Advanced... .
  • Different types of media storage by location.
  • "Add volume to library" to extend the available logical space.
  • "Mirror my data here" for one-click redundancy.
  • iTunes Prosumer Edition (or iTunes Pro, or iTunes Enterprise Edition, etc).

So getting back to how this helps the consumer, and how it helps Apple, let me say that Apple's great strength is their ability to abstract away complicated ideas behind simple interfaces through the use of clever algorithms and other tricks of software and hardware. Of late, they've even had a problem bringing their employees up to speed, technically. Because they're not hiring PhD's to work at the local "genius bar," they have to not only explain technical concepts to their employees in a manner that is technical, but not too technical, but of course also explain those products in lay terms for their consumers.

It seems to me that Apple has the opportunity to mine their own customer base for new customers. That's pretty win-win. Of course, for every product Apple brings to market, they manage to fuck up two that are already available and abort three more for lack of management and vision.

Please Donate To Bitcoin Address: [[address]]

Donation of [[value]] BTC Received. Thank You.
[[error]]
Post a Comment