A Log-Structured Storage Boom!

We appear to be in the midst of a log-structured storage boom. It's always gratifying to see smart people converge with your work, and that's exactly what's been happening with since we implemented a log store underneath the Aleri Platform.

Valerie Aurora, one of the ZFS engineers at Sun, has a nice write-up on the history of the technology in the context of filesystems over at LWN where she points out, among other things, that there's a log-store in every solid state disk. In another of her articles, she introduces a new filesystem project called Featherstitch, the paper from which is a great deeper look at how filesystems (particularly log-structured ones) work.

One of the reasons we chose a log-store was that it makes it easier to build a form of versioned tree in which each version, identified by a root node that's unique to that version, is actually immutable. Updates are accomplished by writing a change set and a new root node, thus allowing multiple threads to continue reading one version of a tree while a new one is being written. This kind of structure is ideally suited to high write throughput workloads, especially in multithreaded environments.

We aren't the only ones to have looked hard at these problems and come to these particular design conclusions. For example, Manuel Woelker has an article at EclipseSource about the internals of git repositories, Clojure's persistent data structures, and CouchDB's backing store, all of which use -- you guessed it -- some form of immutable tree structure that's updated by a change set and a new tree node. CouchDB even uses a similar file layout to ours.

I, for one, am quite pleased to see the same design decisions being made in all these great projects.