Monday, April 5, 2010

I spent a lot of time over the weekend thinking about persistent data. I was trying to come up with a set of base abstractions over which to build a persistent store. I haven't found the best set, but I found part of a good set that can be refined to be better.

You need two things in order to effectively use persistent data. The first is a ‘durable store’ which we will use to remember the information, the second is a management system that gives us control over what is stored and retrieved and acts as an intermediary between transient programs and the persistent data they need. The basic interface between the durable store and the management system is this:
(write-bytes! store-name byte-vector) => address

(read-bytes store-name address) => byte-vector
write-bytes! is the primitive means of saving information. The store-name argument identifies which particular durable store save the bytes in, and the byte-vector argument is a vector of small numbers for the store to save. When invoked, the store should save the numbers somehow (more on this later), and return an address, which is perhaps simply an integer.

read-bytes is the primitive means of recovering information. Again, the store-name argument identifies which particular durable store to read the bytes from, and the address is a value that was previously returned by a call to write-bytes! on that same store. It should return a byte vector whose contents are equal to the one that was stored.

We'll make a very simple requirement for the behavior of this API. write-bytes! may either succeed or fail. If it fails, we require it to not return a success indication. For this API, we assume that if write-bytes! returns an address, that it succeeded. Furthermore, read-bytes may also fail or succeed, but if it succeeds it must return the same bytes that were given in the call to write-bytes! that produced that address.

We'll need to extend this API a bit, but it is an ok start.

This API gives us a mechanism for durability, but it isn't a very useful one. What if you want to save something other than byte vectors? How do you remember the address that you got back? This will be the responsibility of the management layer.

Exercise 1: Implement the above API using the file system.

No comments: