I Have Seen the Future, and It is Copy-On-Write

by Ville Laurikari on Wednesday, May 27, 2009

Copy-on-write is a similar advance for concurrent programming as garbage collection was for memory management.

I became a copy-on-write believer at around the turn of the millennium when I was working for the HiBase project as a research assistant.  Among other things, I implemented an in-memory SQL database server in Shines, our prototype programming language, which used a strictly copy-on-write memory model.  We employed an optimistic concurrency control algorithm to resolve possible conflicts for database transactions run in parallel.  The code was elegant, easy to understand, and ridiculously efficient.  There wasn’t a single lock or critical section in the entire system.  Just threads, message passing, and copy-on-write data structures.  Beautiful.

Close ThreadsJohn McCarthy invented garbage collection for Lisp in 1959.  It took almost 40 years for garbage collection to finally hit the mainstream programming consciousness with Java.  I’m not exactly sure when the copy-on-write data model was invented, but based on the Wikipedia page on functional programming I’m going to place it somewhere in the 1970s.  It would be about time for copy-on-write to hit mainstream now, but I don’t see that happening.  What’s wrong?

The Erlang guys have touted the copy-on-write model to be the True Way to do concurrent programming for years.  But Erlang isn’t enjoying much success in the mainstream.  I suspect this is to do with the academic stigma that goes with functional programming and the unusual syntax inherited from Prolog.

Nowadays, copy-on-write techniques are growing increasingly popular also outside the confines of RAM.  In the past ten or so years, a number of file systems have popped up which use a copy-on-write data model to implement snapshotting, for example.  I find ZFS to be the most promising of these.  RAID-Z is just super cool.  But let’s get back to programming languages.

I believe the concurrent programming language for the mainstream will have the copy-on-write data model, threads with message passing, a familiar syntax, plus a big ecosystem of libraries and tools.  Most importantly, though, the language won’t called a “functional” language, even though it might in reality be awfully close to being one.  I wonder when this language will appear.  Or does it already exist?

Next: Copy-on-write 101 – Part 1: What is it?

Related posts:

  1. Copy-On-Write 101 – Part 3: The Sum of Its Parts
  2. Copy-On-Write 101 – Part 2: Putting It to Use
  3. Copy-On-Write 101 – Part 1: What Is It?
  4. Programming Problems in Disguise
  5. Why Write Code When You Can Remove Some?

If you liked this, click here to receive new posts in a reader.
You should also follow me on Twitter here.

Comments on this entry are closed.

{ 5 comments }

Nik July 21, 2009 at 13:05

Just a note, you’ve probably seen it already, Clojure (http://clojure.org/) seems to have some of the Copy-on-Write features described in this article.

Ville Laurikari July 21, 2009 at 13:15

Yep, Clojure is really, really cool. It even has built-in support for software transactional memory. If it only had very efficient automatically disk-backed data structures, Clojure would be near perfect.

Vladimir Sedach August 12, 2011 at 23:32

The first person to apply the idea of multiple versions to concurrency control was David Reed in his 1978 PhD dissertation (http://publications.csail.mit.edu/lcs/specpub.php?id=773), inventing what’s now known as MVCC. It’s been popular in databases since the 80s.

Henry Baker has a good summary of other things this idea has been used for in one of his papers (http://home.pipeline.com/~hbaker1/TreeShadow.html section 7).

Overall the “purely functional” data structure is an obvious idea from the lambda calculus, and predates Turing machines and von Neumann’s work and any kind of programming language. It’s just the fault of limited machine resources that it isn’t the default way programs are thought about.

Raoul Duke November 2, 2012 at 21:09

since when is Erlang copy-on-write? i though only Erjang was.
https://github.com/trifork/erjang

Raoul Duke November 2, 2012 at 21:10

Re: efficient serialized/deserialized data structures: https://github.com/laforge49/JActor might be of possible interest.

{ 1 trackback }

Previous post:

Next post: