Swarm: A true distributed programming language
Monday, October 6, 2008 at 9:48PM Fundamentals
The fundamental concept behind Swarm is that we should "move the computation, not the data".
The Swarm prototype is a simple stack-based language, akin to a primitive version of the Java bytecode interpreter. I wanted the proof of concept to be quick to implement, while demonstrating that the concept could work for a popular runtime like the JVM or Microsoft's CLR.
Update (Sept 17th 09): Swarm is now implemented as a Scala library, so you program in normal Scala, rather than a custom stack-based library as with the prototype described here. It uses the Scala 2.8 Continuations plugin to achieve this. See end of blog post for further information.
The Prototype
The prototype is implemented in Scala, and I will use snippets of Scala code below, but a knowledge of Scala won't be required to understand the rest of this article. I chose Scala because I wanted to learn it, and because its rich semantics tends to make coding easier and faster than Java (my normal language of choice).
As with the JVM, there are three places to store data in the Swarm VM: the stack, a local variable array, and the store. The stack is used for intermediate values in computations, data here tends to be very short-lived. In the prototype it is implemented as a List[Any]. The local variable array is for data that is used within a block of code (its implemented as a Map[Int, Any]).
The "Store"
The "store" is somewhat analogous to the JVM heap. It is used for long-term storage of data, indeed, in an actual implementation it may be persistent, and/or transactional, but in the prototype it is in-memory. The store contains "objects", each of which is a list of key-value pairs. The values may be references to other objects. The store is implemented as a Map[Int, Map[String, Any]].
The store is the key to how Swarm is distributed. A store can be spread across multiple computers, and objects in a store can point to objects on remote computers. While the objects stored on different computers are different, all the computers contain the same computer program (or must have access to it).
When some code is running on a computer that wants to read or write to an object stored locally, this can be done as it would with any other computer, a very efficient operation. But what if the code needs to access a remote object? Well, in this case, rather than moving the remote object across the network to the local computer, we freeze the computation, and move it to the remote computer, where it resumes execution - now able to access the object directly.
Continuations
Freezing the code execution in this way is called a "continuation". A continuation is akin to using the "Save Game" feature in a computer game. It allows you to start the game from the same place in the future, or even to email the saved game to a friend of yours and have them continue where you left-off.
Continuations are supported by some advanced programming languages like Haskell and Ruby, but not other more common languages like Java and Python (at least, not directly).
In practice, the continuation must contain the current stack, the current local variables, and the program counter. These three things represent the "state" of a Swarm computer program. If you can transmit these three things to another computer that also has access to the same computer program, then it can pick up where you left off, and the author of the computer program doesn't need to know or care about this little bit of magic.
Of course, sending this state across the network, even though it will typically be quite a small amount of data (probably fitting in a single IP packet), will still introduce a significant delay relative to the normal speed of code execution. For this reason, Swarm must try to ensure that objects are distributed among the computers in the cluster in such a way that over-the-network traffic is minimized.
Deciding where objects should go
Our goal is to minimise the number of times continuations must be packages up and sent across the network. We can do this by ensuring that objects which are "tightly coupled", meaning that the reference from one to the other are frequently used by the code, reside on the same computer.
For now we use a very simple approach to this. When an object is created, we attempt to put it on the same computer as the last object accessed. This is easy, because this is the computer that will be executing the code. The only situation under which we don't do this is if the current server has too many objects relative to the other servers, in which case we create the object on the least-full server. This helps to balance objects across the servers.
Of course, this is a very simple approach, in a real-world implementation we'd want much more sophisticated load-balancing that monitored actual usage and moved objects as necessary.
The simulation
To test these ideas I wrote a simulation in Scala. My goal was to create something (to quote Einstein) as simple as possible, but not simpler.
First I created a simple stack-based language. It supports primitive control commands, Goto and If. It also supports mathematical operations like addition, generation of random numbers, and greater than comparison. Lastly, it supports commands to create, modify, and retrieve objects in the store. Recall that each object is a list of key-value pairs, where the key is a string, and the value can be a number, string, or a reference to another object. At the time of writing the language doesn't support subroutines, but these could be added with relative ease. You can find the definitions of these commands in Instruction.scala.
I used immutable datastructures for the state of the execution, a List for the stack (Scala's Stack implementation is buggy - yay Scala!), and an IntMap for the local variables.
The simulated Store
Next I implemented the Store class. This is a map of integers to objects. Each object is itself a map of strings to arbitrary values, which could include integers, strings, ObjectRef objects, among others.
Overseeing the various Stores is a Cluster. Its role is to tell us when we should create new objects, using its loadBalance() function. It will generally create the object on the current computer (aka "node"), but will sometimes say that it must be created on another node to ensure objects are reasonably well distributed.
The load balancing algorithm is simple: No store is permitted to have less than half the free-space of the least-full store. If it is, then the object is created on the least-full store instead.
A test program
Next I needed to write a simple program to see how the Swarm prototype performs. I needed something simple, but which would hopefully be representative of the way real-world software creates and uses objects.

I decided to implement a simple binary-search tree, and then insert random numbers into it. It took me a while, and was rather tedious, but eventually I got something working. You can see the end result in Test2.scala. Initially line numbers had to be specified explicitly, this was a royal PITA, so I later implemented a label mechanism, and a "compilation" pre-processing stage (implemented in Interpreter.scala) that assigned line-numbers to the labels.
The experiment
I created 5 virtual computers, each with a capacity of 20 objects. Of course, this is a ludicrously small number of objects (in the real world, a node would be able to handle millions, even billions of objects), but I wanted to stress the system.
Results
Here is a graph of the resultant tree, the color indicates which of the computers the object is stored on - as you can see the root of the tree was stored on the pink computer.

In this experiment, I show the percentage of instructions that result in a continuation being transmitted between computers as the size of the tree is increased:

As can be seen, it increases rapidly before leveling out around 3.25%.
Analysis
Even the simplistic load-balancing approach employed in the simulation seems to be reasonably effective, only 3.5% of instructions required some communication between nodes. A number of factors conspired to make this a difficult task for it:
- The insertion of data to random points in the tree make it more difficult for the algorithm to group branches of the tree together on the same computer
- Each node had very limited capacity
Immediate future
Continuous re-allocation of objects between computers in response to actual usage is key. This will need to be a trade-off between efficiency of the re-balancing process, and its effectiveness. Near-term work on the simulation is likely to focus on this.
Long term
I guesstimate that turning this concept into an actual working piece of software would be a 5 man-year effort. Among the major questions/challenges are:
Can we work with an existing virtual machine, or do we need a new language?
Ideally we would be able to make Swarm support a widely-used bytecode language such as the Java Virtual Machine, Microsoft's Common Language Runtime, or the LLVM compiler infrastructure. Swarm's simple bytecode is a simplification of the Java Virtual Machine for this purpose. However, it is possible that we will uncover reasons that we need a new language for this. That would be unfortunate if the case, as it would create a significant barrier to entry.
What about persistence, transactions, and replication?
If we want Swarm to have the potential to replace databases, then it will need to be able to store objects persistently (ie. on disk). It will also need to support transactions (so that we can be sure that the stored data is always in a consistent state). We will also need a solution to ensure that it is fault-tolerant, meaning some kind of replication scheme.
Wrap up
Right now Swarm is still more of a thought-experiment for me, as I am still very-much focussed on my day job. That being said, I have had some conversations with a number of people who have expressed an interest in providing funding to make Swarm a reality. As I mentioned before, it would be a big task, probably requiring around 5 very smart people working full-time for at least a year. The potential, however, is huge - just imagine being able to write massively scalable applications without having to think about scalability and persistence!
Source code
You can find the Scala source code in a public git repository here. The code is not that well commented, and at the time of writing it will spit out a Graphviz file to produce the tree diagram above, its not really intended for public consumption - but it may be interesting for people to browse.
Update (13th Jan 09): I have created a Google Code project for Swarm with a discussion mailing list. If you are interested please head on over and sign up!
Update 2 (17th Sept 09): I've done a lot more work on this, see here.
Programming,
Projects,
Swarm 

Reader Comments (27)
[...] RangerLG wrote an interesting post today onHere’s a quick excerptA store can be spread across multiple computers, and objects in a store can point to objects on remote computers. While the objects stored on different computers are different, all the computers contain the same computer program (or … [...]
Have you thought about using something like work stealing to distribute the load? That way, you just pile all the new work up on the machine that created it and then when you run out of work, you ask another (randomly chosen) machine to hand some over. The victim server can choose to only hand over things it thinks are the least related (in terms of data access) to currently executing work. Work stealing should reduce the amount of communication you have to do with other nodes. I guess it could get a bit more complicated if the pieces of work have priorities though.
didroe, I have considered something similar - basically moving the objects around so as to minimize cross-network traffic while trying to evenly distribute the workload. This would probably be through some kind of clustering algorithm, possibly taking place as part of a garbage collection cycle.
So the objects would be moved around on an ad-hoc basis and then periodically you organise everything?
I'm curious, as an object can be stored anywhere, does that mean you have to perform a lookup to find which node has a given object before retrieving it? Or do you have some clever way around that where you intrinsically know which node to go to?
didroe, the reference to the object contains the node that is storing the object. Obviously, these will need to be updated somehow if the object is moved.
This is so cool as to be beyond mind boggling in its future applications. AI is truly only computer heart beats away.
Hurray!
Hi Ian,
This looks great stuff, but what if a node crashed in between the execution, we are keeping track of object's current residence, but what about their current state, to retrieve them back, and place somewhere else? any good idea!!
adyut, we'd need replication and transactions for that.
It seems that you need some more primitives to implement mapreduce. Perhaps something like a "for each object of this type run this code" and something like "collect all the objects of this type" ?
Hein, yes - right now there is no way to fork the execution, or to re-combine the results of such a fork. It would be nice if this could be handled transparently to the programmer. That would require dependency analysis within the code to see what could safely be executed in parallel.
WOW! Now that is thinking outside of the box.
Awesome. I have so many suggestions and open questions that I posted it rather than leave a behemoth comment:
http://blog.asmartbear.com/2008/10/ideas-for-swarm.html
Highlights: How to get cross-store caching "for free," why the graphs are the most important thing here, argument for the JVM as your VM, raising issues with external resources, user interaction, and synchronization.
Ideas and Questions for Swarm...
Highlights: How to get cross-store caching “for free,” why the graphs are the most important thing here, argument for the JVM as your VM, raising issues with external resources, user interaction, and synchronization....
Not sure if this is different than software agents, a concept that is at least 10 years old. I (vaguely) remember my first Java job that used ORBs (object request brokers) which are fancy RMI servers to ferry bytecode (i.e., agents) to various nodes to process data housed in local OODBMS.
There's a language from many years ago that is similar to this called Emerald, written by Eric Jul and his mates when he was at college
Imagine building a distributed bittorrent tracker and index website based on a framework such as this, using the resources of anonymous peers to serve torrents. No more single point of failure like servers and isps. Impossible to take down short of disabling the entire internet or rounding up every single peer.
Of course in addition to a robust distribution framework, it also needs safeguards to provide moderation and prevent corruption by third parties and trolls.
It may even be impossible to implement successfully, but I can always keep dreaming...
You need to have a look at grid computing research, a lot of what you talk about is pretty much already out there. (and quite advanced compared to your ideas btw)
moving the computation the way you mention is just what clustering systems do, but on a distributed system how do you install the code on all nodes? do you have mechanisms to automatically move the code if it doesn't exist on the remote node?
English Spanish translation performed by qualified English Spanish translators. Guaranteed translation service from the best professional English to Spanish translators in the translation industry.
Thanks
english translation
http://www.setranslations.com/
Great Job Ian... Pretty impressive :-)
The use of Scala might make it harder for this to go mainstream though... The fact is that there are too many languages out there and what I think people need is a simple set of efficient libraries that will enable developers do parallel and distributed computing with minimum difficulty.
Generally, the *key* issues with distributed and parallel setting are interest management, load balancing and synchronization and a language with a set of libraries to efficiently address the coding of those features and related algorithms will be great.... Will ping you via the DL for more :-)
How is this different from the hadoop programming model, which also brings the program to the data?
This is a "partition" data and "partiton" behaviour based-on object like in OOP model , that all objects are scattered in network and all messages are send "com and go" among object in network not objects'datas is "move" in network ,and a behaviour of a object A is execute at place where A "live" ,and VMs have role as message transfer among objects,all that hence all behaviour are parallely executed,right? If that's right,why not extend a OOP language as Java,C#,... or construct a build-in library to support this feature ínstead of create a new programming language?
Processing dispatch has to handle failure cases.
Destination can go down taking its data and any execution objects with it.
Some secure server(s) must contain data that can reissue a whole job or partials on failure.
Data objects need a directory mechanism so that processing objects can be dispatched.
Data dependency/independence models must drive higher level data/processing dispatch.
The devil is in the details. The data and processing operations are trivial compared to the smart/adptive control of the multiprocessing distribution.
How is this different from the hadoop programming model?
A sail veering about the blank bay waiting for a swollen designer sunglasses wholesale bundle to bob up, roll over to the sun a puffy replica sunglasses wholesale face, salt white. Here I am. They followed the winding path down to the creek. Buck Mulligan stood on a stone, in shirtsleeves, his wholesale cheap sunglasses unclipped tie rippling over his shoulder. A young man clinging to a spur of rock near him moved slowly frogwise his green sunglass wholesale legs in the deep jelly of the water. Buck Mulligan sat down to unlace his wholesale oakley sunglasses boots. An elderly man shot up near the spur of rock a blowing red sunglasses for wholesale face. He scrambled up by the stones, water glistening on his fashion sunglasses wholesale pate and on its garland of grey hair, water rilling over his chest and paunch and spilling jets out of his wholesale sunglasses china black sagging loincloth.
Hi!
Interesting stuff! But isn't it rather a case for an distributed OS of late (or for that matter Hadoop of the prvois comment)? On the other side viewing it from the language-design angle is maybe more promising as the adoption threshold isn't so high?
Thus: why did you choose to implement the infrastructure as a language runtime?