Pragmatic Formal Specifications : Alloy

As a software practitioner I find that the hardest part of software development is actually figuring what you need to build. Many colleagues seem upset when they lose a week’s worth of coding. Not me. When I’ve worked on a project for a week, I can recode it in a fraction of the time, and get a better result, because I know then exactly what to do. The how is rarely a problem. (Especially today when Google is never too far away.) The what is what tends to take time.

When you’re trying to define what you exactly need to build I need to feel the problem. Coding, making prototypes, is a great way for this. However, coding is tedious. You need a lot of boiler plate code for even simple problems. Would it not be great if there was a tool that could minimise this exploratory work? Even better, a tool that could not only be used to explore a model but could even automatically verify aspects of it?

Since the 1990’s I’ve been intrigued with formal specifications. However, my experiences with formal specification languages have been quite abysmal. First trying to get your head around a formal specification language is really hard for a pragmatic old school procedural programmer. Second tooling is normally absent or quite horrid.

This changed a couple of years ago when I read Software Abstractions from Daniel Jackson (the son of Michael for the more senior among my readers). This book introduces a tool called Alloy. This tool is a very pragmatic approach to formal specifications. It can prove a formally specified assertion though only within a limited scope. Maybe this falls short of a full theoretic proof but it is surprisingly useful in practice. Virtually all bugs I’ve seen in my life can be demonstrated in a limited scope. Alloy guarantees that those bugs will be found when they can be be demonstrated in that limited scope.

Types & Tables

To show what Alloy is, let’s take a look at a very simple model of a Java Map. Let’s start with the keys and values type:

sig Object {}

The sig keyword defines a type, like a class in Java. However, in Alloy a type is the name of a table consisting of atoms. Actually, every time you use a name in Alloy, which act like variables or types in Java, it is actually a table of atoms. In Alloy, singletons, null, collections, methods, and types are all tables under the hood. In this, atoms are just an identity, they have no properties, just a unique identity. (We give them a name for convenience but they are just unique identities in the universe.)

A table is a set of tuples of atoms. If we use the curly braces for sets ({}) and parentheses (‘()’) for a tuple then the sig Object represents a table with a single column and an infinite number of rows. An example:

{
  (o1) 
  (o2)
  (o..)
  (o∞)
}

The ox values are the atoms, for convenience we label them.

Structure

Since we now have a model for the Java Object type, we need to define a model for the Java Map’s entries. clearly, a Map is also a table of atoms in Alloy, just like Object. A Map atom identifies one possible Map in the universe of all possible Maps. If all we have is tables then the entries should also be a table and look like:

{
  (m1,o1,o1) 
  (m1,o..,o..) 
  (m1,o∞,o∞) 
  (m..,o1,o1) 
  (m..,o..,o..) 
  (m∞,o1,o1) 
  (m∞,o..,o..) 
  (m∞,o∞,o∞) 
}

This is a table of atoms of type (Map,Object,Object) that holds every possible combination of these atoms. Also known as the cartesion product of Map×Object×Object . That is, any combination of Map, Object, and Object you can think of is present in one row in that table. Since you have an infinite number of Map and Object atoms, the table is impressively long. Since we do not have that space, not even on Github, the infamous elllipse (‘..’) is used to mark the intermediate rows.

In Alloy, you define this table in a surprisingly familiar way:

sig Map {
  entries : Object -> Object
}

The sig Map defines a Map table, holding our Map atoms like Object. The entries acts as you expect but not for the reasons you expect it! That is, entries does not define a field of a Map, it instead defines a table called entries. This tables is a subset of the table that was the cartesian product of Map×Object×Object.

Notice that entries is a table with 3 columns, the first column is for the the Map atom, the second the key atom, and the third the value atom, or (map,key,value) of type (Map,Object,Object). For a Java developer this is confusing since we usually go out of our way to keep fields a local detail of the object. However, in Alloy entries is actually a global table that can be used anywhere in the specification document. This is very similar to the SQL table model that also has no hiding.

Constraining

In our entries table we allow a mapping from any Object to every Object in the universe. In Java, this is not the case. A Java Map can only hold a limited set of keys (memory is so constraining) and each present key can map to only one value. We therefore need to constrain the entries table to model this Java limitation.

In Alloy, we can constrain the table by placing keywords around the relation (->) operator. The Object set -> one Object reduces the entries table to a smaller table (but still infinite) that can, for any given atom, have no mapping or at most one. The key column of the entries table is a subset of the atoms in the Object table. However, when an atom is in the key column, there must be exactly one Object in the value column. We can specify this as follows:

sig Map {
  entries : Object set -> one Object
}

Running

Since we’ve defined a complete model of our universe of Java Maps we can actually run it! We can ask Alloy to show some possible instances of our model. For this, add:

pred show( map : Map ) {}
run show for 3

If you run this from the Execute menu, you get a diagram. Some are quite boring, like no entries in a Map or even not any Map at all, but you can click on Next to see alternatives in the infinite (ok, limited to 3 instances of Map and Object atoms) universe. For example, the following picture shows a Map with two entries. Object0 is mapped to Object0 and Object1 is mapped to Object1.

image

Evaluator

If you want to dive deeper into the found solution, click on the Evaluator button. This opens a shell where you can look at the data using the Alloy language. For example (the > indicates text you can type, the rest is the response from the evaluator):

> Map
{
  Map$0
}
> Object
{
  Object$0,
  Object$1
}
> entries
{
  Map$0->Object$0->Object$0,
  Map$0->Object$1->Object$1
}
> Map$0.entries
{
  Object$0->Object$0,
  Object$1->Object$1
}
> Map$0.entries.Object
{
  Object$0,
  Object$1
}

(formatting changed for readability)

Scope

The 3 we specified in the run phrase is actually the scope. The scope states that we will have max 3 Map atoms and max 3 Object atoms. One could think of Alloy iterating over those 6 objects and creating all possible solutions but then we should be prepared to wait a few eons when we get more different types. Even a low number of instances quickly grows into a very large number, this is the definition of the combinatorial explosion and NP complete problems. Fortunately, Alloy uses the specification to prune this state space with a SAT solver so that answers are usually instantaneous even though the state space is humongous.

Sculpting

Though the Map definition looks surprisingly close to a boilerplate-less Java definitions it is actually the utter opposite. Where in Java you’re not having any instances until you do something, this specification models the universe of possible Map instances with every possible combination of Object keys and values.

I had a hard time understanding this infinite universe model until I realised that in Java you are an engineer building state up from nothing, bit for bit. Formal specifications work the other way around, you start with every possible state in the universe and then you carve out states by specifying constraints. It is like making a statue. A sculptor removes material while an engineer adds material. With formal specifications we’re going to sculpt the rows from the tables!

It took me quite a bit of time to get my head around this model, so I hope you fare better. The problem is that most usage looks very close to how we use fields in Java. It is close, but not really. It works intuitive for the simple cases but without properly understanding it, you get lost quickly when constraints become more complex.

Joining

Anyway, we’ve modelled the basics of a Java Map structure. Time to start with the operations.

If we want to model the size method of our map then we need to define a function. In Alloy, we can use the object oriented style of defining methods:

fun Map.size() : Int {
  # this.entries // The `#` represents the cardinality, or number of rows, of the table. 
}

The this.entries looks fortunately very intuitive to a Java programmer! There is a Map atom and you access the fields!

Not.

In Alloy the dot (‘.’) is not a field operator but it is a join operator. That is, in Java the . retrieves some memory at an offset related to the this pointer; In Alloy we perform a full blown logical join operation like we do in a database with SQL.

You should use the syntax proximity to Java to initially feel what the dot does but remember that it is a bit more powerful than accessing a field in Java.

In the size function, if you say this.entries, you actually join this (the atom member of the Map table for wich this function is called) with the entries table, which is a Map->Object->Object table. That is you join:

Map . Map->Object->Object

Alloy joins first select the last column of the left table with the first column of the right table. That is, only rows are selected that are in both. In this case our left table is Map and only has one column. (The join works for any cardinality > 0.)

So for example, entries is the following table:

{ 
  (m0,o0,o0)
  (m0,o1,o2)
  (m0,o2,o0)
  (m1,o0,o0)
  (m2,o1,o1)
  (m1,o2,o0)
}

If we join this table with {(m1)} then we must first select all rows that start with m1 in the right table, add any columns to the right of or left table (none in this case) and then remove the matching column. This gives us:

    selected             remove
(m1,o0,o0)           (o0,o0)
(m1,o2,o0)           (o2,o0)

In SQL you would execute the following if you could address the columns by a zero based index:

select 1,2 from Map inner join entries where Map.0 = entries.0

Notice that the Map.0 is be the last column of the Map table and entries.0 is the first column of entries.

For example, to get the keySet of a Map we can define:

fun Map.keySet : set Object {
  this.entries.Object
}

By joining the this.entries table (which is Object->Object) with Object, we remove the second Object (the value) column, and are left with a one column table with just the keys of the map, the exact definition of the Java Map.keySet() method!

There will be no further selection on the table because the value column can only hold Object and that is always a subset of the Object table. I.e. any atom in the value column is by definition a member of the Object table. So the join result will include all rows. The only effect is therefore dropping the value column.

Box Join

There is also a box join, which uses the square brackets ([]) like an index operation. This is a join but changes the evaluation order to make it more intuitive to us procedural coders.

We can thus use this type of box join to get the values of the map:

fun Map.values : set Object {
  this.entries[Object]
}

This looks intuitive but there is no magic. The expression Object.(this.entries) is identical to this.entries[Object]. The box join first calculates the table in front of the start bracket (‘[’), and then joins the table inside the brackets with this prefixed table. That is, a[b] == (b).(a). It looks like indexing but don’t be fulled, it is a full join. Again, it works wonderfully intuitive for simple cases but be aware of more complex cases.

Step by step:

  left     right           selection           drop
  (m0) .   (m0,o0,o0)  =   (m0) x (m0,o0,o0)   (o0.o0)
  (m1)     (m0,o1,o2)      (m0) x (m0,o1,o2)   (o1,o2)
  (m2)     (m0,o2,o0)      (m0) x (m0,o2,o0)   (o2,o0)
           (m1,o0,o0)      (m1) x (m1,o0,o0)   (o1,o1)
           (m2,o1,o1)      (m2) x (m2,o1,o1)
           (m1,o2,o0)      (m1) x (m1,o2,o0)

The Map.get Method

Ok, time for the get method:

fun Map.get( key : Object ) : Object {
  this.entries[key] // naive and wrong, read further!
}

When the Object passed to get is not in the this.entries table then the value of the this.entries[key] expression is the empty set. However. We’ve defined the function as return Object, which is a short cut for one Object. Ergo, this function fails if it cannot return a single value. However, Java definitely can handle the case when a key has no mapping so we better model it as well. In Java, we have null, so we therefore need to model it explicitly.

one sig null extends Object {}

This adds a null atom to the Object table. That is, null in Object becomes true with this definition. The one in front of the sig indicates that there is only a single atom representing null.

If there is a key present then we want to return the value, otherwise we must return null. In Alloy, the implies operator is used for those cases. It acts as an if statement with a possible else.

one this.entries[key] implies this.entries[key] else null

One thing that our Java Map also has is the containsKey method. So let’s add it. Since we’ve already done the work in get to detect the presence/absence let’s reuse it . (A bad idea as we will later find out.) We’ll make it a predicate because that is a function that returns a boolean.

pred Map.containsKey( key : Object ) {
  this.get[key] != null  // wrong, but oh so widespread!
}

Mutating

So far, we’ve only queried state. One of the hard things to get your head around in formal specification is that you cannot mutate state. That is, there is no assignment operator, you can only constrain the model. So we cannot put and atom in Map that does not already contain that mapping. In a formal specification you define a transition between two Map atoms. You have one atom representing the Map in the before state and another atom the after state.

To model the put method we cannot use a normal Alloy function because we need to pass the atom for the after state and we need to be able to reason about return values. We also need to signal if a method fails. A method fails when its contract is not met. In Java, this means an exception is thrown. When a method succeeds it indicates all its pre-conditions, post-conditions and invariants are met. This is best modelled with a predicate, where all the parameters and expected return value are passed as arguments.

pred Map.put( m' : Map, key, value, return : Object ) {
    m'.entries = this.entries ++ key -> value
    return = this.get[k]
}

The ++ operator adds the relation key -> value to the after state but it also removes any previous mapping from key to any object. This maps very well to the Java Map put method. If the before map held o1->o2 and we add o1->o0 then the ++ operator removes the o1->o2 mapping and adds the o1->o0 mapping.

The = is not an assignment operator. It just carves out the state-space, it is a constraint. The constraint will remove any solutions from the state space where that constraint is not true. In this case, if the predicate is called with a return that does not match the this.get[k] it just fails. It also rejects any m' that does not match the first line of the predicate. Sounds incredibly complex and inefficient for a poor Java developer but it really isn’t in the magical world of formal specifications.

Verification

What we’ve done so far is to specify the semantics of the size, get, containsKey, values, keySet, and put methods. We’ve specified this unambiguously since the Alloy language can have only one interpretation. After all, it is a specification language! (Actually, we ignored the Java equals semantics to keep things simple.)

However, how sure are we that Alloy’s interpretation is the intended one? Anyone that has written specifications can testify that maybe Donald T. can write fantastic specifications without tools but mere mortals SADly can’t.

Let’s add an assertions in Alloy. An assertions will try to find a counter example that proves the assertion wrong. If no such counter example can be found then we can be sure that, within the given scope, the assertio holds.

To verify our put method we need to traverse the whole state space and search for put methods that succeed. When such a method succeeds, we verify that any invariants hold and any pre- and post conditions are met.

One obvious invariant in the Map example is that when we put a key in a map we expect that the containsKey method returns true afterwards. Therefore, for every put operation that succeeds we require that containsKey is true for the after state. For example:

assert put {
  all m, m' : Map, key, value, return : Object | {      
    m.put[m',key,value,return] implies m'.containsKey[key]
  }
} This assertion states that for every possible combination of the  `m, m', key, value,` and `return` tables, if a `put` operation succeeds for these arguments then the after state must contains `key`.

To run an assertion, you have to add the scope. This looks like:

check put for 3

When we run this assertion from the Execution menu we find that it sadly fails. Alloy finds the following counter example:

put(o,null)
  before = {}
  after  = { o->null }

What’s going on?

Looking at the specification we see that I took a shortcut for the containsKey method. Instead of properly asserting the presence of the key in the entries table we called get and checked if it was null. However, we can do a put[o1,null] and then the value is null, fooling containsKey. With our containsKey, any null value is treated as if there is no mapping. We therefore have to modify the containsKey predicate to really test the presence:

pred Map.containsKey( key : Object ) {
  one this.entries[key]
}

We should then change the get function to use containsKey predicate so we do not have multiple definitions:

fun Map.get( key : Object ) : Object {
  this.containsKey[key] 
    implies  this.entries[key]
    else     null
}

Changing this predicate and running the assertion again shows no more counter examples. So within 3 Object and Map instances we can be ensured that put and containsKey operate correctly.

Total Specification So Far

The total specification so far looks like:

sig Object {}
one sig null extends Object {}
sig Map {                      entries : Object -> lone Object }

fun Map.size() : Int {
  # this.entries                  
}
pred Map.containsKey( key : Object ) {
  one this.entries[key] 
}
fun Map.get( key : Object ) : Object {
  this.containsKey[key] 
    implies this.entries[key]
    else    null
}
fun Map.values : set Object {
  this.entries[Object]
}
fun Map.keySet : set Object {
  this.entries.Object
}
pred Map.put( m' : Map,  k, v, r : Object ) {
  this.get[k] = r
  m'.entries = this.entries ++ k -> v	
}
assert put {
  all disj m, m' : Map, key, value, return : Object {      
    m.put[m',key,value,return] implies
      m'.containsKey[key] 
  }
}
check put for 3
pred show( map : Map ) {}
run show for 3

Is this Not a Lot of Work for Little?

The current model is quite simplistic because its primary value was to demonstrate the Alloy language for Java developers. Still, I did discover the problem with null values for Map; this is a chronic problem in Java that has wide repurcussions. If we would specify APIs using a formal language then I do believe we could prevent those issues better. Too often a Javadoc comment falls woefully short of what a method does. Specifying APIs with a formal language would likely make many APIs a lot better.

That said, where specifications really shine is in concurrent problems. Human intuition is incredible bad at thinking about parallel events. With Alloy it is possible to model concurrent problems by making traces and asserting important properties. In a subsequent blog I hope to show how to construct a model of a flip-flop, a more complex state machine, and some locking algorithms.

Why

My life’s story is API driven code. Simplicity of software systems starts at separating API from usage and implementation. All software disasters I’ve see are caused by too much coupling, creating the dreaded big ball of mudd. Coupling makes code hard to change and harder to predict what a change does. Too much coupling is the root of almost all software evil. APIs are the only trick we have to decouple users and implementers and thereby minimise coupling. And APIs do require very strong specifications to not create submarine couplings that bring us back to square one.

Looking at the enormous amount of bugs we generate daily, especially in security, then I do believe it is paramount for our industry to grow up and become an engineering discipline. We’re no longer just making toy applications that in the worst case annoy our users. Software is pervasive in our society and is an essential part of our infrastructure; people can die when software fails. Point in case the disastrous security and software of most IoT devices that are being produced today. It may be time for our industry to embrace the tools that are out there to minimise bugs. Time we should understand what we promise but also verify that we deliver. This is the area where I think formal tools could be very effective.

Then again, I do know that getting the developer community to do the right thing is incredibly hard.

Conclusion

The only problem I see with using Alloy in anger is that once the specification is written it looks so simple. Why took it so much effort to get there? The result often looks so easy that you wonder why you did not directly go to code? Humans just have no memory for what they did not understand minutes ago.

This experience actually reminds me of an anecdote in one of the books of Daniel Jackson’s father. Michael Jackson was asked by a manager what his opinion was about the staff he had worked with the previous days. Michael answered that he thought one of the female developers, Allie, had really impressed him. The manager looked a bit concerned. He said that they had been impressed with her job interview. However, they were not really sure yet how good she was. The work she had done so far had turned out to be very much simpler then they had expected …

Peter Kriens


Comments