It is just a Formality...

Hey, if you like this article be sure to check out the follow up "Getting started with Dafny: Your first proof"


I have a little joke that I tell to each new employee that I meet: "Hey I work on <Component X>, I'm sorry in advance for all the bugs you'll have to deal with 😅". I tell it to new QA, support folks and even sales people. It always gets a chuckle and is a good ice breaker. (Did I mention that we are hiring?)

But the self deprecating humor is a bit of a defense mechanism, because bugs in my code mortify me. I've seen all sorts over the years: little bugs, big bugs, sneaky bugs, heisenbugs, "I thought we fixed this already!?" bugs and "HOW DID THIS MAKE IT PAST QA 😭" bugs. I've seen bugs that hid for years and then exploded. I've seen bugs with 7 figure price tags...multiple times.

They all hurt. So in my code I take pains to follow principles such as "parse, don't validate" (as written by the brilliant Alexis King), the use of immutable data structures to simplify multithreaded programming and strong type systems to catch inconsistencies. I lean on unit tests to catch bugs in "well written code" and Property Based Testing to validate tricky assumptions.

But even with all of these tactics, there will always be gaps where bugs live in the darkness. But what if there was a way to make your code airtight, so that it contained no corner cases? What if you could prove that every possible configuration combination in your software did not cause invalid behavior? Well there is! In this article I want to introduce you to the gold standard for software verification: Formal Methods.

What are Formal Methods?

Lets ask Wikipedia:

In computer science, formal methods are mathematically rigorous techniques for the specification, development, analysis, and verification of software and hardware systems

Simple enough. It all starts with building a specification that defines "correct" behavior of your system. We make a LOT of assumptions on a day-to-day, minute-to-minute basis about how our code will work. Although the wibbly-wobbly-timey-typey objects in JavaScript or Python are quick and convenient, we have to nail those down and decide exactly how they should behave.

Once your software has a specification that defines correct behavior, then you need to check your implementation to ensure it follows the spec. This is where we typically reach for suites of unit tests. But as we all know, suites of unit tests can never catch every weird combination of conditions that causes incorrect behaviors to arise.

That is where proof assistants (F*, Dafny, Coq, Lean, Liquid Haskell) and model checkers (Alloy, TLA+) arrive on the Formal Methods station wagon. These tools take our specifications along with details of our implementation and rigorously validate it. And when I say rigorously I mean they deliver mathematical certainty of correct behavior as defined by your specification.

What are they good for?

What if you need to write code that was speeding down the highway? Or that was zapping tumors in people's brains? Or code that controlled an irreplaceable satellite millions of miles away? Or code that controls hundreds of millions of dollars (this happens a lot)? Are you going to trust your normal day-to-day programming tools with this weight of responsibility? I, for one, would not.

If your code absolutely, positively MUST work the first time and every time, then you need Formal Methods. It will save you money, time and the heartache of having to explain failures to your clients or far worse, their loved ones.

Why am I just now hearing about this?

Put simply, because it is a pain in the ass 😑.

A lot of proof assistant languages such as Lean and F* rapidly become a soup of esoteric symbols and punctuation marks that us mere mortals cannot fathom. That's probably half the reason most people turn 360 degrees around and moonwalk out the door when the discussion turns to formal methods; they seem impenetrable.

Article content
Actual snippet of a proof written in "Lean" 🤔
Article content
An advanced proof written in Lean 😵

For those that make it past the crazed syntax and esoteric concepts, then there is the real problem. Proving programs are correct is hard. Damn hard. You see, when we are confronted with a machine that does not accept hand waving to hard questions we either leave in frustration or shrink back in defeat. But never fear, there is an easier path and that path is Dafny!

Dafny : A surprisingly practical verification language

Dafny, unlike it's relatives like F* and Lean, comes from the syntactic lineage of C/C++ and more recently, C# and Java. If you can write any of those, you can write Dafny. There are no weird symbols and abbreviations and we are able to build programs in both functional and object-oriented styles.

So lets take a look at the syntax:

Article content
A simple snippet of Dafny

Not that bad ehh? The "ensures" thing looks a little weird, but I bet that you can guess what it does without me saying anything. This snippet was from a little project modeling a toy connection protocol between a server and a client. The whole thing less than 100 lines.

The second big win of Dafny is that has a great toolchain! You can get started in 60 seconds in VSCode just by downloading the extension and creating a new file. There is no barrier to entry!

Dafny is a mature and complex language under continuous development and improvement (the reference document is 448 pages long!). As you can imagine, we are not going to be able to cover everything, so consider this a gentle introduction. A future post will provide examples and show how it really works.

What can Dafny do for us?

"Most proof languages can help you write a formal proof, but then you need to write separate code that exactly implements the proved algorithm."

First and foremost, Dafny helps you write correct software by giving you advanced ways of telling the compiler and verifier what your code is supposed to do. This is also known as design by contract. The contracts primarily take the form of preconditions and postconditions.

In addition to the simpler syntax and more approachable toolchain, Dafny has a super power. Most proof languages can help you write a formal proof, but then you need to write separate code that exactly implements the proved algorithm. This gap leaves room for human error and can undermine the whole point of going through pain for formal proofs.

Dafny however can compile your verified code down into a number of mortal languages including C#, Java, JavaScript, Go and Python. This can give you great confidence that your code is going to work exactly as intended.

So what are these "Contract" things?

Each time we define a function or a method in Dafny we provide markup that defines the boundaries of expected behavior including inputs, outputs and reads and writes to other variables. These behavior constraints come in the form of loop invariants, assertions, preconditions and postconditions. We declare this metadata and Dafny's verifier will let us know if we have covered every possible corner case.

Article content
A more advanced snippet demonstrating postconditions and loop invariants

These contracts are also important to us humans, because they clearly state the developer's intentions to future maintainers. Many times we change or refactor some code and accidentally alter a hidden behavior that was important to the system. Making all of these explicit prevents these types of mistakes.

A precondition is a rule that a method or function requires in order to work correctly. We do this in C# and other languages quite commonly by checking method arguments and throwing exceptions when they are not null or otherwise invalid.

Postconditions are somewhat rare in most day-to-day programming but they are important in Dafny. A post condition is a promise that when a function completes, the program will be in a specific state. It could be a guarantee that a data structure is valid, or that a bank account is not negative or anything else that is important to us.

Loop invariants are trickier. Loops are particularly difficult to prove correct. They can overwrite variables, transform arrays and perform other complex operations that can confuse humans and unfortunately even the verifier system. Loops can also be very hard to prove that they will ever stop looping. To help things along, it is sometimes necessary to add some rules to a loop that the system can prove are always true. These rules are known as Loop Invariants.

What kinds of things can we "verify" in Dafny?

Pretty much anything. The language provides you incredible flexibility in how you structure your code. It could be in a functional programming style, or an object oriented style. If you can write it in a mainstream language like C#, you can write it in Dafny. Dafny has been used to model everything from encryption algorithms to complex distributed systems to blockchain smart contracts. With enough effort, you can describe almost any type of software; the sky (and your time/budget) is the limit!

Sounds great! What's the catch?

There are three big difficulties in working with Dafny (and most other formal verifiers).

1. It's back to the basics

First, you are going to have to build everything from scratch 😕. Unlike in C# and JavaScript where we have package managers that have millions of pre-built classes and functions, Dafny only gives you very basic building blocks to work with. You will need to build up your application's behavior brick by brick which can be very time consuming. If it's any consolation, you have total control over the behavior of the resulting program.

2. You have to help the computer for it to help you

Next you will quickly find out that Dafny's verifier is really dumb. Don't get me wrong, it is operationally perfect at proving things false, but we have to remember that Turing Complete languages are undecidable in general and proving them correct is impossible. It won't take you very long before you write a piece of code that you know for a fact is correct, but Dafny can't make the mental jump to verify it.

This is because the underlying verification engine (called Boogie, which is built on the Z3 SMT solver) uses conservative heuristics to prove your program is correct. This is because even some simple algorithms can be very very computationally expensive to verify. So the verifier doesn't search every possible path to verify your program. It uses various tricks and tactics to do a best effort search. As a result, you will commonly have to add "extra" assertions or invariants that seem obvious to you, but help Dafny figure out that your program is correct.

There is some good news however, Dafny will generate counter examples anytime it fails to prove something fully. You can inspect these examples to see where the logic broke down and try to fix them.

I should note that Dafny will never generate false positives, only false negatives. That is to say that if Dafny says your program is correct, you can rest assured that it is correct (unless your design is bad...). But if Dafny says that it cannot prove your program, it just means that you may need to add extra assertions to help Dafny out. In fact that is how we commonly debug programs in Dafny; when presented with an error, we start adding assertions to smaller and smaller chunks of the program until we either find the gap/mistake, or Dafny agrees that we are telling the truth.

3. Formalism == Hard Mode

Lastly as I mentioned before, it is really really hard to write formal code. All those little assumptions have to be written down in order for the computer to help us validate them. It is normal for the contracts defining a function to take up more lines than the function itself! And when we do write them down, we may find that they interact in complex ways that keep sending us back to the drawing board in our designs.

While this exercise is critical to producing correct and verifiable software, it comes at the cost of slowing development down to a fraction of our normal pace. Writing just one hundred executing lines a day is exceptional for most of this work and it can be a lot rougher than that when dealing with tricky algorithms. It does get easier with time, but it will always be much much slower than just writing normal code.

There is also always the risk that our software specification itself is wrong or incorrectly defined. Dafny will catch contradictions in our specifications, but if you define something incorrectly Dafny can't do much to help you.

Should I use it in my code today?

It depends 🤷. It's going to take an investment of time and effort and you will need to weigh that against the risks and costs of failures. The folks at AWS invested heavily in TLA+ and were able to uncover crazy complex bugs, including one with a 35-step reproduction path. It also enabled them to implement performance optimizations with confidence, which at the scale of AWS can save millions of dollars.

Microsoft has also invested heavily in formal methods. In fact, Dafny, TLA+, Boogie, Z3, Lean and F* all come from, or are stewarded by Microsoft Research. Microsoft also has used formal methods to prove complex systems, such as their IronFleet project which validates the behavior of distributed systems. But at what cost? IronFleet took 3.7 man-years to implement and it was only 5000 lines of code.

At the end of the day, it's all about risk assessment. Probability of Failure x Cost of Failure = Risk. You'll need to do that math for yourself. But knowing how to use these techniques will give you tools tools that let you deliver on super-human levels of reliability.

Next Steps

In my next post, I'll demonstrate a practical example of Dafny to model the behavior of my hypothetical scheduling application I talked about in my previous post. Until then!