Applied Algorithms

Risk as a reflection of values

A few days ago I was having dinner with some close friends and the discussion drifted to the nature of humanity. My friend argued that the measure of humanity was a sliding scale that was weighted against the amount of survival pressure that a person is under. Further, that survival pressure is primary driver of altruistic expression. Essentially that when the chips are down, our mental risk assessment becomes dominated by external factors instead of an internal sense of morals. The conversation depressed me because it darkens my world view with the clouds of doubt that any of us are actually firmly tied to our sense of morals, but instead it is all about how we perceive and interpret risk.

Working in an engineering organization involves daily interactions with risk management. Risks to timelines, risks to availability and operations, risks to security. We have to weigh the costs and expected ROI of preventative measures against the weighted probability and severity of negative outcomes. "If we invest in penetration testing, how much will it reduce the projected likelihood and severity of a security breech?"

Another fundamental but less discussed risk is the cost/benefit of investing in people. People have value in many different dimensions that matter to organizations. Specific skills, experience, knowledge, principles, curiosity, and determination; these are all things that drive varying degrees of the success of an organization. Some of these might be considered intrinsic to a person's nature such as curiosity and determination, but others, such as skill, experience and knowledge naturally accumulate with age. So to maximize as many dimensions of excellence as possible, organizations naturally favor senior talent because they have natrual advantages. Who wouldn't want to maximize chances of organizational success? I mean, this is dog-eat-dog capitalism.

The risk analysis of an organization will naturally vary with time as an organization's economic context ebbs and flows from peacetime to wartime and back again. But even as we shift footing, an organization's principles should persist, just as I believe people should not abandon their moral center when times get hard. It is in wartime, in times of challenge that our metal is tested and shows. I hear so much about "culture" in tech companies but so little about guiding principles and group conscience that guide us as we endure as an organization and as an institution.

So what is this post actually about? It's about inclusivity. The tech industry owes a great debt to many underserved communities. Even for those of us that don't belong to a specific minority group, we have all belonged to a group that is undervalued and underserved: the junior role.

Each of us started somewhere that required someone in a position of power to reach down a hand and take a risk on us. Maybe we had good grades or some extra help from an acquaintance that could vouch for us, but at the end of the day, we all started inexperienced, untested and unproven. Maybe it was a megacorp, or a scrappy startup or one of the countless middle size companies that make up the bulk of this industry, but every person in tech remembers, that open door, that first handshake, literal or figurative, that sealed our first position on our career path. The good faith opportunity, the chance to gain experience, confidence and prove to ourselves and others, our value. At every step in our careers, whether it was our first job or our most recent promotion was a handshake, a vote of confidence in our potential and value.

This is a subject I feel so strongly about, because I wasn't ready when I got my first handshake, but they took a chance on me and supported me until I was. I owe that handshake so much, from the cozy vacations I have taken, to the roof over my head. I have a bit of a lonely disposition and I don't usually see much of myself in others. But I absolutely see myself in those that waiting for that open door and that first handshake.

Whenever we can, we should try to see the potential and the value in people, even when they don't have 10 page CV to show for it. Junior roles bring new blood and ideas into an organization. They bring perspective and questions that we forgot to ask. They bring mistakes and opportunities for resilience. They bring connection with younger generations that will make up our clients one day; sooner than we think. They can bring fire, ambition and a "why not" attitude that reminds us of who we were and might be again.

But they always bring risk. Risk of slipping a deadline, risk of bugs and mistakes, risk of failure. It cuts to our core principles to take a human risk, especially when we are under pressure to deliver. Maybe next quarter when things are not as busy, maybe once we ship this feature, maybe once we stabilize this system. It's a fallacy that is so easy to tell ourselves. But the reality of capitalism is that tomorrow isn't promised to us and there is never going to be a perfect time to bring on a junior that will take time and investment to provide that all important ROI.

Our choices with that risk, I believe serve as a reflection of our values both as organizations and as people, and that we should take them under serious consideration. There is always going to be the question of how to make as much money as fast as possible. But an essential question that we must ask ourselves is "have we reached far enough to give someone that handshake?". And that handshake, after I've earned my last dollar and my last line of code has stopped executing, is how I would want to be remembered.

It is just a Formality...

Hey, if you like this article be sure to check out the follow up "Getting started with Dafny: Your first proof"

I have a little joke that I tell to each new employee that I meet: "Hey I work on <Component X>, I'm sorry in advance for all the bugs you'll have to deal with 😅". I tell it to new QA, support folks and even sales people. It always gets a chuckle and is a good ice breaker. (Did I mention that we are hiring?)

But the self deprecating humor is a bit of a defense mechanism, because bugs in my code mortify me. I've seen all sorts over the years: little bugs, big bugs, sneaky bugs, heisenbugs, "I thought we fixed this already!?" bugs and "HOW DID THIS MAKE IT PAST QA 😭" bugs. I've seen bugs that hid for years and then exploded. I've seen bugs with 7 figure price tags...multiple times.

They all hurt. So in my code I take pains to follow principles such as "parse, don't validate" (as written by the brilliant Alexis King), the use of immutable data structures to simplify multithreaded programming and strong type systems to catch inconsistencies. I lean on unit tests to catch bugs in "well written code" and Property Based Testing to validate tricky assumptions.

But even with all of these tactics, there will always be gaps where bugs live in the darkness. But what if there was a way to make your code airtight, so that it contained no corner cases? What if you could prove that every possible configuration combination in your software did not cause invalid behavior? Well there is! In this article I want to introduce you to the gold standard for software verification: Formal Methods.

What are Formal Methods?

Lets ask Wikipedia:

In computer science, formal methods are mathematically rigorous techniques for the specification, development, analysis, and verification of software and hardware systems

Simple enough. It all starts with building a specification that defines "correct" behavior of your system. We make a LOT of assumptions on a day-to-day, minute-to-minute basis about how our code will work. Although the wibbly-wobbly-timey-typey objects in JavaScript or Python are quick and convenient, we have to nail those down and decide exactly how they should behave.

Once your software has a specification that defines correct behavior, then you need to check your implementation to ensure it follows the spec. This is where we typically reach for suites of unit tests. But as we all know, suites of unit tests can never catch every weird combination of conditions that causes incorrect behaviors to arise.

That is where proof assistants (F*, Dafny, Coq, Lean, Liquid Haskell) and model checkers (Alloy, TLA+) arrive on the Formal Methods station wagon. These tools take our specifications along with details of our implementation and rigorously validate it. And when I say rigorously I mean they deliver mathematical certainty of correct behavior as defined by your specification.

What are they good for?

What if you need to write code that was speeding down the highway? Or that was zapping tumors in people's brains? Or code that controlled an irreplaceable satellite millions of miles away? Or code that controls hundreds of millions of dollars (this happens a lot)? Are you going to trust your normal day-to-day programming tools with this weight of responsibility? I, for one, would not.

If your code absolutely, positively MUST work the first time and every time, then you need Formal Methods. It will save you money, time and the heartache of having to explain failures to your clients or far worse, their loved ones.

Why am I just now hearing about this?

Put simply, because it is a pain in the ass 😑.

A lot of proof assistant languages such as Lean and F* rapidly become a soup of esoteric symbols and punctuation marks that us mere mortals cannot fathom. That's probably half the reason most people turn 360 degrees around and moonwalk out the door when the discussion turns to formal methods; they seem impenetrable.

Actual snippet of a proof written in "Lean" 🤔

An advanced proof written in Lean 😵

For those that make it past the crazed syntax and esoteric concepts, then there is the real problem. Proving programs are correct is hard. Damn hard. You see, when we are confronted with a machine that does not accept hand waving to hard questions we either leave in frustration or shrink back in defeat. But never fear, there is an easier path and that path is Dafny!

Dafny : A surprisingly practical verification language

Dafny, unlike it's relatives like F* and Lean, comes from the syntactic lineage of C/C++ and more recently, C# and Java. If you can write any of those, you can write Dafny. There are no weird symbols and abbreviations and we are able to build programs in both functional and object-oriented styles.

So lets take a look at the syntax:

A simple snippet of Dafny

Not that bad ehh? The "ensures" thing looks a little weird, but I bet that you can guess what it does without me saying anything. This snippet was from a little project modeling a toy connection protocol between a server and a client. The whole thing less than 100 lines.

The second big win of Dafny is that has a great toolchain! You can get started in 60 seconds in VSCode just by downloading the extension and creating a new file. There is no barrier to entry!

Dafny is a mature and complex language under continuous development and improvement (the reference document is 448 pages long!). As you can imagine, we are not going to be able to cover everything, so consider this a gentle introduction. A future post will provide examples and show how it really works.

What can Dafny do for us?

"Most proof languages can help you write a formal proof, but then you need to write separate code that exactly implements the proved algorithm."

First and foremost, Dafny helps you write correct software by giving you advanced ways of telling the compiler and verifier what your code is supposed to do. This is also known as design by contract. The contracts primarily take the form of preconditions and postconditions.

In addition to the simpler syntax and more approachable toolchain, Dafny has a super power. Most proof languages can help you write a formal proof, but then you need to write separate code that exactly implements the proved algorithm. This gap leaves room for human error and can undermine the whole point of going through pain for formal proofs.

Dafny however can compile your verified code down into a number of mortal languages including C#, Java, JavaScript, Go and Python. This can give you great confidence that your code is going to work exactly as intended.

So what are these "Contract" things?

Each time we define a function or a method in Dafny we provide markup that defines the boundaries of expected behavior including inputs, outputs and reads and writes to other variables. These behavior constraints come in the form of loop invariants, assertions, preconditions and postconditions. We declare this metadata and Dafny's verifier will let us know if we have covered every possible corner case.

A more advanced snippet demonstrating postconditions and loop invariants

These contracts are also important to us humans, because they clearly state the developer's intentions to future maintainers. Many times we change or refactor some code and accidentally alter a hidden behavior that was important to the system. Making all of these explicit prevents these types of mistakes.

A precondition is a rule that a method or function requires in order to work correctly. We do this in C# and other languages quite commonly by checking method arguments and throwing exceptions when they are not null or otherwise invalid.

Postconditions are somewhat rare in most day-to-day programming but they are important in Dafny. A post condition is a promise that when a function completes, the program will be in a specific state. It could be a guarantee that a data structure is valid, or that a bank account is not negative or anything else that is important to us.

Loop invariants are trickier. Loops are particularly difficult to prove correct. They can overwrite variables, transform arrays and perform other complex operations that can confuse humans and unfortunately even the verifier system. Loops can also be very hard to prove that they will ever stop looping. To help things along, it is sometimes necessary to add some rules to a loop that the system can prove are always true. These rules are known as Loop Invariants.

What kinds of things can we "verify" in Dafny?

Pretty much anything. The language provides you incredible flexibility in how you structure your code. It could be in a functional programming style, or an object oriented style. If you can write it in a mainstream language like C#, you can write it in Dafny. Dafny has been used to model everything from encryption algorithms to complex distributed systems to blockchain smart contracts. With enough effort, you can describe almost any type of software; the sky (and your time/budget) is the limit!

Sounds great! What's the catch?

There are three big difficulties in working with Dafny (and most other formal verifiers).

1. It's back to the basics

First, you are going to have to build everything from scratch 😕. Unlike in C# and JavaScript where we have package managers that have millions of pre-built classes and functions, Dafny only gives you very basic building blocks to work with. You will need to build up your application's behavior brick by brick which can be very time consuming. If it's any consolation, you have total control over the behavior of the resulting program.

2. You have to help the computer for it to help you

Next you will quickly find out that Dafny's verifier is really dumb. Don't get me wrong, it is operationally perfect at proving things false, but we have to remember that Turing Complete languages are undecidable in general and proving them correct is impossible. It won't take you very long before you write a piece of code that you know for a fact is correct, but Dafny can't make the mental jump to verify it.

This is because the underlying verification engine (called Boogie, which is built on the Z3 SMT solver) uses conservative heuristics to prove your program is correct. This is because even some simple algorithms can be very very computationally expensive to verify. So the verifier doesn't search every possible path to verify your program. It uses various tricks and tactics to do a best effort search. As a result, you will commonly have to add "extra" assertions or invariants that seem obvious to you, but help Dafny figure out that your program is correct.

There is some good news however, Dafny will generate counter examples anytime it fails to prove something fully. You can inspect these examples to see where the logic broke down and try to fix them.

I should note that Dafny will never generate false positives, only false negatives. That is to say that if Dafny says your program is correct, you can rest assured that it is correct (unless your design is bad...). But if Dafny says that it cannot prove your program, it just means that you may need to add extra assertions to help Dafny out. In fact that is how we commonly debug programs in Dafny; when presented with an error, we start adding assertions to smaller and smaller chunks of the program until we either find the gap/mistake, or Dafny agrees that we are telling the truth.

3. Formalism == Hard Mode

Lastly as I mentioned before, it is really really hard to write formal code. All those little assumptions have to be written down in order for the computer to help us validate them. It is normal for the contracts defining a function to take up more lines than the function itself! And when we do write them down, we may find that they interact in complex ways that keep sending us back to the drawing board in our designs.

While this exercise is critical to producing correct and verifiable software, it comes at the cost of slowing development down to a fraction of our normal pace. Writing just one hundred executing lines a day is exceptional for most of this work and it can be a lot rougher than that when dealing with tricky algorithms. It does get easier with time, but it will always be much much slower than just writing normal code.

There is also always the risk that our software specification itself is wrong or incorrectly defined. Dafny will catch contradictions in our specifications, but if you define something incorrectly Dafny can't do much to help you.

Should I use it in my code today?

It depends 🤷. It's going to take an investment of time and effort and you will need to weigh that against the risks and costs of failures. The folks at AWS invested heavily in TLA+ and were able to uncover crazy complex bugs, including one with a 35-step reproduction path. It also enabled them to implement performance optimizations with confidence, which at the scale of AWS can save millions of dollars.

Microsoft has also invested heavily in formal methods. In fact, Dafny, TLA+, Boogie, Z3, Lean and F* all come from, or are stewarded by Microsoft Research. Microsoft also has used formal methods to prove complex systems, such as their IronFleet project which validates the behavior of distributed systems. But at what cost? IronFleet took 3.7 man-years to implement and it was only 5000 lines of code.

At the end of the day, it's all about risk assessment. Probability of Failure x Cost of Failure = Risk. You'll need to do that math for yourself. But knowing how to use these techniques will give you tools tools that let you deliver on super-human levels of reliability.

Next Steps

In my next post, I'll demonstrate a practical example of Dafny to model the behavior of my hypothetical scheduling application I talked about in my previous post. Until then!

I built a serverless platform in 1 day (and so can you!) Part 2

So, last time we covered the control plane of this platform and some of the nifty things that Microsoft Orleans provides for us that make it a great choice for this kind of platform. Today, we will cove the data plane, which includes routing and execution. But first a quick refresher.

What are we building again?

A basic serverless platform suitable for internal/private deployments. The goal is to provide an easy way to bolt services together, much like AWS Lambda or Azure Functions. These services can be great time savers because they make it very quick and easy to get some code running in the cloud without worrying about managing VMs or dealing with Kubernetes which might have so much deployment complexity that it dominates the effort when you just need a few lines of code running.

That already exists, so why are we building it?

Because we can 😃! It is a great exercise to reinvent the wheel a bit sometimes to learn more about how the wheel actually works. And sometimes (more often than you think), your simpler wheel can out perform the big SaaS wheels that are out there. In this case, I'm trying to demonstrate that a lot of what Azure and AWS sells you is actually not very complex and for certain use cases, you can save a lot of money by building your only miniature version of it. Now back to our regularly scheduled programming 🙂.

Examining the Data Plane implementation

So, as we saw before, the control plane is a set of administrative APIs that "control" the behavior of the system. Specifically, they control how HTTP requests are handled by the system. The data plane handles and routes the HTTP requests coming into the system. Now asp.net already has a lot of fancy routing tech built it for us to use. But we are going to throw that all way and leverage Orleans some more to handle our routing. We do this because asp.net routing is mostly intended to be static upon application startup and we explicitly do not want to have to restart our application just to add new routes and handlers.

So the front door of our system is very simple:

The top level

As a side note, I think I'm going to switch to using Carbon.now.sh for my code formatting from now on, The formatting here on LinkedIn is just terrible 😐.

This is using the very nice minimal APIs from asp.net and as we can see, it's a single mapping for all GET requests to anything below /app in the HTTP path. The real work is being done by this mysterious DataPlaneRouterService that is being injected into out handler. So lets take a look at that:

The real routing logic for our app

Again, not much really going on in here, but lets walk through it. We first do some checking to ensure the "organization name", which serves as the path prefix, is clean. From there we normalize it to all lowercase and then lookup an Orleans Grain that has that identifier and call it's Execute method.

How Grains work in Orleans

I'm going to step back for a moment and remind everyone about how Grains work in Orleans again. As we stated before, Orleans is Microsoft's implementation of the Distributed Virtual Actor pattern. In this case, we are going to focus on the "Virtual" part.

When we talk about Virtual Memory, we are talking about a magical system that lets each process running on a computer believe that it has access to an essentially unlimited amount of memory. The operating system does a lot of bait-and-switch magic behind the scenes to preserve this illusion. If your program hasn't touched a particular chunk of memory for a little while, the operating system will save it to the page file on disk and use the space in RAM for something more important. The next time you need that chunk, the operating system loads it back from the page file and makes it available on the fly. It's a miracle that computers can even work at all, with all of this nonsense going on. But they do and programming is much simpler because of it.

In Orleans, Grains are "Virtual" in the same way. When you need to talk to a specific Grain, you just call a method on the Grain as if it was there the whole time. The Orleans runtime will magically bring it into existence just-in-time somewhere in your cluster without your code knowing about it. If your Grain had previously had state data stored in it, that data is loaded from storage at that time. When you haven't called any methods on that exact Grain instance for a little while, Orleans will put the Grain and it's data back into cold storage to be used at a some point in the future.

The net effect is that in the above code we can summon a grain for any possible path that has been sent to the HTTP API and allow the Grain's internal logic to handle it. The Grain is coded to look in it's internal state and see if a handler and then decides how to proceed. This is fundamentally how we determine how to handle each possible route in our application even if they don't "exist".

By the way, if you're wondering how many Grains Orleans can hold active in memory at once, the answer is millions and the number of Grains that can live in cold storage is infinite (assuming you can pay for the storage).

The Route Handler Grain

The data plane for each RouteHandlerGrain consists of 1 method: Execute. That method looks like this:

Route hander Execute method

We first check to see if this Grain instance has any State loaded already. If it doesn't, we know that no one has set up a handler for this Grain and therefore we treat it as a 404.

As we discussed above, once a Grain has been loaded/activated, it exists as a normal in-memory C# object. This includes the normal ability to use fields and properties to store data. Leveraging this ability, we check if we have already cached a compiled version of the handler script stored in a nullable property. If there is a cached version, then we invoke the script, if we have not, then we load the script on demand, compile and cache it, then invoke it.

We do it this way because compiled/optimized script formats cannot always be serialized to and from cold storage like the other Grain state data, so we recompile the script on-demand after the Grain has been activated. The next time a request comes in for this route, we can used the volatile cached copy to keep things running fast.

The textual bodies of the scripts themselves are stored in another Grain that works much like a stored object in Git. This is done to support a history mechanism so that we can have tracking of what changes have been made to the handler script over time and rollback to them if needed.

Executing an abstract script

One of my company's executives has a saying that goes something like: "It's hard to make things simple". The first version of this code (which you can find here) was kinda ugly, but it got the job done. As I matured the development of this system, I spent a fair amount of effort cleaning up the design so that the code was cleaner to read and would be easier to extend. As a result, the Evaluate method now looks like this:

The core Evaluate method

Again it isn't too complicated to read. We create a context object that encapsulates the request path and a future extension point for data. This context data will be injected into the script's runtime space. We then do some bookkeeping to measure how many resources the script will take to execute and then fire the script. After the script completes we measure how large the output is so we can track it for our consumption based billing, and then update the consumption data for the Organization. That's it.

Executing the actual Javascript

So this does skip over the most interesting part: how do we actually execute the code?! Well, we use a very handy .NET library called Jint that provides a Javascript interpreter. There are pros and cons to using an interpreter vs a compiler, but Jint is well engineered and executes small scripts very quickly, which is what we want. We inject the arguments and a mini utility library API into the script's runtime and then invoke it. The plumbing to do this is organized in a couple of classes, but it boils down to this class that does the heavy lifting.

Of note is the fact that our scripts don't have any of the normal Javascript runtime APIs that you would get from a browser like Chrome or a server like Node or Deno. We have to create them each manually. Currently the only thing I implemented was a very basic Fetch API so that our script could call other upstream services if it needs to. You can see that here:

The Fetch API for our Javascript runtime

As needed, other runtime components can be implemented to make life easier for the script itself, but this is enough already to do useful things. I really want to attach a JSON based key/value store API as a next step.

You'll note there is some code in here to measure the amount of time that we spend waiting on the HTTP GET call. This is to support a future enhancement where we do not bill the client for time that we spend waiting on some other service to execute. This is similar to how Cloudflare does it's billing and I consider it to be more fair. Theoretically, while the script is blocked on an upstream service, we can do other work with the thread so it really wouldn't cost us much. But this isn't fully implemented yet.

So, how well does this thing work?

"...32,500 requests for second is pretty good for a single machine..."

Pretty darn good 😃! Despite the Orleans runtime doing a ton of magical work in the background, there isn't much fundamental overhead to the system. We can see that in a local test on my workstation, which looked like this:

====================================
Thread Count: 8 Requests Per Thread: 10000
Total: 80000 Errors: 0 Subjective Time: 19.166 Real Time: 2.462
Subjective Average: 4,174.01
Mean Latency: 0.240
Real Average: 32,497.69

This shows that we are about to send 80,000 requests through the data plane to a trivial handler script, in about 2.5 seconds. This works out to only 240 microseconds per call, including all the asp.net HTTP controller and Orleans dispatching overhead.

I think 32,500 requests per second is pretty good for a single machine test run and I should mention this number can go much higher when the load is spread across multiple handlers that don't have to contend for the Grain's single threaded execution. There are also other optimizations we could make by to allow better scaling, such as:

Moving consumption updates out of the execution path
Allowing fan-out of individual route execution
Implementing response caching

At the end of the day...

As I mentioned before, the normalized cost of compute in Azure functions is as high as 6 times the cost of bare VMs or K8s containers. Those services deliver ease of deployment, hands free scale-out and shiny web editors. But with a little ingenuity, we don't have to pay the convivence tax to get good functionality. This service could be hooked up to a CI pipeline for script deployments the same as Lambda/Functions and deliver that same power to your organization.

But what about scaling? Well, the final cherry on this cake, is that we require ZERO changes to this code to scale out from 1 machine up to 100 machines. Just add more replicas to the environment and the Orleans cluster will expand to them and load balance across them.

I'm pretty confident that this tiny serverless platform could displace a large fraction of the work that we currently do with Azure functions and it can save a fair amount of money because we don't have to rely on complex and expensive cloud primitives that AWS or Azure have built for us. You can run this on a bare VM or in containers and use nothing else besides S3 or Blob/Table storage, which are as cheap and cost effective as you can get in the cloud.

I'll part with a quote that a very wise client told me years ago:

"You pay for what you don't know"

I built a serverless platform in 1 day (and so can you!)

Ok, ok. It took me 2 days. But I was doing other things with my life 🤷. But anyway, onto the real story.

So a few days ago, I was writing some internal documentation for my job at Ziosk (did I mention we are hiring?), specifically I was writing a primer for newcomers to Microsoft Orleans. To start with, Orleans is Microsoft's implementation of the Distributed Virtual Actor model, which makes people's eyes glaze over, so I try to build intuition other ways. I have fallen into a pattern of introducing Orleans as a "nanoservice" platform. You see, microservices are small independent APIs that handle some specific functionality. But the more microservices we add, the higher the complexity and the greater the woes. But Orleans skips a lot of that pain and lets us break out functionality as if we were programming normal C# classes and interfaces. This enables us to define smaller and more fine-grained services if our designs call for it. Hence, "nanoservices".

While I was explaining this in my document, I mused a bit that since nano is smaller than micro, you could build microservices out of smaller nanoservices. Then it clicked! Maybe I could actually do that! And what could be a more hip way to deploy a microservice but to use a serverless platform and pay 6 times the rate 😃! So I cracked open VSCode and started writing.

What exactly are we building?

Well, a serverless platform! So what is that? Lets ask wikipedia:

Serverless computing is a cloud computing execution model in which the cloud provider allocates machine resources on demand, taking care of the servers on behalf of their customers.

It further states:

Serverless computing does not hold resources in volatile memory; computing is rather done in short bursts with the results persisted to storage. When an app is not in use, there are no computing resources allocated to the app. Pricing is based on the actual amount of resources consumed by an application.

Ok, so it's a platform that abstracts the developer completely away from the machine. It also does computation in short bursts (request/response processing) and costs the client nothing while it is not executing.

From this we can distil a couple of basic requirements:

You need to be able to deploy the serverless code
You need to be able to execute that code to do some processing
You need to be able to pay for consumption and nothing else.

How are we going to build it?

We are going to use one of my favorite tech stacks: C# + Orleans + gRPC. Actually F# is my favorite language, but that would be harder for you folks to follow along with, so back to C#, my 3rd favorite language 😉. The serverless code will be in Javascript because it is the most popular language on earth and it is well supported. As for Orleans, I think it will fit the workload very well and make prototyping easier. gRPC is a no-brainer for me. It gives me compile time safety for my client and server code, it's my default choice.

We are going to build this app on the fly and not do a lot of up front design. This was thought of and built on a whim, so I didn't want to try to make something perfect. This was a personal hackathon. If you stop by my repo, you can see it commit by commit. Lets jump in!

Building out the Control Plane

Our system will consist of a control plane and a data plane. The control plane is how routes are controlled and other administrative tasks are controlled. The data plane is where requests flow in the front door and are ultimately served by the serverless code. The control plane will need a proper interface that we can talk to programmatically. So I reach for my favorite tool for APIs, which is gRPC. gRPC lets me define a server in a special definition language and then generate both server and client code.

The control plane basically looks like this :

service ControlPlane {
  rpc QueryUsage QueryConsumptionRequest) 
    returns QueryConsumptionResponse) {}

  rpc RegisterHandler (RegisterHandlerRequest) 
    returns (RegisterHandlerResponse) {}
}

So there are two methods, QueryUsage and RegisterHandler. This is the bare minimum for us to compute billing and upload code. This code will be handled by a asp.net gRPC controller that will then forward the details onward to Orleans. We are hosting the gRPC controllers in the same process as the Orleans because it is faster and simpler. The controllers are not very complex, here the code for `RegisterHandler` with the error handling removed for brevity

var normalizedHandlerName = 
  helpers.GetNormalizedHandlerName(
    request.HandlerId
  );

var grain = clusterClient.GetGrain<IJavascriptGrain>(
  normalizedHandlerName
);

await grain.Import(request.HandlerJsBody);

return new RegisterHandlerResponse {
    Success = true
};

That's about it. We convert the handler id to lowercase, lookup the Orleans grain by that id, call Import on the grain and return a result. This is missing auth, but hang with me.

You may be curious about the Import method, so here it is:

private Script? Script { get; set; }

public async Task Import(string code)
{
    var script = ParseScriptOrThrow(code);

    Script = script;
    state.State ??= new JavascriptGrainState();
    state.State.Code = code;

    await state.WriteStateAsync();//save to storage
}

The method parses the code to ensure the syntax is valid, because why let someone push broken code 😐. After that it stores the parsed script in a volatile field in the class (which is just fine) and then updates the persistent state with the raw code string and saves it. (We'll see why later...🙂).

Actually lets see why now 😃.

The Why and the How of Orleans

You see an Grain in Orleans is like an instance of an object. In fact it literally is an object instance in .NET. It has methods and fields and a constructor. We can store things in the fields of this object like the Script property we saw above. And they will disappear when the object is collected. However, we can mark some of those fields and properties as persistent and then Orleans takes over control of the data.

The lifecycle of a Grain in Orleans

A Grain in Orleans is never created and it is never destroyed. That is why we call it the Virtual Actor pattern. You call up a grain by providing a unique key, much like a primary key in a database. If this grain has state associated with it, Orleans transparently loads that data into memory when it routes a request to that Grain. After that, any method calls to the Grain can use that state as normal. If the Grain instance in memory hasn't had a method call sent to it for a while, Orleans will deactivate the grain and save it's state back to persistent storage (such as S3 or a database).

Does this sound a little familiar?

Yep, most serverless platforms work the same way. If your Azure Function or AWS Lambda isn't used for a while, it goes to sleep and then the next time it is called, it has to warm up. This is called the cold start problem and it can be a big annoyance for some use cases (but my prototype doesn't suffer much from this 😁). So now you see why Orleans is a good fit. We store the raw script in the database and then have Orleans Load it on demand when we need it. Once loaded we parse the script once and cache the result in the Grain's Script property and then reuse the cached Script to execute the request again and again.

The alternative?

We, of course, could have done all this ourselves and used normal asp.net. But then we would need to:

Manually integrate with some persistent storage technology to store the scripts and the consumption data.
Designed a caching mechanism to store the scripts in memory so we aren't clobbering the persistent store.
Built out a cache eviction policy so that we weren't holding on to every script ever in memory.
Figured out how to store and when to write consumption data back to the database
Finally once we are done with all of that, we can then worry about the hard problem of scaling this to run on multiple servers in a coherent way without having a lot of coordination traffic between the server instances, which is not a trivial problem

But we don't have to, because Orleans exists and it does all of this for us and more. ✨

This blog post is getting a little long, so lets take a break here. Next time, we will see how the data plane is implemented and also handle how to measure consumption for billing purposes. Until then!

An Introduction to Testing the Impossible

I just finished my 2nd reading of “There is no Antimemetics division” by qntm. It is a short but wild novel that centers around the concept of creatures that the human mind cannot perceive and a group of intrepid protagonists that must use deduction and raw intelligence to notice gaps in their reality where they have encountered these creatures. The book explores a very thought-provoking concept: How do you combat an adversary that you cannot see or even comprehend fully?

This bares a passing resemblance to the topic of today’s post: How do you find bugs when your software is so complex that you cannot tell if its output is correct? We see lots of documentation and blog posts talking about how to write tests for trivial functions and toy programs. But the process of writing good tests for software with complex behaviors or outputs is often left as an exercise for the reader. In this post I’m going to cover an advanced approach to help solve that problem.

A Motivating Example

Let's pretend for a moment that you have just completed a particularly challenging and intricate event scheduling system. It is a critical, high-performance component that was hard to write and full of tricky business requirements. The component reads a bunch of schedule entries, some for specific dates and times, others that repeat, some for only certain objects or groups of objects, some with superseding priority and turns them all into a ‘compiled’ `ISchedule` object that can be queried.

public interface ISchedule
{
    public ScheduleEvent? GetCurrentEvent(DateTime pointInTime, Entity entity);
}

You draw out a small example schedule to test. It seems ok, but after some thought, you are worried about your implementation accidentally applying an event every day, or every third Wednesday. So, you write many more invariants and conditions using small test schedules with a couple of events each.

However, the schedules in production will involve thousands of overlapping events for hundreds of entities. It seems intractable to test these large and complex schedules authoritatively. After a schedule gets to a certain size, it becomes difficult for humans to reason about what is a correct answer to a query. We can’t really write a mock to validate our answers system because the mock would need to be almost as complex as the SUT. How could we improve our confidence in the behavior of the SUT with high complexity inputs?

Metamorphic Testing

I was puzzling around this concept on and off for about 18 months before I stumbled on a research paper that finally gave me the vocabulary I needed to dig into the literature. The problem we are describing is called the Test Oracle Problem. A slightly more formal definition is given in the paper “The Oracle Problem in Software Testing: A Survey”:

"Given an input for a system, the challenge of distinguishing the corresponding desired, correct behaviour from potentially incorrect behavior is called the 'test oracle problem'"

It turns out that several disciplines of software development suffer badly from this problem. Machine learning systems, compilers and search engines create outputs that are so complex that it is extremely difficult, if not impossible, to determine if they are fully correct. However, a technique called Metamorphic Testing can be used to reason about outputs that would otherwise be too large for humans to evaluate. It requires us to generalize our approach and move up to a higher level of reasoning.

The basic idea goes like this, you take some input value and then you alter it in a controlled way creating a new, related input. You then rack your brain and come up with a ‘hypothesis’ that your change should cause some predictable and observable effect on the output. You then randomly generate thousands of inputs, alter them and then run the input pairs through the SUT. Afterwards you inspect the differences between the outputs to see if your hypothesis is true. Concretely, the hypothesis is a rule that states, “given any two inputs that are related in a specific way, the outputs should be related in another specific way. This hypothesis is formally called a “Metamorphic Relation”.

A basic example might look like this: You start with a list. You then copy it and add a new value to the list. You then declare that if I have added a new item to a list, the length of the new list should always be greater than the original list. This might not seem profound, but it has a key advantage over normal invariants and postconditions; you do not need to know anything about the original list. The starting list might have been empty, or it might have had 100, 000 items already. You only need to know about the change itself; in this case, adding a new item.

Attacking the problem

If you think back to our example of the compiled schedule, we no longer need to know anything about the schedule itself, the items in it or its validity. We could randomly generate an extremely large schedule involving thousands of rules and then use this technique to hunt for bugs despite the overwhelming complexity of the SUT output. So how might we attack the schedule problem in practical terms? Let's take another look at the definition of the SUT output.

public interface ISchedule
{
	public ScheduleEvent? GetCurrentEvent(DateTime pointInTime, Entity entity);
}

We have a function called GetCurrentEvent. It accepts two inputs and produces a nullable output. This seems very tough to validate in general and deceptively impossible to contrast two versions of it. But it turns out that we can transform this function into a concrete data structure that is easier to manipulate. Buckle up, we are going to go a little deep on the math for a moment.

We first need to recognize that any pure function can be thought of as a Set of input and output pairs. This might seem overly abstract but trust me, Sets are good because they are quite easy to work with. So, now we need to create a Set that is a decent representation of our function. We can do this simply by “sampling” the function. That is to say that we take a bunch of input values, run them through the function to get some outputs, then take the input / output pairs and insert them into the Set. There are several ways we can sample the function, randomly or at regular intervals or by some other heuristic.

It should be noted however that sampling too loosely will cause the Map to be a poor approximation of the underlying function and may not pick up key detail. And of course, sampling too much will be slow and inefficient. For my testing, I chose to take a sample for each schedule entity, every 15 minutes from 2022-1-1 through 2022-12-31. That works out to about 35, 000 data points for each entity in the schedule. That might seem like alot, but it only costs a few KB of memory, and we should take advantage of having amazingly fast computers. Anyway, this produces a large mapping that has “good” resolution into the underlying schedule.

Now that we have a convenient (and powerful) way of contrasting outputs (by comparing the Set representations of sample maps), we can begin to speculate about what a useful hypothesis might be. How about this, “If I add a new schedule rule that does not match any valid Entity in the schedule, the schedule sample map should not change”. We then use our tooling to generate thousands of random, schedule rules (or groups of rules) that fit this description and test it! This, again, might sound simple, but it provides a way for us to test some properties of our SUT. Another rule might be “If I add a schedule rule that matches a valid entity, the schedule map should change”. This sounds might sound good at first, but if we think carefully, this statement isn’t true. If we happen to add a schedule rule that exactly matches a rule that is already in the schedule, then the schedule would not change. Or more importantly, it should not change! We can turn this into a Metamorphic Relation like, “If I add a schedule rule that is already in the schedule, the schedule map should not change”.

But we can go deeper than this. Because we have translated our function into a convenient Set representation, we can use off the shelf set operations to transform our outputs. What about this relation “If I add a schedule rule that affects entity X, then the output should only differ by output samples that relate to X”. We can determine this, by using Set subtraction which will leave us with a Set that only contains the differences between the two sets.

I bet you can think of a few more relations that exist between outputs. Each one of these can help us cover more of the possible input space and they can do so while operating on massively complex inputs that we can't evaluate by hand. But the key takeaway is that we can combine this with other more traditional types of testing to help shine light on more of the universe of possible defects that might exist. It isn’t a silver bullet, but it does help us shine light on dark corners that might have hidden bugs.

I’ll leave you with a favorite quote of mine about testing.

"…But unit tests are not enough. Type systems are not enough. Contracts are not enough, formal specs are not enough, code review isn’t enough, nothing is enough. We have to use everything we have to even hope of writing correct code, because there’s only one way a program is right and infinite ways a program can be wrong, and we can’t assume that any tool we use will prevent more than a narrow slice of all those wrong ways."

- Hillel Wayne