I built a serverless platform in 1 day (and so can you!) Part 2

So, last time we covered the control plane of this platform and some of the nifty things that Microsoft Orleans provides for us that make it a great choice for this kind of platform. Today, we will cove the data plane, which includes routing and execution. But first a quick refresher.

What are we building again?

A basic serverless platform suitable for internal/private deployments. The goal is to provide an easy way to bolt services together, much like AWS Lambda or Azure Functions. These services can be great time savers because they make it very quick and easy to get some code running in the cloud without worrying about managing VMs or dealing with Kubernetes which might have so much deployment complexity that it dominates the effort when you just need a few lines of code running.

That already exists, so why are we building it?

Because we can πŸ˜ƒ! It is a great exercise to reinvent the wheel a bit sometimes to learn more about how the wheel actually works. And sometimes (more often than you think), your simpler wheel can out perform the big SaaS wheels that are out there. In this case, I'm trying to demonstrate that a lot of what Azure and AWS sells you is actually not very complex and for certain use cases, you can save a lot of money by building your only miniature version of it. Now back to our regularly scheduled programming πŸ™‚.

Examining the Data Plane implementation

So, as we saw before, the control plane is a set of administrative APIs that "control" the behavior of the system. Specifically, they control how HTTP requests are handled by the system. The data plane handles and routes the HTTP requests coming into the system. Now asp.net already has a lot of fancy routing tech built it for us to use. But we are going to throw that all way and leverage Orleans some more to handle our routing. We do this because asp.net routing is mostly intended to be static upon application startup and we explicitly do not want to have to restart our application just to add new routes and handlers.

So the front door of our system is very simple:

Article content
The top level

As a side note, I think I'm going to switch to using Carbon.now.sh for my code formatting from now on, The formatting here on LinkedIn is just terrible 😐.

This is using the very nice minimal APIs from asp.net and as we can see, it's a single mapping for all GET requests to anything below /app in the HTTP path. The real work is being done by this mysterious DataPlaneRouterService that is being injected into out handler. So lets take a look at that:

Article content
The real routing logic for our app

Again, not much really going on in here, but lets walk through it. We first do some checking to ensure the "organization name", which serves as the path prefix, is clean. From there we normalize it to all lowercase and then lookup an Orleans Grain that has that identifier and call it's Execute method.

How Grains work in Orleans

I'm going to step back for a moment and remind everyone about how Grains work in Orleans again. As we stated before, Orleans is Microsoft's implementation of the Distributed Virtual Actor pattern. In this case, we are going to focus on the "Virtual" part.

When we talk about Virtual Memory, we are talking about a magical system that lets each process running on a computer believe that it has access to an essentially unlimited amount of memory. The operating system does a lot of bait-and-switch magic behind the scenes to preserve this illusion. If your program hasn't touched a particular chunk of memory for a little while, the operating system will save it to the page file on disk and use the space in RAM for something more important. The next time you need that chunk, the operating system loads it back from the page file and makes it available on the fly. It's a miracle that computers can even work at all, with all of this nonsense going on. But they do and programming is much simpler because of it.

In Orleans, Grains are "Virtual" in the same way. When you need to talk to a specific Grain, you just call a method on the Grain as if it was there the whole time. The Orleans runtime will magically bring it into existence just-in-time somewhere in your cluster without your code knowing about it. If your Grain had previously had state data stored in it, that data is loaded from storage at that time. When you haven't called any methods on that exact Grain instance for a little while, Orleans will put the Grain and it's data back into cold storage to be used at a some point in the future.

The net effect is that in the above code we can summon a grain for any possible path that has been sent to the HTTP API and allow the Grain's internal logic to handle it. The Grain is coded to look in it's internal state and see if a handler and then decides how to proceed. This is fundamentally how we determine how to handle each possible route in our application even if they don't "exist".

By the way, if you're wondering how many Grains Orleans can hold active in memory at once, the answer is millions and the number of Grains that can live in cold storage is infinite (assuming you can pay for the storage).

The Route Handler Grain

The data plane for each RouteHandlerGrain consists of 1 method: Execute. That method looks like this:

Article content
Route hander Execute method

We first check to see if this Grain instance has any State loaded already. If it doesn't, we know that no one has set up a handler for this Grain and therefore we treat it as a 404.

As we discussed above, once a Grain has been loaded/activated, it exists as a normal in-memory C# object. This includes the normal ability to use fields and properties to store data. Leveraging this ability, we check if we have already cached a compiled version of the handler script stored in a nullable property. If there is a cached version, then we invoke the script, if we have not, then we load the script on demand, compile and cache it, then invoke it.

We do it this way because compiled/optimized script formats cannot always be serialized to and from cold storage like the other Grain state data, so we recompile the script on-demand after the Grain has been activated. The next time a request comes in for this route, we can used the volatile cached copy to keep things running fast.

The textual bodies of the scripts themselves are stored in another Grain that works much like a stored object in Git. This is done to support a history mechanism so that we can have tracking of what changes have been made to the handler script over time and rollback to them if needed.

Executing an abstract script

One of my company's executives has a saying that goes something like: "It's hard to make things simple". The first version of this code (which you can find here) was kinda ugly, but it got the job done. As I matured the development of this system, I spent a fair amount of effort cleaning up the design so that the code was cleaner to read and would be easier to extend. As a result, the Evaluate method now looks like this:

Article content
The core Evaluate method

Again it isn't too complicated to read. We create a context object that encapsulates the request path and a future extension point for data. This context data will be injected into the script's runtime space. We then do some bookkeeping to measure how many resources the script will take to execute and then fire the script. After the script completes we measure how large the output is so we can track it for our consumption based billing, and then update the consumption data for the Organization. That's it.

Executing the actual Javascript

So this does skip over the most interesting part: how do we actually execute the code?! Well, we use a very handy .NET library called Jint that provides a Javascript interpreter. There are pros and cons to using an interpreter vs a compiler, but Jint is well engineered and executes small scripts very quickly, which is what we want. We inject the arguments and a mini utility library API into the script's runtime and then invoke it. The plumbing to do this is organized in a couple of classes, but it boils down to this class that does the heavy lifting.

Of note is the fact that our scripts don't have any of the normal Javascript runtime APIs that you would get from a browser like Chrome or a server like Node or Deno. We have to create them each manually. Currently the only thing I implemented was a very basic Fetch API so that our script could call other upstream services if it needs to. You can see that here:

Article content
The Fetch API for our Javascript runtime

As needed, other runtime components can be implemented to make life easier for the script itself, but this is enough already to do useful things. I really want to attach a JSON based key/value store API as a next step.

You'll note there is some code in here to measure the amount of time that we spend waiting on the HTTP GET call. This is to support a future enhancement where we do not bill the client for time that we spend waiting on some other service to execute. This is similar to how Cloudflare does it's billing and I consider it to be more fair. Theoretically, while the script is blocked on an upstream service, we can do other work with the thread so it really wouldn't cost us much. But this isn't fully implemented yet.

So, how well does this thing work?

"...32,500 requests for second is pretty good for a single machine..."

Pretty darn good πŸ˜ƒ! Despite the Orleans runtime doing a ton of magical work in the background, there isn't much fundamental overhead to the system. We can see that in a local test on my workstation, which looked like this:

====================================
Thread Count: 8 Requests Per Thread: 10000
Total: 80000 Errors: 0 Subjective Time: 19.166 Real Time: 2.462
Subjective Average: 4,174.01
Mean Latency: 0.240
Real Average: 32,497.69

This shows that we are about to send 80,000 requests through the data plane to a trivial handler script, in about 2.5 seconds. This works out to only 240 microseconds per call, including all the asp.net HTTP controller and Orleans dispatching overhead.

I think 32,500 requests per second is pretty good for a single machine test run and I should mention this number can go much higher when the load is spread across multiple handlers that don't have to contend for the Grain's single threaded execution. There are also other optimizations we could make by to allow better scaling, such as:

  • Moving consumption updates out of the execution path
  • Allowing fan-out of individual route execution
  • Implementing response caching

At the end of the day...

As I mentioned before, the normalized cost of compute in Azure functions is as high as 6 times the cost of bare VMs or K8s containers. Those services deliver ease of deployment, hands free scale-out and shiny web editors. But with a little ingenuity, we don't have to pay the convivence tax to get good functionality. This service could be hooked up to a CI pipeline for script deployments the same as Lambda/Functions and deliver that same power to your organization.

But what about scaling? Well, the final cherry on this cake, is that we require ZERO changes to this code to scale out from 1 machine up to 100 machines. Just add more replicas to the environment and the Orleans cluster will expand to them and load balance across them.

I'm pretty confident that this tiny serverless platform could displace a large fraction of the work that we currently do with Azure functions and it can save a fair amount of money because we don't have to rely on complex and expensive cloud primitives that AWS or Azure have built for us. You can run this on a bare VM or in containers and use nothing else besides S3 or Blob/Table storage, which are as cheap and cost effective as you can get in the cloud.

I'll part with a quote that a very wise client told me years ago:

"You pay for what you don't know"


I built a serverless platform in 1 day (and so can you!)

Ok, ok. It took me 2 days. But I was doing other things with my life 🀷. But anyway, onto the real story.

So a few days ago, I was writing some internal documentation for my job at Ziosk (did I mention we are hiring?), specifically I was writing a primer for newcomers to Microsoft Orleans. To start with, Orleans is Microsoft's implementation of the Distributed Virtual Actor model, which makes people's eyes glaze over, so I try to build intuition other ways. I have fallen into a pattern of introducing Orleans as a "nanoservice" platform. You see, microservices are small independent APIs that handle some specific functionality. But the more microservices we add, the higher the complexity and the greater the woes. But Orleans skips a lot of that pain and lets us break out functionality as if we were programming normal C# classes and interfaces. This enables us to define smaller and more fine-grained services if our designs call for it. Hence, "nanoservices".

While I was explaining this in my document, I mused a bit that since nano is smaller than micro, you could build microservices out of smaller nanoservices. Then it clicked! Maybe I could actually do that! And what could be a more hip way to deploy a microservice but to use a serverless platform and pay 6 times the rate πŸ˜ƒ! So I cracked open VSCode and started writing.

What exactly are we building?

Well, a serverless platform! So what is that? Lets ask wikipedia:

Serverless computing is a cloud computing execution model in which the cloud provider allocates machine resources on demand, taking care of the servers on behalf of their customers.

It further states:

Serverless computing does not hold resources in volatile memory; computing is rather done in short bursts with the results persisted to storage. When an app is not in use, there are no computing resources allocated to the app. Pricing is based on the actual amount of resources consumed by an application.

Ok, so it's a platform that abstracts the developer completely away from the machine. It also does computation in short bursts (request/response processing) and costs the client nothing while it is not executing.

From this we can distil a couple of basic requirements:

  • You need to be able to deploy the serverless code
  • You need to be able to execute that code to do some processing
  • You need to be able to pay for consumption and nothing else.

How are we going to build it?

We are going to use one of my favorite tech stacks: C# + Orleans + gRPC. Actually F# is my favorite language, but that would be harder for you folks to follow along with, so back to C#, my 3rd favorite language πŸ˜‰. The serverless code will be in Javascript because it is the most popular language on earth and it is well supported. As for Orleans, I think it will fit the workload very well and make prototyping easier. gRPC is a no-brainer for me. It gives me compile time safety for my client and server code, it's my default choice.

We are going to build this app on the fly and not do a lot of up front design. This was thought of and built on a whim, so I didn't want to try to make something perfect. This was a personal hackathon. If you stop by my repo, you can see it commit by commit. Lets jump in!

Building out the Control Plane

Our system will consist of a control plane and a data plane. The control plane is how routes are controlled and other administrative tasks are controlled. The data plane is where requests flow in the front door and are ultimately served by the serverless code. The control plane will need a proper interface that we can talk to programmatically. So I reach for my favorite tool for APIs, which is gRPC. gRPC lets me define a server in a special definition language and then generate both server and client code.

The control plane basically looks like this :

service ControlPlane {
  rpc QueryUsage QueryConsumptionRequest) 
    returns QueryConsumptionResponse) {}

  rpc RegisterHandler (RegisterHandlerRequest) 
    returns (RegisterHandlerResponse) {}
}

So there are two methods, QueryUsage and RegisterHandler. This is the bare minimum for us to compute billing and upload code. This code will be handled by a asp.net gRPC controller that will then forward the details onward to Orleans. We are hosting the gRPC controllers in the same process as the Orleans because it is faster and simpler. The controllers are not very complex, here the code for `RegisterHandler` with the error handling removed for brevity

var normalizedHandlerName = 
  helpers.GetNormalizedHandlerName(
    request.HandlerId
  );

var grain = clusterClient.GetGrain<IJavascriptGrain>(
  normalizedHandlerName
);

await grain.Import(request.HandlerJsBody);

return new RegisterHandlerResponse {
    Success = true
}; 

That's about it. We convert the handler id to lowercase, lookup the Orleans grain by that id, call Import on the grain and return a result. This is missing auth, but hang with me.

You may be curious about the Import method, so here it is:

private Script? Script { get; set; }

public async Task Import(string code)
{
    var script = ParseScriptOrThrow(code);

    Script = script;
    state.State ??= new JavascriptGrainState();
    state.State.Code = code;

    await state.WriteStateAsync();//save to storage
}

The method parses the code to ensure the syntax is valid, because why let someone push broken code 😐. After that it stores the parsed script in a volatile field in the class (which is just fine) and then updates the persistent state with the raw code string and saves it. (We'll see why later...πŸ™‚).

Actually lets see why now πŸ˜ƒ.

The Why and the How of Orleans

You see an Grain in Orleans is like an instance of an object. In fact it literally is an object instance in .NET. It has methods and fields and a constructor. We can store things in the fields of this object like the Script property we saw above. And they will disappear when the object is collected. However, we can mark some of those fields and properties as persistent and then Orleans takes over control of the data.

Article content
The lifecycle of a Grain in Orleans

A Grain in Orleans is never created and it is never destroyed. That is why we call it the Virtual Actor pattern. You call up a grain by providing a unique key, much like a primary key in a database. If this grain has state associated with it, Orleans transparently loads that data into memory when it routes a request to that Grain. After that, any method calls to the Grain can use that state as normal. If the Grain instance in memory hasn't had a method call sent to it for a while, Orleans will deactivate the grain and save it's state back to persistent storage (such as S3 or a database).

Does this sound a little familiar?

Yep, most serverless platforms work the same way. If your Azure Function or AWS Lambda isn't used for a while, it goes to sleep and then the next time it is called, it has to warm up. This is called the cold start problem and it can be a big annoyance for some use cases (but my prototype doesn't suffer much from this 😁). So now you see why Orleans is a good fit. We store the raw script in the database and then have Orleans Load it on demand when we need it. Once loaded we parse the script once and cache the result in the Grain's Script property and then reuse the cached Script to execute the request again and again.

The alternative?

We, of course, could have done all this ourselves and used normal asp.net. But then we would need to:

  1. Manually integrate with some persistent storage technology to store the scripts and the consumption data.
  2. Designed a caching mechanism to store the scripts in memory so we aren't clobbering the persistent store.
  3. Built out a cache eviction policy so that we weren't holding on to every script ever in memory.
  4. Figured out how to store and when to write consumption data back to the database
  5. Finally once we are done with all of that, we can then worry about the hard problem of scaling this to run on multiple servers in a coherent way without having a lot of coordination traffic between the server instances, which is not a trivial problem

But we don't have to, because Orleans exists and it does all of this for us and more. ✨

This blog post is getting a little long, so lets take a break here. Next time, we will see how the data plane is implemented and also handle how to measure consumption for billing purposes. Until then!