"What is the difference between a class and a struct in C#?". I like this as a lead-in interview question because most candidates can offer up a quick answer that can help them get some confidence rolling in case they are nervous (we are not generally looking to asses candidates for performance under duress 🙂). There is a spectrum of correct answers that can shed light on how the candidate thinks about this fundamental component of the .NET type system. I'm going to lay out some of the ways that I think about structs in .NET
Level 0: What is a struct?
Some folks have never heard of structs. A struct (also known as a Value Type) is an alternative way of defining a type in C# and other .NET languages like F# and VB.NET. They have various tradeoffs, but they have been around since the beginning of .NET. My speculation is they looked at how Java was designed to treat everything as an object and observed that the Java compiler wasn’t good enough at optimizing the usage of normal classes for them to always be appropriate from a performance perspective.
They share many similarities with classes, but they are completely different in others. So let’s jump right in!
Level 1: Structs are lightweight, classes are heavy 🏋️♀️
I hear this answer a lot and it does grasp the basic usage guideline of when to use structs. The traditional guidance from Microsoft was that structs were appropriate for data between 1 and 16 bytes per object. Larger than that, and you should consider using a class. Even after 20 years, this guidance is correct enough for beginners to pay attention to, so if you only remember one fact about structs and classes, it should probably be this one.
Level 2: Structs can't inherit from other types 🙁
At the next level, a candidate might recognize another defining characteristic of structs, they can’t participate in traditional inheritance like classes can[0]. Since inheritance is a pillar of most object-oriented programming languages, this would seem to be a major limitation! But it is very important to note that structs CAN implement interfaces, which can get us polymorphism, which is good enough for most purposes of object orientation.
In fact, if you take into account Default Interface Methods that were recently introduced into .NET, then you can get the code sharing advantages that used to be reserved only for classes! But that is kind of a bonus fact probably belongs at a higher level 😉.
Level 3: Structs are "on the stack" 🤔
Ok, at this level the candidate recognizes that there are two major locations in memory that object instances live in: the “Heap” and the “Stack”. All class object instances live on the heap [1]. It is a general-purpose dumping ground for data and it can grow to near infinite sizes, which is part of the reason that we recommend designing large objects as classes instead of structs.
Another key detail about the heap Is that objects that are allocated on the heap have to be eventually cleaned up by the Garbage Collector and this has negative performance implications for high throughput applications. So for small, commonly created objects it is best to define them as structs so that we can avoid the overhead of garbage collection.
Lastly, in contrast to the heap, the “stack” is very limited in size. Usually than just 1-4MB compared to the heap which can be many GB or more. You might think that this is why there is a recommended 16-byte limit on the size of structs, but it isn’t (that is coming up next). 👇
Level 4: Structs become slow when they are large 😕
Now we are getting some real details that are starting to affect high performance applications. Now somewhere around skill levels 3 or 4, a candidate should also know that structs are pass-by-value and that classes are pass-by-reference. This is a subtle but very important detail that basically means that when we have an instance of a class stored in a variable, what we are actually storing under the covers is a “reference” (aka a pointer) to a location in the heap. This reference is very much like a postal address in the real world; a small piece of information that tells us where to find the real object we are looking for.
Structs are completely different. When we store a struct in a variable, we are actually copying the entire value of that object instance in directly into the variable. If we pass this variable into a function, we copy the entire value of the struct instance into the method we are calling. With a class, we are just passing the reference, which is always limited to 8 bytes or less.
The big idea to take away at this level is that if we were to define a very large struct, let’s say 100 bytes, then every time that we passed that struct into a new method, the computer would have to copy a lot of data around again and again which starts to eat away at the performance of our application. This is the real reason that Microsoft has that 16-byte guideline on the size of structs [2]. Now at this level we should also point out that Microsoft violates their own 16-byte limit in several structs in the BCL and they do this because that limit is a general guideline and not a hard limit. If you want, you can define a struct that is 100 bytes or 100 kilobytes and our program will just start to slow down more and more if we use it a lot. But just be aware that at a certain point the limitations we talked about at the end of level 3 will become increasingly important.
Level 5: Structs are “bad” because can be "dangerous" 😞
This is a VERY important detail about structs. They can be a bug farm in your code. You see, because structs are pass-by-value they are implicitly copied every time you call a method or even assign them to a new variable name.
So let’s say that you have a struct and you pass it into a child method that does some work and then modifies a field or a property on the struct and the returns. Simple enough, but it’s a bug. When you pass the struct into the child method, the compiler implicitly makes a copy of your struct. So the struct instance that got modified in the child method, is not the struct instance that exists in the parent method.
It is SUPER easy to introduce this kind of copy-then-mutate bug into your code and as a result the use of struct for storing updated information is STRONGLY discouraged by Microsoft and everyone else out there. But if you look at my code, you will see custom structs all over the place. So how do I do this and avoid these bugs? Immutability! I almost always define my structs as `readonly` which prevents them from being mutated by any code. You can’t have a copy-then-mutate bug if there is no mutation! Now we still have to take things like level 4 into consideration when choosing structs over classes, but this level’s warning simply doesn’t apply when using `readonly` structs.
Level 6: Structs are slow because they have to be "boxed" 📦
Boxing is an important concept in .NET that can have big performance implications when you are using structs. Basically, it is an exception to Level 3, where a struct can accidentally end up on the heap and becomes subject to the Garbage Collector having to clean them up just like classes.
Here’s how it works: let’s say you have a struct that implements `IDisposable` and you pass it to a method that accepts an argument of type IDisposable. Because of “magic” ✨ in the .NET runtime, this works just fine, and you don’t really notice anything different than passing a class that implements `IDisposable `. But this magic is complicated under the covers. The runtime writes special low-level code inside of our method that is designed to accept any type of object that Implements the `IDisposable` interface. Specifically, the runtime performs a lookup on each object to find the exact method slot to call on that specific object. That lookup is called virtual dispatch, and it involves reading a section of the object called the object header. The problem is that only objects on the heap have object headers. Structs don’t have headers, they are pure data which is part of why structs are so lightweight.
But we just said that that you CAN in fact pass a struct to a method that expects a heap object with a header, so what gives? Well, the runtime makes this work by allocating a special temporary object called a “box” on the heap, just like a class. The box has an object header and everything else needed to perform virtual dispatch. The runtime copies the data in the struct into the box and then passes the box to the method. Easy peasy.
But the solution creates another problem here. When the method is completed, the box has to be cleaned up just like a normal class object. So, every time you invoke a method that expects an interface (or of type `object`) then a new box gets created and then thrown away. Now imagine invoking that method thousands of times per second and you can see that it will add up quickly. On top of the pain for the garbage collector, there is the overhead of actually creating the box object and copying the data into it. It can drag application performance down.
Now the final bit to this level is that it actually isn’t true. In C# you can use a technique called “generic constraints” to change how the compiler reasons about you use of types. When you do this, .NET can elide creation of the box and use the struct directly in the child method.
Level 7: Structs are actually really fast 🤩
Up until this point, a candidate is displaying some decent depth of understanding of the limitations of structs and why they are not always a good choice. But it has all been drawbacks and warnings about using structs. At Level 7, they are getting into why structs are actually amazing. We have talked quite a bit about how method calls work under the covers in .NET. Now we need to learn that not all method calls are created equal.
We talked a little bit about the concept of dispatch in Level 6. This is a fundamental concept of compiler theory that every programming language needs to implement. In .NET there are actually many different types of method dispatch.
Generally, when you invoke a method on a class, you get a virtual dispatch like we talked about in Level 6. If you pass a class object of type “MyClass”, the compiler has to account for the fact that the actual type of the object could be `MyClass` or some other child or grandchild class that inherits from `MyClass`. So, every time we invoke the method, the compiler needs to peek under the covers to see the exact type of the actual object so that it can know which method slot to use.
But when you invoke a method on a struct (excluding boxed structs) you get a simpler, faster type of dispatch called “direct dispatch”. Direct dispatch is faster because the runtime can determine the exact method slot to invoke ahead of time before the method is called. This is because structs don’t support inheritance so if we pass a struct of type `MyStruct` to a function, the compiler knows ahead of time that we only need to support object of the exact type `MyStruct` and selects the correct method slot without checking.
Now the difference between virtual and direct dispatch is just a few nanoseconds, but as we have said before, if you are calling a method thousands or millions of times per second, you will start to notice a difference.
Lastly, it is important to note that there are exceptions to these guidelines in both directions. Boxing and sealed classes being two examples of when classes and structs and use different forms of dispatch than normal. But using structs to gain access to direct dispatch is a reliable way to have the power of abstraction without the performance penalty.
Level 8: Structs are all powerful 🧘♂️
At this level, we attain enlightenment. We no longer fear the use of large structs because we know that we can pass those structs around by reference using `in` and `ref` and avoid copying their data at all. And because of that, we recognize there are now times that we can ignore Level 3 to obtain optimal performance.
We learned about ref-structs and their ability to store managed references, which unlocks the ability to reference exotic forms of memory such as stack-allocated and unmanaged memory. We can control the exact way our data is represented in memory using `StructLayout` and `InlineArray` and that we can slice memory buffers into shapes that look like packet headers and data structures because we see the 1s and 0s underneath it all. We are pooling our buffers, using zero-copy I/O, using SkipLocalsInit, swapping out memory allocators and have become aware of memory alignment and SIMD operations. We have entered the realm of the fastest systems languages on earth.
And at the end of it all, we have realized the most important thing about this power. That we don’t really need it. Classes work for 99.5% of all the code that we write on a daily basis. .NET Garbage Collector is pretty damn good at managing memory. The walled garden of .NET memory management saves us a lot of pain that we never even had to learn about such as memory fragmentation. The simple rule we learned in Level 1, ultimately stays intact. If you look at my code, you will see a lot of `readonly struct` and `readonly record struct` mentioned, but they are almost all very small and expected to be created and thrown away quickly. If I need a type to implement an interface, I just use a class 99% of the time rather than the struct + generic constraint technique from Level 6. It’s just easier.
If I was writing database engines or high throughput streaming data processors, then I would make very different choices. But in a world where my application is spending most of its time waiting on some external database or obnoxiously an slow 3rd party system, there is just no point in trying to optimize something from 1-2 milliseconds down to 1-2 microseconds. Nobody will notice it. So just when in doubt, just take the easy way and use a class 😁.
Footnotes:
[0] If you’re a smarty pants, you realize that all structs actually inherit from System.ValueType which itself inherits from System.Object. But aside from this special case, we cannot define a struct that inherits from another struct or from a class.
[1] : In recent versions of .NET the runtime is now smart enough to stack allocate class instances in some limited cases, which bypasses the Garbage Collector, so this rule is no longer exactly correct!
[2] : So there is another reason and that has to do with the compiler’s historical ability to “enregister” struct values, which is a very low level detail of something called a “Calling Convention” that .NET uses to tell the CPU exactly how to perform a method call. But don’t worry about that too much 🙂.