Null handling

After two big chapters of explanation how features work in Rust, we will now return to solving common programming problems. One of these is handling absent values.

Many languages support this, some like JavaScript may even go the extra mile and support two different types of absent values, null and undefined. Having that feature is nice, but handling absent values is often pretty horrible and leads to errors extremely fast. If you are using programs written in C# or Java, chances are that you have seen a bunch of NullPointerExceptions or NullReferenceExceptions already. Opening the browser console before loading a website is also notorious for throwing a bunch of errors for non-existing values. So how can we improve on that?

Conventionally, we would have to null-check every value that can be null before using it. That isn't viable in languages where everything can be null, so we mostly just assume that nobody is going to throw absent values our way. This works most of the time, but once in a while, either some function returns an absent value, or we want to put absent values into functions ourselves, to convey some extra intention. The biggest problem here is that in the vast majority of languages, the possibility of an absent value is implicit and doesn't appear anywhere in a function signature, or at all, really. Thus, there is no guarantee in the interface that those values are handled properly.

Languages like Kotlin are the exception to this rule. In Kotlin, if a value can be absent, it is noted as a nullable type in the function signature. However, there's still the issue of handling absent values when they are encountered. We will go through a few approaches found in common languages.

Lower level languages like C don't have an inherent concept of absent values, because they work on flat memory which can't be absent. They express absent values through pointers to a special null address. You can null-check for it pointing to that address, but it isn't mandatory. When you try to use the value of such a null pointer, the thread just straight up crashes. This is absolutely horrible for program usefulness and reliability, but at least ensures memory safety because you don't work on arbitrary memory. Except if you forget to set your pointer to that special address, then you do access arbitrary memory and only the machine gods know what will happen. Unsafe pointers in Rust actually function in the same way, so there is a reason why they are unsafe and normal programs should never touch them.

Then there are languages like JavaScript, which work on a bunch of tasks called through events. Again, null-checks are existent, but not mandatory. When you try to use an absent value, the task stumbling upon it is abandoned. When you have a real-time application that suddenly just stops, this is why. If you have a button that doesn't do anything even though there's code for it, this is why. Again, absolutely horrible for usefulness and reliability, but compared to lower level languages, memory safety is always ensured, so nothing arbitrary happens.

Going further, we approach languages like C# and Java, that actually have some form of handling for it. Again, non-mandatory null-checks. If you try to use an absent value, an exception is thrown. You can catch that exception and handle the error, but catching the exception isn't mandatory, either. If you don't catch the exception, the thread crashes. The ability to catch those errors makes this approach better already, but it just adds another safety net. If you forget to set that up, it's just as the same as the other approaches in terms of usefulness and reliability.

Now we get to Rust, which hasn't an inherent concept of absent values since it's a lower level language. Instead, it uses its powerful generics to provide an enum for the notion of absent values in the standard library.

pub enum Option<T> {
  None,
  Some(T),
}

As you might have read in its chapter, you can only use data inside an enum variant if you check for that variant. Thus, null-checking is mandatory in Rust. Because this is in the standard library, every function in the standard library that might return something uses this enum (except if it also returns something on an error case, but more on that later, so we have a wealth of examples to choose from. Let's just see what happens when we try to pop a value off a vector:

let mut vec = vec![1, 2, 3];

if let Some(value) = vec.pop() {
  // Use the popped value
} else {
  // Tried to pop off an empty vector!
}

Now of course, this also is very verbose, which you might not want for something like prototyping. Also, you may have guaranteed for a value to not be absent already (through Option::is_some for example), so you want to just use it. You can do that because the enum provides an unwrap method. This replaces the check and lets you directly use the value, but crashes the thread if the value is absent. Now you may ask yourself: But doesn't this make null-checking non-mandatory again? And you are absolutely right, it does. However, the difference is that not checking it is a conscious decision of the programmer and it's also very easy to see in the code. So if you make an error and not check a value where you should do that, that error is very easy to find.