Strengthen your types

28 Jun 2021 on Languages, Types, Scala

“Static typing” and “strong typing” are frequently conflated, as for many people a statically typed language implies that programs written with it must also be strongly typed. That’s notionally true as the variables themselves have types, not just the values, but I’d argue that most statically typed programs are actually fairly weakly typed.

That’s a bold claim, so I’ll spend the next few minutes justifying it.

To begin, we need to examine what we mean by strong and weak typing, and where better to start than JavaScript? Here are some vexing expressions taken from the infamous wat talk which produce clearly nonsensical results:

[] + []  // ""
[] + {}  // "[object Object]"
{} + []  // 0
{} + {}  // NaN

You can read the detailed explanation of what’s happening if you like, but it comes down to JavaScript’s surprising implicit type conversions at runtime. To demonstrate that this is a consequence of weak typing rather than dynamic typing we can contrast the results with another dynamically typed language, Python, which doesn’t coerce types at runtime. Here the results are either sensible or the runtime raises a TypeError:

[] + []  # []
[] + {}  # TypeError: can only concatenate list (not "dict") to list
{} + []  # TypeError: unsupported operand type(s) for +: 'dict' and 'list'
{} + {}  # TypeError: unsupported operand type(s) for +: 'dict' and 'dict'

Wikipedia says there is no precise definition of what constitutes weak or strong typing but hopefully those examples are sufficient to convince you that one of its definitions is a good baseline for considering a language weakly typed:

A weakly typed language has looser typing rules and may produce unpredictable or even erroneous results

Assuming we can broadly agree on that definition, let’s get back to the business of showing that most statically typed programs are also weakly typed. Quite fantastically, we don’t need anything more complex than Hello World’s greet function to do so! Here’s what that might look like in Scala 3:

def greet(name: String): String = "Hello " + name + "!"

This is Scala so it’s statically typed, and at first glance it appears strongly typed because the parameter type is declared to be a String, the return type is also declared to be a String, and it clearly does return a String. We know the + operator will always be String concatenation in this context, so the function works as you’d expect:

greet("Alice")  // "Hello Alice!"

However, we can still call this function in ways that produce unpredictable or erroneous results. These might not be as unpredictable or erroneous as some of the JavaScript expressions — we’re not going to get NaN, for example— but they definitely aren’t the results we would like.

greet("  Alice  ")  // "Hello   Alice  !"
greet("")           // "Hello !"
greet(null)         // "Hello null!"

As such, it appears we need to modify our definition of weakly typed. It’s not just whether the language itself is weakly typed, but whether the program written in the language is weakly typed.

A weakly typed program has looser typing rules and may produce unpredictable or even erroneous results

Here we demonstrably have a program with sufficiently loose typing rules that unpredictable or erroneous results are produced. Ergo, this program is weakly typed.

There are two key weaknesses in the parameter type.

Firstly it’s not really constraining the parameter type to be a String because null is not a String and passing null is permitted. By default in Scala 3 the String type actually means String | Null. However, by supplying the -Yexplicit-nulls compiler option we can strengthen the String type to mean exactly String so the final line of code won’t compile:

greet(null)  // [E007] Type Mismatch Error: Found: Null, Required: String

Secondly we’re not constraining the contents of the string to be valid, so we can pass in things like the empty string, or strings with leading or trailing whitespace. This problem could be solved without changing the types by adding branching logic to the greet function; from what I’ve seen over the last twenty years or so, this is the approach most people would use to solve it in most languages:

def greet(name: String): String =
  val trimmed = name.trim
  if trimmed != null && trimmed.nonEmpty then
    "Hello " + trimmed + "!"
  else
    ""

greet("Alice")      // "Hello Alice!"
greet("  Alice  ")  // "Hello Alice!"
greet("")           // ""

This looks better. Unfortunately we’ve moved the problem of unpredictable or erroneous results to the return type because the function’s type of String implies it will always return a greeting, but the empty string isn’t a valid greeting.

You might argue I’m picking nits here, but when it comes to using this function there’s nothing to indicate to the caller they may need to deal with this case so there’s a fair chance no greeting being displayed will be raised as a bug somewhere down the line, and it’ll be nontrivial to work out that it was an empty string that somehow got into the system causing it. Half a day gone.

We can fix this by strengthening the return type to an Option[String] which tells the caller that greet may not be able to construct a greeting if the input is invalid, i.e. it is a partial function rather than a total function. Now when the function is used, the caller explicitly has to handle the failure case and decide what to do. Half a day back.

def greet(name: String): Option[String] =
  val trimmed = name.trim
  if trimmed != null && trimmed.nonEmpty then
    Some("Hello " + trimmed + "!")
  else
    None
    
greet("Alice")      // Some("Hello Alice!")
greet("  Alice  ")  // Some("Hello Alice!")
greet("")           // None

This function now arguably meets our definition of strongly typed, because whatever you pass into it, the result is predictable and it is hard to subsequently make erroneous use of it. So, we’re done, right?

No. Unfortunately not.

This function might now be hard to misuse, but to make it that way we’ve broken the single responsibility principle. It has the dual responsibilities of understanding what makes a name valid, and also formatting a greeting. This might not seem like too much of a problem, but over time it means other functions dealing with names will have to reimplement the same logic, might do it slightly differently, and in a long-lived codebase you’ll end up with numerous different implementations of the same basic concept and won’t know which is correct (if any).

The reason it’s having to break the single responsibility principle is because the types are still too weak. So let’s strengthen them again and instead of name being a String introduce a proper FirstName type and move the validation into its smart constructor fromString to ensure that only valid instances can be constructed; this approach is sometimes called “making illegal states unrepresentable”:

object types:
  opaque type FirstName = String
  
  object FirstName:
    def fromString(name: String): Option[FirstName] =
      val trimmed = name.trim
      if trimmed != null && trimmed.nonEmpty then
        Some(trimmed)
      else
        None
      
import types.*

def greet(name: FirstName): String = "Hello " + name.toString + "!"

FirstName.fromString("Alice").map(greet)      // Some("Hello Alice!")
FirstName.fromString("  Alice  ").map(greet)  // Some("Hello Alice!")
FirstName.fromString("").map(greet)           // None

The above code might benefit from a little explanation. The opaque type line defines FirstName as being a String but because it’s opaque that fact isn’t known outside the container types, so if you try to call greet("Alice") then you’ll get a compilation error that FirstName was expected but String was found. However, inside the types object the equivalence of FirstName and String can be used to return a String as a FirstName after validation.

Here FirstName.fromString has the single responsibility of understanding what makes a first name valid, and greet has the single responsibility of formatting the greeting. Note that greet can go back to returning a String rather than an Option[String] because the input must always be valid, so it’s a total function rather than a partial function. We could go further than this and return a NonEmptyString or even a Greeting (and we should!) but I think you get the idea by now so that is left as an exercise for the reader.

These improvements might not appear to be much of a benefit in this little sample of code, but when additional functions need a first name they can reuse the same type, and won’t need to implement any argument validation, or negative tests for invalid input. The functions will have less branching which makes them easier to reason about, and the code will likely have fewer bugs as a consequence.

There are myriad other benefits to strengthening your types. The obvious one is that it makes the code more self-documenting which is a major benefit for readability and maintainability, especially in long-lived codebases where the original authors may no longer be around, or may be busy with other things.

It also helps to prevent trivial mistakes in code. If you had a method expecting a last name or an email address then if the types are all String you can pass one where another is expected and the compiler can’t help you. When you use distinct FirstName, LastName, and Email types then it’s not possible to mix them up.

A less obvious benefit is that it helps to enforce good architectural practices. If all your business logic deals with strong types rather than strings or integers then it forces validation of input up to the boundary layers where it should be, not merely by convention, but by necessity because you cannot call lower layers of code without doing so!

Returning to the claim that most statically typed programs are pretty weakly typed, I don’t have proof but I’ll take a bet that majority of the code written by the majority of people reading this looks much more like the initial version of the function with its weak String parameter and String result types than the final version with its strong types. That’s fine; I’m in that majority.

But there’s always time to make amends.

Any time you find yourself needing to validate arguments, give a function multiple responsibilities, or make a function partial rather than total, consider whether you can strengthen your types to remove those problems.

Originally posted on the Zego Product Engineering blog

Templates (for web app):

Error