Code and Bitters

For some reason, I find myself thinking a lot about Rust enums, and matches, and the various ways one can write them.

This post contains a lot of opinions / bike-shedding / hand-wringing about what is ultimately just a personal preference about code style. Don't take it too seriously.

I'm probably wrong about at least one thing I've written in this post. If you find a mistake or omission, or just want to argue for one style over another, I invite you to send a message! If I hear any interesting ideas I'll add them to this post.

Part 1: To `use`, or not? §

Let's start with a really trivial enum, and show some variations on it.

enum Fruit {
    Apple,
    Orange,
    Pear,
}

fn taste(fruit: &Fruit) {
    match fruit {
        Fruit::Apple => println!("I like apples."),
        Fruit::Orange => println!("That's a bit tart."),
        Fruit::Pear => panic!("I'm allergic to pears!"),
    }
}

This is (I think) the canonical style for matching on enums in Rust. I don't think there's anything here that should bother most Rust programmers. There is one tiny little thing that sometimes bugs me, though: imagine that we had many variants, and that the name was a bit longer. Say for example, this:

enum CaliforniaTreeFruit {
    Almond,
    Apple,
    Apricot,
    Avocado,
    Cherry,
    Fig,
    Orange,
    Plum,
}

fn pick(fruit: &CaliforniaTreeFruit) {
    match fruit {
        CaliforniaTreeFruit::Almond => println!("I like salads with thin-sliced almonds on top."),

        CaliforniaTreeFruit::Apple => {}
        CaliforniaTreeFruit::Apricot => {}
        CaliforniaTreeFruit::Avocado => {}
        CaliforniaTreeFruit::Cherry => {}
        CaliforniaTreeFruit::Fig => {}
        CaliforniaTreeFruit::Orange => {}
        CaliforniaTreeFruit::Plum => {}
    }
}

There are two things I notice about this new code. First, repeating CaliforniaTreeFruit over and over again seems... well, repetitive. Second, because it's a long name, if you're already a few indents to the right, you don't have much space left on that line for a one-line expression.

It's tempting to do this instead:

use CaliforniaTreeFruit::*;

fn pick(fruit: &CaliforniaTreeFruit) {
    match fruit {
        Almond => println!("I like salads with thin-sliced almonds on top."),
        Apple => {}
        Apricot => {}
        Avocado => {}
        Cherry => {}
        Fig => {}
        Orange => {}
        Plum => {}
    }
}

This is a lot tidier, and it builds and runs exactly the same. I've written code like this myself. I think, however, that this is probably not a good idea. Here are the reasons why:

The names are mysterious to a reader. Imagine someone else is editing this code a year later-- and they want to know where the name Apple came from. They might get assume this is Fruit::Apple instead. Use statements like use xyz::* can sometimes make code harder to follow, so I would probably advise against them in most cases. If you really want the variant names in scope, instead consider putting the use statement inside the function. That way it will give a strong hint to the reader where these names are coming from.
Mistakes are harder to catch, because of ambiguous syntax. If I include Pear in this list (which is not a variant of CaliforniaTreeFruit), I expect that the code will fail to compile. But that's wrong-- this code actually does compile:

fn throw(fruit: &CaliforniaTreeFruit) {
    match fruit {
        Apple => println!("nice throw!"),
        Pear => println!("oops!"),
    }
}

There is a strange compiler warning, though:

warning: variable `Pear` should have a snake case name
  --> src/lib.rs:46:9
   |
46 |         Pear => println!("oops!"),
   |         ^^^^ help: convert the identifier to snake case: `pear`
   |
   = note: `#[warn(non_snake_case)]` on by default

It will probably only take you a minute to realize what's happened. The compiler thinks you wanted a wildcard match, re-borrowing fruit with the new variable named Pear.

In the end, I think this style holds too much potential for trouble, so I'm going to resist the temptation to do this in the future.

Part 2: inline enum members §

There are a few ways to build an enum with a more complex internal structure. Let's say, something like this:

struct Banana {
    weight: u32,
    days_until_ripe: i8,
}

struct Coconut {
    diameter: u32,
}

struct Mango(&'static str);

enum IslandFruit {
    Banana(Banana),
    Coconut(Coconut),
    Mango(Mango),
}

The first two are probably the way I would typically write enums that carry payload data. It's certainly not the only way, but I have a few comments on this code so far.

One is that I re-used the enum variant name for the name of each standalone struct. I totally understand if this bothers people; it bothers me a little bit too, but not as much as it would bother me to have to invent artificial names like FruitBanana or IslandCoconut. I think reusing the name is the lesser evil.

The one thing about duplicate names that bothers me the most is that it looks funny in the derived Debug output. For example, this code:

let coco = IslandFruit::Coconut(Coconut{diameter: 19});
println!("{:?}", coco);

will print

Coconut(Coconut { diameter: 19 })

The doubled-up name isn't a problem if you already understand the code, but it can be a little confusing if you're looking at it for the first time. For example, I often write unit tests by printing out the Debug string first, and then pasting that back into my code as the right side of assert_eq!

I've sometimes wished that I could fine-tune the derived Debug implementation to do one of two things:

Add the enum name to the output, e.g. IslandFruit::Coconut instead of just Coconut. That would aid me when pasting into a unit test.
Omit the variant name altogether. This would be more convenient for logging, where the doubled-up name doesn't add any value.

The other issue with this code is that struct Mango only has one un-named field, which leads to awkward-looking indirection to get at its contents. Look at what's needed to destructure a Mango:

fn get_color(fruit: IslandFruit) -> &'static str {
    match fruit {
        IslandFruit::Banana(b) => match b.days_until_ripe {
            d if d > 1 => "green",
            d if d < -7 => "brown",
            _ => "yellow",
        },
        IslandFruit::Coconut(_) => "brown",
        IslandFruit::Mango(Mango(color)) => color,
    }
}

It's debatable whether I made the right choice here. I could just have easily written:

enum IslandFruit {
    Banana(Banana),
    Coconut(Coconut),
    Mango(&'static str),
}

... though the inconsistent style would probably continue to bother me.

Also, I like data to have names. A reader has no idea what the inner value is supposed to be if I write:

struct Mango(&'static str);

I should replace that with a named member, just so it's more obvious what the contents mean:

struct Mango {
    color: &'static str,
}

What if I decide to take this even further and inline all of the structs into the enum?

enum IslandFruit {
    Banana { weight: u32, days_until_ripe: i8 },
    Coconut { diameter: u32 },
    Mango{ color: &'static str },
}

fn get_color(fruit: IslandFruit) -> &'static str {
    match fruit {
        IslandFruit::Banana {
            days_until_ripe, ..
        } => match days_until_ripe {
            d if d > 1 => "green",
            d if d < -7 => "brown",
            _ => "yellow",
        },
        IslandFruit::Mango { color } => color,
        IslandFruit::Coconut { .. } => "brown",
    }
}

There are nice things about this-- the destructuring of Mango is tidier. We didn't have to worry about what to name the standalone structs. The big downside is that now it's impossible to build functions that operate directly on a Banana. With the standalone struct names, we could pull out the days_until_ripe logic like this:

impl Banana {
    fn color(&self) -> &'static str {
        match self.days_until_ripe {
            d if d > 1 => "green",
            d if d < -7 => "brown",
            _ => "yellow",
        }
    }
}

This sort of refactoring is the reason why I prefer not inlining data members into enums, unless the payload is so trivial that (a) I'm confident I'm not going to change the guts later, and (b) it's obvious to all readers what the meaning of the payload value is.

The other thing that can go wrong with the inlined style is that there's no way to inline another enum. If we later decide that struct Mango should instead be enum Mango, Rust doesn't have syntax that would allow us to inline this into IslandFruit:

enum Mango {
    Red,
    Yellow,
    Green,
}

In other words, you can't do this (at least not in 2021):

enum IslandFruit {
    /* ... */
    Mango(enum {
        Red,
        Yellow,
        Green,
    }),
}

I think that was plenty of overthinking for one day. Until next time...

Good luck with your Rust projects!

code and bitters

Part 1: To use, or not? §

Part 2: inline enum members §

Part 1: To `use`, or not? §