DHO is Mostly Confused

Rants from Devon H. O'Dell

A Malloc Idiom in Go

I’m finally writing Go. Although I was very active in contributing to the project for the first two-and-a-bit years, I’ve been pretty absent since 2012. Initially, I contributed because I am a fan of Plan 9 from Bell Labs and of FreeBSD. I loved the idea of having a usable, CSP-inspired language, but the initial release of Go only ran on Linux and OS X. I only had a FreeBSD system at the time, so I ported the compiler toolchain, runtime, and standard library to FreeBSD (with a good bit of education on debugging from Russ Cox).

However, most of my work has been on low-latency systems software – mostly written in C – and FreeBSD has not been supported by any of my employers since 2007. Since I wasn’t really getting a chance to write new software in Go, and I eventually was no longer interested in the overhead of maintaining knowledge of an OS I was only using for fun, my use of and contributions to Go fell by the wayside.

Now that I work at Google, I’ve finally had some opportunities to write code in Go. While I still enjoy the language, there are a few experience report-style things that ended up keeping me from using the language for the last 5-6 years, and things I currently find a little cumbersome. At the suggestion of some colleagues, I thought I’d document at least one of them.

One thing I miss is something like C’s malloc idiom. In C, allocated memory generally comes from calls to malloc-family functions, which give you at least enough memory to fit at least one of the thing you want, or none at all. The idiom for doing this looks like:

T *p = malloc(sizeof *p);

Notice how the type of p (T *) only appears once. This line of code exploits how sizeof behaves when its operand appears to dereference a pointer – which isn’t what happens, because the result of sizeof must Kate Flavel reminds me that this is not true for VLA types, which are so useless, I always forget. Such a type has its expression evaluated. always be discernable at compile-time. What the language defines instead for this syntax is that sizeof yields the size of the pointed-to object; it doesn’t turn into some runtime thing. The nice thing about the C idiom is that if I change the type represented by T, I only change the type in the declaration or definition. No changes need to be made in the site(s) the pointer object is assigned to the result of the allocation. While the example above is trivial, consider a more complicated case where a structure member points to some type:

struct set {
    size_t cap;
    size_t nmemb;
    int members[];
}

struct set *
set_create(size_t sz)
{
    struct set *n = malloc(sizeof *n + (sz * sizeof *n->members));
    if (n == NULL) {
        return NULL;
    }

    n->cap = sz;
    n->nmemb = 0;

    return n;
}

If, later, we want to change struct set to support members with other than int, we might change members to a union, and add some enum specifying the fields we wanted to add. And we can do this without changing any of our code in set_create.

This has bitten me in Go literally every time I’ve created some struct type that embeds a thing that must be allocated, like a slice or a map. In Go, we are forced to repeat ourselves in expressing the type of thing we’d like to allocate, despite that type being well known to the compiler, and type inference is so idiomatic (consider expressions like a := b), I sometimes have to dig to figure out what the type of a thing is. Let’s look at what’s involved in creating a struct with a map embedded:

type NamedMap struct {
    name string
    m    map[string]string
}

func NewNamedMap(name string) *NamedMap {
    return &NamedMap{name: name, m: map[string]string{}}
}

We could alternatively use make in NewNamedMap, but we’re still left with return &NamedMap{name: name, m: make(map[string]string)} – again, repeating the type. With well-thought-out code, there should be only one (additional) place where we need to specify the type to allocate it, but this still requires touching multiple points of code when types change. This bites me when I’m prototyping and haven’t thought fully about the amount of state I need to keep in a map. I’ve found myself needing to change map[string]string to map[string]T on several occasions, and it bugs me every time I have to change more than one line.

One could argue I should think more about what I need before writing code, and that’d be fair. I’d still counter that it’s not uncommon for additional state requirements to develop within a project’s lifetime, like in my set example above. It’s also possible that constraints of a system change over time such that a type that was perfectly fine initially eventually becomes unusable. In Go, the set structure might look like this:

type Set struct {
    nmemb   int
    cap     int
    members []int
}

func NewSet(sz int) *Set {
    return &Set{cap: sz, members: []int{}}
}

You might ask why I’m not just using a slice, and the answer is that this is a short example for demonstrating the problem. Anyway, we may later want to support a slice of different types, and we run into the same problem we had previously. With a slice, we might be able to ignore the initialization, assuming we only ever read from the slice after appending to it. With e.g. maps and channels, the situation is more complicated because we really must allocate before use. So it’s not that uncommon to repeat type information somewhere.

I’m not entirely sure how to go about solving this. For compound literals, it perhaps it would be possible to add a syntax like:

return &Set{cap: sz, members: Set.members{}}

If you had a pointer to a slice for whatever reason:

return &Set{cap: sz, members: &Set.members{}}

I don’t know if I like these compound literal syntaxes. Fixing make to support the sizeof behavior in C feels more expressive and obvious:

return &Set{cap: sz, members: make(Set.members)}

But maybe this is just me programming in C for too long.

I don’t know that it is useful or valuable to change new to have similar behavior; I suspect its use is uncommon enough that it is not. In any case, it does seem clear to me that this would reduce the overhead of refactoring software, and of writing new software.

This particular issue isn’t so bad, but fixing it would definitely make my life better. Go is already much more expressive than C in many cases, and than C++ (which I still can’t stand to learn). I think if support for type inference in allocation was added to the language, it’d become more appealing to C systems programmers who are still holding out for reasons other than GC. (Personally, I’d also like to see a better story around support for system-level concurrency, but that’s best left for a separate post.)

I’ve edited this article to fix an error in the C example. When I originally wrote the example, the struct set did not make use of flexible array members. Anmol Sethi wrote to ask about this feature, and pointed out that I was erroneously allocating and assigning to the FAM again. I had forgotten to remove that code. Oops.

More Posts