Tuesday, June 16, 2020 — 00:57

“Master” terminology in tech

I saw a helpful hint a while back about changing the name of the default branch git creates. Out of the box, when you run git init to create a new repository, git names the default branch “master”. This is a poor choice for a number of reasons, but the reason it’s coming up again now is the fact that “master/slave” terminology isn’t great given a desire to exorcise racist and/or oppressive language from our common use.

The discussions around this has been… well, infuriating.

Part of what gets me most is the guys (and it’s 90% guys) arguing that “context matters”, and we “shouldn’t be so sensitive”, and it’s “not about race”, and that we should just be able to use the technical terminology that makes the most sense without having to worry about “political correctness”.

If that’s the case…. “master” is terrible terminology here. As in most cases in tech.

Set aside for a moment all of the very valid human reasons why we should probably want to avoid “master/slave” language in general. Just for a minute. One could argue that in some technical contexts, “master/slave” is a descriptive metaphor. In some multi-process systems, for example, you have one process directing the work of several others in a one-directional way. Centralized control, disempowered “worker” processes, &c. Okay. Maybe. There are lots of other terms that work just as well (see “worker”, above), but it’s not totally detached from what’s going on in the thing being described. Fine.

But git has none of these characteristics, and cases which do are, by far, the exception. It’s generally quite a bad fit. We use the “master/slave” terminology not because it’s the best description of what’s going on, but because it’s an easy default, and we aren’t accustomed to thinking about the implications.

In my experience, the CompSci world uses “master/slave” most commonly in databases… where it’s a particularly terrible choice. It’s usually used for some variety of replication or failover — neither of which describe anything resembling a “master/slave” relationship in the real world. In what “master/slave” scenario did the enslaved person get a copy of everything the master had? Or inherit his position, role, or responsibilities if the master became incapacitated? It’s a bad choice for databases, and it always was. We use it because the “master/slave” relationship, and the power dynamic behind it, is so ingrained in our thought process it just seems like the default. It was a lazy choice.

Git is much more like the database usage than the multi-process system. In fact, the “master/slave” terminology is so inappropriate in git that it leads many people (including myself, when I first started using git) to assume that it must have some other origin, like “master recording” in the audio world. After all, aside from a few off-hand comments in documentation, git doesn’t really have a concept of a “slave” repository to go with “master”, so the origin must be something else. Right? Unfortunately, that isn’t true. A lot of the thinking that went into git came from experience with an older system called BitKeeper. BitKeeper did have an explicit concept of a “slave” repository (where the term was also a poor choice on technical grounds), and git inherited the name of its default branch from that.

This ambiguity is a real loss on purely technical grounds. A “master”, in both the “master/slave” and “master recording” sense, is “special”; the default branch in git is not. This can be seen in how easy it is to change the name; it’s a single command to change it on a given repository, and one setting in a config file to change the default when creating a new one. And nothing breaks. You aren’t required to have a “master” branch; you don’t have to modify git to get rid of it. It’s just a default name for the first branch you start off working in. Knowing that there’s nothing special about it, that it’s just another branch, is actually useful and important if you get into the deeper arcana of using git. Calling the default branch “master” obscures this.

So the “master/slave” language is a poor fit on purely technical grounds in the vast majority of cases where it’s used, and in the remainder, there are lots of other choices which are as good or better. I can’t think of any case where “manager/worker” is worse on any technical point.

And, of course, the above discussion has all been setting aside the social considerations. But we shouldn’t really do that. Git, like most any software, is written for and by humans, so taking into account social concerns is entirely proper. The fact that many people are upset by the terminology is, in and of itself, an entirely valid reason to consider changing it. Were there really strong technical reasons to keep it, or equally strong contrary social reasons, you’d have to weight your options. But in this case, like most uses of this terminology, there just aren’t. It’s bad on the technical merits, has bad history, upsets some people, and the “pro” argument seems to boil down to some combination of “this is the way we’ve always done it” and “we shouldn’t want to make things better for people”. Both of which are crap.

It all makes the fervent defense of the term pretty questionable. It’s transparently not a defense on technical grounds of the term itself, and a defense resting on inertia is pretty weak, too. The more zealous the defense the clearer it becomes that it’s entirely about social issues… but in a pretty dark way.

Now as I’m writing this I see a note that github has decided to replace “master” with something more sensible. Good. I’d still like to see git itself change the default, but github is certainly the 10,000 lb gorilla in the git ecosystem, so that’s a good sign.

Anyway. Kill your masters. And rename your default branch.