Clutternaming, The Anti-pattern

Naming things and anti-patterns are two things that have fascinated me. So, when Marc Atwood tweeted with a name for an anti-pattern I’d personally experienced I was excited to have a name to it. Marc tweeted:

today I asked a master of naming antipatterns for a name for the practice of naming directories in an evolving project after version numbers, instead of just using the VCS. His answer: “clutternaming”.

What Is It?

So, here is the idea… you have a codebase that has more than one major version. This is where you have breaking changes. The way the different versions are stored in code is through the use of directories. So, two major version live on disk next to each other at the same time.

For example, you have a library with v1, v2, and v3. You might have directories like so…

mylib
├── v2
└── v3

In the mylib directory you have major version 1 code, in the v2 directory you have major version 2, and in the v3 directory you have major version 3. All in the same checkout.

When you reference the library to use you have the major version in the path for anything above v1. This leads to the use of the library including the version in the name.

Repeated History

This has to do with what was old is new again. Subversion used to be a popular source control system. At one time, it was the most popular one. This was before Git took over.

In subversion it often happened that major versions were on different branches. Branches and tags in subversion are just directories.

When Git and the other news source code management tools came along they had a new way to handle branches and tags that didn’t use directories. This mode of handling versions became the popular way to handle things. They stopped the filesystem sprawl.

More recently, some projects have started to talk about using directories to handle major versions. Even when using a tool like Git.

Anti-pattern, Really?

You might wonder, is this really an anti-pattern? In software there are few hard rules that apply to everyone all the time. Software is used for just so many things from micro-controllers to web UIs. But, for most of us most of the time I’ll argue it’s an anti-pattern.

Andrew Koenig coined the term in the Journal of Object-Oriented Programming back in 1995 when he wrote:

An antipattern is just like a pattern, except that instead of a solution it gives something that looks superficially like a solution but isn’t one.

Martin Fowler put it this way:

The essential idea (as I remember it) was that an antipattern was something that seems like a good idea when you begin, but leads you into trouble.

The famous book, Design Patterns (a.k.a. the Gang of Four Book) describes two things that distinguish an anti-pattern from a bad idea. Wikipedia documents these as:

  1. A commonly used process, structure, or pattern of action that despite initially appearing to be an appropriate and effective response to a problem, has more bad consequences than good ones.
  2. Another solution exists that is documented, repeatable, and proven to be effective.

Given this context around an anti-pattern I will argue based on these points. First, there are negative consequences from clutternaming. Here are a couple:

  • It doesn’t scale as the number of versions go up. Especially on non-trivial software. One of the libraries I use is Kubernetes client-go. It is on major version 12 (as of this being written) which has over 1,500 files. Consider the case where all of the versions live side by side in the directory tree. You end up with file sprawl.
  • Source control management history is lost or practically can’t be navigated. For example, you start a new major version so you copy the contents of the old version into a new directory structure. Now, Git commands to look at the history don’t show information across major versions. It becomes less functional and the experience is more broken. This works against modern source control management.

Take a little time and you will find more issues.

Second, let’s talk about another solution that exists. That is the status quo where branches represent different major versions. Prior to Git clutternaming was common. Git and the other more modern tools came out with cheap branching. This worked so well that using branches for major versions became normal. No file system sprawl, scaled up with the number of versions, and it worked with the native features in the source control management systems.

Clutternaming is an anti-pattern for most of us. There are cases where people crafted a codebase of spaghetti code and clutternaming is the solution that keeps folks moving. Sometimes an anti-pattern for most of us is the best solution for an edge case. I would just suggest being careful to avoid clutternaming unless you can justify it despite all the negative consequences.