Go vgo: Semantic Versioning and Human Error

Note, a second post with some additional detail is available in a post titled "Go vgo: A Broken Dependency Tree".

In Sam Boyer's introduction to his analysis of vgo he touched on a number of practical, day in and day out, issues that can arise from using MVS. MVS, an acronym for Minimal Version Selection, is a new algorithm for solving a dependency tree. You can read more about it in Russ Cox's series.

MVS is a new algorithm not found in other programming language package managers making it, in my opinion, something worth analysis and discussion. We should understand what we're getting ourselves into, warts and all, because it's different than we're used to.

In Sam's analysis he noted a case that goes like this:

“Our project depends on [email protected] right now, but it doesn’t work with [email protected] or newer. We want to be good citizens and adapt, but we just don’t have the bandwidth right now.”

I wanted to share a real world example where this happened.

Breaking Semantic Versioning

Before I touch on the example it's important to talk about Semantic Versioning, a.k.a. SemVer.

According to the specification, minor version changes are required to be backward compatible. The 5 and 7 in the example above are the minor version. So, 1.7.0 should have worked. That means the package maintainers released a version that broke from semantic versioning.

The issue Sam called out about handling the case where someone isn't following the spec. This can be on purpose or by accident. It's not uncommon to find it.

Two Ways To Solve The Problem

There are two ways, at least, to solve the problem:

  1. Rewrite the code using the dependency so that it works with the new version
  2. Set a maximum supported version (e.g., > 1.5.0, < 1.7.0)

These methods don't need to be used in isolation. In practice, the solution can be to set a version range until you have time to rewrite the code that uses the dependency.

Why not rewrite the code right away? I know some have asked this. As it is human to error and break from the semver spec it is also human to set priorities. Here are a couple real world reasons to release with a cap on supported versions:

  • Your application released with a bug because of a behavior change in a dependency, where the dependencies signature didn't change. You want to release a bug fix quickly. You roll back to a version range you tested as working to get a fix out quickly
  • The product priority decision makers want to get some features out before doing cleanup coding work. This could be due to a partnership, special showcase event, or something else. Features and making it work under a deadline happen

It's important to remember that all of this thus far is because we are human, we err, we set arguable priorities, and are dealing with a world of dependencies where some are outside of software. We need fault tolerant solutions.

The Real World Example

Helm uses gRPC. When gRPC made their 1.4.0 release they made an important change that wasn't obvious from the release notes. Prior to 1.4.0 there was one function of MaxMsgSize. In, and following, the 1.4.0 release this function was deprecated and there were two new functions of MaxRecvMsgSize and MaxSendMsgSize. Where MaxMsgSize had previously set the size on both send and receive it was now just set the receive value. This was a change in behavior.

When Helm made the update that crossed this gRPC release change it passed all the tests. In order to flex this issue Helm would need to be used with a large data set rather than just testing functionality.

This caused a bug that was later found in production. What was the Helm project to do to fix this bug and get a release out quickly?

  1. Roll back the dependent version on gRPC to one we knew worked, issue a patch fix, and then dig into what was going on once users had their issues fixed
  2. Take the time to figure out gRPC, what happened, and how to alter our code to make it work with the newer gRPC version before releasing a bug fix

I can't state this enough, it's important to respond to people and act as quickly as needed to work with the human response.

How App Devs See Dependencies

I know that developers at some highly profitable companies, especially those that are large or concerned with security, view dependency management as a space where you review and understand all changes to all dependencies.

For many that just is not the case. To illustrate this I'll share two examples:

  1. Companies are concerned with speed of development at the moment. So much so that Gartner now maps "Enterprise High-Productivity Application Platform as a Service" as a thing. Taking time to review and understand all dependency changes slows speed down. Productivity here ends up being features the business care about
  2. DHH, the creator of Ruby on Rails, recently gave a keynote at RailsConf. In it he talked about things we used to deal with (his example was SQL) that people don't need to anymore. About how knowledge and concern with problem spaces has shifted. Whether you agree with his view or not, this view is shared by many and it applies to how they see functionality in dependencies

Now, this isn't about how people should act but rather about how they do act. Should is arguable while do is observation of behavior.

Humans and Trust

gRPC broke with SemVer by making a behavior change. It was one that, in retrospect, didn't need to happen. The new MaxMsgSize could have called both the MaxRecvMsgSize and MaxSendMsgSize functions to keep its former behavior.

Since the maintainers were willing to make such a change it causes issues of trust. Will the maintainers make SemVer breaking changes again? If they did it before without good reason they may do it again. This now becomes an issue of trust.

How can we codify rules of trust into dependency management? Trust that is sufficient for people. Since different people are different, trust that can satisfy different variations of trust. What coverage of codified trust works for most developers?

Issues Can Bubble Up?

Helm isn't just an application. Helm is designed in a manner where it can be imported as a dependency and other things can be built on it. Some applications do that today.

If Helm doesn't work with the latest version of a dependency because of an issue, how does the parent application importing it know that? What if the application importing Helm happens to import something else that asks for a newer version? How can Helm communicate up the tree so the resolver can know, programmatically, not to go newer? What if the developer of that app importing Helm doesn't know about the issue and tries to manually update? With vgo and MVS it'll pass and things can silently break.

This issue with a break in semantic versioning doesn't have a path to communicate it up the dependency tree for the MVS resolver. The silent behavior change that impacted Helm doesn't have a way to communicate it up to consumers of Helm, as a package.

If I can't communicate version range trust up the dependency tree can I trust the tool that's doing dependency management? Of course, as different people have different views on trust there will be different answers to this. Whose position on trust will be covered by the tools?