Do I Need An Operator?

Operators have become a hot new pattern in use among cloud native organizations. There are libraries, frameworks, talks at conferences, and so much more talking about them.

There is good reason for this. Operators can be incredibly useful. Operators enable the codification of operations business logic into an application that can oversee an application. What is often in a Runbook for an operations person to perform when an incident or event occurs can now happen automatically.

Then there are tools like Crossplane that make it possible to use services, like MySQL, in a cross cloud compatible manner as a SaaS. In fact, operators have made it much easier to run a SaaS within a Kubernetes cluster in general.

There are some who tell me that everything needs an operator. That it’s a requirement for every application running in a cluster, a panacea, or a silver bullet. This isn’t the case either. I’ve seen cases where focus and work on an operator has lead to an application and overall experiences that failed to meet any form of user needs.

If operators are useful but should not be applied to every situation it’s worth asking, when should we use operators?

Usefulness

A test I like to apply is to look for usefulness. There are lots of shiny things we can chase. How are they useful and to what degree?

For example, a common type of application to deploy is a stateless service. These are often in the form of a twelve-factor application. Historically, they may have been easily deployed to Heroku or Cloud Foundry where they would run for long periods without any issue. In Kubernetes they would typically be deployed as a deployment.

Should you create an operator that manages this application?

To make it a little practical you might ask:

  1. Are there Runbook tasks that can be automated?
  2. Is there some task that needs to be performed based on an event? Can that be easily codified in an operator? Is there an easier method to codify this task than an operator?

For this second question I find it’s really important to ask if there is a low-fi way to implement the feature. For example, if you want regular backups of data it can be fairly easy to implement that as a CronJob. Does the low-fi solution work well enough to meet the need?

Another way to look at this is to have an engineering problem. One that is well defined. Then look for the simplest way to solve that. Sometimes an operator will be the right choice. Other times something else will be simpler.

Common Operator Situations

Operators do have a place where they are currently the best choice for solving problems. The following are a couple of the places I have personally seen their usefulness. It’s not all inclusive. I’m sharing more as inspiration.

A SaaS In Your Cluster

There are times where you or your organization may want to offer something up as a SaaS. A common example is a database like MySQL or PostgreSQL. There are a few ways to handle bringing a common technology like a database to a cluster.

First, everyone who needs it can manage the database themselves. This isn’t really the case for a SaaS. Suggesting an operator here may be putting the cart before the horse. A decision needs to be made based on the merits of the problem that a SaaS is needed. Once a SaaS is decided on based on it’s own merits then we can look at an operator. For example, if one team is going to run PostgreSQL it may be much simpler for them to manage it using a Helm chart or collection of Kubernetes manifests.

Second, if you have decided on a SaaS then there are options. For example, there is the Kubernetes service catalog which uses the Open Service Broker API. This is an option and it’s been designed to be similar to what works in Cloud Foundry. Cloud Foundry has successfully used this for years.

But, the Kubernetes service catalog still does not have a stable release while it has been in development for more than three years. Since the start of 2019 the level of development has shrunk and numerous people moved to other methods and projects. This may be unpopular and I do not mean to hurt anyone’s feelings but the service catalog does not appear to be the path forward for this.

Operators using CRDs and custom resources appear to be the path forward. A cluster scoped operator can be installed that enables people to use CRs to request a service. That service can be something running in a public cloud, like RDS, or it can be something running in the cluster.

Complex Applications

There are some very complex applications running in Kubernetes. For example, there are people who run OpenStack within Kubernetes. Dealing with the complexity (e.g., ordering of installed services) has led to the development of new tools.

There is no one way to manage complex applications and determining the best method is something for your organization.

One way to manage complex applications is to use an operator. Essentially, this is a piece of software to manage other software. This makes it possible to use CRDs and CRs to declare the applications details and then the controller can handle the actual management and imperative elements within the system.

Automating Runbooks

Runbooks are an essential part of the process to operate things you care about. Many organizations successfully use them. They also provide an opportunity for automation.

Runbooks are documented tasks of what to do when an event happens. These are ideal for automation. After all, if you can describe to a person how to handle an event we can, often, describe how to do that to a machine via code. What is the thing called that looks for events in Kubernetes and acts on them? And, it’s got application specific business logic? An operator would be a fit.

An operator is only one type of application that can handle events on another application. An operator is a controller with application specific business logic. The original post announcing operators says,

An Operator is an application-specific controller that extends the Kubernetes API to create, configure, and manage instances of complex stateful applications on behalf of a Kubernetes user.

You may run into a situation where you are building runbook automation that does not need to extend the Kubernetes API. Maybe it just needs to leverage the API and not extend it. There are cases for non-operator applications to manage applications. There are also times to use operators for this and they fit quite well.

Should You Use Operators?

If they fit your business or technical need then yes. They are like any pattern. There are places they are a good fit and other places where there are other patterns that are a better fit.

Just don’t give into the hype that you always need them. Look for a problem to present itself that they are a good solution for.