Storing More Than Container Images In Registries

Did you know that you can store more than container images in many container registries? Container registries generally follow the OCI Distribution specification. While still unreleased, as of the writing of this, there have been recent changes that make the type thing (a.k.a artifact) stored and distributed through registries more general. This work was started though a project called OCI Artifacts and many current registries have some level of support for storing a variety of things.

In this post we will look at what’s going on in this space, since there are many angles to this topic.

WebDav, Object Storage, and Registries

Why would anyone want to put all kinds of artifacts into container registries? This is a question worth asking. To understand it, I think it’s useful to go back a ways and look at APIs to store things.

WebDAV came to the world as RFC 2518 over two decades ago. It extended HTTP to provide a standard API to put, get, list, and perform other operations on resources. Sound familiar? It might remind you of object storage.

In 2006, Amazon launched Simple Storage Service (S3) and it launched a wave of varying object storage providers. Everyone from open source projects to public clouds came along with their own take on object storage. Most of these services had different APIs.

The diverse set of APIs was great for innovation and vendor lock-in. It wasn’t so great for end-user agency, those who wanted to work with multiple providers, or competing providers who wanted to try and pull customers away from competition.

In recent years there has been an increase in some types of standardized artifacts around cloud native (containerized) platforms and software in general. Many of these need to be distributed in a similar way to container images. In the CNCF alone there are Helm charts, Falco rules, OPA policies, OLM operators, and more.

Wouldn’t it be great if there was one API to push, pull, and distribute all of these things? Object storage doesn’t provide this because there are so many varying APIs. WebDAV? How would anyone get interest in something created so long ago?

Since so many places need to have container registries already, why not use those?

Storage Vendors

We can’t look at the “why” without looking at vendors.

The largest players pushing for artifact support in registries are storage vendors. This should not be a surprise. It lets storage vendors show off something new and shiny (this works for engineer career growth and new features for marketing) and provides a common API (it’s easier to write interoperability and sales can more easily sell a migration case).

There are some definite positives from a common API and I think it’s great that vendors would like to have one.

Is The OCI Distribution API Good For This

The OCI Distribution Spec spells out an API that’s designed for container images. Images that are often large and are layered (one image has a pointer to another image as a parent layer). Most of the other artifacts are small and not layered. That means the OCI Distribution Spec API isn’t designed for this artifact use case. It’s a bit like trying to stick a round peg in a square whole. The round peg is small enough to fit but it’s not really meant to go there.

Storing a variety of artifacts in distributions brings up user experience issues, as well. Registries often display commands for working with images and metadata around them. Different artifacts will have different commands for their differing tools. Metadata will be different as well. How will registries handle this so that users have a good experience? As of today they don’t and the experience is generally not great.

Of course, none of the experience from UI to search are covered in the OCI Distribution Spec. This is an exercise left up to the registry developers.

To recap, from a technology perspective the OCI Distribution Spec is not ideal. The API wasn’t designed for the types of artifacts people want to store. But, the OCI Distribution API can be made to work, sorta, for many of the artifact use cases. Even if it’s not ideal… it can work.

Distribution and Varying Registries

I’ll start by getting the big one out of the way. Docker Hub does not support artifacts.

If Docker Hub doesn’t support artifacts than who does? Distribution (formerly Docker Distribution) is the foundation for many of the current registries and it does support artifacts. Azure ACR, Amazon ECR, and Harbor are some of the examples of systems that do support artifacts.

Then there are some upstarts like bundle.bar.

bundle.bar

Here is where reality and specifications get interesting. It appears the specification is going to be open to artifacts yet some providers may not support them which would, it appears, mean they are not in full compliance with the specification. This is one angle to the OCI and its specifications I will be keeping an eye on.

Open Source Projects

A number of open source projects are experimenting with storing their artifacts in OCI based container registries. For example, Helm has experimental support to push and pull Helm charts to and from an OCI registry. This can be used instead Helm repositories, to some extent.

Helm isn’t the only project taking advantage of OCI registries. There are many others either trying it out or looking into it.

When open source projects are using OCI registries they are often using libraries like oras to do the heavy lifting. Yet, this Microsoft project points to another area of uncertainty. It’s only lightly maintained and that maintenance is from non-Microsoft folks who have a light amount of control. There is still no 1.0.0 API stable release that one would like to have when you rely on it.

Should We Do This

All of this has left me with the question, should we do this? Should we put the round peg in the square hole?

In trying to answer this question for myself I tried to answer a couple questions…

1. Is The Use Case There?

Many need to run their own private registry already. This is for a variety of reasons such as on-premise setups (on-premise has been re-branded edge and it’s hot again). Why run many types of things to handle their artifacts when they can just run one? This can simplify the footprint.

2. Is There A Better Way To Manage These Artifacts?

This question really has three angles. There is the pure technology angle and then there is the practical organizational angle. Are there better technologies to manage the artifacts? The answer to that is yes. For example, one could use S3 and Minio (for setups outside Amazon) or a provider with the S3 API.

But, there is also the practical question. Does IT want to run two services (OCI registry and an object storage for the other artifacts)?

Some organizations are going to say YES because they already need to provide an object storage while others are going to say NO because they want less to operate.

Then there is the chicken and egg problem. How can you install object storage into your Kubernetes cluster before you have object storage to get the artifacts from to start it up? At the base level you need an OCI registry to get things running and this is where organizations may not want to run multiple services.

In Summary

The situation is as clear as muddy water. There is potential. There are downsides.