Migrating Docker Registries

03 Oct 2020 docker managing delegating case-study

What is the problem?

Starting in September 2020, my team (along with my peer teams) needed to migrate to a new remote Docker registry. The team in charge of the registries were migrating to save costs (more than 10x savings!) as well as increase throughput by running in the same region as our other software. Great idea!

The registry team gave lots of great advice, thorough documentation on old and new URIs, the new authentication mechanism, timeline, etc.

The timeline looked like this:

What is the state of how builds work?

Now when you take a look at a problem like this on the surface, you might think this is what needs doing:

But when you start to take a closer look at the problem, there is a lot more to it. In order to figure that out, you need to know more about the ecosystem.

We can validate that the migration is a success when both

What do we need to do?

So with that, you start to realize there is a decent amount of work.

How does a team actually accomplish this?

It was my job, as a manager on the team, to help the team work through how to do this. We decided to optimize for:

So to kick off this work, we

What would you do differently next time?

Establish what success looks like earlier

It took some time to understand how we know the migration is complete. Knowing that earlier makes it easier to explain the goal to the team, and to find an owner for that verification. For us, success looked like successful builds of our on-premise software, ans successful deployments of our cloud software.

Write out what work is needed

Once we knew what it would look like to be finished, we could work backwards and defined the tasks along with their ordered dependencies. One thing we all agreed would have been helpful would be a diagram showing the dependencies.

It was about halfway through the project before we started doing this, and once we did, it was easy to communicate the work remaining and the order to tackle the work.

Have a kickoff meeting with everyone involved

It was apparent not every team had the same context on the problem, the constraints, or the work required to accomplish the goals. This also meant not every team had to reinvent this process. We tried to write it down for them, but having time to really cement this and get Q&A for anything we missed would have helped a lot.

Start on manual migration earlier

This would have unblocked a lot of repos whose only dependency on the old registry was on old tags of images. This also would have eliminated a step where some repos first push to the new registry, then later pull from the new registry.