Converging to monorepo

opinion 13 Jul 2018

A revision control tool is more than its interface, but also its ecosystem of tools and pool of professionals that can use it, and they play a central role in developers workflow that it affects how they think and work.

If you are used to a centralized system, like CVS or Subversion, then you work with the idea of having a single source of truth, but you have more barriers to commit as the repository history grows - the so famous “commit rights.”

If you are used to a distributed system like git or hg, then you are free to commit to your copy of the repository, but the source of truth in a distributed system is a social contract 1, not a physical fact.

In practice, in a professional setting, I would favor the centralized flow - having a single source of truth mattered more than the politics linked to commits rights. It also meant that when I switched from subversion to git, I changed the tool but kept the concentrated workflow I had a set of private repositories linked to each other. The monorepo was a reality in my work life - among the copies, there was one copy that everyone should be pushing and pulling.

When GitHub became a thing we all got free git repositories and if you were willing to pay few private repositories. The world saw an explosion of repositories - for me, every new experiment and ideas gave birth to a new repository. Some of them evolved and became useful, but problems arose when they started depending on each other. Refactors meant multiple commits in different repositories, and each repository risked having a slightly modified version of the same code.

Multiple repositories for me meant the code would more likely diverge than converge. It happens because each local use interferes with an integrated piece of the code - you stretch here and there to make sure things works, and small subtle changes cripple in. When you are working on modules that have an established interdependency between them, the last thing you want is a divergence between them.

I was always under the impression that the stories of success of monorepos that you hear from the big companies were limited to them. I always considered monorepo for personal projects an overkill solution.

A few days ago, I had a hardware crash. When restoring the data to the backup computer, I realized that I was working on several repositories as if they were all placed together. Well, they were. In the directory, they are all together, but they were stored all in separated remotes.

I was working on a refactor that affected several of them, I found myself git-adding, git-committing and git-pushing the same changes several times. When I worked the source code, it was an atomic change. However, I was unable to store these changes atomically remotely.

For the sake of science, I decided to pretend I was Google. If Google has Piper, I have Git (please, humor me). So if they’ve got a monorepo, I’ve got a monorepo too.

In the next post, I am going to share the details of how I merged all my repositories in a single one while keeping their history intact. For now, I want to share the actual consequences in my workflow.

The first and most obvious consequence is that now I was able to make all my changes as atomic commits. I would make them all and push them at once.

The most visible change came from the continuous integration tool. For each commit, I was able to run all tests and observe how a particular change could have unexpected consequences at the moment they were made, not when I integrated them into the smaller repositories.

I got the deployment dramatically simpler. Before, for every sizeable change, I would have to independently coordinate integration and deployment. At some edges cases, I would risk having one service running with a code older than the latest because the change broke it and the fix was non-trivial. When the code sits in a single repository you have the choice to detect first if services have changed or not, and if they did, deploy them automatically. All that only after you ran the complete test suite successfully.

One could argue that having a monorepo is suitable for development but inadequate for sharing. It is correct but fixable. Using git-subtree, I created mirror repositories where people can open issues and pull requests. Merging these changes back is trivial when I use git-subtree.

The conclusion is that converging all my repositories to a monorepo made many things better, with a very little loss for social contribution (it takes longer to get the changes back and forth from the mirrors).

It means that monorepos are not an exclusive set up of large companies. If you are a prolific engineer with lots of repositories that depend on each other, give a monorepo a try. It should cost very little of your time, and if it does not work for you and you use git 2, you can split the code back into many repositories and keep the history intact.

1 The Linux Kernel is perhaps the most expressive example, everyone knows that the reference is Linus Torvalds’ repository.

2 I don’t know if it is possible with Mercurial, but I would guess it is.