Development operations at Shopify has a long history that winds its way through various systems, technologies, and iterations. Over the course of these iterations we have learned a number of lessons, experienced ups and downs, and ultimately ended up with a system that performs well.
In this post, we’ll discuss the systems we have had throughout the history of Shopify. In a follow up post we'll talk about the system we run today and we will have another post that includes a few forward thinking questions about developer operations at Shopify.
Throughout Shopify’s history, we have used 4 different systems to create and maintain developer systems. Each one has its own personality and optimized for different needs. We’ll discuss each system, explain some pros and cons of each system, and why we decided to switch from this platform.
In the earlier days of Shopify we had a few dozen developers and no automated environment setups. This system was quite ad hoc with the entire system being described in a hard coded list. When a developer started they would be handed a list of dependencies to install (e.g. install MySQL, install Homebrew, install Ruby version X however you want etc). There were some bin/setup scripts to automate some of this, but these bash scripts often broke. Moreover, as dependencies changed new sub-dependencies would be added without realizing it, so the developer would have to decrypt cryptic messages to determine what they need to install to continue.
This system was quite error prone and meant that developers had divergent systems if they chose to install things differently. On the other hand, developers often learned quite a lot about the system, how it worked, and the macOS operating system (our OS of choice to this day).
As the company grew, this system became cumbersome to execute and we often spent a long time setting laptops up. Finally, this cost became too much so we decided to consolidate this process using Boxen.
Boxen (https://github.com/boxen/our-boxen/) is a tool that is implemented on top of Puppet (https://puppet.com/) and open source system configuration tool. The intention of Boxen is to wrap some of the more complicated parts of Puppet and allow repositories to include customization scripts.
During our usage of Boxen, we saw the first signs of benefit from consolidating this configuration - developers could help each other a lot more easily with similar setups, we could merge changes to the setup and it would apply everywhere, etc. This allowed us to more easily maintain and deploy developer systems and we were able to move faster.
Unfortunately, macOS is a complex system and running Puppet only added to that complexity. As the system began to run on more and more systems, we saw problems with things that users changed, and things that conflicted with Boxen/Puppet configuration. What made matters worse is that Puppet was not a well known or understood tool and we did not expect developers to learn it. This meant that when something went wrong people could not debug the system themselves. This is what caused us to seek a different alternative.
Looking at the issues we experienced with Boxen, it was clear to us that we had to allow developers to modify the system (they’ll do this regardless), have a consolidated experience, have a system that people understood, and we also desired a more similar configuration to production. This led us to Vagrant.
Vagrant (https://www.vagrantup.com/) was our answer to Boxen. Vagrant is a virtualized system that allows projects to be run within the confines of a virtual machine (VM). An included configuration file can allow a developer to simply start with a ‘vagrant up’ command which would initialize a production-similar system with all dependencies and code. With this setup, all code and dependencies ran inside of a VM which allowed the developer to modify their system in any way without impacting the VM itself. This was a solution that hit all of our requirements of consolidation, production similarity, consolidated configuration, and a system that people understood (more on this in a moment).
Vagrant, for Shopify, was generated from our production Chef (https://www.chef.io/chef/) configurations. Our entire operations team understood how this system worked, so we were able to modify and change the system readily. This also meant that we had a production-development parity. Unfortunately, our production environments often did not take our Vagrant environments into account and we consistently broke developer laptops. Moreover, as the development team grew there became more of a disparity between the number of operations engineers who knew Chef and the number of developers.
We also had other issues with Vagrant. With the code hosted in the VM but the text editors and IDEs on the host machine, we were required to mount an NFS drive from the VM on the local machine. This ended up causing issues with syncing and file changes were often not picked up which resulted in a lot of lost productivity and time in debugging errors from the system.
We ended up deciding that Vagrant was not a proper solution due to the NFS issues and the constant breakage. We sought a new solution and researched a new alternative. From this time, dev and Railgun were created.