How DevOps helped Azzimov improve quality and reduce lead time
Azzimov teamed up with Microsoft to implement DevOps practices into their processes. This report describes the steps we took and the results.
The practices we implemented include:
- Unit testing
- Continuous integration with Jenkins
- Continuous deployment
- Integration testing
The core hacking team:
- Kevork Aghazarian – Project Manager, Azzimov
- Yassine Zeroual – System Administrator, Azzimov
- Prasad Perera – Back-end Developer, Azzimov
- Yanick Plamondon – Back-end Developer, Azzimov
- Julien Stroheker – Technical Evangelist, Microsoft
- William Buchwalter - Technical Evangelist, Microsoft
Azzimov is a startup based in Montreal, Canada, that delivers a virtual shopping mall solution that leverages machine learning. The main focus of this hackfest was to increase the quality and the reliability of the product and its deployments.
Azzimov is a very agile organization, which is why they have been able to bring innovative solutions to the market consistently for many years. But keeping the same pace of delivery becomes harder as the company grows. A lot of organizations do not realize that; they hire new people regularly to counter the productivity loss, creating even more technical debt in the long run.
The other, better solution is to put time into managing and reducing the technical debt early and continuously. That is the approach Azzimov chose.
The team put the emphasis on two aspects:
- Quality: Minimizing issues and downtime in production when releasing new features.
- Time to market: Increasing their ability to ship fast and stay ahead of the market.
Azzimov’s architecture is very granular, with each component being isolated. This is a very good practice that makes changing any one of the components much easier (and that was great for our hackfest). It would also greatly simplify moving to a container-based architecture, but that’s for another day.
For the purpose of this hackfest, we decided to hack on their most complicated component: Trinity Gateway. If we managed to do something meaningful there, then Azzimov’s team was confident they could implement similar practices for their other components.
Trinity Gateway is a Java-based service that acts as an orchestrator for every other service in the solution.
Solutions, steps, and delivery
The first day of the hackfest was dedicated to value stream mapping. This process allows us to quickly capture the waste and issues in the workflow, from ideation to production.
The exercise let us quickly find areas in need of improvement.
First, we saw that not enough checks were in place to reach the level of quality that Azzimov wants.
We also found that the mean lead time from the planning of a feature to its release in production is around 5 weeks. This lead time was probably somewhat optimistic, but not too far from the truth.
Ultimately, the goal of DevOps is to reduce this lead time as much as possible to allow rapid feedback and iteration, but in order to do that, we need a way to be confident in the quality of our product.
For these reasons, we decided to focus first on continuous integration and continuous deployment of new changes into a staging environment. If done properly, this would allow them to catch any issues much faster than today, and more importantly, to spot issues before they impact a customer.
These are basic building blocks of DevOps that can be built on later.
Because the Azzimov team already had some experience with Jenkins, we decided to go ahead and use it. Because we love automation and we don’t want to set up anything manually, we deployed a new Jenkins instance on Azure using an Azure Resource Manager template that can be parameterized to set up any number of nodes.
Once Jenkins was set up, we split the work between two teams:
- One would work on continuous integration.
- The other would work on automating deployment on a staging environment.
One of the main painpoints that we identified during the value stream mapping was the lack of tooling that could help Azzimov implement some DevOps practices.
Between build and release were a lot of manual steps. The build (using Maven 2) was also executed manually from a developer machine. Although that is a very common practice, if we automate many of these steps, we can significantly reduce lead time and improve quality.
We explored the concept of continuous integration and the possibility of building the code at each commit in their GitHub repository, as well as the impact of adding unit tests and running them at build time.
We decided to automate those tasks and create a pipeline project in Jenkins 2.0 that consists of the following four stages:
- Checkout sources: Classic Git clone of the repository.
- Build: Building the .jar file with Maven.
- Unit tests: Run a shell command to execute the JUnit tests in the project.
- Publish artifacts: Upload the .jar file into Azure Blob storage, ready to be deployed.
Here is the code of this pipeline in Jenkins:
The next step was to deploy this artifact (.jar) automatically in a new environment to run the integration tests.
Continuous deployment to staging
The idea was to automatically deploy the artifact previously built in an environment, ready to be tested. For this step, we introduced the concepts of infrastructure as code and configuration management on Microsoft Azure with the Azzimov team.
First, we had to define the needs for the application:
- An Ubuntu environment
- Java Development Kit (JDK) installed
- Tomcat binaries
- Download the artifact from Azure Blob storage
- Launch Tomcat on the port 8080 with the JAR file
With an Azure Resource Manager template, we automated the deployment of one virtual machine running Ubuntu on Azure. When this deployment was done, we added a custom extension written on shell to run our custom settings.
One interesting countermeasure that we had to find was the way to automatically accept the license agreement from Java:
We used the
echo debconf command to remediate to it (see the next snippet).
Here is a piece of code from the shell script that we pushed as a custom extension in the virtual machine:
So now that we have one JSON file containing our infrastructure as code part and one shell script containing our configuration part, we are ready to execute it any time we want.
Here are the related Jenkins steps:
- Check out Resource Manager template: Git clone of the repository containing the Resource Manager template and shell script.
- Upload deployment script: Upload the shell file into Azure Blob storage, ready to be executed as a custom extension.
- Create resource group: Execute an Azure CLI command to create a resource group on Azure where we will deploy our environment.
- Deploy integration environment: Execute an Azure CLI command to deploy our environment thanks to our Resource Manager template.
- Install Trinity: Add a custom extension on the virtual machine previously created to execute our shell script.
- Send SMS: Send a text message to one of the Ops people to acknowledge the deployment using a Trello API.
Here is what it looks like as code:
We then decided to push ahead with our objective of improving quality. One issue is that the existing code base does not have enough unit tests. As any developer will say, adding unit tests to a code base written by someone else is very hard. If the code base is somewhat large, this can be a huge time investment.
While it is probably always worth going back and adding unit testing to critical pieces of the application, functional tests offer a better return on investment in the short term. By testing the whole stack, we can quickly make sure everything is working as expected with minimal efforts before a release.
During the value stream mapping, we identified that the team did some manual sanity checks using Curl before going into production. We decided to create a new repository containing functional tests for all the different services, and capture the manual tests as a first step.
We decided to use Node.js for functional testing for several reasons:
- It is easy to learn, and some developers already had experience with it.
- With a library such as Mocha, we can easily export test results in an xUnit format to display them in Jenkins.
- Libraries such as PhantomJS will make UI testing easier on a build agent.
Here is an example of what the new tests look like:
We then hooked these tests into the Jenkins file so that they are executed once the environment finishes deploying.
While we can do only so much in three days, continuous integration, the addition of critical unit tests, and automated integration tests should make a difference in the stability of the application when releasing a new update. In the meantime, continuous deployment will help reduce the lead time by letting developers verify their changes more often, thus detecting issues before they reach production.
The new pipeline feature in Jenkins 2.0 has proven very flexible and easy to use. Because everything is code, maintaining and evolving continuous integration and continuous deployment pipelines is straightforward.
Here is the stage view of our completed Jenkins pipeline:
A lot of improvements can be made to this script, among which:
- Using a Jenkins variable to detect the current branch, and deploy in production (or staging) when on
- Deploy a temporary environment on Azure for every commit using the Resource Manager templates, run the functional tests on this environment, and automatically deprovision it.