How DevOps practices are helping to improve Genetec's production environment
Genetec teamed up with Microsoft in a hackfest to bring DevOps practices and a DevOps mindset to Stratocast, a cloud-based video monitoring system. During this short event, the hack team touched on many DevOps practices, including:
- Continuous integration / continuous deployment
- Integration testing, UI testing, and more
- Cross-department collaboration
Key technologies used:
- Microsoft Azure
- Microsoft Team Foundation Server
- Microsoft Visual Studio Team Services
The hack team
The team for this event consisted of the following key profiles from Genetec, joined by technical evangelists from Microsoft.
- Product owner
- Test engineer team lead
- System test team leader
- Back-end system test champion
- Lead cloud ops engineer
- Back-end developer team Lead
- Back-end developers
- David Tesar – Senior DevOps Technical Evangelist
- Julien Stroheker – DevOps Technical Evangelist
- William Buchwalter – DevOps Technical Evangelist
Genetec is a global provider of IP video surveillance, access control, and license plate recognition solutions unified in a single platform, called Security Center, which they built in Microsoft Azure. They work with partners worldwide to help provide safer, more secure environments for small to medium-sized and enterprise-class businesses in more than 80 countries.
Genetec was present at our Microsoft DevOps hackathon training event in Montreal in March 2016; during that two-day event, we introduced them to some DevOps concepts and showed them how to embrace a DevOps mindset.
Figure 1. The Genetec hackfest gets under way
Three months later, we teamed up with Genetec for a hackfest (Figure 1) with a goal of improving and implementing some DevOps practices in its current production environment.
During our hackfest, we worked mainly on the back end of Stratocast, a camera security center that is 100 percent cloud-based—no devices installed on-premises. The idea is to offer a plug and play experience with cameras, without the need for firewalls or networking, for example. The targets of this offering are small and medium-sized enterprises.
Behind the scenes, Stratocast uses the same binaries as the enterprise-wide solution, Security Center, but in a software as a service (SaaS) mode because of Azure. In terms of architecture on Azure, the application uses:
- Cloud Services with five worker roles
- Azure Queue storage
- Azure SQL databases
- Storage accounts
Figure 2. Architecture overview
We focused our technical efforts mainly on the back-end solution, which uses a Cloud Services architecture on Microsoft Azure.
This current piece of code is a monolithic application and the size of the artifact (.exe) of Stratocast is around 800 MB.
Our first objective was to create a mapping of the full value stream, from ideation to delivery in production. By doing this, we were able to identify bottlenecks and find the most pressing pain points.
The value stream mapping (VSM) exercise took place during the first day and a half. While that may seem long, it is important not to underestimate the value of this exercise because it is often the first time the team is exposed to a global view of the overall process, extending well outside their respective departments. It also leads to very interesting open discussions.
Figure 3 shows the result of our value stream mapping activity.
Figure 3. Value stream mapping
One difficulty during the VSM activity was that the process is constantly evolving and not well documented. We decided to use the most recent release as a base for the exercise.
Based on the VSM, we estimated the total lead time (from ideation to deployed in production) to be around 8 months, which was consistent with the previous releases.
Two areas were notably painful: Dev and QA, with around 3 months lead time each. The problem is a very common one: large batch size.
Features are developed on separate branches, usually a month-long process. Because multiple features are delivered in a single release, the development team usually works in a silo for several months before deeming their work ready for release, at which point they create a new release branch and the system test team starts the QA phase.
Because the changes after several months are so numerous, the test team has to check the integrity of the whole system, which takes a lot of time as well.
This creates a vicious circle: Because integration and testing are painful, the initial reaction is to do the least possible, making them exponentially more painful. A better approach is to do integration and testing as often as possible, and incrementally work on improving the pipeline. Small batch size is key.
On the technical level, the situation is made worse by the lack of automation: No continuous integration existed, and the automated test suite is unreliable, slow, and triggered manually.
Deployments also are triggered manually. A custom tool is used to correctly configure the environment. It’s complicated for the team to deploy their solution on Azure just to test it. Lastly, because no load test exists, release in production is a very long and critical process to ensure minimum risk for their customers: New releases are rolled out to an incrementally larger subset of their customers every week during the course of a month.
The goal of the VSM exercise was to identify pain points and shortcomings, but it also revealed several points that were very encouraging:
- Genetec employees are very honest regarding the current state of their processes, and have a “can do” mindset. They acted as a real team to find possible countermeasures and did not spend time blaming anyone. This is the kind of environment where a DevOps culture can really grow.
- While deployments need to be triggered manually, they are already well automated.
- Some unit tests are already in place, and the team understands the value of adding more in the future.
- Integration testing, while very long and unreliable, is automated, and the team is willing to improve their quality and performance.
- Functional tests are also already in place, using a very interesting combination of SpecFlow and Selenium. Once again, they need to be triggered manually.
- Microsoft Team Foundation Server 2015 update 2 is already in use, and other teams at Genetec have significant experience with it.
Based on this we decided to focus on bridging the existing parts together in an automated fashion with the goal of reducing the feedback cycle length and ultimately the lead time of development and quality assurance.
Solutions, steps, and delivery
When we discussed the back-end application with Genetec, we understood that every step from development to production was manual. Each build was triggered manually from the server running Team Foundation Server and each step required a manual operation carried out by different people.
However, they had built tools with some automated parts to help them, such as:
- Stratocaster, which allows them to prepare, deploy, and monitor the cloud service deployment on Azure.
- A number of automated tests.
- Pieces of scripts on different steps.
But they had nothing to orchestrate and manage all the steps in the pipeline—from the simple check-in of code to the release on Azure with the tests on top of it! That’s where we focused our effort during the hackfest. We started to explore the actual pipeline at the highest level.
Figure 4. Pipeline in QA
We found that these four steps (Figure 4) in the pipeline were executed with different tools and little action between them. Completion of one step did not trigger the next, which could result in lost time between steps. We also identified an important scrape/rate on the automated testing step; the tests had to be rerun a few times to be sure everything was well executed.
One of the biggest bottlenecks in terms of time was system testing. It took three months to execute this step.
Based on all of this, we decided to work on the continuous integration feature and see how to implement it on Stratocast.
Because we didn’t want to impact the current build process, we decided to create a new build definition that will be triggered every time a build finishes.
Figure 5. Post-build step
In this new simple build, we prepared the package to be deployed automatically in the release. The goal was to eliminate the manual operation with the custom tool Stratocaster and deploy our application automatically as often as possible on Azure.
We quickly understood the pain of having an actual artifact of 800 MB, and the reality that we could not build it every few minutes.
Here are the steps that we implemented in this short build:
- Ran some PowerShell scripts to map the artifact folder of this new build to the previous one on the file share.
- Copied the template service configuration and package files (.cscfg and .cspkg) in the artifact folder.
- Added seven unit tests to the process to test the feature, for the moment.
Figure 6. Build definition
One of the developers was quite satisfied to see the results live in Team Foundation Server. He thought being able to see the results directly in the build process report could encourage people to use and implement it. He also liked the code coverage feature that, for example, allows them to see the number of lines in the code.
Figure 7 shows an example of a unit test report during a build.
Figure 7. Unit test results
After having this new build, we also wanted to automatically execute it. One developer worked on a custom script that starts this new build when the previous one is done.
Figure 8. New build connection
Release management and continuous delivery
The next big step, which was also new to the team, was the release: the ability to deploy automatically from a build to Azure and run all the tests. Figure 9 shows the steps in the release that we implemented.
Figure 9. Steps in the release
From the previous build that we had just created, we had the .exe solution plus the deployment files for the cloud service on Azure, which we called the “artifacts.”
Here are the main steps of our build definition:
- Upload the .exe package somewhere accessible on Azure: a blob storage.
- “Tokenize” the deployment file (XML) using a Release Management Utility tasks extension.
Deploy our new cloud service using the deployment file that we customized just one step earlier.
When the cloud service is deployed on Azure, the virtual machine inside calls a bootstrap script (.cmd) that downloads and installs the .exe solution on it.
- Wait for callback from the cloud service, which indicates that the solution is installed inside the cloud service.
- Copy some mandatory files to run our tests.
- Run these tests to validate this new environment.
We also added the ability to delete this environment at the end of the deployment. Team Foundation Server was now doing the same manual steps that Stratocaster was, but automatically.
Now that we had identified the process and the solution to deploy the package on Azure, we could choose to run this sequence (Figure 10) at each deployment.
Removing the Stratocaster tool from the process gives us the ability to win some precious lead time in the stream and removes many manual tasks that could lead to human error, and now anyone can launch the process without knowing how it works behind the scenes.
Figure 10. Continuous deployment
Functional UI testing
As mentioned earlier, a comprehensive functional test suite already existed but was triggered manually, and only during the QA phase. In our pursuit of short feedback loops, it was vital to make these tests run frequently and automatically. We decided to integrate them into a new release definition.
These functional tests were written with SpecFlow (Cucumber for .NET), which is a great tool for writing tests in plain language and making them readable by anyone, thus becoming a living documentation of the system.
Here is an example of a very basic SpecFlow test (unrelated to Genetec). This test is implemented in C# and will be run by MSTest (or any other .NET test runner) like a standard unit test:
The next time we open this test in Visual Studio, SpecFlow will automatically detect it and show us the following generated output instead, which is much easier to read.
At runtime, these tests used Selenium to interact with the UI, which required a browser to run. We decided to go with the simplest solution first: Create a custom build agent with a browser installed on it.
In the future, PhantomJS could be used instead of a real browser. This would allow Selenium tests to run even on a hosted agent, and would open the door for advanced scenarios like running multiple PhantomJS instances in parallel, each executing a different subset of the functional tests. The blog post Getting Started with Selenium in a Continuous Integration Pipeline gives some hints on how to do that.
These tests needed to pass a .testsetting file to vstest.exe to run correctly. It turned out to be quite problematic because the VSTest task in Visual Studio Team Services didn’t seem to accept it.
We opened an issue on GitHub and quickly found the root of the issue as well as a workaround with the help of the Team Services product team. A fix for this bug should be included in the next sprint!
After some struggling, we got the result of the tests, shown in Figure 11.
Figure 11. Functional testing
While many tests failed, we were still pleased with the outcome. Most of the failures were due to incorrect configuration and not to an inherent problem with our approach. Everything should pass with a little tweaking.
Currently, a run will clock at around three hours, which is too long to be part of a CI pipeline, but it shouldn’t be too hard to improve with a bit of ingenuity. In the meantime, these tests can either be run on a schedule (every night, for example), or every X number of commits, which already will be a big improvement.
This hackfest provided the opportunity to involve different people from different teams and departments. A good example is when the test team shared their methodology regarding the system testing and explained why it takes three months of lead time to pass through it.
We identified some potential double work between the Back End and the Front End teams on the testing part. Both teams were using a kind of simulator for the cameras to test some scenarios, such as:
- Adding a camera to the system.
- Removing a camera.
- Streaming video from the camera.
This kind of scenario was executed by both teams at each release. To remediate this, we decided to deploy an integration environment with the latest release of each part—front end and back end—and run all of the tests automatically in this one. Each team had its own lifecycle with different artifacts at the end. Every night, Genetec triggered a deployment, grabbed the latest artifacts on both parts from each build process, deployed it in a common environment called “Integration,” and ran all the tests on it.
Figure 12. Integration environment
Thanks to this process, every morning each team and each developer can see their work from the previous day in one common environment. This is a good way to continuously integrate in production for Genetec.
Integrating DevOps practices into a several-years-old monolithic solution is never easy, but Genetec’s team really did a great job in the limited time we had. We were able to set up a skeleton for a full continuous delivery pipeline that will be built upon in the future.
Of course, implementing automation across the full pipeline and being able to continuously deliver value is a long journey, but the team had some great ideas for the future:
- Having a single Kanban board for the whole process, avoiding local optimizations.
- Splitting the monolith into several smaller, independent components that could each be released at a different rate.
- Finding the manual tests that are run every time and automating them, to reduce the lead time of the QA phase.
- Improving performance of integration and functional tests. Faster tests mean they can be run more often, with shorter feedback loops.
- Executing some tests together, optimizing each team’s time.
One week is a very short time to change processes and practices. But the most important lesson Genetec learned was about working as a team. They realized the importance and impact of cross-department communication:
- Avoiding local optimizations.
- Avoiding duplicated work (for example, dev and test engineers both doing the same set of tests).
- Having better knowledge of other departments’ processes, allowing anyone to pitch in if needed.
While having Microsoft on site was certainly a catalyst for such events, organizations that want to improve should definitely consider taking some time to organize something similar. In most companies, collaboration between departments is often kept to a minimum, each having separate objectives.
Hackfest-like events are a unique way to break these silos and plant the seed for a DevOps mindset, helping an organization to answer questions such as:
- How could we reduce our time to market?
- Could we automate some tasks to be more productive and reduce errors?
- Could we get actionable customer feedback earlier?
This kind of exercise is better known as a Kaizen event, which is part of the Kaizen movement (Continuous Improvement).