In this DevOps hackfest, Microsoft teamed up with Powel to hack a brand-new solution, SmartWater. We describe the process and the result here, including the following DevOps practices:

  • Infrastructure as Code
  • Continuous integration / Continuous delivery
  • Test, staging, and production environments

Additionally, the team expressed a need for getting hands-on experience with:

  • Test automation
  • Branching strategies
  • Security modeling

The hackfest took place a few weeks after a value stream mapping (VSM) session where we were able to identify some pain points and plan for an improved workflow. The result was an estimated 50% reduction in lead time as well as a much higher degree of percent complete and accurate.


SmartWater screenshot showing water leakage information


DevOps practices implemented

During the hackfest, we implemented the following practices:

  • Underwent a value stream mapping session to find areas of improvement.
  • Created a multi-team project with work flowing from one team to the next.
  • Used a microservice approach to deliver software updates in a “fast ring” style without downtime.
  • Defined acceptance criteria according to the ISO standards for software quality.
  • Used the SDL and the Microsoft Threat Modeling tool to assess security risks early.
  • Architected the code using Domain-Driven Design (DDD) principles.
  • Used a long-lived feature branching strategy, using Git as source control.
  • Adhered to the SOLID principles of object-oriented design.
  • Developed code using a Test-Driven Development (TDD) approach.
  • Set up an automated build and test pipeline for each component of the solution (CI).
  • Set up release management (CD+RM).
  • Used A/B testing based on an opt-in policy.
  • Expected to monitor running applications using Application Insights.
  • Used Infrastructure as Code.
  • Used Slack as a collaboration tool between departments.

We worked long and hard during each day of the hackfest. Everyone was high-spirited and excited about the work, and when Friday afternoon came, we had a working proof of concept (POC) tying in all of the API components with the Web App and output from the Machine Learning algorithm.

Core team

Company Role
Powel Project lead
  Domain expert, water
  Concept designer
  Developer / Security champ
  Two developers
  ML, IoT Ingestion
Microsoft Two technical evangelists

Customer profile

Powel spans Europe with a broad and sustainable customer base and a long history as a trusted supplier of software solutions for cities, municipalities, counties, the energy industry, as well as the contracting sector. It generates CA USD 50M in revenue and has 460 employees. The company has offices across Norway as well as in Sweden, Denmark, Switzerland, Chile, Turkey, and Poland.

Problem statement

Powel used a mix of older and newer tools and methods for source control, quality control, tracking work, and communication and deployment. Products include GIT, Team City, and Octopus Deploy.

The current software team has a lead time of just over 61 days on average, with a %C&A of 0.8% in their existing projects.

The organization as a whole is divided along traditional lines (design, operations, development, QA), and a significant challenge was to get this team to keep true to the CI – QA – PROD mindset, and to start thinking about deployment in the beginning of the project and not at the end. Unit-testing existed, but was unstructured and up to each developer. They had an awareness of best practices, but poor or missing guidelines about how to do it. Powel would like to consider release branching and feature toggling as well.

The challenges (areas of improvement) that the Powel team has been facing can be summed up as follows:

  • Transforming to support rapid delivery of new features to customers.
  • Formalizing test acceptance criteria.
  • Assessing risk on each feature delivered and establishing practices to document and counter them.

Focus of the hackfest

Powel is investing in a brand new, born-in-the-cloud solution for monitoring water in municipalities (Smart Water for Smart Cities). Undetected/underreported water leakage accounts for over 30% of water loss and is the single biggest factor in lost water. Using smart meters deployed in some municipalities (and other sources), Powel will monitor water through Azure’s IoT Hub and analyze usage data to detect and predict maintenance issues. Detecting water leaks and dynamically controlling water pressure throughout zones in cities are the end goals of the product.


Deeply focused team

The team hacking


Solution, steps, and delivery

Value stream mapping

The preparation for the DevOps hackfest involved assessing the current state of a similar project’s software development lifecycle, looking for areas of improvement, and focusing on the actual hack. During these phases, we narrowed down which areas to improve and what to deliver, and then proceeded to just do it!

The VSM session included the following people:

Company Role
Powel Project lead
  Domain expert, water
  Stand-in for Atle Vaaland on day 1
  (future) CTO of Powel
  Head of R&D
  Developer / Ops
  Operations lead
  Developer
Microsoft Technical evangelist

Current state value stream

In our initial VSM session, the hackfest team discussed the overall process from feature definition to design, the application development cycle, the technologies used, and the routines for handing off from one stage to the next.


VSM of current development process done by the team

VSM of current development process done by the team


Because it is a completely new solution to be defined, we did not have an MVP (Minimum Viable Product) yet defined, so we decided to focus on the team’s existing development process and discussed ways to improve that.


VSM, current state

VSM, Current State


  • Lead time vs processing time. The current value stream has a total completion lead time of just over 61 business days. Compared to the total processing time, this yielded a 30.1% efficiency rate, meaning lead time represents 70% of that time.

  • Percent complete and accurate. The “Percent complete and accurate” (or simply: %C&A) measures how many times a feature passes from one stage to the next without having to come back for review, fix, or other reason. The rolled value here indicates that few, if any of the features being delivered today require rework at some point in the flow (rolled %C&A 0.8%). This is not an exceptional result—many companies may have as much as 100% of their flow requiring rework at some stage—but we will be aiming to deliver a stronger number here in our future map.

Future state value stream

Having spent most of the time discussing and learning about Powel’s current software development process, we proceeded to design a “perfect” future state value stream.


Whiteboard image of future state


Because SmartWater is a greenfield project, the decision to scrap the current way of development in favor of a complete redesign addresses the central points of concern as shown on the following map.


Future state VSM

Future State VSM


Architecture

The SmartWater solution has a microservice architecture consisting of a web portal that calls in to various underlying APIs. The web portal, as well as each of the underlying APIs, are defined as epics, and detailed in features within each, the idea being that future products will be able to call into these APIs.


Overall solution architecture diagram


Team workflows

In the hackfest, we introduced Visual Studio Team Services (VSTS) as our overall Application Lifetime Management (ALM) tool, allowing each department to track changes to the product and structure their work into an effective flow between the teams. The teams defined in VSTS are:

  • Design Team - Design mocks and formulate acceptance criteria
  • Developer Team - Develop the feature, test automation
  • Ops Team - Deployment, infrastructure, and load tests
  • Analytics Team - Work with Machine Learning and advanced analytics

Each deliverable component in the solution is represented as an epic in VSTS. Epics live outside all of the teams in the SmartWater project. When the need for a feature within a component arises, that feature is added on a base level, and then immediately assigned to the design team. A feature flows from one team to the next until it is put into general release.

Epic level workflow

We defined the following workflow for the components at the epic level. Because each epic represents a deliverable component, the epic level items represent these component’s lifetime.

Epics lifecycle flow


The following features are under epics:

  • Feature definition. The features titles are written in a style similar to Behavior Driven Development (BDD) (Dan North is considered by many as the father of BDD). The idea is that a feature title should reflect the final outcome of said feature, for example, “The SettingsApi delivers the user’s setting for Fast Ring deployment.”

    This ubiquitous style gives everyone a very clear understanding of the feature just by looking at the title. After the feature is pushed into production, a simple query in VSTS delivers a list of features that are in production. This way everyone sees what the released product can now do.


    SettingsApi Epic has its own release, in which all features changed are shown

    VSTS Showing features released


  • Feature creation. Features are created in a brainstorming session with marketing, designers, developers, and operations. The brainstorming session happens at epic level. Once defined, the features that require the design team to do initial work are “transferred” to the design team for acceptance criteria definition and production of design mocks. If it is a new feature for an API, it bypasses the designers, and is submitted directly to the developer team.

Design workflow

The design team carries their features across the following workflow.

Design team workflow


Each feature needs an interaction design before a graphical design can be applied to it. After these two design stages are done, the feature should be complete enough for defining its acceptance criteria to answer the question: “When is this feature regarded as complete?”

The acceptance criteria should include the ISO standards IEC 9126 / IEC 25010:2011 for reference, among them:

  • Functional suitability/completeness
  • Reliability
  • Usability
  • Compatibility
  • Performance efficiency
  • Maintainability
  • Security (done by the developer team)

After the design and acceptance criteria have been defined, the feature is placed in “Review” status, with the idea being that senior leadership reviews the work done by designers and gives them the final “go.” This will not be a very formal process initially, but marks good practice.

Developer workflow

The following flow helps management understand how the features are progressing.

Developer Workflow


  • Threat modeling. Before the team starts to develop a feature, they create a threat model so that any security issues can be added to the final acceptance criteria list. They do so by using the Microsoft SDL; among them we demonstrated the Threat Modeling tool.

  • Planning session and user story creation. Before developers start working on a feature, they have a planning session where user stories are appended to the feature. These user stories are then broken into individual tasks that are easier to estimate. This is a built-in feature of VSTS.

Operations workflow

The operations workflow is simply Next Active Resolved (or Dismissed), as most of the tasks involved in operations from a feature development are short-lived. Operations also has the responsibility of monitoring a feature in production environments, but that kind of operations task does not have a development approach and will be discussed later in this report.

Analysis workflow

Like operations, the features that the analysis team develop are not as detailed as the steps in design and development, and are kept to the simple form of Next Active Resolved (alternatively Dismissed). A feature of an API may require work done by the analysis team (such as adding an API to the Machine Learning model) and thus, may be assigned to this team when required.

Collaborating with Slack


We decided to use the collaboration tool Slacktm as a tool to prepare for the hackfest execution. With this, Powel had a project-specific discussion team that was divided into the following topics:

Topic Purpose
General General discussion
DevOps For discussions around code and operations
Dataweek Discussions around what kinds of data to use in the solution, and how to get them
MachineLearning Discussions around Azure Machine Learning topics
VSTS Automated hook into VSTS for showing failed and passed builds

These chat-rooms were, and still are, actively used by the project team members and Microsoft, as they provide an easy, informal way of discussing the project in detail. It is particularly interesting to see how all the project stakeholders can see when builds pass/fail, a seemingly simple thing that brings the stakeholders that much closer to the overall flow.

Hosting model

Currently, the architecture is delivered as a set of individually hosted Web Apps in Azure App Service, but is flexible enough to be built into Azure Service Fabric at a later stage if cost becomes an issue. Each web API can be thought of as a component in Service Fabric and deployed separately. Today, the team delivers each web API to a hosted Web App in Azure.

Domain-Driven Design structure

We organized the code in a Domain-Driven Design (DDD) architecture, focusing on the entities in the SmartWater solution, and the objects that operate upon them. Thus, each component is built up as a collection of class libraries that represent the different layers of execution: presentation, business, data, as well as the domain that is common to all. This made a clean separation of concerns and organized the projects cleanly, allowing different developers to do work in separate levels of the same feature.


SettingsApi organized in a DDD structure

SettingsApi organized in a DDD structure


SOLID principles of object-oriented design

To provide a satisfying degree of test automation, we adopted the SOLID principles of object-oriented design. This was mainly done to produce code that can be easily tested in isolation.

This report doesn’t go into the details of the SOLID principles, as these are regarded as the very fundamental principles for doing test-driven development. The SOLID principles that we stressed over and over again during the hackfest were:


    public class TenantManager : ITenantManager
    {
        private readonly IExceptionHandler _exceptionHandler;
        private readonly ITenantReader     _tenantReader;
        private readonly ITenantValidator  _tenantValidator;

        public TenantManager(ITenantValidator tenantValidator, ITenantReader tenantReader, IExceptionHandler exceptionHandler)
        {
            _tenantValidator  = tenantValidator;
            _tenantReader     = tenantReader;
            _exceptionHandler = exceptionHandler;
        }


        public async Task<Tenant> GetByIdAsync(string tenantId)
        {
            if (!_tenantValidator.Validate(tenantId).Passed)
                return null;

            return await _exceptionHandler.GetAsync(() => _tenantReader.GetByIdAsync(tenantId));
        }
    }


Code: The Single Responsibility Principle dictates that each class must only have a single reason to change. In the previous example, the sole purpose of a tenantManager is to orchestrate the process of retrieving a Tenant object. In the example, this means validating the parameter and doing a central exception handling around the call to the object that handles the IO operation. Validation, exception handling, and reading are NOT part of its responsibilities, so these are injected using the Dependency Inversion Principle (DIP); thus, the only reason for doing a modification to this code is if the process itself requires a change (that is, by introducing a log step).

Source control

We selected a “long-lived feature branch” source control strategy. Each product in the architecture is developed and maintained in a separate source control branch.

Each branch triggers its own build/release automation cycle into a CI runtime environment where the product is immediately available for testing. After a feature is deemed okay for general release, it is merged onto the master branch, but the branch itself lives on for further development.


Conceptual overview of the branches in the solution

Conceptual overview of the branches in the solution


Note: This approach has issues. Merging common code with each of the APIs basically merges ALL the code from the master branch. We’re still searching for the optimal solution here, and have found one suggestion so far:

Move all common code into a versioned NuGet Library, and have each of the current branches become their own repositories.

This is a trivial change to the existing application structure, and is probably already implemented by the time this report is released.


Writing better commit messages

We adopted a strategy for writing commit messages that aims to complete the following sentence with each commit:

If this commit is applied, it will….<commit message describing the intent of the commit>

This made it easy to see the value-increase of each commit:

8b8d955 Add a leakage zone polygon mapper
be9d70f Merge branch 'master'
f39ab34 Finish test coverage for ExceptionHandler
910e3c3 Fix errors in spatial polygon and dates in documentDb queries
c4b445b Limit the query to docDb to 100 items for performance
e49a077 Update the API to wrap the results in DataResult
a1c1877 Rename IsValid() to Validate()
caba32d Add tenantVerifier and dateVerifier


Why: The team was used to writing commit messages such as Added XX or Adjusted YY, which is already apparent in the code changes and did not bring any value to a person doing QA. With this approach, it is now plain to everyone what the value added is. They can read (bottom up) how the product grows.

Test automation

We built the applications using a TDD approach, writing unit tests for each bit of production code. This was done to solidify the product and give an early alert on breaking changes.

The unit tests are written in the style of Roy Osherove as suggested in his book The Art of Unit Testing.

A base-class for unit testing make dependencies of the class under tested automatically mocked, so that the hassle of unit tests goes away. The base class uses the NuGet package StructureMap.AutoMocking.Moq to achieve this:


public class ExceptionHandlerTests : TestsFor<ExceptionHandler>
{
    [Fact]
    public void Get_FunctionIsNull_DoesNotInvokeLogger()
    {
        // Arrange            
        Func<int> nullFunction = null;

        // Act           
        Instance.Get(nullFunction);

        // Assert
        GetMockFor<ILogger>().Verify(o => o.LogExceptionAsync(It.IsAny<Exception>(), It.IsAny<string>()), Times.Never());
    }
}


Test automation ensures the logic of the application is always verified during builds. Another nice NuGet package named Should gave us a natural language form of writing assertions that everyone instantly adopted; using Should makes the assertions more readable:


    [Fact]
    public async Task GetAsync_ValidFunction_ReturnsExpectedResult()
    {
        // Arrange
        Func<Task<int>> simpleFunction = () => Task.FromResult<int>(1313);

        // Act           
        var results = await Instance.GetAsync(simpleFunction);

        // Assert
        results.ShouldEqual(1313);
    }


Why test-driven development matters

Developing code using TDD gave us the benefit of not needing to start up a full-running system to try out a local change. A short unit test will immediately confirm a theory and uncover breaking changes in other areas of the code.


Test-driven approach gives early warnings of breaking changes and helps see which libraries are covered by tests.

Test-driven approach gives early warnings


Even though code-coverage doesn’t uncover any quality issues, it was used in the SmartWater project as an indicator of potentially problematic code. Libraries dealing with IO have a very limited set of unit tests.

Note: Load testing will be implemented after the team has sorted out an Active Directory tenant to use in the CI environment.


Continuous integration

Having all the unit tests run during a build stage in VSTS means that we can stop the code from going into production if a unit test fails. Unit tests are run continuously on each commit.


Build report from VSTS

Build report from VSTS


Fast ring/production deployment

We decided to give the tenants the option to receive updates to the program in a similar fashion to that of Windows 10. By providing the customer with a “Fast” update option, they will, through their own configuration, be able to work on a newer, “unreleased” version of the product. This gives us the ability to gather feedback and metrics on that feature before deciding to release generally.

This was achieved by a simple configuration approach:

  1. Each API is versioned. For example, the API for retrieving infrastructure data has the following address: https:<baseUrl>v1/LeakageZones/{tenantId}

  2. A new version of the same API containing a change would then appear as: https:<baseUrl>v2/LeakageZones/{tenantId}

  3. The API clients are configured with a value for BaseUrl, ApiVersion, and FastVersion, and based upon the preference of the tenant, they then read out the FastVersion to use the newest feature, or ApiVersion if they haven’t opted in for early releases.

  4. After a feature is ready for general release, the ApiVersion and FastVersion both point to the same values.

Conclusion

After the hackfest, Powel is now able to plan, design, develop, and release a new feature within days. Minor changes and fixes can be put into production in minutes!

We estimate an average feature to have an average lead time of 34 days or fewer, which is a significant improvement to the %C&A. It is twice the current team speed, and we expect the lead time to drop even further as everyone starts getting into “the flow.”

With VSTS, each project stakeholder can now follow a feature as it flows through each of the teams and into production, gaining insight into their development process and adjusting as they go. With the VSTS build and release system hooked into Slack, everyone will see a notification when something new is released.

Adding acceptance criteria and considering security before a feature is developed add up to a workflow with a much higher %C&A, because developers can see the requirements and meet them at once.

For all of this to happen, the developers adopted modern styles of software development, forming the basis of a healthy codebase that is easy to maintain and easy to understand for new developers joining the team. The SOLID form, combined with DDD and TDD, sets the stage for a SmartWater solution that can grow in complexity but never drop in its simplicity.

Opportunities going forward

As in most hackfests, we did not have the time to do everything that we set out to do, so this section lists the items that the rest of the team committed to following up on.

Ensuring all features have their acceptance criteria defined

During the hackfest, we did not have time to detail these, but this is marked for follow-up soon after.

Implementing the release strategy

Even though we landed a strategy for a fast ring/slow ring release, we only managed to complete the configuration part, and still need to design a page for the user to actually set this value. Some work remains on this, but nothing too complicated.

Monitoring in production with Application Insights

This point was spoken of, but we didn’t have time to address it during the hackfest. We are planning a Skype session to go through Application Insights and deliver on the promise of getting vital production data from the system in production. The general idea is to use Application Insights to track performance and custom events, and to detect unforeseen problems.

Planning and executing tests

Another follow-up is the use of test management capabilities in VSTS to plan and execute tests. The acceptance criteria made by the design team will be created as test plans in VSTS that are executed after the feature has been delivered. These test plans attach easily to the feature description in VSTS. The test plans cover both functional tests as well as non-functional tests, such as load testing and usability testing.

Source code

Full vertical example of the SmartWater.SettingsApi: contains many of the principles discussed in this article; access keys and other secrets have been masked out, but other than that, the code is complete