Microsoft and 24COMS teamed up to optimize and accelerate 24COMS’ overall delivery pipeline and capacity to provision new customers to their platform. They started their DevOps journey with a value stream mapping workshop to discover areas of improvement and to narrow the focus for the hackfest that followed. This report describes those events and the results.

The hack team

The team for this engagement consisted of the following key profiles from 24COMS and Microsoft.

24COMS:

  • Leo Winder – Owner/CTO
  • Michael Vlaar – Development Engineer
  • Leone Keijzer – Development Engineer
  • Ronald Koster – Development Engineer

Microsoft:

Customer profile

24COMS empowers enterprises, small office–home office (SOHO) customers, and end users to self-organize the dynamics of their personal and business communications and information sharing. They focus on creating software solutions for digital social communication, geofencing, phone as a sensor, location-based innovations, and more, both for consumer and enterprise customers. They provide new possibilities in modern communications by adding a geotracking dimension to conventional communication mechanisms, information exchange, and alerting.

Our journey started with a meeting between 24COMS and Microsoft in which we talked about how to practice more agile software development by applying DevOps practices. During this meeting, we narrowed the options to executable and doable steps that would be inline and helpful to their DevOps journey. We decided to deliver a value stream map (VSM) workshop followed by a DevOps hackfest.

Problem statement

Before starting the value stream map process, the teams agreed on what the focus of the workshop should be regarding the solution and its components. The 24COMS team wanted to focus on their recently refactored back-end service called Follow me. They had turned this formerly big, monolithic application into a microservice-based application.

They designed this solution as an Azure Service Fabric implementation and already were running their production environment on this new architecture. The focus of the conversation with 24COMS was on the operations pipeline. We discovered that most of the impact of the hackfest could be on this back-end service.

Solution, steps, and delivery

The one-day value stream map workshop took place in August 2016 followed by a three-day hackfest in September 2016.

The VSM workshop

In the VSM workshop, we captured the high-level conceptual architecture of the solution so everyone could understand what it would look like and what the characteristics of a microservice implementation are.

solution architecture


Originally, the architecture was based on Cloud Services and a big dependency on Azure Service Bus. The performance limitations of this approach were severe and we had to find an alternative to improve scalability and ease of deployment.

With the original approach, the performance limit was 250 msg/s. That meant they had to scale on a per Service Bus namespace basis, which would have caused a lot of management overhead and even would max out the subscription quota for the Service Bus namespaces because their throughput requirements are way beyond what is offered currently with this service. They target servicing up to 100,000 devices, which can potentially submit messages frequently depending on circumstances (alert situations, big events, and so on).

During earlier conversations with 24COMS, Service Fabric was mentioned because it could offer the opportunity to build a solution on top of a framework that had reliability, scalability, and upgrade flexibility built in from the start.

The result of the migration to microservices was four different services:

  • Client Service
  • Tracker Service
  • Geofence Service
  • Notification Service

The clients talking to these services are specialized tracking devices or smart phones running iOS, Android, or Windows. Transport security and message level security are based on encryption and access tokens. All services are running within a Service Fabric cluster with a couple of dependencies on external services.

After drawing up the solution, the discussion continued on what areas of improvement were the most important for 24COMS’ business.

areas of improvement


The value stream mapping workshop started with their CTO and development lead. We determined all the steps involved in their existing life-cycle management procedure. This gave us insights into the effort it took the company to get from a great idea to a successful deployment in production and how such a deployment would evolve over time as a result of measuring customer usage and feedback.

The outcome of the investigation was the six core steps they employ in their development practices. These would become the subject of investigation for optimizing and automating the overall process.

value stream mapping


Based on the VSM, we determined that the processes of development and release showed the most potential for improvement. Basically, 24COMS was facing the challenge of a startup that needs to develop at a high velocity while at the same time handle a lot of changes to existing parts of the system as well as parts that were under construction. It was clear to the team that, if the aim was to have a more predictable, manageable, and sustainable delivery process, most of the manual work needed to be automated.

It should be noted that this process by itself was very fruitful for the team. They took the opportunity to look back and get a good grip on what they wanted and needed to change.

overall view


The workshop resulted in a set of goals for the rest of the engagement:

  • Implement continuous integration and continuous deployment (CI/CD)
    • Implement automated builds of development output.
  • Implement infrastructure as code (IaC) and configuration management
    • Declare environments via Azure Resource Manager to achieve a better level of automatization and predictability. In addition, apply IaC and release management practices.
  • Implement release management (RM)
    • Moving and deploying the application to different environments during its lifecycle.
  • Innovation of development process and people skill development
    • Leverage innovation and accelerate future feature development by planning and scheduling development of new skills, practices, and tools as a standard way of working and planning.
    • Create smaller deployment batches, drive development toward continuous innovation and, as a result, let their business be more agile.

The hackfest

The three-day hackfest took place at the Microsoft office in Schiphol-Rijk in the Netherlands. We held a couple of preparation calls prior to the event so everybody could start well informed and ready. We translated the goals set during the VSM workshop into concrete activities for the hackfest.

High-level goals:

  1. Create a secure Azure Service Fabric cluster.
    • Certificate to secure the cluster.
    • Azure Key Vault holding the certificate.
    • Resource Manager template to set up the full cluster (IaC).
  2. Connect Visual Studio Team Services and Azure Service Fabric.
    • New Team Services project.
    • Push code into the repositories.
    • Create build definitions for multiple projects (CI).
    • Create release definitions for multiple environments (CD/RM).
  3. Build, ship, and test.
    • Apply software changes and test the overall delivery pipeline.
    • Check the versioning of the application and how it gets upgraded in Service Fabric.
    • Test the platform including load testing using Team Services web test and virtual user resources.

The rest of this report describes the journey and learnings we encountered executing these tasks.

Creating secure Azure Service Fabric

According to the plan, the first step was to create a Resource Manager template to provision a secure Azure Service Fabric cluster. At that point, 24COMS was provisioning their clusters for test, staging, and production manually using the Azure portal. To be more agile and in line with DevOps practices, the template would be used as the base for the cluster that would be used during the hackfest.

Our starting point was the Azure quickstart templates on GitHub, where we found an adequate template that was tuned to their specific needs. The template used was the 5 Node Secure Service Fabric Cluster with WAD enabled.

Copying and pasting Azure Resource Manager templates from GitHub can result in invalid JSON files. To prevent this, any advanced text editor could be used to make sure no hidden characters are present in the file. We used Visual Studio and Visual Studio Code at this hackfest when creating IaC.

Here is a snippet from the Resource Manager template we used to provision the Service Fabric Cluster. You can get the full file here: https://gist.github.com/valeryjacobs/ba65d7a015ab691bdc6479ececd33e4a#file-servicefabriccluster-json.

  {
      "apiVersion": "2015-06-15",
      "type": "Microsoft.Compute/virtualMachineScaleSets",
      "name": "[variables('vmNodeType0Name')]",
      "location": "[resourceGroup().location]",
      "dependsOn": [
        "[variables('vmStorageAccountName')]",
        "[variables('virtualNetworkName')]",
        "[variables('supportLogStorageAccountName')]",
        "[variables('applicationDiagnosticsStorageAccountName')]"
      ],
      "tags": {
        "resourceType": "Service Fabric",
        "clusterName": "[variables('clusterName')]",
       "displayName":  "Cluster scale set"
      },
      "properties": {
        "upgradePolicy": {
          "mode": "Automatic"
        },


Before deploying a Resource Manager template, check the core quota of the subscription. If not enough cores are available, the error message notifying this might not be instantly visible and this could lead to confusion.

To set up the security assets, we used a PowerShell module on GitHub that helped with securing and populating an Azure Key Vault to secure communication with the Azure Service Fabric management endpoints: https://github.com/ChackDan/Service-Fabric#microsoft-azure-service-fabric-helper-powershell-module

Most of the template was straightforward with one notable addition for linking the cluster to the certificate in the Azure Key Vault.

We discussed the options for the certificate to be used for securing the cluster. Since they need to hook multiple domain names for white-labeling their service to customers, a wild card certificate would have made sense, but 24COMS preferred not to use this type of certificate due to speculated security degradation involved. In the end, 24COMS opted for multidomain certificates by adding the CN names needed. A 2K key was used instead of a 4K key because some tracking device resources are too limited to handle these large keys.

Here is a snippet from the Resource Manager parameters file indicating the setup of the Key Vault-based certificate:

      ...
    {
      "certificateThumbprint": 
	      {
			"value": "99912345o458045827c723047234"
	      },
      "sourceVaultvalue":
	      {
	      		"value": "/subscriptions/<Sub ID>/resourceGroups/<Resource group name>/providers/Microsoft.KeyVault/vaults/<vault name>"
	      },
      "certificateUrlvalue": 
	      {
	      		"value": "https://<name of the vault>.vault.azure.net:443/secrets/<exact location>"
	     },
	...
    


A more detailed description of these steps is published on GitHub: https://github.com/djzeka/VSTS-AzureServiceFabric/blob/master/docs/Creating%20secure%20Azure%20Service%20Fabric%20Cluster.md

An auto-provisioned secure Service Fabric cluster

secure service fabric cluster


This concluded the first step where we had a complete template to be able to automatically generate clusters for specific environments.

Connecting Visual Studio Team Services and Azure Service Fabric

Next was the step to add a Visual Studio Team Services project that would hold all the sources, scripts, and solution assets. The Resource Manager templates used were also added to the solution as part of the Infrastructure as Code initiative. 24COMS had separate repositories for the back-end and front-end applications. We combined them within the same Team Services project to get a single unit of deployment and life-cycle management.

We took the opportunity to restructure and clean up the solution to include only the necessary code base for the services. This should speed up the build time because all projects in the solution will be built during the CI/CD steps.

To be able to use the solution for the latest cluster version, it needs to be updated to the latest SDK. The version can be looked up in the .sfproj file under the ‘ProjectVersion’ element. Visual Studio is capable of updating the project after the latest SDK is installed. Potential build errors can be fixed by updating the Service Fabric NuGet packages in the solution.

Not all 24COMS team members were using Windows as their workplace operating system and we wanted to include the whole team to get them familiar with Microsoft’s tooling. We opted to use Azure Dev Test labs to provide all team members with a preconfigured workspace that included Visual Studio 2015 and common tools used by the team. They only had to select the ‘formulas’ they needed and 15 minutes later they were all up and running. These machines could be scheduled to turn off outside of office hours to save costs and resources. Visual Studio Code was used on multiple machines as the default text/code editor.

The Resource Manager template was added to the solution as well to get an integrated repository that included IaC. This way the full history of every topology change and the option to revert incorrect changes would be available.

Creating build definitions

Back-end services:

The next task was to set up a build definition for the microservice-based back end. Luckily, a template was already available in Visual Studio Team Services, so only the configuration had to be changed. The ‘Update Service Fabric App Versions’ step updates the application and service versions for all projects that have been updated. This way an automated build and deploy will only touch whatever needs to be updated.

To be able to filter out services that don’t have any code changes, the solution projects have to be enabled for \ builds. This can be achieved by updating the project files to include an element named Deterministic within the Propertygroup element and giving it a value of True. This makes sure the build output is binary identical so it doesn’t get picked up for an automated version update.

Testing the build resulted in an error message. This turned out to be the result of a misconfigured build definition in the solution file, caused by previous changes to the solution. All projects need to output x64 binaries or else there is a mismatch with the SDK’s assemblies, which are built only for 64-bit environments.

iOS client application:

Next up was setting up the CI/CD for the iOS mobile application. An extra step had to be added for a successfully automated build to manage the dependencies of the Objective-C Cacoa project (UI framework for iOS). The CacoaPod pod install command arranges the package management and initialization step. Luckily, this build task was available in the Task Catalog in Visual Studio Team Services.

For those cases where a required build task is not present in the Task Explorer, you can check the Visual Studio Marketplace for additional extensions that can be installed into the catalog.

The build definition for the iOS application

ios application


Apple doesn’t allow builds of iOS of OSX application to happen on platforms other than OSX. For this reason, builds cannot run in hosted mode. Luckily, a handy agent is available that 24COMS used to set up a local machine to execute the builds. Visual Studio Team Services will orchestrate the process according to the build definition. The agent retrieves all assets needed to do the build and the output is sent back for further release steps. Details on how to install and configure the agent can be found here: https://github.com/Microsoft/vsts-agent/blob/master/README.md

There is a paid alternative that can be considered called MacinCloud. This service hosts genuine Mac hardware and offers build agents as a service. More information can be found here: https://www.macincloud.com/pricing/build-agent-plans/vso-build-agent-plan

The 24COMS iOS app communicating with the ‘Follow me’ back end running in Azure Service Fabric

ios app communicating


Android application:

Last up was the Android version of the 24COMS mobile application. Key points here were the addition of a Gradle step that triggers the build process and .PKG creation process. Gradle needs a build definition file in order to build.

apply plugin: 'com.android.library'

android {
    compileSdkVersion 23
    buildToolsVersion "23.0.2"
    useLibrary 'org.apache.http.legacy'

    defaultConfig {
        minSdkVersion 16
        targetSdkVersion 23
    }

    buildTypes {
        release {
            minifyEnabled false
            proguardFiles getDefaultProguardFile('proguard-android.txt'), 'proguard-rules.txt'
        }
    }
}

dependencies {
    compile project(':httpHelper')
    compile project(':irisandroid')
    compile project(':geosocket')
    compile 'com.commit451:PhotoView:1.2.5'
    compile 'com.facebook.android:facebook-android-sdk:4.7.0'
    compile 'com.google.android.gms:play-services:9.4.0'
    compile 'com.android.support:cardview-v7:23.0.+'
    compile 'com.android.support:recyclerview-v7:23.0.+'
    compile 'com.android.support:design:23.3.+'
    compile 'com.google.maps.android:android-maps-utils:0.4+'
    compile 'me.dm7.barcodescanner:zbar:1.8.4'

}

apply plugin: 'com.google.gms.google-services'


After updating this file, the builds continued to fail. This was because the hosted build agent did not have the correct agent version installed to support the Android project version (1.98.1). The alternative was to use the local build agent that already was set up for the iOS build. Custom build agents (local or cloud-hosted) offer more flexibility in preinstalling software for builds. Agent versions for multiple platforms can be found here: https://github.com/Microsoft/vsts-agent

The build definition for the Android client application

android build definition

Creating release definitions

The template offered by Visual Studio Team Services for a Service Fabric release definition contains a single task that publishes an application to the cluster based on the selected publishing profile. This publishing profile was matched with the corresponding environment the build definition was set up for.

In preparation for this step, a service endpoint item was created linking the deployment to the secured cluster management endpoints using the certificate in the Azure Key Vault.

Release definition for updated services, relying on Azure Service Fabric to manage service life-cycle and versioning

release definition


When adding new Azure Service Fabric “Service” at Team Services, make sure to add HTTPS to the URL for your Azure Service Fabric cluster followed by the port to avoid errors and confusion. For example, https://qwerty.westeurope.cloudapp.azure.com:19000/.

add https


Back-end testing and discovery of external clients:

While part of the team was working on iOS, Android, and GPS trackers and building and pushing new app versions and back-end services, we also did a live test and checked whether data flow from front ends was working.

We were using the dark websocket terminal tool (Chrome extension) to connect to the back end and pick up the “signals” from front-end devices. The image below shows what was captured—all three expected devices were successfully sending GPS data to a newly provisioned test cluster.

Using a socket tool to check incoming signals

socket tool

With this we reached the milestone where a newly updated and provisioned system was working as expected.

Beta testing:

24COMS was using Apple’s TestFlight for beta testing but they realized they had to implement and manage multiple tools for the various platforms they support.

TestFlight beta/test app releases, iOS only

testflight


To include beta releasing as part of the on-ramp to the DevOps practice feature flagging, HockeyApp was introduced into the flow. The full migration to HockeyApp was out of scope for the three-day hackfest, but for 24COMS, HockeyApp looked like a promising option and an improvement that would simplify their DevOps practice even further.

HockeyApp is a solution to introduce mobile app distribution, crash reporting, usage metrics, and user feedback in an app development pipeline. More details can be found at HockeyApp.net.

HockeyApp offering multiplatform app beta testing and monitoring

hockeyapp


Load testing:

Load testing often is an important step in making sure a scalable system actually has the characteristics we like to see. Although not part of the core priorities for the hackfest (due to the time investments needed to set this up), we did discuss a recommended approach. Visual Studio Team Services offers cloud-based load testing ranging from a simple HTTP call that can be configured and run multiple times, to recorded and coded tests that can simulate large numbers of clients and devices.

During the engagement, we shared samples for writing a code-based load test that can hit the Service Fabric cluster endpoints from a large pool of virtual users. With a simple click of a button in the Team Services portal, a large-scale load test can be triggered and the results are instantly available online. Using Application Insights to monitor the environment under test makes a lot of sense, and Service Fabric has guidance on how to set it up (see the Additional resources section).

Conclusion

At the end of the hackfest, 24COMS had the core DevOps practices embedded in their company DNA. Changes in their market that lead to new insights at the level of 24COMS’ management now can be turned into new features quicker than ever, due to the following improvements:

  • Code changes will lead to automatically updated and validated test, staging, and eventually production environments.

    code changes


  • The build steps for mobile applications are no longer isolated but will be part of a consistent and highly automated process. Eventually the release for both the back end and mobile apps could be consolidated into a single process.
  • Deployment management can be done centrally using a single orchestrator.
  • Updates to back-end services can be rolled out without downtime. Service Fabric handles the deployment and message/request routing.
  • At the application level, a resilient runtime monitors services and acts on failures automatically, often by fixing them without human intervention.
  • Resources running in the cloud are used effectively. By running multiple services in a coherent cluster, each paid-for CPU can be used to capacity.
  • Azure Service Fabric allows us to push, mitigate, and deliver new code and bug fixes fast.

    Pushing a bug fix with the ease and speed of Azure Service Fabric

    pushing bug fix

The last and perhaps most important learning: Just one day after we finished our hack with 24COMS, they informed us about improvements they had made—they have applied, learned, and utilized full CI/CD/RM for their production and it looks like this:

Automated updates from Team Services to Service Fabric

automated updates

Appreciation

It is exciting to see how quickly changes can be made if we tune ourselves in to the discovery and hacking mindset. Many thanks to the 24COMS team for picking up the challenge—we appreciated your partnership and your will to drive this hackfest to a successful conclusion.

Additional resources

Resource Manager, Key Vault, and Azure Service Fabric:

Azure Service Fabric HOLs and additional material:

HockeyApp, Application Insights:

Build agents, Xcode setup: