Transnet Research Locomotive Optimization Engagement – DevOps and IoT

Solution overview

The Transnet team is composed of a diverse group of researchers and engineers faced with solving some of the organization’s most complex problems. The team’s main challenges are optimizing the execution of multiple large-scale projects with many manual processes, and difficulties between development and operations that result in long lead times in their environment. The team is additionally tasked with finding a solution on how best to use Internet of Things (IoT) technologies to perform real-time monitoring and fleet maintenance tasks.

This engagement was executed as an example of DevOps structure while delivering the IoT-based solution. A value stream mapping exercise detailed the existing developer operations process and identified gaps. The gaps were prioritized and various solutions were selected to be addressed.

Value stream mapping

The IoT portion of the project focused on real-time monitoring of train status and predictive maintenance.

Technologies used

  • Visual Studio Team Services
    • Scrum methodology
    • Continuous integration
      • Applications
      • Device firmware
      • Automatic device update
    • On-premises build support
    • Automated testing
  • NuGet Package Management Extension for Visual Studio Team Services
  • Visual Studio 2015
  • Azure IoT Hub
  • Azure Stream Analytics
  • Azure Machine Learning
  • Azure Blob storage
  • Azure Functions
  • Power BI
  • ARMKeil µVision® IDE
  • Python

The team

CompanyNameRole
MicrosoftDavid RussellPrincipal Technical Evangelist, Scrum Master, Developer
Brent SamodienPrincipal Technical Evangelist, Developer
Suliman Noor MohamedAudience Evangelism Manager
TransnetMoses MatsepaneProject Sponsor, Product Owner
Tielman MeyerDeveloper
Chetan VallaDeveloper
Fanie van ZylDeveloper
Jed Krishnasana Developer
Kgabo SephesuDeveloper
Philip BothaDeveloper
Sven HoltDeveloper

The Transnet development team

Customer profile

Transnet is the largest and most crucial part of the freight logistics chain that delivers goods to each and every South African. Every day Transnet delivers thousands of tons of goods around South Africa, through its pipelines and both to and from its ports. It moves that cargo onto ships for export while it unloads goods from overseas.

Transnet operates an integrated freight transport company, formed around a core of five operating divisions that complement each other. These are supported by a number of companywide specialist functions such as Transnet Projects that underpin the group as a whole. Transnet has just successfully completed a four-point turnaround strategy and just embarked on a four-point growth strategy.

Just prior to adopting its growth strategy, it embarked on a rebranding exercise. As part of that growth strategy, Transnet is investing $7.6 billion on revitalizing and extending its infrastructure. These plans include widening and deepening ports, building a new pipeline, and buying hundreds of new locomotives. In addition, the research and development team is working on building a locomotive system that facilitates easy monitoring of new or existing locomotives and reducing turnaround times for maintenance.

Transnet aims to innovate fast and deliver quality solutions and this is why the need for effective DevOps and effective use of IoT is of high importance. The introduction of DevOps practices will help reduce costs of delivering products, and creating new, scalable IoT solutions will help Transnet leverage the company assets more effectively.

Problem statement

The Transnet research team is a newly formed group of engineers tasked with building innovative solutions that target the various challenges confronting the organization. The team was faced with challenges in the operations space—the current lack of agility delayed the time to market significantly. Due to the nature of the research team, the need to deliver solutions in a rapid pace with limited or changing requirements is key to the team’s success.

The immediate challenge the team faced was that the process and methodology that the team worked in was ill-defined and not conducive to rapid releases. The team is building a remote monitoring solution in which it wanted to include predictive maintenance to reduce downtime due to failure.

This section will define the problems/challenges that the customer wants to address with the hackfest. It will describe how the problems have been identified and what the current situation is (including metrics such as the lead time). To help paint a clearer picture of the customer’s mindset at the start of the project, here are a few sample quotes from the customer team:

We feel that DevOps is something that is lacking and needs improvement in our department. We are not satisfied with the way we are currently running and are looking to improve the way we develop, deploy, and support our products.

Automated testing is limited to none at this stage.

All software projects ascribe to a form of Agile during the development phase with the TE R&D software design team. However, we are still in the infancy stage and still figuring out how to do Agile effectively

In my opinion, we are still figuring out how to do DevOps properly so the hindrance is mainly getting the DevOps defined

Solution, steps, and delivery

Developer operations

The hackfest began with the Microsoft team and the Transnet Research & Development team working together in an information gathering session. The session began with the Microsoft team presenting on Developer Operations and Agile, as well as the Azure IoT offerings available. The Transnet team members presented on the various components of their solution and what they work on.

We decided to work on the developer operations portion first because it would give a good grounding for the second component of the hackfest, which was the IoT project. We performed a value stream mapping (VSM) exercise in which we identified that it would take 111 days to go from a requirement surfacing to production. The team worked together to produce the value stream map as well as identify gaps in the process that could be addressed.

Value stream mapping

The largest cause of delay in the project is the amount of manual processes and email communications. The value stream map helped us identify the following points for which we could help reduce lead time:

  • Manual, heavy processes increase lead time dramatically, although many of these could be done using automation.
  • A large amount of manual emails is being sent to initiate next steps, leading to a total of 22 days’ lead time.
  • Build process is difficult and manual, contributing an additional 9 days.
  • Deployment to environments is challenging due to scheduling and locomotive locations.
  • Current process is based on RUP (Rational Unified Process), which was not conducive to fast turnaround times required by the team. The long requirements gathering process accounts for a large portion of the delivery cycle.

To help resolve some of these challenges, we focus on the following for the DevOps hackfest:

  • Methodology: Adjusting the team’s methodology to a more agile approach.
  • Infrastructure as code: Creating artifacts that allow for infrastructure to be duplicated and managed easier.
  • Automated testing: Introducing automated unit testing into the environment.
  • Continuous integration: Removing all manual build processes.
  • Continuous deployment: Removing all manual deployment processes.
  • Package management: Solving dependency issues in source code.

Introducing a new methodology (Scrum) to the team and demonstrating the various aspects of running a scrum sprint helped the team see the value in the process. The Scrum process was selected because the team wanted to have regular drops of code that were more frequent and more flexible going forward. Together we explored the process, the various responsibilities, and the tooling.

We introduced the morning stand-up and the Visual Studio Team Services Scrum Board and Backlog management, which was leveraged during the IoT solution build and demonstrated how backlog management works as well as the tasks and code linking to track work against code.

To help reduce lead time, we introduced automated build process and deployment strategies into the environment. The automated build process was aimed to take their existing branching strategy in GIT and to update the relevant environments with the most up-to-date code.

Branch activity mapping

Build definitions were created for the development branch, QA branch, and the master branch. The projects included in this engagement were:

  • The telemetry integration platform. This platform is a C# solution that was easily added to VSTS. An additional step of dropping the build to an Azure Blob store is included. This drop location is monitored and when a new version is found, the application updates itself.

  • The protocol library. This protocol library is leveraged across multiple teams and projects. Originally, it was used by including a DLL. This was changed to leverage the package management extension in Visual Studio Team Services.

  • The Device ARM Firmware. This is used to build the device firmware leveraging the Keil build tools that reside on a virtual machine hosted in Azure.

In addition to the build, the team introduced automated unit testing to the projects and began implementing the unit tests as part of the build process. The following sections describe the areas of DevOps that were improved.

Automated testing

The addition of automated unit testing to all their C# applications has helped reduce manual testing time significantly as well as reduce the number of structural errors passing through into QA. The testing was broken down into two areas that are executed at different phases of the project. We used C# unit tests to remain consistent with the rest of the project.

C++ Native unit tests have been looked at as something to implement in the future for the native components of the solution.

Automated unit tests are executed when merges take place on the development branch. This executes code analysis and unit tests. The code analysis helps ensure developers are sticking to the same rules and identifies some potential programming issues early in development. The addition of unit tests helps reduce time when it comes to regression testing and ensuring that scenarios have the expected results.

Manual tests are performed during quality assurance and are managed as part of testing in Visual Studio Team Service.

The team has increased code coverage by 4% and will continue to add more unit tests throughout the following sprints. Due to unit testing during our sprint, a number of errors were picked up during the development build phase and the developer was notified of the bug automatically. This helped reduce time in QA.

Continuous integration

Continuous integration assists in ensuring code is built throughout the process and build failures are quickly identified as well as execution of existing unit tests to ensure scenarios are tested and they pass.

One of the challenges of the implementation was a custom build component that needed to be added to the pipeline. The case in point was an on-premises virtual machine with the Keil build tools installed in order to build the ARM components.

In order to streamline the process, a virtual machine was built in Transnet’s Azure tenant. The image was built to match the on-premises solution as closely as possible. The Keil build tools were installed and now, as part of the queued build, PowerShell is used to start up the Keil virtual machine. Once the build is completed and extracted, then a PowerShell script is used to shut the virtual machine down again.

The Keil build tools download location is: http://az717401.vo.msecnd.net/eval/MDK522.EXE

This is the script to start the virtual machine:

#Set Some Variables
$rgName = "resource-group-name"
$vmName = "VM-name"
$subscriptionName = "SubscriptionName"
$Credential = "Credential"
# Login to Subscription
Add-AzureRmAccount -Credential $Credential
# Make sure that the Correct Subscription is selected
Select-AzureRmSubscription -SubscriptionName $subscriptionName
# Start the VM required without asking for confirmation
Start-AzureRmVM -Name $vmName -ResourceGroupName $rgName -Force
# Check VM Status to make sure it is running before proceeding with Build
Get-AzureRmVm | Get-AzureRmVm -Status | select ResourceGroupName, Name, @{n="Status"; e={$_.Statuses[1].DisplayStatus}} 

This is the script to stop the virtual machine after the build is complete:

#Set Some Variables
$rgName = "resource-group-name"
$vmName = "VM-name"
$subscriptionName = "SubscriptionName"
$Credential = "Credential"
# Login to Subscription
Add-AzureRmAccount -
# Make sure that the Correct Subscription is selected
Select-AzureRmSubscription -SubscriptionName $subscriptionName
# Stop the VM required without asking for confitrmation
Stop-AzureRmVM -Name $vmName -ResourceGroupName $rgName -Force 

This approach allows the build to quickly start or stop the virtual machine to perform builds. A challenge with this approach is maintaining the Keil build tools. At this point, this is a manual process the team will be required to maintain.

Infrastructure as code

Exposing their Azure resources as a template has allowed the team to create development and test environments easily and quickly. Azure Resource Manager has the amazing capability to create resource templates from existing infrastructure. This capability creates the ability for an Ops team to redeploy environments as the need surfaces. These templates can also be used to set up redundant infrastructure for disaster recovery purposes. Another scenario would be containing costs in their dev environment where they would only create the infrastructure when needed for testing and then tear it down once the tests are concluded.

Deployment to production is done manually by firing off a Visual Studio Team Services build definition using the master branch in Git for the Azure resource template.

There is a challenge in that some of the infrastructure cannot be generated from these templates yet, because the capability hasn’t been released. A case in point for the Transnet implementation was its Stream Analytics environment. At the time of the engagement, Stream Analytics configuration was not exported as part of a standard Azure Resource Manager export.

To remedy this limitation, the following script is used to create a custom .json Resource Manager template that can be used to recreate the existing Stream Analytics environment from a logged- in instance of PowerShell:

Login-AzureRmAccount

$Operation = "status"
$JobName = "CreateStreamAnalyticsEnvironment"
$ResourceGroupName = "[ResourceGroupName]"

$templateFile = "[TemplateFileDirectory]\$JobName.json"
if ( $DatasourcesFile -eq "" ) {
     $DatasourcesFile = "[TemplateFileDirectory]\$JobName-datasources.json"
}

write-output "Exporting Job to file $templateFile"

    $saj = Get-AzureRMStreamAnalyticsJob -ResourceGroupName $ResourceGroupName -Name $JobName
    $saj.PropertiesInJson > $templateFile

    # generate JSON structure that we can add to the datasource-parameters file
    write-output "Exporting Job Datasources to file $DatasourcesFile"
    $dsa = [System.Collections.ArrayList]@()
    foreach( $input in $saj.Properties.Inputs ) {
        ExportDataSource $input.Name $input.Properties.DataSource $dsa
    }
    # update output datasources
    foreach( $output in $saj.Properties.Outputs ) {
        ExportDataSource $output.Name $output.Properties.DataSource $dsa
    }
    $dsj = @{}
    $dsj["Name"] = $JobName
    $dsj["Datasources"] = $dsa
    $json = @{}
    $json.Add("Jobs", @($dsj))
    # dump it on the console so we can grab it from there
    $json | ConvertTo-Json -depth 5 > $DatasourcesFile 

The content below is extracted from the resulting .json template, which defines the Stream Analytics environment:

{
  "Id": "[ID]",
  "Location": "West Europe",
  "Name": "[JobName]",
  "Properties": {
    "CreatedDate": "2016-11-30T09:13:24.527Z",
    "DataLocale": "en-US",
    "Etag": "9dca13a1-038d-41d3-801f-db0bd2ee2df9",
    "EventsLateArrivalMaxDelayInSeconds": 5,
    "EventsOutOfOrderMaxDelayInSeconds": 0,
    "EventsOutOfOrderPolicy": "Adjust",
    "Functions": [],
    "Inputs": [],
    "JobId": "8e1b0b07-3d7b-430a-9021-2285577c5ffb",
    "JobState": "Created",
    "LastOutputEventTime": null,
    "Outputs": [],
   "OutputStartMode": null,
    "OutputStartTime": null,
    "ProvisioningState": "Succeeded",
    "Sku": {
      "Name": "Standard"
    },
    "Transformation": {
      "Name": "Transformation",
      "Properties": {
        "Etag": "d5681b7c-ade1-43a1-9470-9801c7508871",
        "Query": "SELECT\r\n    *\r\nINTO\r\n    [YourOutputAlias]\r\nFROM\r\n    [YourInputAlias]",
        "StreamingUnits": 1
      }
    }
  },
  "Tags": null,
  "Type": "Microsoft.StreamAnalytics/streamingjobs"
}

Finally, the following script is used to create the Stream Analytics environment from the generated .json template:

New-AzureRmStreamAnalyticsJob -File "[FileLocation]\CreateStreamAnalyticsEnvironment.json" -Force -verbose -ResourceGroupName "[ResourceGroupName]" 

Continuous deployment

Continuous deployment has allowed for faster deployment to the various environments. Prior to the introduction of continuous deployment, the deployments were done manually when developers were available to install updates in the environments and on devices. These asks had long lead times due to developer availability. Continuous deployment has eliminated the lead time involved, which could be 1 to 2 days per deployment. It is now automatically done in minutes using Visual Studio Team Services and its continuous integration feature of the build definitions.

The solution for production builds to the devices leveraged IoT Hub to notify devices via an Azure function. This Azure function monitored the drop location for new builds, sent messages to the devices to initiate an update, and provides the devices with the file location. On notification, the device would fetch and execute the update. IoT Hub twinning was leveraged to compare versions and manage updates. Devices update their metadata once the update has successfully completed. Twinning provided a query mechanism for a broadcast as well as provided a mechanism for maintaining which versions are installed on which device. The following diagram details this process.

Package management

The team has found that managing packages across projects has been difficult to coordinate. The introduction of Visual Studio Team Services Package Management through NuGet has helped resolve that issue as it becomes the single source of libraries. This component was also integrated with continuous integration and deployment, and when changes are made and merged onto the production branch, a new version is deployed to the package manager.

The final structure of the DevOps hackfest looked as follows:

Internet of Things (IoT)

Part of the hackfest was to assist the team in leveraging the IoT features of Azure, specifically:

  • IoT Hub
  • Stream Analytics
  • Power BI
  • Table Storage
  • Machine Learning

The solution consists of two separate devices that are involved. The first is a custom build device based on Windows Embedded that has been purposely built by the team running an x86 processor with 2 GB of memory. This device serves as a field gateway and provides additional capabilities dependent on what the locomotive has available. The device includes Wi-Fi and physical network connections, which are used to communicate with other sensors and telemetry devices on the trains. The device includes built-in GPS, accelerometer, voltage, and a number of other sensors.

Due to the sensitivity of the information being transmitted, security was important to ensure that information was not accessible by outside entities as well as to ensure that only approved devices can connect to the solution. IoT Hub provided a secure means of transmitting information over an SSL connection that only allowed approved devices. If a device is compromised, then access for that device can easily be revoked from the portal.

This device currently writes information to a file that is accessed by other systems outside of this solution. To integrate this device with IoT Hub, a simple agent was built to process file items and submit them to IoT Hub. A sample of the code to connect to IoT Hub is shown below:

/// Transmit Information to IoT Hub from Device
using (StreamReader sr = new StreamReader(LogPath+ "\\DateTimeLastFileTX.txt")) { LastTxDate = sr.ReadLine(); };
if (LastTxDate != null)
{
    txDateTime = DateTime.Parse(LastTxDate);
    txDateTime = IoTHub.CopyNewFilesForTx(LogPath, TempPath, txDateTime);
}
else
{
    txDateTime = IoTHub.CopyNewFilesForTx(LogPath, TempPath);
}

LastTxDate = string.Format("{0:yyyy_MM_dd_hh_mm_ss_tt}", txDateTime);
FileInfo[] info = IoTHub.ReturnfileInfo(TempPath);
if (info.Length>0)
{
    string filename = IoTHub.ZipFiles(TempPath, LastTxDate);
    bool result = IoTHub.SendFileToBlob(filename);
    if (result == true)
        System.IO.File.WriteAllText(@LogPath + "\\DateTimeLastFileTX.txt", txDateTime.ToString());
}

The packets that are sent to IoT Hub are roughly 4 KB in size and contain delta information for the various sensors. They are typically sent at a rate of 1 packet every 200 ms to 1 every second, depending on the type of information. The typical packet includes a sensor ID, value, and timestamp.

Once the information has been ingested by IoT Hub, Stream Analytics processes the stream and makes locomotive information available for Power BI dashboards.

The Power BI dashboard shows the current location of the trains as well as some general statistics about each train. This dashboard is designed to be a general overview of what is happening in the fleet.

The second device scans locomotive wheel profiles, which is used for train maintenance purposes.

The device is shown below:

The wheel profiling device sends a data set containing the 2D profile of the wheel, the time the profile was taken, and the associated train. This data is sent every time a train passes through the scanners for each set of wheels.

The final architecture is depicted below:

Conclusion

Microsoft and Transnet worked closely together to solve some of the team’s major pain points in their developer and operations team interactions and processes as well as with their IoT solution. This intensive engagement started off by establishing a good base of developer operations so that we could use the IoT solution to demonstrate the value of the developer operations that were implemented. The introduction of continuous integration and deployment in conjunction with automated testing helped turnaround time to production significantly.

At the end of the engagement, the new value stream map looked as follows:

<img src=’/techcasestudies/images/Transnet/completedvsm.png’ width=1024/>

The team has moved from a 111-day delivery cycle to a 35-day turnaround. This is a savings of 75 days due to a large reduction in waiting for others to perform tasks that could be automated.

The Microsoft engagement finally helped me go through an Agile experience, which I had always heard about. The program added much value to our software department. - Kgabo Sephesu

Going forward, the team will focus on adding more unit tests to their applications and continue getting a solid foundation in the new structure they are using. This effort includes introducing native unit tests to the C++ projects.

An area the team is aiming to implement is A/B testing in conjunction with usage monitoring to get better feedback on how operators are using their client application.

The IoT engagement helped the team rapidly implement a high-volume messaging system that could be monitored in real time with room for expansion in a secure and trusted manner.

For future steps, we have connected the Transnet team with the Microsoft Services division to continue the work on the IoT solution by adding predictive maintenance based on the telemetry from the trains as well as the wheel data coming in from the scanning device.

In addition to the predictive maintenance, the team will refactor the existing code base for their device to push information directly to the IoT hub instead of via a file.

The mechanisms for automatically managing third-party build processes as well as integrating device updates into the build process are some of the artifacts that can be used universally.

At the time we concluded our work with the Transnet team, we had successfully implemented an end-to-end DevOps process from scratch and produced the ground work for their IoT solution.

I would like to thank you and your team for choosing us for the deep technical engagement. It was truly remarkable. The team is excited and looking forward to delivering more value to our stakeholders in shorter periods of time. The greater impact was also when the team learned valuable insights about Visual Studio. We have already configured sprints for one of our major projects and we’ll be ready to execute fully in the New Year.

The practical hands-on approach of the program is the most useful and valuable part. This is because you are embedded in the team’s environment and can easily understand how they do things. This is not your typical consulting or academic approach, this was well received by engineers who are normally super skeptical of buzzwords and fads.

Well done for spotting that the team is stretched thin with the multiple projects that they are all involved in. I have already restructured the project allocations of all members. The impact of this program has already yielded results and I am sure it will continue to yield some more as we get the hang of it.

Thanks to you, Dave, and your team—you guys are gifted in working well with people. I am looking forward to more engagements with you.

—Moses Matsepane

Additional resources