Xenodata lab is a financial services technology startup whose XenoFlash service analyzes financial reports using natural language processing and then automatically generates an infographic to help people easily understand it.

At a hackfest with xenodata, a Microsoft team helped xenodata move Kubernetes to Azure and also create a continuous integration (CI)/continuous deployment (CD) pipeline to enable automated deployment using Visual Studio Team Services. Xenodata was using Ruby with MongoDB and Redis on Kubernetes on Google Cloud Platform (GCP).

Solution overview

To migrate its service, Microsoft helped xenodata configure Azure Container Service (Kubernetes) with Azure Container Registry and Redis. We used Azure DocumentDB with API for MongoDB for a managed Mongo service on Azure. We also developed the CI/CD pipeline using Team Services. Xenodata had a problem with slow CI builds (30 minutes). Also, they needed to replace a lot of the parameter of the Kubernetes YAML file. They used Ansible to do this. However, that is overly complex for just replacing parameters.

Key technologies used:

  • Azure Container Service with Kubernetes as the orchestrator
  • Azure Container Registry
  • Visual Studio Team Services with Linux hosted build
  • Azure Functions for migrating storage from GCP to Azure
  • DocumentDB with API for MongoDB
  • Microsoft Operations Management Suite for Docker container monitoring

Hackfest Members

Core team:

  • Daisuke Miyashiro – CTO, xenodata lab
  • Sumio Shindo – Infrastructure Engineer, Mazrica
  • Tsuyoshi Ushio – Technical Evangelist, DevOps, Microsoft
  • David Tesar – Senior Technical Evangelist, DevOps, Microsoft

Customer profile

Xenodata lab is a financial services technology startup based in Tokyo that provides services to help the decision-making process using natural language processing. Xenodata recently took first place in the MUFG fintech accelerator competition.

Problem statement

Xenodata was running its services using Kubernetes on Google Cloud Platform (Google Container Engine), but wanted to move to Azure as part of the Microsoft BizSpark Plus program.

They had several problems with the current environment:

  • Long continuous integration time—30 minutes for the Docker Hub build system.
  • Manual deployment of Kubernetes (continuous delivery).
  • Ansible being used only for replacing Kubernetes YAML file variables. Ansible is too much for this small task (release management).

Also, we faced some issues with the Azure migration:

  • Storage migration from GCP to Azure—there was no tool for it. A third-party tool didn’t work for so much data.
  • Layer 7 load balancer for Ingress controller—how can we enable it on Azure?
  • They want to use a managed service for MongoDB (equivalent) on Azure.

Here is how we sorted out these problems.

Solution, steps, and delivery

We solved these problems by using the following steps.

1. Value stream mapping

Because xenodata’s lead time from repository check-in to deployment was less than 2 hours, we didn’t conduct a formal value stream mapping (VSM) session. However, we did a lightweight VSM exercise to help understand their problem.

Value Stream Mapping 1

Their lead time was sufficiently fast. Their pain point was a slow packaging time for Docker builds. When someone pushes code, then CI starts the build/test/packaging process. It took 30 minutes to realize the result of the integration. Generally speaking, it should be within 10 minutes.

Also, when they deploy a new version of software, they need to edit the Ansible file and deploy manually. Ansible is a great configuration management tool that helps IT automation. It is excellent for configuring development/production environments and deployment pipelines. However, the xenodata team just wants to replace some keywords inside of YAML files. Ansible is too much for this purpose.

We tried this pipeline using Visual Studio Team Services.

We recommend having release management for automated deployment. Let’s get hacking.

2. CI/CD pipeline using Team Services Linux hosted agent

We had two strategies. One was having a build server with a Linux agent. The second was a Linux Hosted Agent (Preview). We deployed Azure Container Registry and then created a pipeline on the Team Services build.

Build pipeline

With Docker Integration, we implemented build and push. The build image name should include the Azure Container Registry host name and namespace.

For guidance, see Push your first image to a private Docker container registry using the Docker CLI.

We specified the image name like this. We used the Repository name for the namespace and the build ID for the version.

xenoacr-xxx.azurecr.io/$(Build.Repository.Name):$(Build.BuildId)

Also, we added the Publish Artifact task. Since this pipeline pushes a Docker image to Azure Container Registry, we didn’t need to share via a sharing folder called drop. However, we published a README this time. We just wanted to connect this build result to a release pipeline. If we didn’t publish anything, we couldn’t connect the build pipeline and release pipeline.

Build Pipeline

We also fired the build when we added a tag like this for other repositories.

Build Pipeline

We implemented it using a Team Services Linux hosted agent. Then the build time was just 10 minutes, rather than the 30 minutes when using the Docker Hub build.

Release pipeline

We had several problems with the Kubernetes deployment. We needed to install a kubectl command and configure it on the Linux hosted agent. Also, we needed to replace the YAML files.

Team Services release management can link several sources: a drop folder, Git repositories, and so on. Using this feature, we linked one Team Services private repository and the drop folder from the build definition.

Pipeline Overview

Enabling kubectl command

Then we stored the kubectl command on the private repository with chmod +x and stored the config file of Kubernetes. You can get the config file from this command. For more information, see Get started with a Kubernetes cluster in Container Service.

scp azureuser@MASTERFQDN:.kube/config .
  export KUBECONFIG=`pwd`/config
  kubectl get nodes

However, the problem is that you need to store the config file in the private repo. This is not good—you shouldn’t share the secret among everyone on your project. So we developed the Kubernetes for Team Services plugin after the hackfest. You can store the config on the “endpoint” and Team Services manages the connection-related parameters for you, like a secret, password, connection string, and so on. Now you can use this extension from the Team Services Marketplace.

Kubernetes extension

Kubernetes extension for VSTS

Note: For the first version, we didn’t automatically download the kubectl command. Some customers are using the old version. Also, the xenodata team wanted to shorten the build/release time, and downloading takes time. Now we are writing a new feature that enables us to download automatically if a customer wants to use it.

Pipeline Overview

Replace YAML file

Kubernetes enables a rolling update. The problem with the build/release pipeline is with managing the version of Docker images. We built a Docker image through the build pipeline and then pushed to the Azure container registry. You need to specify the exact version that you want to deploy to the Kubernetes (k8s) cluster. We used the BuildID to manage this problem.

We used a Replace Tokens extension for this purpose. It replaces a placeholder with environment variables. Release management can refer to a BuildID that is linked to a drop folder. We wrote the YAML file this way.

#{BUILD_BUILDID} will be replaced with the environment variable “BUILD_BUILDID”—this is a built-in environment variable of release management.

Now you can automatically deploy the target version for every environment.

---

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: xeno
  labels:
    app: xeno
spec:
  replicas: 4
  strategy:
    type: RollingUpdate
  template:
    metadata:
      labels:
        app: xeno
        tier: frontend
    spec:
      containers:
      - image: xenoxxx-xx.azurecr.io/xenodata-lab-developer/xeno:#{BUILD_BUILDID}#
        name: xeno
        env:
        - name: RAILS_ENV
          value: production
        - name: BUGSNAG_RELEASE_STAGE
          value: "production"
        - name: RAILS_SERVE_STATIC_FILES
          value: server
          : (Continue)

If you want to know more about configuring the CI/CD pipeline, I blogged what I learned from the xenodata project: Journey to the Kubernetes from Legacy docker environment on Azure with VSTS.

3. Storage migration from GCP to Azure

One of the pain points was the storage migration. We have several tools for accessing cloud storage: gsutil, CloudBerry, and so on. They are usually great tools. However, these didn’t work. Xenodata lab had a lot of data files. When the team tried to retrieve the data, the tools froze. So I wrote a migration code using Azure Functions. You can read about it on this blog post and see the source code:

Storage Migration Overview

As the blog post explains, we had a problem with timeout errors when trying to fetch data from Google Cloud Storage. There were too many files to fetch. The ordinary storage management tool didn’t work. So I needed to develop the code for this Azure Function. Using Cloud Storage Client Library (Google) and Windows Azure Storage Library, I wrote an Azure Functions code (serverless).

Once you create a queue to a storage account, the functions detect the queue and then fetch the whole artifact under the directory that you specified on the queue recursively. You can control how much of the artifact you want to get.

This is the code:

project.json

{

  "frameworks": {
    "net46":{
      "dependencies": {
          "Google.Cloud.Storage.V1": "1.0.0-beta05"
      }
    }
  }
}

run.csx


#r "Microsoft.WindowsAzure.Storage"
using System;
using System.IO;
using Google.Cloud.Storage.V1;
using Microsoft.WindowsAzure.Storage;
using Google.Apis.Auth.OAuth2;

public async static Task Run(string message, TraceWriter log)
{
    log.Info($"C# Queue trigger function processed: {message}");
    // Environment.SetEnvironmentVariable("GOOGLE_APPLICATION_CREDENTIALS", "D:\\home\\site\\wwwroot\\{YOUR FUNCTION NAME}\\servicecert.json"); // this function is from 1.0.0-beta06

    // Google Storage

    var credential = GoogleCredential.FromStream(System.IO.File.OpenRead("D:\\home\\site\\wwwroot\\{YOUR FUNCTION NAME}\\servicecert.json"));
    var client = StorageClient.Create(credential);
    var bucketName = "simpleatest";

    // Azure Storage Account 

    var storageAccount = CloudStorageAccount.Parse("BlobEndpoint={YOUR BLOB SERVICE SAS TOKEN is here}");
    var blobClient = storageAccount.CreateCloudBlobClient();
    var container = blobClient.GetContainerReference(bucketName);
    await container.CreateIfNotExistsAsync();

    foreach(var obj in client.ListObjects(bucketName, message))
    {
        if (IsDirectory(obj.Name))
        {
            container.GetDirectoryReference(obj.Name);
        } else 
        {
            var blockBlob = container.GetBlockBlobReference(obj.Name);
            using (var stream = await blockBlob.OpenWriteAsync())
            {
                await client.DownloadObjectAsync(bucketName, obj.Name, stream);
            }
        }
    log.Info($"{obj.Name}:{obj.ContentType}");        
    }
}

private static bool IsDirectory(string backetPath)
{
    return backetPath.EndsWith("/");
}

4. Nginx Ingress for SSL offloading

Xenodata used an Ingress controller for SSL offloading. Azure didn’t have that until now, so we used Nginx-Ingress instead. It worked as expected. The only difference from the document is that we need to expose the Nginx-Ingress pod for this purpose with the --type=LoadBalancer option.

Deploy and testing Nginx-Ingress on Azure:

$ git clone https://github.com/kubernetes/contrib.git
$ cd contrib/ingress/controllers/nginx$
$ kubectl create -f examples/default-backend.yaml
$ kubectl expose rc default-http-backend --port=80 --target-port=8080 --name=default-http-backend
$ kubectl create -f examples/default/rc-default.yaml
$ kubectl run echoheaders --image=gcr.io/google_containers/echoserver:1.4 --replicas=1 --port=8080
$ kubectl expose deployment echoheaders --port=80 --target-port=8080 --name=echoheaders-x
$ kubectl expose deployment echoheaders --port=80 --target-port=8080 --name=echoheaders-y
$ kubectl create -f examples/ingress.yaml
$ kubectl expose pod nginx-ingress-controller-27lwk --port=80 --target-port=80 --type=LoadBalancer
$ kubectl get  service
$ curl {YOUR_SERVICE_IP_ADDRESS}/foo -H 'Host: foo.bar.com'

SSL offloading setting test:

$ openssl req -x509 -nodes -days 365 -newkey rsa:2048 -keyout /tmp/tls.key -out /tmp/tls.crt -subj "/CN=foo.bar.com"
$ kubectl create secret tls foo-secret --key /tmp/tls.key --cert /tmp/tls.crt
$ echo "
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: foo
  namespace: default
spec:
  tls:
  - hosts:
    - foo.bar.com
    secretName: foo-secret
  rules:
  - host: foo.bar.com
    http:
      paths:
      - backend:
          serviceName: echoheaders-x
          servicePort: 80
        path: /
" | kubectl create -f -

$ kubectl delete service nginx-ingress-controller-27lwk

$ kubectl expose pod nginx-ingress-controller-27lwk --port=443 --target-port=443 --type=LoadBalancer
$ curl https://{YOUR_NEW_SERVICE_IP} -H 'Host:foo.bar.com' -k

I blogged about what I learned from the xenodata project: Configure Nginx Ingress Controller for TLS termination on Kubernetes on Azure

5. Automation for Kubernetes setup (configuration as code)

Using the Kubernetes extension, they automated the configuration of the Nginx-Ingress Controller, including secret creation. Now they can recreate the Kubernetes cluster very easily.

Configure kubernetes cluster

Task configuration (xeno case):

Title: Create a docker registory secret
Task: Kubernetes General Task
k8s endpoint: some end point
kubectl binary: $(System.DefaultWorkingDirectory)/xeno_deploy/kubectl
sub command: create
Arguments: secret docker-registry myregistrykey --docker-server=xxxxxx.azurecr.io --docker-username=xxxxxx --docker-password=xxxxxxx --docker-email=xxxxx@xxxxx.com
Title: Create a certificate secret
Task: Kubernetes General Task
k8s endpoint: some end point
kubectl binary: $(System.DefaultWorkingDirectory)/xeno_deploy/kubectl
sub command: create
Arguments: secret generic certificate-secret --from-file=tls.crt=$(System.DefaultWorkingDirectory)/xeno_deploy/xenodata-lab-com.crt --from-file=tls.key=$(System.DefaultWorkingDirectory)/xeno_deploy/xenodata-lab-com.key
Title: Deploy default backend
Task: Kubernetes Apply Task
k8s endpoint: some end point
YAML file: $(System.DefaultWorkingDirectory)/xeno_deploy/default-backend.yml
kubectl binary: $(System.DefaultWorkingDirectory)/xeno_deploy/kubectl

Note: You can find the source in default-backend.yaml.

Title: Deploy Ingress Controller
Task: Kubernetes Apply Task
k8s endpoint: some end point
YAML file: $(System.DefaultWorkingDirectory)/xeno_deploy/nginx-ingress-controller.yml
kubectl binary: $(System.DefaultWorkingDirectory)/xeno_deploy/kubectl

Note: You can find the source in nginx-ingress-controller.yaml.

Title: Configure Ingress Controller
Task: Kubernetes Apply Task
k8s endpoint: some end point
YAML file: $(System.DefaultWorkingDirectory)/xeno_deploy/static-ip-nginx-ingress-lb-svc.yml
kubectl binary: $(System.DefaultWorkingDirectory)/xeno_deploy/kubectl

static-ip-nginx-ingress-lb-svc.yml

Note: LoadBalancerIP isn’t supported on the current version. It will come soon.

# This is the backend service
apiVersion: v1
kind: Service
metadata:
  name: nginx-ingress-lb
  annotations:
    service.beta.kubernetes.io/external-traffic: OnlyLocal
  labels:
    app: nginx-ingress-lb
spec:
  type: LoadBalancer
  loadBalancerIP: xxx.xxx.xxx.xxx
  ports:
  - port: 80
    name: http
    targetPort: 80
  - port: 443
    name: https
    targetPort: 443
  selector:
    # Selects nginx-ingress-controller pods
    k8s-app: nginx-ingress-controller

For details on setting up the Ingress controller, see Configure Nginx Ingress Controller for TLS termination on Kubernetes on Azure.

6. Operations Management Suite integration with Kubernetes

This was easy! Just follow this process: Monitor an Azure Container Service cluster with Microsoft Operations Management Suite (OMS). We can monitor/log search containers with Operations Management Suite.

Storage Migration Overview

Conclusion

We accomplished a lot, and now xenodata is in the migration process. I can’t wait to hack with DocumentDB with them. We solved almost all of the technical problems related to the k8s environment migration from GCP to Azure. Also, we made the CI/CD process three times faster without a lot of effort using Team Services Linux hosted build. We learned a lot about Azure Container Service (Kubernetes) and even about GCP!

Here is some customer feedback from this hackfest:

“I didn’t know much about Azure. During the hackfest, I could hack with a Microsoft evangelist and we could discuss and learn about Azure in person. Also, VSTS is awesome.”

– Daisuke Miyashiro, CTO

“I was not used to Azure and VSTS terms and configuration (especially account setting). It might take time if you try only by yourself. Through this hackfest, I could hack and talk with a Microsoft evangelist in person. Also, he helped to research the root causes of obstacles. The hackfest helped me (a beginner) learn about Azure.

“Also, for advanced topics, he provided source code, VSTS tasks, and discussed the architecture. It was entirely helpful, especially because this project is about migration from another cloud platform to Azure service, of which we used a preview service. It keeps on changing. A hackfest-like engagement is the best fit for this project.

“We and Microsoft learned a lot together. Shared as a case study, it is also helpful for the software industry. We hope Microsoft will continue these hackfest-style engagements! Thank you”

– Sumio Shindo, Infrastructure Engineer

Resources