Redoop, a big-data solution and platform provider, and Microsoft worked together to identify and accomplish three key goals. The result was a “one-click” deployment model based on Azure Resource Manager templates, improvement in DevOps efficiency, and publication of Redoop’s solution to the Azure Marketplace in China.

Project team:

  • Nevin Dong – Principal Technical Evangelist, ISV, Microsoft
  • Chao Li – Principal Technical Evangelist, ISV, Microsoft
  • Livia Li – Partner Business Evangelist, ISV, Microsoft
  • Ming Liu – Senior Program Manager, C&E, Microsoft
  • Siwei Lv – Senior Software Engineer, C&E, Microsoft
  • Zhidi Shang – Principal Program Manager, C&E, Microsoft
  • Xiaojun Tong – CTO, Redoop
  • Wei Guo – Dev Manager, Redoop
  • Dafei Sun – Dev Manager, Redoop

Customer profile

Redoop is a big-data solution and platform provider based in China. It helps enterprise customers adopt big-data technologies through its Hadoop/Spark products, enabling them to build intelligent industry solutions to better satisfy fast-growing business needs.

Redoop is a member of the ODPI Linux Foundation organization and is an active contributor in Hadoop communities, events, and social media in China. Redoop’s flagship product, China Redoop Hyperloop (CRH), can be run on x86, OpenPower, ARM-based chips, Linux (Ubuntu, Red Hat, and so on), and Windows Server OS platforms. It supports a broad range of technologies offered in the Hadoop/Spark stack and accelerates Hadoop execution performance through field-programmable gate arrays (FPGA), GPU, and Hadoop Processing Library (HPL).

In 2016, Redoop announced its +HADOOP strategy, engaging with leading industry customers to help them leverage Hadoop big-data technologies with innovative industry use cases, and exploring opportunities in new business models and markets. Redoop CRH products have been successfully deployed in some large companies such as NTT DOCOMO, China Aerospace, and Lenovo, supporting up to 100 nodes/4.8 PB large-scale clusters in real-world production environments. Redoop plans to reach more industry customers in telecom, banking, transportation, security, and Internet games in the future.

Problem statement

Industry and enterprise customers are beginning to invest heavily in big-data app developments and deployments to gain deeper business insights, optimize business-operations efficiency, and deliver better consumer experiences. Independent software vendor (ISV) partners are eager to leverage cloud platforms to accelerate big-data solution deployments and provisioning and lower the total cost of ownership (TCO).

However, big-data app deployments and operations processes are time- and resource-consuming. Some partners, for example, still use the original GUI way to manage Hadoop cluster creation and provisioning, or suffer from the pain of using scripts that are hard to maintain and change by demand.

The Redoop CRH product contains a broad range of technologies offered in the Hadoop/Spark stack, such as Hive, Storm, HBase, YARN, and ZooKeeper, and provides rich tool sets to support various scenarios such as online, nearline, and offline, as demonstrated in Figure 1. Redoop also supports ingestion from various types of structured, semistructured, and unstructured data sources, such as relational databases, sensors, audios, videos, streams, logs, and more.

Figure 1. Redoop CRH product technical architecture diagram

Redoop CRH Product Technical Architecture Diagram

After much discussion and brainstorming during an envisioning workshop, the team agreed to focus on the following goals:

  • Optimize DevOps lifecycle and processes, supporting different usage scenarios including dev/test, training, staging/production, and operations management.
  • Implement “one-click” deployments in which customers can create Redoop CRH clusters and virtual machines (including Windows and Linux) in an easy and automated way.
  • Publish CRH to the Azure Marketplace, enabling enterprise customers to use a free trial of CRH and easily move their big-data apps to Azure, lowering their capital expenditures (CAPEX) and operating expenses (OPEX).

Solution, steps, and delivery

The project team’s journey started with a one-day cloud ideation session (CIS) workshop with architecture design reviews. The Redoop team introduced their business visions and key customer requirements, then did a deep-dive into their architecture design and key component designs and sample codes.

The Microsoft DX TE team gave an overview of Microsoft Azure and its latest offerings, and conducted a detailed breakout session on how to develop solutions based on Azure infrastructure as a service (IaaS) platforms, especially Azure Resource Manager, Virtual Machines and Virtual Network, VM Desired State Configuration (DSC) extensions, PowerShell, and Azure command-line interface (Azure CLI). The Azure Marketplace team explained the marketplace participation policies and boarding process, and technical and nontechnical requirements for solutions to be published to the Azure Marketplace.

Figure 2. Cloud ideation session (CIS) workshop

Cloud Ideation Session (CIS) Workshop

Three key solutions were identified and defined for next-step elaboration actions:

  • “One-click” deployment model based on Azure Resource Manager templates
  • DevOps optimization hackfest
  • Publishing to the Azure Marketplace in China

“One-click” deployment model based on Resource Manager templates

Redoop CRH platform deployments consist of many key components, including image sources, master node virtual machines and data node virtual machines, virtual network, DNS, IPs, storage account, security groups, and so on. IT professionals or developers can create a CRH Hadoop cluster and virtual machines step by step when following a user manual, but this approach can be time-consuming and error-prone. The process could be automated by using scripts and command lines, but it would not be easy to debug and customize given different types of requirements.

Azure Resource Manager technology enables Redoop to deploy, manage, and monitor these components as a resource group. It would be fast and easy to deploy, provision, update, or delete all the resources in a single, coordinated operation. Resource Manager also provides security, auditing, diagnostics, and tagging features to help IT professionals manage the resources after deployment. This would dramatically increase Redoop’s and the customer’s DevOps efficiency and productivity.

Redoop can use templates that work for different environments and usage scenarios such as testing, staging, and production. It would also be easy for Redoop to provide different versions of templates for different market segments, industry solutions, and so on.

Redoop defines the configurations in a JSON file named “deploy.json”. The following code example defines some attributes of the basic infrastructure resources needed, including storage account and type, recommended virtual machine size, and OS family that can be supported. Note that based on performance baseline testing results, Redoop suggests Azure D-series virtual machines should be chosen for cluster nodes, from Standard_D2 as minimum requirements.

1. {
2.    "$schema": "https://schema.management.azure.com/schemas/2015-01-01/deploymentTemplate.json#",
3.    "contentVersion": "1.0.0.0",
4.    "parameters": {
5.        "storageName": {
6.            "type": "string",
7.            "defaultValue": "crhstoryage",
8.            "metadata": {
9.                "description": "存储账户名称(只有小写英文字母和数字有效)"
10.            }
11.        },
12.        "storageAccountType": {
13.            "type": "string",
14.            "defaultValue": "Standard_LRS",
15.            "metadata": {
16.                "description": "存储账户类型 (默认是 Standard_LRS)"
17.            }
18.        },
19.        "vmSize": {
20.            "type": "string",
21.            "defaultValue": "Standard_D2",
22.            "metadata": {
23.                "description": "部署集群的虚拟机大小 (默认是 Standard_D2,建议虚拟机大小最小是 Standard_D2)"
24.            }
25.        },
26.        "CentOSVersion": {
27.            "type": "string",
28.            "defaultValue": "6.5",
29.            "allowedValues": [
30.                "6.5",
31.                "6.6",
32.                "6.7",
33.                "6.8"
34.            ],
35.            "metadata": {
36.                "description": "操作系统默认是 CentOS 6.5, 可选择 CentOS 6.5, 6.6, 6.7, 6.8. 等版本"
37.            }
38.        },
39.        "vmCount": {
40.            "type": "int",
41.            "defaultValue": 3,
42.            "minValue": 2,
43.            "maxValue": 50,
44.            "metadata": {
45.                "description": "集群虚拟机个数(默认值是3, 最小2)"
46.            }
47.        }
48.    },

Another important step is to define network security group and rules, as listed in the next code example, to control SSH traffic, web and data nodes access, admin node and license control, and more.

93.        {
94.            "apiVersion": "[variables('resourceAPIVersion')]",
95.            "type": "Microsoft.Network/networkSecurityGroups",
96.            "name": "[variables('securityGroupName')]",
97.            "location": "[resourceGroup().location]",
98.            "properties": {
99.                "securityRules": [
100.                    {
101.                        "name": "SSH",
102.                        "properties": {
103.                            "description": "Allows SSH traffic",
104.                            "protocol": "Tcp",
105.                            "sourcePortRange": "*",
106.                            "destinationPortRange": "22",
107.                            "sourceAddressPrefix": "*",
108.                            "destinationAddressPrefix": "*",
109.                            "access": "Allow",
110.                            "priority": 100,
111.                            "direction": "Inbound"
112.                        }
113.                    },
114.                    {
115.                        "name": "NamenodeWeb",
116.                        "properties": {
117.                            "description": "namenodeWeb",
118.                            "protocol": "Tcp",
119.                            "sourcePortRange": "*",
120.                            "destinationPortRange": "50070",
121.                            "sourceAddressPrefix": "*",
122.                            "destinationAddressPrefix": "*",
123.                            "access": "Allow",
124.                            "priority": 101,
125.                            "direction": "Inbound"
126.                        }
127.                    },
128.                    {
129.                        "name": "DatanodeControl",
130.                        "properties": {
131.                            "description": "datanode control port",
132.                            "protocol": "Tcp",
133.                            "sourcePortRange": "*",
134.                            "destinationPortRange": "50010",
135.                            "sourceAddressPrefix": "*",
136.                            "destinationAddressPrefix": "*",
137.                            "access": "Allow",
138.                            "priority": 102,
139.                            "direction": "Inbound"
140.                        }
141.                    },
142.                    {
143.                        "name": "CRHAdmin",
144.                        "properties": {
145.                            "description": "CRHAdmin port",
146.                            "protocol": "Tcp",
147.                            "sourcePortRange": "*",
148.                            "destinationPortRange": "8080",
149.                            "sourceAddressPrefix": "*",
150.                            "destinationAddressPrefix": "*",
151.                            "access": "Allow",
152.                            "priority": 103,
153.                            "direction": "Inbound"
154.                        }
155.                    },
156.                    {
157.                        "name": "Licence",
158.                        "properties": {
159.                            "description": "Licence port",
160.                            "protocol": "Tcp",
161.                            "sourcePortRange": "*",
162.                            "destinationPortRange": "9999",
163.                            "sourceAddressPrefix": "*",
164.                            "destinationAddressPrefix": "*",
165.                            "access": "Allow",
166.                            "priority": 104,
167.                            "direction": "Inbound"
168.                        }
169.                    }
170.                ]
171.            }
172.        },

After the depended resources such as storage account, virtual network, availability set, network security group, network interface card (NIC), and public IP are created, we can create the master and data node virtual machines, as shown here:

317.        {
318.            "apiVersion": "[variables('resourceAPIVersion')]",
319.            "type": "Microsoft.Compute/virtualMachines",
320.            "name": "[variables('vmNameM')]",
321.            "location": "[resourceGroup().location]",
322.            "dependsOn": [
323.                "[concat('Microsoft.Storage/storageAccounts/', variables('storageNameM'))]",
324.                "[concat('Microsoft.Network/networkInterfaces/', variables('nicNameM'))]",
325.                "[concat('Microsoft.Compute/availabilitySets/', variables('vmAvaM'))]"
326.            ],
327.            "properties": {
328.                "availabilitySet": {
329.                    "id": "[resourceId('Microsoft.Compute/availabilitySets', variables('vmAvaM'))]"
330.                },
331.                "hardwareProfile": {
332.                    "vmSize": "[parameters('vmSize')]"
333.                },
334.                "osProfile": {
335.                    "computerName": "[variables('vmNameM')]",
336.                    "adminUsername": "[variables('adminUsername')]",
337.                    "adminPassword": "[variables('adminPassword')]"
338.                },
339.                "storageProfile": {
340.                    "imageReference": {
341.                        "publisher": "[variables('imagePublisher')]",
342.                        "offer": "[variables('imageOffer')]",
343.                        "sku": "[parameters('CentOSVersion')]",
344.                        "version": "latest"
345.                    },
346.                    "osDisk": {
347.                        "name": "osdisk",
348.                        "vhd": {
349.                            "uri": "[concat(reference(concat('Microsoft.Storage/storageAccounts/', variables('storageNameM')), variables('resourceAPIVersion')).primaryEndpoints.blob, variables('vmStorageAccountContainerName'),'/',variables('OSDiskName'),'.vhd')]"
350.                        },
351.                        "caching": "ReadWrite",
352.                        "createOption": "FromImage"
353.                    },
354.                    "dataDisks": [
355.                        {
356.                            "name": "datadisk1",
357.                            "diskSizeGB": "100",
358.                            "lun": 0,
359.                            "vhd": {
360.                                "uri": "[concat(reference(concat('Microsoft.Storage/storageAccounts/', variables('storageNameM')), variables('resourceAPIVersion')).primaryEndpoints.blob, variables('vmStorageAccountContainerName'),'/',variables('dataDisk1VhdName'),'.vhd')]"
361.                            },
362.                            "createOption": "Empty"
363.                        }
364.                    ]
365.                },
366.                "networkProfile": {
367.                    "networkInterfaces": [
368.                        {
369.                            "id": "[resourceId('Microsoft.Network/networkInterfaces',variables('nicNameM'))]"
370.                        }
371.                    ]
372.                },
373.                "diagnosticsProfile": {
374.                    "bootDiagnostics": {
375.                        "enabled": "true",
376.                        "storageUri": "[concat(reference(concat('Microsoft.Storage/storageAccounts/', variables('storageNameM')), variables('resourceAPIVersion')).primaryEndpoints.blob)]"
377.                    }
378.                }
379.            }
380.        },

The virtual machine creation process invokes the custom scripts file (for example, master.sh file for master node) defined in Azure virtual machine extensions, to install the components from the repository, such as Ambari, ZooKeeper, HDFS, YARN, MapReduce, and so on.

455.        {
456.            "apiVersion": "[variables('resourceAPIVersion')]",
457.            "type": "Microsoft.Compute/virtualMachines/extensions",
458.            "name": "[concat(variables('vmNameM'), '/crhsetup')]",
459.            "location": "[resourceGroup().location]",
460.            "dependsOn": [
461.                "[concat('Microsoft.Compute/virtualMachines/', variables('vmNameM'))]"
462.            ],
463.            "properties": {
464.                "publisher": "Microsoft.Azure.Extensions",
465.                "type": "CustomScript",
466.                "typeHandlerVersion": "2.0",
467.                "autoUpgradeMinorVersion": true,
468.                "settings": {
469.                    "fileUris": [
470.                        "[concat(variables('setupSHuri'), '/master.sh')]"
471.                    ],
472.                    "commandToExecute": "[concat('sh master.sh ', variables('NodeNum'))]"
473.                }
474.            }
475.        },

The Ambari setup scripts code example is shown here:

59.echo "setup_ambari.sh ..."
60.echo "#!/bin/sh
61.UserName=\$1
62.auto_setup_ambari () {
63.  expect -c \"set timeout -1; 
64.            spawn /usr/sbin/ambari-server setup -j /opt/install/jdk;
65.            expect {
66.              *continue* {send -- y\r;exp_continue;}
67.              *Customize* {send -- y\r;exp_continue;}
68.              *root* {send -- \$UserName\r;exp_continue;}
69.              *configuration* {send -- n\r;exp_continue;}
70.               eof        {exit 0;}
71.            }\";
72.}
73.auto_setup_ambari \$UserName" >> script/setup_ambari.sh
74.

After the cluster, virtual machines, and all the related resources are created and provisioned, users can log on to the management portal and add the service components needed—for example, HIVE, HBase, Sqoop, Storm, Flume, Docker, Spark—and then finish the installation process, as shown in Figure 3.

Figure 3. Adding components to the management portal

Adding components in the management portal

Now IT professionals and developers can monitor the running status and metrics of the CRH cluster and services in the Ambari management portal, as shown in Figure 4.

Figure 4. Adding components to the management portal

Adding components in the management portal

DevOps optimization hackfest

The team conducted a DevOps optimization hackfest to better understand the current lifecycle, process, and capabilities the Redoop team is developing and delivering their CRH products. A value stream mapping (VSM) session, led by the Microsoft DX TE team, as shown in Figure 5, helped the Redoop team evaluate the process capabilities, find the improvement areas, and define the follow-up actions.

Figure 5. DevOps optimization hackfest

DevOps Optimization Hackfest

VSM is a useful visualization tool that helped the Redoop team to better understand “as-is” processes and status, and brainstorm and structure their “to-be” Agile model. The Redoop team was excited to learn the VSM methodology—they strongly accept the importance of quick response and closed loop from customer to customer. The VSM hackfest helped Redoop gain a common understanding of the process, tools, and techniques across teams.

These are some of the key findings and recommendations from the VSM exercise:

  • The Agile development lifecycle includes four phases: conception (starting from customer requirements and interactions), developments & continuous integration (CI), staging, and release. It would take almost 1 month (4 weeks plus 1 more day) to complete an iteration cycle.
  • The Redoop team uses Redmine and Wiki for Agile project management and to manage work items. Wiki is also used in customer requirements management, tracking, and sharing across teams.
  • Visual Studio Code, PowerShell, Azure CLI, and Docker are used by the Azure-CRH-ARM development team; the source codes are stored and managed in their repository “redhub.io”.
  • Jenkins and Sonar are used in continuous integration and source-code control. Automated builds are very important for the subsequent tests and deployments tasks.
  • TestMgr, Test Case, JUnit, PowerShell, Azure CLI, and Docker are used by testing and quality assurance (QA) teams.
  • Ambari, Python, Shell, and Azure Resource Manager are used in the deployment and staging process. In the VSM exercise, the Redoop team also explored how to automate the configurations and deployments to better satisfy specific customers’ requirements.
  • Ambari, log monitoring tools, the Azure portal and diagnostics are used in production, operations management, and troubleshooting.
  • Redoop also wants to release and publish the solution to the Azure Marketplace in China to reach more customers online and speed up the deployment process.

Figure 6. Value stream mapping (VSM) diagram

Value Stream Mapping (VSM) Diagram

Some improvements were observed after using Resource Manager templates in the deployment model:

  • The Redoop CRH cluster deployment and provisioning time is reduced from several hours (depending on customer environment) to 15-30 minutes after using the “one-click” deployment model based on Resource Manager templates.
  • It takes 10 to 15 minutes to initialize the configuration of a CRH cluster, and then the customer can start using and monitoring the resources.
  • Redoop can put its virtual machine images and service components repository in its Azure subscription and use Linux on Azure-endorsed distributions to secure image quality and improve installation speed.

Publishing to the Azure Marketplace in China

The Azure Marketplace is an online applications and services marketplace that enables startups and ISVs to offer their solutions to Azure customers around the world. The Azure Marketplace can help partners distribute and sell their solutions or services to other developers, ISVs, and IT professionals who want to quickly develop their cloud-based applications and mobile solutions.

Announced in September 2016, the Azure Marketplace in China differs somewhat from other regions in ways such as the onboarding process and price model. The Azure Marketplace team and Microsoft DX team work closely together to drive more ISV partners to onboard and distribute their solutions or images to reach more customers.

The Redoop team recognizes the value the Azure Marketplace can bring and expects to grow their business faster with the Azure and Marketplace platforms. They plan to provide a free offering for customers in scenarios such as trial use, training, and proof of concept (POC). Customers can also take a “bring-your-own-license” (BYOL) approach to deploy and use Redoop products in their own Azure subscriptions.

The following key items are reviewed during the Azure Marketplace onboarding workshop:

  • Business and non-technical requirements, including company information, legal documents, ISV partnership, licensing, third-party software dependency, and support.
  • Azure Resource Manager templates and script code reviews.
  • Virtual machine images certification tests.
  • Test templates offered in staging.
  • Deploy and test templates offered in the Azure Marketplace in China.

In December 2016, Redoop successfully published its solution in the Azure Marketplace in China: https://market.azure.cn/Vhd/Show?vhdID=12034&version=14181

Conclusion

By using Resource Manager templates, Linux virtual machines and DSC extension technologies in Azure, Redoop can help customers quickly and easily deploy Hadoop cluster and large-scale nodes with just a few clicks. This approach could dramatically reduce the efforts of IT professionals and developers, and help lower customers’ CAPEX and OPEX.

The value stream mapping hackfest helped Redoop optimize its lifecycle and operations, contributing to its success in publishing its solution to the Azure Marketplace in China. The VSM exercise help the team build an agile DevOps lifecycle with quick response and a closed loop from customer to customer.

The DevOps journey is a lot like completing a puzzle: Combine key practices such as project management, requirements management, continuous integration, daily build, release management, and so on, and you finally get a clear view of the process. The DevOps hackfest and VSM exercise also help optimize DevOps efficiency, greatly reducing Redoop cluster deployments and provision times.

For the next step, Redoop intends to work closely with Microsoft on the following:

  • Gaining feedback from customer deployments and operations practices to remove performance bottlenecks and improve DevOps efficiency.
  • Developing more complex Resource Manager templates to better satisfy industry or customer usage scenarios and needs.
  • Integrating more automation tools in the DevOps lifecycle and processes, including Visual Studio Team Services on-premises or Visual Studio Online, and OSS tools such as Jenkins, Puppet, Git, and Docker.

More resources