This document describes the procedure to deploy DL workspace cluster on a Ubuntu Cluster that is on a VLAN with a initial node that is used as a PXE-server for prime the cluster.
[Run Once] Setup development environment.
Configuration the cluster, and determine important information of the cluster (e.g., cluster name, number of Etcd servers used). Please refer to Backup/Restore on instruction to backup/restore cluster configuration.
Configure and setup the databased used in the cluster.
Config shared file system to be used in the cluster, following instructions in Storage and the configuration.
Configure the information of the servers used in the cluster. Please write the following entries in config.yaml.
network:
domain: <<current_domain>>
container-network-iprange: "<<your_cluster_ip_range, in 10.109.x.x/24 format>>"
platform-scripts : ubuntu
machines:
<<machine1>>:
role: infrastructure
<<machine2>>:
role: worker
<<machine3>>:
role: worker
....
If you are building a high availability cluster, please include multiple infrastructure nodes. The number of infrastructure nodes should be odd, e.g., 1, 3, 5. 3 infrastructure nodes tolerate 1 failure. 5 infrastructure nodes tolerate 2 failures.
./deploy.py -y build
./deploy.py build pxe-ubuntu
./deploy.py docker run pxe-ubuntu
Reboot each machine to be deployed. In each boot screen, select to install Ubuntu 16.04.
./deploy.py sshkey install
./deploy.py runscriptonall ./scripts/prepare_ubuntu.sh
./deploy.py execonall sudo usermod -aG docker core
Partition hard drive, if necessary. Please refer to section Partition for details.
./deploy.py -y deploy
./deploy.py -y updateworker
./deploy.py -y kubernetes labels
If you are running a small cluster, and need to run workload on the Kubernete master node (this choice may affect cluster stability), please use:
./deploy.py -y kubernetes uncordon
Works now will be scheduled on the master node. If you stop here, you will have a fully functional kubernete cluster. Thus, part of DL Workspace setup can be considered automatic procedure to setup a kubernete cluster. You don’t need shared file system or database for kubernete cluster operation.
[Optional] Setup Spark
./deploy.py mount
./deploy.py webui
./deploy.py docker push restfulapi
./deploy.py docker push webui
./deploy.py kubernetes start jobmanager
./deploy.py kubernetes start restfulapi
./deploy.py kubernetes start webportal