These are the general steps to deploy a DL Workspace cluster.
[Run Once] Setup development environment.
Configuration the cluster, and determine important information of the cluster (e.g., cluster name, number of Etcd servers used). Please refer to Backup/Restore on instruction to backup/restore cluster configuration.
Configure and setup the databased used in the cluster.
Config shared file system to be used in the cluster, following instructions in Storage.md and the configuration.
python deploy.py -y build
deploy.py -y deploy
Start worker nodes. Please use ‘-public’ option if you run command inside firewall, while the cluster is public (e.g., Azure, AWS).
deploy.py -y updateworker
If you stop here, you will have a fully functional kubernete cluster. Thus, part of DL Workspace setup can be considered automatic procedure to setup a kubernete cluster. You don’t need shared file system or database for kubernete cluster operation.
__Static IP:__ Static IP/DNS name are strongly recommended for master and Etcd server, especially if you desire High Availability (HA) operation. Please contact your IT department to setup static IP for the master and Etcd server. With static IP, the DL workspace can operate uninterruptedly.
Otherwise, each time master and Etcd server has been rebooted (the master and Etcd servers may obtain a new IP addresses), you will need to restart master, etcd and work nodes by repeating steps of 4 and 5.
deploy.py -y hostname set
deploy.py -y kubernetes labels
deploy.py webui
deploy.py docker push restfulapi
deploy.py docker push webui
deploy.py -y kubernetes start webportal
deploy.py -y kubernetes start restfulapi
deploy.py -y kubernetes start jobmanager