DLWorkspace

Frequently Asked Questions (FAQ) for Azure Cluster Deployment.

Please refer to this for more general deployment issues.

After setup, I cannot visit the deployed DL Workspace portal.

sudo ./az_tools.py create failed.

Lost connection at the very first step of deploying infra node to Azure, or ./deploy.py runscriptonall ./scripts/prepare_vm_disk.sh

I cannot ssh to the node when my devbox is a physical server instead of a virtual one.

How do I know the node has been deployed?

I could not build docker image/No such image/An image does not exist locally with the tag/The repository XXX does not have a Release file

I can connect master/infra node, but the UI is not working (cannot access from browser), how to debug?

finished all deployment, but not able to connect to master node via ./deploy.py connect master, ssh denied even with ssh -i deploy/sshkey/id_rsa core@<infra node url>.

I can’t execute Spark job on Azure.

For ‘az login’, when I type in the device code, the web page prompt me again for the code.

I have launched a job (e.g., TensorFlow-iPython-GPU). However, I am unable to access the endpoint with error

```This site can’t be reached
....cloudapp.azure.com refused to connect.
```

Please check the docker image of the job you are running. Sometime, the iPython (or SSH server) hasn't been properly started, which caused the endpoint to be not accessible.  

I notice that my azure command is failing.

Azure CLI may time out after inactivity. You may need to re-login via ‘az login’.

Common configuration errors.