The workspace is the centralized place to:
A workspace includes other Azure resources that are used by the workspace:
A compute target is any machine or set of machines you use to run your training script or host your service deployment. You can use your local machine or a remote compute resource as a compute target. With compute targets, you can start training on your local machine and then scale out to the cloud without changing your training script.
Azure Machine Learning introduces two fully managed cloud-based virtual machines (VM) that are configured for machine learning tasks:
Compute instance: A compute instance is a VM that includes multiple tools and environments installed for machine learning. The primary use of a compute instance is for your development workstation. You can start running sample notebooks with no setup required. A compute instance can also be used as a compute target for training and inferencing jobs.
Compute clusters: Compute clusters are a cluster of VMs with multi-node scaling capabilities. Compute clusters are better suited for compute targets for large jobs and production. The cluster scales up automatically when a job is submitted. Use as a training compute target or for dev/test deployment.
For more information about training compute targets, see Training compute targets. For more information about deployment compute targets, see Deployment targets.
Azure Machine Learning Datasets make it easier to access and work with your data. By creating a dataset, you create a reference to the data source location along with a copy of its metadata. Because the data remains in its existing location, you incur no extra storage cost, and don't risk the integrity of your data sources.
For more information, see Create and register Azure Machine Learning Datasets. For more examples using Datasets, see the sample notebooks.
Datasets use datastores to securely connect to your Azure storage services. Datastores store connection information without putting your authentication credentials and the integrity of your original data source at risk. They store connection information, like your subscription ID and token authorization in your Key Vault associated with the workspace, so you can securely access your storage without having to hard code them in your script.
An environment is the encapsulation of the environment where training or scoring of your machine learning model happens. The environment specifies the Python packages, environment variables, and software settings around your training and scoring scripts.
For code samples, see the "Manage environments" section of How to use environments.
An experiment is a grouping of many runs from a specified script. It always belongs to a workspace. When you submit a run, you provide an experiment name. Information for the run is stored under that experiment. If the name doesn't exist when you submit an experiment, a new experiment is automatically created.
For an example of using an experiment, see Tutorial: Train your first model.
A run is a single execution of a training script. An experiment will typically contain multiple runs.
Azure Machine Learning records all runs and stores the following information in the experiment:
You produce a run when you submit a script to train a model. A run can have zero or more child runs. For example, the top-level run might have two child runs, each of which might have its own child run.
A run configuration defines how a script should be run in a specified compute target. You use the configuration to specify the script, the compute target and Azure ML environment to run on, any distributed job-specific configurations, and some additional properties. For more information on the full set of configurable options for runs, see ScriptRunConfig.
A run configuration can be persisted into a file inside the directory that contains your training script. Or it can be constructed as an in-memory object and used to submit a run.
For example run configurations, see Configure a training run.
When you submit a run, Azure Machine Learning compresses the directory that contains the script as a zip file and sends it to the compute target. The zip file is then extracted, and the script is run there. Azure Machine Learning also stores the zip file as a snapshot as part of the run record. Anyone with access to the workspace can browse a run record and download the snapshot.
NOTE: To prevent unnecessary files from being included in the snapshot, make an ignore file (.gitignore or .amlignore) in the directory. Add the files and directories to exclude to this file. For more information on the syntax to use inside this file, see syntax and patterns for .gitignore. The .amlignore file uses the same syntax. If both files exist, the .amlignore file is used and the .gitignore file is unused.
Azure Machine Learning automatically logs standard run metrics for you. However, you can also use the Python SDK to log arbitrary metrics.
There are multiple ways to view your logs: monitoring run status in real time, or viewing results after completion. For more information, see Monitor and view ML run logs.
When you start a training run where the source directory is a local Git repository, information about the repository is stored in the run history. This works with runs submitted using a script run configuration or ML pipeline. It also works for runs submitted from the SDK or Machine Learning CLI.
For more information, see Git integration for Azure Machine Learning.
When you run an experiment to train a model, the following steps happen. These are illustrated in the training workflow diagram below:
VMs/HDInsight, accessed by SSH credentials in a key vault in the Microsoft subscription. Azure Machine Learning runs management code on the compute target that:
Machine Learning Compute, accessed through a workspace-managed identity. Because Machine Learning Compute is a managed compute target (that is, it's managed by Microsoft) it runs under your Microsoft subscription.
At its simplest, a model is a piece of code that takes an input and produces output. Creating a machine learning model involves selecting an algorithm, providing it with data, and tuning hyperparameters. Training is an iterative process that produces a trained model, which encapsulates what the model learned during the training process.
You can bring a model that was trained outside of Azure Machine Learning. Or you can train a model by submitting a run of an experiment to a compute target in Azure Machine Learning. Once you have a model, you register the model in the workspace.
Azure Machine Learning is framework agnostic. When you create a model, you can use any popular machine learning framework, such as Scikit-learn, XGBoost, PyTorch, TensorFlow, and Chainer.
For an example of training a model using Scikit-learn, see Tutorial: Train an image classification model with Azure Machine Learning.
The model registry lets you keep track of all the models in your Azure Machine Learning workspace.
Models are identified by name and version. Each time you register a model with the same name as an existing one, the registry assumes that it's a new version. The version is incremented, and the new model is registered under the same name.
When you register the model, you can provide additional metadata tags and then use the tags when you search for models.
TIP: A registered model is a logical container for one or more files that make up your model. For example, if you have a model that is stored in multiple files, you can register them as a single model in your Azure Machine Learning workspace. After registration, you can then download or deploy the registered model and receive all the files that were registered.
You can't delete a registered model that is being used by an active deployment.
For an example of registering a model, see Train an image classification model with Azure Machine Learning.
You deploy a registered model as a service endpoint. You need the following components:
For more information about these components, see Deploy models with Azure Machine Learning.
An endpoint is an instantiation of your model into a web service that can be hosted in the cloud.
When deploying a model as a web service, the endpoint can be deployed on Azure Container Instances, Azure Kubernetes Service, or FPGAs. You create the service from your model, script, and associated files. These are placed into a base container image, which contains the execution environment for the model. The image has a load-balanced, HTTP endpoint that receives scoring requests that are sent to the web service.
You can enable Application Insights telemetry or model telemetry to monitor your web service. The telemetry data is accessible only to you. It's stored in your Application Insights and storage account instances. If you've enabled automatic scaling, Azure automatically scales your deployment.
The following diagram shows the inference workflow for a model deployed as a web service endpoint:
Here are the details:
For an example of deploying a model as a web service, see Deploy an image classification model in Azure Container Instances.
When you deploy a trained model in the designer, you can deploy the model as a real-time endpoint. A real-time endpoint commonly receives a single request via the REST endpoint and returns a prediction in real-time. This is in contrast to batch processing, which processes multiple values at once and saves the results after completion to a datastore.
Pipeline endpoints let you call your ML Pipelines programatically via a REST endpoint. Pipeline endpoints let you automate your pipeline workflows.
A pipeline endpoint is a collection of published pipelines. This logical organization lets you manage and call multiple pipelines using the same endpoint. Each published pipeline in a pipeline endpoint is versioned. You can select a default pipeline for the endpoint, or specify a version in the REST call.
The Azure Machine Learning CLI is an extension to the Azure CLI, a cross-platform command-line interface for the Azure platform. This extension provides commands to automate your machine learning activities.
You use machine learning pipelines to create and manage workflows that stitch together machine learning phases. For example, a pipeline might include data preparation, model training, model deployment, and inference/scoring phases. Each phase can encompass multiple steps, each of which can run unattended in various compute targets.
Pipeline steps are reusable, and can be run without rerunning the previous steps if the output of those steps hasn't changed. For example, you can retrain a model without rerunning costly data preparation steps if the data hasn't changed. Pipelines also allow data scientists to collaborate while working on separate areas of a machine learning workflow.
Azure Machine Learning provides the following monitoring and logging capabilities:
Azure Machine Learning studio provides a web view of all the artifacts in your workspace. You can view results and details of your datasets, experiments, pipelines, models, and endpoints. You can also manage compute resources and datastores in the studio.
The studio is also where you access the interactive tools that are part of Azure Machine Learning:
IMPORTANT: Tools marked (preview) below are currently in public preview. The preview version is provided without a service level agreement, and it's not recommended for production workloads. Certain features might not be supported or might have constrained capabilities. For more information, see Supplemental Terms of Use for Microsoft Azure Previews.