Create an Azure Machine Learning Compute Cluster using Azure Bicep

8 mins read

Learn how to create and manage a compute in your Azure Machine Learning workspace using Azure Bicep.

Azure Machine Learning is a cloud service for accelerating and managing the machine learning project lifecycle. To deploy a complete Azure Machine Learning Solution, we need a few components:

An Azure Machine Learning Workspace: The workspace is the top resource for all your machine learning activities and a centralized place to view and manage artifacts. Read more about how to deploy a Workspace here.Azure Machine Learning Compute Instances: Acompute instance as a managed cloud workstation for you to work with your machine learning model. This is a managed virtual machine that will include some pre-build functionality so that you can focus on your machine learning development environment. Read more about how to deploy Compute Instances here.Azure Machine Learning Compute Clusters: A Compute Cluster will help you distribute a training or batch inference process across a cluster of CPU or CPU compute nodes.

This article will review how you can create an Azure Machine Learning compute cluster using Azure Bicep, the Azure domain-specific language (DSL) for deploying resources in Azure.

Pre-requisites.

We need to have the following:

An Azure Machine Learning workspace. Check how you deploy a workspace here.Bicep installedA user with an owner or contributor role in Azure.

Why do we need a compute cluster?

A compute cluster in Azure Machine Learning is a resource that you can share in your workspace. This compute resource can scale up as needed to handle jobs efficiently and can be a single or a multi-node compute resource.

This compute resource is executed in a containerized environment and will package your model dependencies in a Docker environment.

The creation is straightforward, and the compute can be shared with your colleagues in the same workspace and automatically scale up or down based on the number of runs you submit.

You can set the maximum, and the minimum number of nodes for your compute cluster.

Save some 💵 💵💵

I recommend you take a look at Low-Priority virtual machines to reduce costs.

Another best practice is to schedule your compute instances. This way, you can make them available for your working hours and automatically start and stop the compute instances.

A third option is to leverage reserved instances. This is a discount applied to running VM instances on an hourly basis. You pay for one-year or three-year terms for your virtual machines. Long-term, this might be a good alternative.

Now let’s work on the Azure Bicep file to create an Azure Machine Learning Compute Cluster.

Azure Bicep file to Create an Azure Machine Learning Compute Cluster

Friendly reminder, ensure you have at least an Azure Machine Learning Workspace before using the following Bicep file.

We will grab the Azure Machine Learning Workspace name. We will pass this as a parameter value along with the admin username and password for the compute resources. Lastly, we will include a parameter for the cluster name.

As mentioned before, we can specify the number of nodes for our compute cluster. We will also provide the location, and optionally you can specify the virtual network name and subnet name.

The code below shows the complete Bicep template to create an Azure Machine Learning Compute Cluster.

@description(‘Specifies the name of the Azure Machine Learning Workspace which will contain this compute.’)
param workspaceName string@description(‘Specifies the name of the Azure Machine Learning Compute cluster.’)
param clusterName string@description(‘The minimum number of nodes to use on the cluster. If not specified, defaults to 0’)
param minNodeCount int = 1@description(‘ The maximum number of nodes to use on the cluster. If not specified, defaults to 4.’)
param maxNodeCount int = 3@description(‘The location of the Azure Machine Learning Workspace.’)
param location string = resourceGroup().location@description(‘The name of the administrator user account which can be used to SSH into nodes. It must only contain lower case alphabetic characters [a-z].’)
@secure()
param adminUserName string@description(‘The password of the administrator user account.’)
@secure()
param adminUserPassword string@description(‘ The size of agent VMs. More details can be found here: https://aka.ms/azureml-vm-details.’)
param vmSize string = ‘Standard_DS3_v2’@description(‘Name of the resource group which holds the VNET to which you want to inject your compute in.’)
param vnetResourceGroupName string = ”@description(‘Name of the vnet which you want to inject your compute in.’)
param vnetName string = ”@description(‘Name of the subnet inside the VNET which you want to inject your compute in.’)
param subnetName string = ”var subnet = {
id: resourceId(vnetResourceGroupName, ‘Microsoft.Network/virtualNetworks/subnets’, vnetName, subnetName)
}resource workspaceName_clusterName ‘Microsoft.MachineLearningServices/workspaces/computes@2021-01-01’ = {
name: ‘${workspaceName}/${clusterName}’
location: location
properties: {
computeType: ‘AmlCompute’
properties: {
vmSize: vmSize
scaleSettings: {
minNodeCount: minNodeCount
maxNodeCount: maxNodeCount
}
userAccountCredentials: {
adminUserName: adminUserName
adminUserPassword: adminUserPassword
}
subnet: (((!empty(vnetResourceGroupName)) && (!empty(vnetName)) && (!empty(subnetName))) ? subnet : json(‘null’))
}
}
}

The code below is the parameters file that we pass on during the deployment process:

{
“$schema”: “https://schema.management.azure.com/schemas/2019-04-01/deploymentParameters.json#“,
“contentVersion”: “1.0.0.0”,
“parameters”: {
“workspaceName”: {
“value”: “YOUR-WORKSPACE-NAME”
},
“adminUserName”: {
“value”: “YOUR-ADMIN-USERNAME”
},
“adminUserPassword”: {
“value”: “YOUR-ADMIN-USER-PASSWORD”
},
“clusterName”: {
“value”: “YOUR-CLUSTER-NAME”
}
}
}

We will use the command below to deploy this Bicep file:

$date = Get-Date -Format “MM-dd-yyyy”
$deploymentName = “AzInsiderDeployment”+”$date”New-AzResourceGroupDeployment -Name $deploymentName -ResourceGroupName AzInsiderML -TemplateFile .main.bicep -TemplateParameterFile .azuredeploy.parameters.json -c

The image below shows the output from this deployment:

Bicep template to create an Azure Machine Learning Compute Cluster.

Now you can go to your Azure Machine Learning Studio and check the compute cluster available, as shown below:

Azure Machine Learning Studio — Compute provisioning and update succeeded.

You can go to the details and validate the node count and the virtual machine size as shown below:

Azure Machine Learning Studio Compute Cluster

Hope you have a better understanding of how you can create an Azure Machine Learning Compute Cluster using Bicep.

Let me know your comments or feedback.

Join the AzInsider email list here.

-Dave R.

💪Create an Azure Machine Learning Compute Cluster using Azure Bicep was originally published in CodeX on Medium, where people are continuing the conversation by highlighting and responding to this story.

Leave a Reply

Your email address will not be published.

Follow Us