Using Bicep to create and attach an Azure Kubernetes Cluster to Azure Machine Learning

14 mins read

Learn how to leverage Bicep to create an Azure Kubernetes Cluster for Azure Machine Learning.

Using Bicep to create and attach an Azure Kubernetes Cluster to Azure Machine Learning

If you are already using Azure Machine Learning Service, you can deploy a trained model to Azure Kubernetes Service.

Azure Machine Learning is a cloud service for accelerating and managing the machine learning project lifecycle.

I described how you could use Bicep to deploy all the components needed to use Azure Machine Learning in previous articles. You need to have:

An Azure Machine Learning WorkspaceAzure Machine Learning Compute nodes to use for your development environmentCompute Clusters to use for submitting training runs.Your model. Check how you can train TensorFlow models here.

You can learn about the architecture and concepts for Azure Machine Learning here.

The image below shows a high-level architecture reference of AzureMachine Learning:

MS Docs — Azure Machine Learning Architecture

This article aims to show you how you can deploy an Azure Kubernetes Cluster into your Azure Machine Learning Service.

To deploy the Azure Kubernetes Cluster, we have two options: we can create a new Azure Kubernetes Cluster from the Azure Machine Learning Workspace or attach an existing Azure Kubernetes Cluster.

For the sake of this use case, we will create a new Azure Kubernetes Cluster.

We will perform the following:

Create an Azure Machine Learning Workspace.Create a new Azure Kubernetes Cluster and attach it to the Workspace

Create an Azure Machine Learning Workspace.

We will use the code below to create our Azure Machine Learning Workspace. You can refer to this article for the detailed process.

Here’s the complete Bicep code to deploy an Azure Machine Learning Workspace:

@description(‘Specifies the name of the deployment.’)
param name string@description(‘Specifies the name of the environment.’)
param environment string@description(‘Specifies the location of the Azure Machine Learning workspace and dependent resources.’)
param location string = resourceGroup().location@description(‘Specifies whether to reduce telemetry collection and enable additional encryption.’)
param hbi_workspace bool = falsevar tenantId = subscription().tenantId
var storageAccountName_var = ‘st${name}${environment}’
var keyVaultName_var = ‘kv-${name}-${environment}’
var applicationInsightsName_var = ‘appi-${name}-${environment}’
var containerRegistryName_var = ‘cr${name}${environment}’
var workspaceName_var = ‘mlw${name}${environment}’
var storageAccount = storageAccountName.id
var keyVault = keyVaultName.id
var applicationInsights = applicationInsightsName.id
var containerRegistry = containerRegistryName.idresource storageAccountName ‘Microsoft.Storage/storageAccounts@2021-01-01’ = {
name: storageAccountName_var
location: location
sku: {
name: ‘Standard_RAGRS’
}
kind: ‘StorageV2’
properties: {
encryption: {
services: {
blob: {
enabled: true
}
file: {
enabled: true
}
}
keySource: ‘Microsoft.Storage’
}
supportsHttpsTrafficOnly: true
}
}resource keyVaultName ‘Microsoft.KeyVault/vaults@2021-04-01-preview’ = {
name: keyVaultName_var
location: location
properties: {
tenantId: tenantId
sku: {
name: ‘standard’
family: ‘A’
}
accessPolicies: []
enableSoftDelete: true
}
}resource applicationInsightsName ‘Microsoft.Insights/components@2020-02-02’ = {
name: applicationInsightsName_var
location: (((location == ‘eastus2’) || (location == ‘westcentralus’)) ? ‘southcentralus’ : location)
kind: ‘web’
properties: {
Application_Type: ‘web’
}
}resource containerRegistryName ‘Microsoft.ContainerRegistry/registries@2019-12-01-preview’ = {
sku: {
name: ‘Standard’
}
name: containerRegistryName_var
location: location
properties: {
adminUserEnabled: true
}
}resource workspaceName ‘Microsoft.MachineLearningServices/workspaces@2021-07-01’ = {
identity: {
type: ‘SystemAssigned’
}
name: workspaceName_var
location: location
properties: {
friendlyName: workspaceName_var
storageAccount: storageAccount
keyVault: keyVault
applicationInsights: applicationInsights
containerRegistry: containerRegistry
hbiWorkspace: hbi_workspace
}
dependsOn: [
storageAccountName
keyVaultName
applicationInsightsName
containerRegistryName
]
}

In the above code, we define the following resources:

Storage AccountKey VaultApplication InsightsContainer RegistryMachine Learning Service Workspace

Now we will deploy the above Bicep file to create our Azure Machine Learning workspace using the command below:

$date = Get-Date -Format “MM-dd-yyyy”
$deploymentName = “AzInsiderDeployment”+”$date”New-AzResourceGroupDeployment -Name $deploymentName -ResourceGroupName AzInsiderML -TemplateFile .main.bicep -TemplateParameterFile .azuredeploy.parameters.json -c

Note we add the flag -C to have a preview of our deployment. Once the deployment is valid, we can execute it.

Azure Machine Learning Workspace — Deployment preview

The figure below shows the deployment output:

Azure Machine Learning Workspace — Deployment output

The figure below shows the resources created:

Azure Machine Learning Workspace deployment

The next step is to create a new Azure Kubernetes Cluster and attach it to the Workspace.

Create a new Azure Kubernetes Cluster and attach it to the Workspace

We will need to pass on two parameter values: the name of the Azure Machine Learning Workspace and the name of the Azure Kubernetes Cluster.

The code below shows the definition of the parameters file which we will pass on during deployment time:

{
“$schema”: “https://schema.management.azure.com/schemas/2019-04-01/deploymentParameters.json#“,
“contentVersion”: “1.0.0.0”,
“parameters”: {
“workspaceName”: {
“value”: “YOUR-WORKSPACE-NAME”
},
“computeName”: {
“value”: “COMPUTE-NAME”
}
}
}

Now we will define the Bicep template. In this file, we will define the following parameters:

@description(‘The exposed port for the compute instance.’)
param computeName string@description(‘The exposed port for the compute instance.’)
param dnsServiceIP string = ”@description(‘Name of the resource group which holds the VNET to which you want to inject your compute in.’)
param vnetResourceGroupName string = ”@description(‘Name of the vnet which you want to inject your compute in.’)
param vnetName string = ”@description(‘Name of the subnet inside the VNET which you want to inject your compute in.’)
param subnetName string = ”@description(‘The exposed port for the compute instance.’)
param location string = resourceGroup().location@description(‘The exposed port for the compute instance.’)
param dockerBridgeCidr string = ”@description(‘The exposed port for the compute instance.’)
param serviceCidr string = ”@description(‘The Azure VM size of the agent VM nodes. This cannot be changed once the cluster is created.’)
param agentVmSize string = ‘Standard_D4_v3’@description(‘The number of agent nodes in the Container Service..’)
param agentCount int = 6@description(‘The SSL cert data in PEM format encoded as base64 string’)
param cert string = ”@description(‘The SSL key data in PEM format encoded as base64 string.’)
param key string = ”@description(‘The CName of the cert.’)
param cname string = ”@description(‘The leaf domain label of public endpoint.’)
param leafDomainLabel string = ”@description(‘Value indicating whether to overwrite existing domain label.’)
param overwriteExistingDomain bool = false@description(‘Value indicating whether to renew certificate.’)
param renew bool = false@allowed([
‘Enabled’
‘Disabled’
‘Auto’
])
@description(‘SSL status. Allowed values are Enabled and Disabled.’)
param sslStatus string = ‘Disabled’@description(‘The exposed port for the compute instance.’)
param workspaceName string

Now we will define two variables:

A variable for the networking configurationA variable for the SSL configurationvar aksNetworkingConfiguration = {
subnetId: resourceId(vnetResourceGroupName, ‘Microsoft.Network/virtualNetworks/subnets’, vnetName, subnetName)
serviceCidr: serviceCidr
dnsServiceIP: dnsServiceIP
dockerBridgeCidr: dockerBridgeCidr
}
var sslConfiguration = {
status: sslStatus
cert: cert
key: key
cname: cname
leafDomainLabel: leafDomainLabel
overwriteExistingDomain: overwriteExistingDomain
renew: renew
}

Lastly, we will define the Azure Kubernetes Cluster:

resource workspaceName_computeName ‘Microsoft.MachineLearningServices/workspaces/computes@2021-01-01’ = {
name: ‘${workspaceName}/${computeName}’
location: location
properties: {
computeType: ‘AKS’
properties: {
agentVmSize: agentVmSize
agentCount: agentCount
sslConfiguration: ((sslStatus == ‘Disabled’) ? json(‘null’) : sslConfiguration)
aksNetworkingConfiguration: (((!empty(vnetResourceGroupName)) && (!empty(vnetName)) && (!empty(subnetName)) && (!empty(serviceCidr)) && (!empty(dnsServiceIP)) && (!empty(dockerBridgeCidr))) ? aksNetworkingConfiguration : json(‘null’))
}
}
}

Not we provide the name of the Workspace and the properties to define the Azure Kubernetes Cluster.

A new resource group will be created and will contain all the resources related to the Azure Kubernetes Cluster.

Here’s the complete Bicep template to create and attach an Azure Kubernetes Cluster to Azure Machine Learning:

@description(‘The exposed port for the compute instance.’)
param computeName string@description(‘The exposed port for the compute instance.’)
param dnsServiceIP string = ”@description(‘Name of the resource group which holds the VNET to which you want to inject your compute in.’)
param vnetResourceGroupName string = ”@description(‘Name of the vnet which you want to inject your compute in.’)
param vnetName string = ”@description(‘Name of the subnet inside the VNET which you want to inject your compute in.’)
param subnetName string = ”@description(‘The exposed port for the compute instance.’)
param location string = resourceGroup().location@description(‘The exposed port for the compute instance.’)
param dockerBridgeCidr string = ”@description(‘The exposed port for the compute instance.’)
param serviceCidr string = ”@description(‘The Azure VM size of the agent VM nodes. This cannot be changed once the cluster is created.’)
param agentVmSize string = ‘Standard_D4_v3’@description(‘The number of agent nodes in the Container Service..’)
param agentCount int = 6@description(‘The SSL cert data in PEM format encoded as base64 string’)
param cert string = ”@description(‘The SSL key data in PEM format encoded as base64 string.’)
param key string = ”@description(‘The CName of the cert.’)
param cname string = ”@description(‘The leaf domain label of public endpoint.’)
param leafDomainLabel string = ”@description(‘Value indicating whether to overwrite existing domain label.’)
param overwriteExistingDomain bool = false@description(‘Value indicating whether to renew certificate.’)
param renew bool = false@allowed([
‘Enabled’
‘Disabled’
‘Auto’
])
@description(‘SSL status. Allowed values are Enabled and Disabled.’)
param sslStatus string = ‘Disabled’@description(‘The exposed port for the compute instance.’)
param workspaceName stringvar aksNetworkingConfiguration = {
subnetId: resourceId(vnetResourceGroupName, ‘Microsoft.Network/virtualNetworks/subnets’, vnetName, subnetName)
serviceCidr: serviceCidr
dnsServiceIP: dnsServiceIP
dockerBridgeCidr: dockerBridgeCidr
}
var sslConfiguration = {
status: sslStatus
cert: cert
key: key
cname: cname
leafDomainLabel: leafDomainLabel
overwriteExistingDomain: overwriteExistingDomain
renew: renew
}resource workspaceName_computeName ‘Microsoft.MachineLearningServices/workspaces/computes@2021-01-01’ = {
name: ‘${workspaceName}/${computeName}’
location: location
properties: {
computeType: ‘AKS’
properties: {
agentVmSize: agentVmSize
agentCount: agentCount
sslConfiguration: ((sslStatus == ‘Disabled’) ? json(‘null’) : sslConfiguration)
aksNetworkingConfiguration: (((!empty(vnetResourceGroupName)) && (!empty(vnetName)) && (!empty(subnetName)) && (!empty(serviceCidr)) && (!empty(dnsServiceIP)) && (!empty(dockerBridgeCidr))) ? aksNetworkingConfiguration : json(‘null’))
}
}
}

Now we will deploy this resource using the command below:

New-AzResourceGroupDeployment -Name $deploymentName -ResourceGroupName AzInsiderML -TemplateFile .main.bicep -TemplateParameterFile .azuredeploy.parameters.json -c

Note we add the flag -C to have a preview of our deployment. Once the deployment is valid, we can execute it.

Azure Machine Learning- Azure Kubernetes Cluster— Deployment preview

The creation of the Azure Kubernetes Cluster might take a few minutes. Then you should see the resources in a new Resource Group and a notification in the Azure Machine Learning Studio Portal.

The figure below shows the output from the deployment:

Azure Kubernetes Cluster — Deployment output

The figure below shows the notification in Azure Machine Learning Studio that a new Azure Kubernetes Cluster has been created.

Azure Machine Learning Studio- Compute provisioning succeeded.

If we check on this notification, you will see the details about the allocation and compute type as Kubernetes service.

Azure Kubernetes Cluster in Azure Machine Learning Studio

You can also verify that the resources related to the Azure Kubernetes Cluster are located in a different resource group.

Azure Resource Group — Azure Kubernetes Cluster resources

Hope this helps you automate your Azure Machine Learning compute resources.

Join the AzInsider email list here.

-Dave R.

💪Using Bicep to create and attach an Azure Kubernetes Cluster to Azure Machine Learning was originally published in MLearning.ai on Medium, where people are continuing the conversation by highlighting and responding to this story.

Leave a Reply

Your email address will not be published.

Follow Us