Skip to content

Commit f4b3c2b

Browse files
committed
Deploy SqueezeNet 1.0 INT8 model with ONNX Runtime on Azure Cobalt 100
Signed-off-by: odidev <[email protected]>
1 parent fa5f1e8 commit f4b3c2b

File tree

9 files changed

+414
-0
lines changed

9 files changed

+414
-0
lines changed
Lines changed: 66 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,66 @@
1+
---
2+
title: Deploy SqueezeNet 1.0 INT8 model with ONNX Runtime on Azure Cobalt 100
3+
4+
minutes_to_complete: 60
5+
6+
who_is_this_for: This Learning Path introduces ONNX deployment on Microsoft Azure Cobalt 100 (Arm-based) virtual machines. It is designed for developers migrating ONNX-based applications from x86_64 to Arm with minimal or no changes.
7+
8+
learning_objectives:
9+
- Provision an Azure Arm64 virtual machine using Azure console, with Ubuntu as the base image.
10+
- Learn how to create an Azure Linux 3.0 Docker container.
11+
- Deploy an ONNX-based application inside an Azure Linux 3.0 Arm based Docker container and an Azure Linux 3.0 custom-image based Azure virtual machine.
12+
- Perform ONNX benchmarking inside the container as well as the custom virtual machine.
13+
14+
prerequisites:
15+
- A [Microsoft Azure](https://azure.microsoft.com/) account with access to Cobalt 100 based instances (Dpsv6).
16+
- A machine with [Docker](/install-guides/docker/) installed.
17+
- Basic understanding of Python and machine learning concepts.
18+
- Familiarity with ONNX Runtime and Azure cloud services.
19+
20+
author: Jason Andrews
21+
22+
### Tags
23+
skilllevels: Advanced
24+
subjects: ML
25+
cloud_service_providers: Microsoft Azure
26+
27+
armips:
28+
- Neoverse
29+
30+
tools_software_languages:
31+
- Python
32+
- Docker
33+
- ONNX Runtime
34+
35+
operatingsystems:
36+
- Linux
37+
38+
further_reading:
39+
- resource:
40+
title: Azure Virtual Machines documentation
41+
link: https://learn.microsoft.com/en-us/azure/virtual-machines/
42+
type: documentation
43+
- resource:
44+
title: Azure Container Instances documentation
45+
link: https://learn.microsoft.com/en-us/azure/container-instances/
46+
type: documentation
47+
- resource:
48+
title: ONNX Runtime Docs
49+
link: https://onnxruntime.ai/docs/
50+
type: documentation
51+
- resource:
52+
title: ONNX (Open Neural Network Exchange) documentation
53+
link: https://onnx.ai/
54+
type: documentation
55+
- resource:
56+
title: onnxruntime_perf_test tool - ONNX Runtime performance benchmarking
57+
link: https://onnxruntime.ai/docs/performance/tune-performance/profiling-tools.html#in-code-performance-profiling
58+
type: documentation
59+
60+
61+
### FIXED, DO NOT MODIFY
62+
# ================================================================================
63+
weight: 1 # _index.md always has weight of 1 to order correctly
64+
layout: "learningpathall" # All files under learning paths have this same wrapper
65+
learning_path_main_page: "yes" # This should be surfaced when looking for related content. Only set for _index.md of learning path content.
66+
---
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
---
2+
# ================================================================================
3+
# FIXED, DO NOT MODIFY THIS FILE
4+
# ================================================================================
5+
weight: 21 # Set to always be larger than the content in this path to be at the end of the navigation.
6+
title: "Next Steps" # Always the same, html page title.
7+
layout: "learningpathall" # All files under learning paths have this same wrapper for Hugo processing.
8+
---
Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
---
2+
title: "About Cobalt 100 Arm-based processor and ONNX"
3+
4+
weight: 2
5+
6+
layout: "learningpathall"
7+
---
8+
9+
## What is Cobalt 100 Arm-based processor?
10+
11+
Azure’s Cobalt 100 is built on Microsoft's first-generation, in-house Arm-based processor: the Cobalt 100. Designed entirely by Microsoft and based on Arm’s Neoverse N2 architecture, this 64-bit CPU delivers improved performance and energy efficiency across a broad spectrum of cloud-native, scale-out Linux workloads. These include web and application servers, data analytics, open-source databases, caching systems, and more. Running at 3.4 GHz, the Cobalt 100 processor allocates a dedicated physical core for each vCPU, ensuring consistent and predictable performance.
12+
13+
To learn more about Cobalt 100, refer to the blog [Announcing the preview of new Azure virtual machine based on the Azure Cobalt 100 processor](https://techcommunity.microsoft.com/blog/azurecompute/announcing-the-preview-of-new-azure-vms-based-on-the-azure-cobalt-100-processor/4146353).
14+
15+
## Introduction to Azure Linux 3.0
16+
17+
Azure Linux 3.0 is Microsoft's in-house, lightweight Linux distribution optimized for running cloud-native workloads on Azure. Designed with performance, security, and reliability in mind, it is fully supported by Microsoft and tailored for containers, microservices, and Kubernetes. With native support for Arm64 (AArch64) architecture, Azure Linux 3.0 enables efficient execution of workloads on energy-efficient Arm-based infrastructure, making it a powerful choice for scalable and cost-effective cloud deployments.
18+
19+
## Introduction to ONNX
20+
21+
ONNX (Open Neural Network Exchange) is an open standard for representing machine learning models, enabling interoperability between different AI frameworks. It allows you to train a model in one framework (like PyTorch or TensorFlow) and run it using ONNX Runtime for optimized inference.
22+
23+
In this Learning Path, we deploy ONNX on Azure Linux 3.0 (Arm64) and benchmark its performance using the[ onnxruntime_perf_test tool](https://onnxruntime.ai/docs/performance/tune-performance/profiling-tools.html#in-code-performance-profiling).
Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
---
2+
title: Baseline Testing
3+
weight: 6
4+
5+
### FIXED, DO NOT MODIFY
6+
layout: learningpathall
7+
---
8+
9+
10+
## Baseline testing using ONNX Runtime:
11+
12+
This test measures the inference latency of the ONNX Runtime by timing how long it takes to process a single input using the `squeezenet-int8.onnx model`. It helps evaluate how efficiently the model runs on the target hardware.
13+
14+
Create a **baseline.py** file with the below code for baseline test of ONNX:
15+
16+
```python
17+
import onnxruntime as ort
18+
import numpy as np
19+
import time
20+
21+
session = ort.InferenceSession("squeezenet-int8.onnx")
22+
input_name = session.get_inputs()[0].name
23+
data = np.random.rand(1, 3, 224, 224).astype(np.float32)
24+
25+
start = time.time()
26+
outputs = session.run(None, {input_name: data})
27+
end = time.time()
28+
29+
print("Inference time:", end - start)
30+
```
31+
32+
Run the baseline test:
33+
34+
```console
35+
python3 baseline.py
36+
```
37+
You should see an output similar to:
38+
```output
39+
Inference time: 0.02060103416442871
40+
```
41+
{{% notice Note %}}Inference time is the amount of time it takes for a trained machine learning model to make a prediction (i.e., produce output) after receiving input data.
42+
input tensor of shape (1, 3, 224, 224):
43+
- 1: batch size
44+
- 3: color channels (RGB)
45+
- 224 x 224: image resolution (common for models like SqueezeNet)
46+
{{% /notice %}}
47+
48+
#### Output summary:
49+
50+
- Single inference latency: ~2.60 milliseconds (0.00260 sec)
51+
- This shows the initial (cold-start) inference performance of ONNX Runtime on CPU using an optimized int8 quantized model.
52+
- This demonstrates that the setup is fully working, and ONNX Runtime efficiently executes quantized models on Arm64.
Lines changed: 122 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,122 @@
1+
---
2+
title: Benchmarking via onnxruntime_perf_test
3+
weight: 7
4+
5+
### FIXED, DO NOT MODIFY
6+
layout: learningpathall
7+
---
8+
9+
Now that you’ve set up and run the ONNX model (e.g., SqueezeNet), you can use it to benchmark inference performance using Python-based timing or tools like **onnxruntime_perf_test**. This helps evaluate the ONNX Runtime efficiency on Azure Arm64-based Cobalt 100 instances.
10+
11+
You can also compare the inference time between Cobalt 100 (Arm64) and similar D-series x86_64-based virtual machine on Azure.
12+
As noted before, the steps to benchmark remain the same, whether it's a Docker container or a custom virtual machine.
13+
14+
## Run the performance tests using onnxruntime_perf_test
15+
The **onnxruntime_perf_test** is a performance benchmarking tool included in the ONNX Runtime source code. It is used to measure the inference performance of ONNX models under various runtime conditions (like CPU, GPU, or other execution providers).
16+
17+
### Install Required Build Tools
18+
19+
```console
20+
tdnf install -y cmake make gcc-c++ git
21+
```
22+
#### Install Protobuf
23+
24+
```console
25+
tdnf install -y protobuf protobuf-devel
26+
```
27+
Then verify:
28+
```console
29+
protoc --version
30+
```
31+
You should see an output similar to:
32+
33+
```output
34+
libprotoc 3.x.x
35+
```
36+
If installation via the package manager fails, or the version is too old for ONNX Runtime; then proceed with installing Protobuf using the AArch64 pre-built zip artifact, as discussed below.
37+
38+
#### Install Protobuf with Prebuilt AArch64 ZIP Artifact
39+
40+
```console
41+
wget https://github.com/protocolbuffers/protobuf/releases/download/v31.1/protoc-31.1-linux-aarch_64.zip -O protoc-31.1.zip
42+
mkdir -p $HOME/tools/protoc-31.1
43+
unzip protoc-31.1.zip -d $HOME/tools/protoc-31.1
44+
echo 'export PATH="$HOME/tools/protoc-31.1/bin:$PATH"' >> ~/.bashrc
45+
source ~/.bashrc
46+
```
47+
48+
Then verify:
49+
```console
50+
protoc --version
51+
```
52+
You should see an output similar to:
53+
```output
54+
libprotoc x.x.x
55+
```
56+
57+
### Clone and Build ONNX Runtime from Source:
58+
59+
The benchmarking tool, **onnxruntime_perf_test**, isn’t available as a pre-built binary artifact for any platform. So, you have to build it from the source, which is expected to take around 40-50 minutes.
60+
61+
Install the required tools and clone onnxruntime:
62+
```console
63+
tdnf install -y protobuf-compiler libprotobuf-dev libprotoc-dev
64+
git clone --recursive https://github.com/microsoft/onnxruntime
65+
cd onnxruntime
66+
```
67+
Now, build the benchmark as below:
68+
69+
```console
70+
./build.sh --config Release --build_dir build/Linux --build_shared_lib --parallel --build --update --skip_tests
71+
```
72+
This will build the benchmark tool inside ./build/Linux/Release/onnxruntime_perf_test.
73+
74+
### Run the benchmark
75+
Now that the benchmarking tool has been built, you can benchmark the **squeezenet-int8.onnx** model, as below:
76+
77+
```console
78+
./build/Linux/Release/onnxruntime_perf_test -e cpu -r 100 -m times -s -Z -I <path-to-squeezenet-int8.onnx>
79+
```
80+
81+
- **e cpu**: Use the CPU execution provider (not GPU or any other backend).
82+
- **r 100**: Run 100 inferences.
83+
- **m times**: Use "repeat N times" mode.
84+
- **s**: Show detailed statistics.
85+
- **Z**: Disable intra-op thread spinning (reduces CPU usage when idle between runs).
86+
- **I**: Input the ONNX model path without using input/output test data.
87+
88+
### Benchmark summary on x86_64:
89+
90+
The following benchmark results are collected on two different x86_64 environments: a **Docker container running Azure Linux 3.0 hosted on a D4s_v6 Ubuntu-based Azure virtual machine**, and a **D4s_v4 Azure virtual machine created from the Azure Linux 3.0 image published by Ntegral Inc**.
91+
92+
| **Metric** | **Value on Docker Container** | **Value on Virtual Machine** |
93+
|--------------------------|----------------------------------------|-----------------------------------------|
94+
| **Average Inference Time** | 1.4713 ms | 1.8961 ms |
95+
| **Throughput** | 679.48 inferences/sec | 527.25 inferences/sec |
96+
| **CPU Utilization** | 100% | 95% |
97+
| **Peak Memory Usage** | 39.8 MB | 36.1 MB |
98+
| **P50 Latency** | 1.4622 ms | 1.8709 ms |
99+
| **Max Latency** | 2.3384 ms | 2.7826 ms |
100+
| **Latency Consistency** | Consistent | Consistent |
101+
102+
103+
### Benchmark summary on Arm64:
104+
105+
The following benchmark results are collected on two different Arm64 environments: a **Docker container running Azure Linux 3.0 hosted on a D4ps_v6 Ubuntu-based Azure virtual machine**, and a **D4ps_v6 Azure virtual machine created from the Azure Linux 3.0 custom image using the AArch64 ISO**.
106+
107+
| **Metric** | **Value on Docker Container** | **Value on Virtual Machine** |
108+
|---------------------------|---------------------------------------|---------------------------------------------|
109+
| **Average Inference Time**| 1.9183 ms | 1.9169 ms |
110+
| **Throughput** | 521.09 inferences/sec | 521.41 inferences/sec |
111+
| **CPU Utilization** | 98% | 100% |
112+
| **Peak Memory Usage** | 35.36 MB | 33.57 MB |
113+
| **P50 Latency** | 1.9165 ms | 1.9168 ms |
114+
| **Max Latency** | 2.0142 ms | 1.9979 ms |
115+
| **Latency Consistency** | Consistent | Consistent |
116+
117+
118+
### Highlights from Azure Linux Arm64 Benchmarking (ONNX Runtime with SqueezeNet)
119+
- **Low-Latency Inference:** Achieved consistent average inference times of ~1.92 ms across both Docker and virtual machine environments on Arm64.
120+
- **Strong and Stable Throughput:** Sustained throughput of over 521 inferences/sec using the squeezenet-int8.onnx model on D4ps_v6 instances.
121+
- **Lightweight Resource Footprint:** Peak memory usage stayed below 36 MB, with CPU utilization reaching ~98–100%, ideal for efficient edge or cloud inference.
122+
- **Consistent Performance:** P50 and Max latency remained tightly bound across both setups, showcasing reliable performance on Azure Cobalt 100 Arm-based infrastructure.
Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
---
2+
title: Setup Azure Linux 3.0 Environment
3+
weight: 4
4+
5+
### FIXED, DO NOT MODIFY
6+
layout: learningpathall
7+
---
8+
9+
10+
You have an option to choose between working with the Azure Linux 3.0 Docker image or inside the virtual machine created with the OS image.
11+
12+
### Working inside Azure Linux 3.0 Docker container
13+
The Azure Linux Container Host is an operating system image that's optimized for running container workloads on Azure Kubernetes Service (AKS). Microsoft maintains the Azure Linux Container Host and based it on CBL-Mariner, an open-source Linux distribution created by Microsoft. To know more about Azure Linux 3.0, kindly refer [What is Azure Linux Container Host for AKS](https://learn.microsoft.com/en-us/azure/azure-linux/intro-azure-linux).
14+
15+
Azure Linux 3.0 offers support for AArch64. However, the standalone virtual machine image for Azure Linux 3.0 or CBL Mariner 3.0 is not available for Arm. Hence, to use the default software stack provided by the Microsoft team, you can create a docker container with Azure Linux 3.0 as a base image, and run the ONNX application inside the container.
16+
17+
#### Create Azure Linux 3.0 Docker Container
18+
The [Microsoft Artifact Registry](https://mcr.microsoft.com/en-us/artifact/mar/azurelinux/base/core/about) offers updated docker image for the Azure Linux 3.0.
19+
20+
To create a docker container, install docker, and then follow the below instructions:
21+
22+
```console
23+
sudo docker run -it --rm mcr.microsoft.com/azurelinux/base/core:3.0
24+
```
25+
The default container startup command is bash. tdnf and dnf are the default package managers.
26+
27+
### Working with Azure Linux 3.0 OS image
28+
As of now, the Azure Marketplace offers official virtual machine images of Azure Linux 3.0 only for x64-based architectures, published by Ntegral Inc. However, native Arm64 (AArch64) images are not yet officially available. Hence, for this Learning Path, you can create your own custom Azure Linux 3.0 virtual machine image for AArch64, using the [AArch64 ISO for Azure Linux 3.0 ](https://github.com/microsoft/azurelinux#iso).
29+
30+
Whether you're using an Azure Linux 3.0 Docker container, or a virtual machine created from a custom Azure Linux 3.0 image, the deployment and benchmarking steps remain the same.
31+
32+
Once the setup has been established, you can proceed with the ONNX Installation ahead.
Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
---
2+
title: Create an Arm based cloud virtual machine using Microsoft Cobalt 100 CPU
3+
weight: 3
4+
5+
### FIXED, DO NOT MODIFY
6+
layout: learningpathall
7+
---
8+
9+
## Introduction
10+
11+
There are several ways to create an Arm-based Cobalt 100 virtual machine : the Microsoft Azure console, the Azure CLI tool, or using your choice of IaC (Infrastructure as Code). This guide will use the Azure console to create a virtual machine with Arm-based Cobalt 100 Processor.
12+
13+
This learning path focuses on the general-purpose virtual machine of the D series. Please read the guide on [Dpsv6 size series](https://learn.microsoft.com/en-us/azure/virtual-machines/sizes/general-purpose/dpsv6-series) offered by Microsoft Azure.
14+
15+
If you have never used the Microsoft Cloud Platform before, please review the microsoft [guide to Create a Linux virtual machine in the Azure portal](https://learn.microsoft.com/en-us/azure/virtual-machines/linux/quick-create-portal?tabs=ubuntu).
16+
17+
#### Create an Arm-based Azure Virtual Machine
18+
19+
Creating a virtual machine based on Azure Cobalt 100 is no different from creating any other virtual machine in Azure. To create an Azure virtual machine, launch the Azure portal and navigate to Virtual Machines.
20+
21+
Select “Create”, and fill in the details such as Name, and Region. Choose the image for your virtual machine (for example – Ubuntu 24.04) and select “Arm64” as the virtual machine architecture.
22+
23+
In the “Size” field, click on “See all sizes” and select the D-Series v6 family of virtual machine. Select “D4ps_v6” from the list and create the virtual machine.
24+
25+
![Instance Screenshot](./instance.png)
26+
27+
The virtual machine should be ready and running; you can SSH into the virtual machine using the PEM key, along with the Public IP details.
28+
29+
{{% notice Note %}}
30+
31+
To learn more about Arm-based virtual machine in Azure, refer to “Getting Started with Microsoft Azure” in [Get started with Arm-based cloud instances](https://learn.arm.com/learning-paths/servers-and-cloud-computing/csp/azure) .
32+
33+
{{% /notice %}}

0 commit comments

Comments
 (0)