Skip to content

Commit 43534e8

Browse files
committed
Run Spark applications on the Microsoft Azure Cobalt 100 processors
Signed-off-by: odidev <[email protected]>
1 parent 2ce7d6e commit 43534e8

File tree

9 files changed

+513
-0
lines changed

9 files changed

+513
-0
lines changed
Lines changed: 66 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,66 @@
1+
---
2+
title: Run Spark applications on the Microsoft Azure Cobalt 100 processors
3+
4+
minutes_to_complete: 60
5+
6+
who_is_this_for: This Learning Path introduces Spark deployment on Microsoft Azure Cobalt 100 (Arm-based) virtual machines. It is designed for developers migrating Spark applications from x86_64 to Arm with minimal or no changes.
7+
8+
learning_objectives:
9+
- Provision an Azure Arm64 virtual machine using Azure console, with Ubuntu as the base image.
10+
- Learn how to create an Azure Linux 3.0 Docker container.
11+
- Deploy a Spark application inside an Azure Linux 3.0 Arm64-based Docker container and an Azure Linux 3.0 custom-image based Azure virtual machine.
12+
- Perform Spark benchmarking inside the container as well as the custom virtual machine.
13+
14+
prerequisites:
15+
- A [Microsoft Azure](https://azure.microsoft.com/) account with access to Cobalt 100 based instances (Dpsv6).
16+
- A machine with [Docker](/install-guides/docker/) installed.
17+
- Familiarity with distributed computing concepts and the [Apache Spark architecture](https://spark.apache.org/docs/latest/).
18+
19+
author: Jason Andrews
20+
21+
### Tags
22+
skilllevels: Advanced
23+
subjects: Performance and Architecture
24+
cloud_service_providers: Microsoft Azure
25+
26+
armips:
27+
- Neoverse
28+
29+
tools_software_languages:
30+
- Apache Spark
31+
- Python
32+
- Docker
33+
34+
35+
operatingsystems:
36+
- Linux
37+
38+
further_reading:
39+
- resource:
40+
title: Azure Virtual Machines documentation
41+
link: https://learn.microsoft.com/en-us/azure/virtual-machines/
42+
type: documentation
43+
- resource:
44+
title: Azure Container Instances documentation
45+
link: https://learn.microsoft.com/en-us/azure/container-instances/
46+
type: documentation
47+
- resource:
48+
title: Docker overview
49+
link: https://docs.docker.com/get-started/overview/
50+
type: documentation
51+
- resource:
52+
title: Spark official website and documentation
53+
link: https://spark.apache.org/
54+
type: documentation
55+
- resource:
56+
title: Hadoop official website
57+
link: https://hadoop.apache.org/
58+
type: website
59+
60+
61+
### FIXED, DO NOT MODIFY
62+
# ================================================================================
63+
weight: 1 # _index.md always has weight of 1 to order correctly
64+
layout: "learningpathall" # All files under learning paths have this same wrapper
65+
learning_path_main_page: "yes" # This should be surfaced when looking for related content. Only set for _index.md of learning path content.
66+
---
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
---
2+
# ================================================================================
3+
# FIXED, DO NOT MODIFY THIS FILE
4+
# ================================================================================
5+
weight: 21 # Set to always be larger than the content in this path to be at the end of the navigation.
6+
title: "Next Steps" # Always the same, html page title.
7+
layout: "learningpathall" # All files under learning paths have this same wrapper for Hugo processing.
8+
---
Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
---
2+
title: "About Cobalt 100 Arm-based processor and Apache Spark"
3+
4+
weight: 2
5+
6+
layout: "learningpathall"
7+
---
8+
9+
## What is Cobalt 100 Arm-based processor?
10+
11+
Azure’s Cobalt 100 is built on Microsoft's first-generation, in-house Arm-based processor: the Cobalt 100. Designed entirely by Microsoft and based on Arm’s Neoverse N2 architecture, this 64-bit CPU delivers improved performance and energy efficiency across a broad spectrum of cloud-native, scale-out Linux workloads. These include web and application servers, data analytics, open-source databases, caching systems, and more. Running at 3.4 GHz, the Cobalt 100 processor allocates a dedicated physical core for each vCPU, ensuring consistent and predictable performance.
12+
13+
To learn more about Cobalt 100, refer to the blog [Announcing the preview of new Azure virtual machine based on the Azure Cobalt 100 processor](https://techcommunity.microsoft.com/blog/azurecompute/announcing-the-preview-of-new-azure-vms-based-on-the-azure-cobalt-100-processor/4146353).
14+
15+
## Introduction to Azure Linux 3.0
16+
17+
Azure Linux 3.0 is Microsoft's in-house, lightweight Linux distribution optimized for running cloud-native workloads on Azure. Designed with performance, security, and reliability in mind, it is fully supported by Microsoft and tailored for containers, microservices, and Kubernetes. With native support for Arm64 (AArch64) architecture, Azure Linux 3.0 enables efficient execution of workloads on energy-efficient Arm-based infrastructure, making it a powerful choice for scalable and cost-effective cloud deployments.
18+
19+
## Apache Spark
20+
21+
Apache Spark is an open-source, distributed computing system designed for fast and general-purpose big data processing.
22+
23+
It provides high-level APIs in Java, Scala, Python, and R, and supports in-memory computation for increased performance.
24+
25+
Spark is widely used for large-scale data analytics, machine learning, and real-time data processing. Learn more from the [Apache Spark official website](https://spark.apache.org/) and its [detailed official documentation](https://spark.apache.org/docs/latest/).
Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
---
2+
title: Baseline Testing
3+
weight: 6
4+
5+
### FIXED, DO NOT MODIFY
6+
layout: learningpathall
7+
---
8+
9+
10+
## Baseline Testing
11+
Since Apache Spark is installed successfully on your Arm virtual machine, let's now perform simple baseline testing to validate that Spark runs correctly and gives expected output.
12+
13+
Run a simple PySpark script, create a file named `test_spark.py`, and add the below content to it:
14+
15+
```python
16+
from pyspark.sql import SparkSession
17+
spark = SparkSession.builder.appName("Test").getOrCreate()
18+
df = spark.createDataFrame([(1, "ARM64"), (2, "Azure")], ["id", "name"])
19+
df.show()
20+
spark.stop()
21+
```
22+
Execute with:
23+
```console
24+
spark-submit test_spark.py
25+
```
26+
You should see an output similar to:
27+
28+
```output
29+
25/07/22 05:16:00 INFO CodeGenerator: Code generated in 10.545923 ms
30+
25/07/22 05:16:00 INFO SparkContext: SparkContext is stopping with exitCode 0.
31+
+---+-----+
32+
| id| name|
33+
+---+-----+
34+
| 1|ARM64|
35+
| 2|Azure|
36+
+---+-----+
37+
```
38+
Output summary:
39+
40+
- The output shows Spark successfully generated code **(10.5ms)** and executed a simple DataFrame operation.
41+
- Displaying the test data **[1, "ARM64"]** and **[2, "Azure"]** before cleanly shutting down **(exitCode 0)**. This confirms a working Spark deployment on Arm64.

0 commit comments

Comments
 (0)