ArmDeveloperEcosystem · pareenaverma · Aug 19, 2025 · Jul 22, 2025
diff --git a/content/learning-paths/servers-and-cloud-computing/spark-on-azure/_index.md b/content/learning-paths/servers-and-cloud-computing/spark-on-azure/_index.md
@@ -0,0 +1,66 @@
+---
+title: Run Spark applications on the Microsoft Azure Cobalt 100 processors
+
+minutes_to_complete: 60
+
+who_is_this_for: This Learning Path introduces Spark deployment on Microsoft Azure Cobalt 100 (Arm-based) virtual machines. It is designed for developers migrating Spark applications from x86_64 to Arm with minimal or no changes.
+
+learning_objectives: 
+    - Provision an Azure Arm64 virtual machine using Azure console, with Ubuntu as the base image.
+    - Learn how to create an Azure Linux 3.0 Docker container.
+    - Deploy a Spark application inside an Azure Linux 3.0 Arm64-based Docker container and an Azure Linux 3.0 custom-image based Azure virtual machine.
+    - Perform Spark benchmarking inside the container as well as the custom virtual machine.
+
+prerequisites:
+    - A [Microsoft Azure](https://azure.microsoft.com/) account with access to Cobalt 100 based instances (Dpsv6).
+    - A machine with [Docker](/install-guides/docker/) installed.
+    - Familiarity with distributed computing concepts and the [Apache Spark architecture](https://spark.apache.org/docs/latest/).
+
+author: Jason Andrews
+
+### Tags
+skilllevels: Advanced
+subjects: Performance and Architecture
+cloud_service_providers: Microsoft Azure
+
+armips:
+    - Neoverse
+
+tools_software_languages:
+    - Apache Spark
+    - Python
+    - Docker
+
+
+operatingsystems:
+    - Linux
+
+further_reading:
+  - resource:
+      title: Azure Virtual Machines documentation
+      link: https://learn.microsoft.com/en-us/azure/virtual-machines/
+      type: documentation
+  - resource:
+      title: Azure Container Instances documentation
+      link: https://learn.microsoft.com/en-us/azure/container-instances/
+      type: documentation
+  - resource:
+      title: Docker overview
+      link: https://docs.docker.com/get-started/overview/
+      type: documentation
+  - resource:
+      title: Spark official website and documentation
+      link: https://spark.apache.org/
+      type: documentation
+  - resource:
+      title: Hadoop official website
+      link: https://hadoop.apache.org/
+      type: website
+
+
+### FIXED, DO NOT MODIFY
+# ================================================================================
+weight: 1                       # _index.md always has weight of 1 to order correctly
+layout: "learningpathall"       # All files under learning paths have this same wrapper
+learning_path_main_page: "yes"  # This should be surfaced when looking for related content. Only set for _index.md of learning path content.
+---
diff --git a/content/learning-paths/servers-and-cloud-computing/spark-on-azure/_next-steps.md b/content/learning-paths/servers-and-cloud-computing/spark-on-azure/_next-steps.md
@@ -0,0 +1,8 @@
+---
+# ================================================================================
+#       FIXED, DO NOT MODIFY THIS FILE
+# ================================================================================
+weight: 21                  # Set to always be larger than the content in this path to be at the end of the navigation.
+title: "Next Steps"         # Always the same, html page title.
+layout: "learningpathall"   # All files under learning paths have this same wrapper for Hugo processing.
+---
diff --git a/content/learning-paths/servers-and-cloud-computing/spark-on-azure/background.md b/content/learning-paths/servers-and-cloud-computing/spark-on-azure/background.md
@@ -0,0 +1,25 @@
+---
+title: "About Cobalt 100 Arm-based processor and Apache Spark"
+
+weight: 2
+
+layout: "learningpathall"
+---
+
+## What is Cobalt 100 Arm-based processor?
+
+Azure’s Cobalt 100 is built on Microsoft's first-generation, in-house Arm-based processor: the Cobalt 100. Designed entirely by Microsoft and based on Arm’s Neoverse N2 architecture, this 64-bit CPU delivers improved performance and energy efficiency across a broad spectrum of cloud-native, scale-out Linux workloads. These include web and application servers, data analytics, open-source databases, caching systems, and more. Running at 3.4 GHz, the Cobalt 100 processor allocates a dedicated physical core for each vCPU, ensuring consistent and predictable performance.
+
+To learn more about Cobalt 100, refer to the blog [Announcing the preview of new Azure virtual machine based on the Azure Cobalt 100 processor](https://techcommunity.microsoft.com/blog/azurecompute/announcing-the-preview-of-new-azure-vms-based-on-the-azure-cobalt-100-processor/4146353).
+
+## Introduction to Azure Linux 3.0
+
+Azure Linux 3.0 is Microsoft's in-house, lightweight Linux distribution optimized for running cloud-native workloads on Azure. Designed with performance, security, and reliability in mind, it is fully supported by Microsoft and tailored for containers, microservices, and Kubernetes. With native support for Arm64 (AArch64) architecture, Azure Linux 3.0 enables efficient execution of workloads on energy-efficient Arm-based infrastructure, making it a powerful choice for scalable and cost-effective cloud deployments.
+
+## Apache Spark
+
+Apache Spark is an open-source, distributed computing system designed for fast and general-purpose big data processing.
+
+It provides high-level APIs in Java, Scala, Python, and R, and supports in-memory computation for increased performance.
+
+Spark is widely used for large-scale data analytics, machine learning, and real-time data processing. Learn more from the [Apache Spark official website](https://spark.apache.org/) and its [detailed official documentation](https://spark.apache.org/docs/latest/).
diff --git a/content/learning-paths/servers-and-cloud-computing/spark-on-azure/baseline.md b/content/learning-paths/servers-and-cloud-computing/spark-on-azure/baseline.md
@@ -0,0 +1,41 @@
+---
+title: Baseline Testing
+weight: 6
+
+### FIXED, DO NOT MODIFY
+layout: learningpathall
+---
+
+
+## Baseline Testing
+Since Apache Spark is installed successfully on your Arm virtual machine, let's now perform simple baseline testing to validate that Spark runs correctly and gives expected output.
+
+Run a simple PySpark script, create a file named `test_spark.py`, and add the below content to it:
+
+```python
+from pyspark.sql import SparkSession
+spark = SparkSession.builder.appName("Test").getOrCreate()
+df = spark.createDataFrame([(1, "ARM64"), (2, "Azure")], ["id", "name"])
+df.show()
+spark.stop()
+```
+Execute with:
+```console
+spark-submit test_spark.py
+```
+You should see an output similar to:
+
+```output
+25/07/22 05:16:00 INFO CodeGenerator: Code generated in 10.545923 ms
+25/07/22 05:16:00 INFO SparkContext: SparkContext is stopping with exitCode 0.
++---+-----+
+| id| name|
++---+-----+
+|  1|ARM64|
+|  2|Azure|
++---+-----+
+```
+Output summary:
+
+- The output shows Spark successfully generated code **(10.5ms)** and executed a simple DataFrame operation.
+- Displaying the test data **[1, "ARM64"]** and **[2, "Azure"]** before cleanly shutting down **(exitCode 0)**. This confirms a working Spark deployment on Arm64.