ArmDeveloperEcosystem · pareenaverma · Aug 14, 2025 · Jul 23, 2025 · Aug 14, 2025
diff --git a/content/learning-paths/servers-and-cloud-computing/spark-on-gcp/_index.md b/content/learning-paths/servers-and-cloud-computing/spark-on-gcp/_index.md
@@ -0,0 +1,63 @@
+---
+title: Deploy Apache Spark on Google Axion C4A virtual machine
+
+draft: true
+cascade:
+    draft: true
+
+minutes_to_complete: 60
+
+who_is_this_for: This is an introductory topic for the software developers who are willing to migrate their Apache Spark workloads from the x86_64 platforms to Arm-based platforms, or on Google Axion-based C4A virtual machines specifically.  
+
+learning_objectives:
+       - Provision an Arm virtual machine on the Google Cloud Platform using the C4A Google Axion instance family, and RHEL 9 as the base image.
+       - Understand how to install and configure Apache Spark on Arm-based GCP C4A instances.
+       - Validate the functionality of spark through baseline testing.
+       - Perform benchmarking to evaluate Apache Spark’s performance on Arm.
+
+prerequisites:
+     - A [Google Cloud Platform (GCP)](https://cloud.google.com/free?utm_source=google&hl=en) account with billing enabled.
+     - Basic understanding of Linux command line.
+     - Familiarity with distributed computing concepts and the [Apache Spark architecture](https://spark.apache.org/docs/latest/). 
+
+author: Jason Andrews
+
+##### Tags
+skilllevels: Advanced
+subjects: Performance and Architecture
+cloud_service_providers: Google Cloud
+
+armips:
+    - Neoverse
+
+tools_software_languages:
+  - Apache Spark
+  - Python
+
+operatingsystems:
+    - Linux
+
+# ================================================================================
+#       FIXED, DO NOT MODIFY
+# ================================================================================
+further_reading:
+    - resource:
+        title: Google Cloud official website and documentation
+        link: https://cloud.google.com/docs
+        type: documentation
+
+    - resource:
+        title: Spark official website and documentation
+        link: https://spark.apache.org/
+        type: documentation
+
+    - resource:
+        title: The Scala programming language official website
+        link: scala-lang.org
+        type: website
+
+
+weight: 1                       # _index.md always has weight of 1 to order correctly
+layout: "learningpathall"       # All files under learning paths have this same wrapper
+learning_path_main_page: "yes"  # Indicates this should be surfaced when looking for related content. Only set for _index.md of learning path content.
+---
diff --git a/content/learning-paths/servers-and-cloud-computing/spark-on-gcp/_next-steps.md b/content/learning-paths/servers-and-cloud-computing/spark-on-gcp/_next-steps.md
@@ -0,0 +1,8 @@
+---
+# ================================================================================
+#       FIXED, DO NOT MODIFY THIS FILE
+# ================================================================================
+weight: 21                  # Set to always be larger than the content in this path to be at the end of the navigation.
+title: "Next Steps"         # Always the same, html page title.
+layout: "learningpathall"   # All files under learning paths have this same wrapper for Hugo processing.
+---
diff --git a/content/learning-paths/servers-and-cloud-computing/spark-on-gcp/background.md b/content/learning-paths/servers-and-cloud-computing/spark-on-gcp/background.md
@@ -0,0 +1,23 @@
+---
+title: "About Google Axion C4A series and Apache Spark"
+
+weight: 2
+
+layout: "learningpathall"
+---
+
+## Google Axion C4A series
+
+The Google Axion C4A series is a family of Arm-based virtual machines built on Google’s custom Axion CPU, which is based on Arm Neoverse-V2 cores. Designed for high-performance and energy-efficient computing, these virtual machine offer strong performance ideal for modern cloud workloads such as CI/CD pipelines, microservices, media processing, and general-purpose applications.
+
+The C4A series provides a cost-effective alternative to x86 virtual machine while leveraging the scalability and performance benefits of the Arm architecture in Google Cloud.
+
+To learn more about Google Axion, refer to the blog [Introducing Google Axion Processors, our new Arm-based CPUs](https://cloud.google.com/blog/products/compute/introducing-googles-new-arm-based-cpu).
+
+## Apache Spark
+
+Apache Spark is an open-source, distributed computing system designed for fast and general-purpose big data processing. 
+
+It provides high-level APIs in Java, Scala, Python, and R, and supports in-memory computation for increased performance. 
+
+Spark is widely used for large-scale data analytics, machine learning, and real-time data processing. Learn more from the [Apache Spark official website](https://spark.apache.org/) and its [detailed official documentation](https://spark.apache.org/docs/latest/).
diff --git a/content/learning-paths/servers-and-cloud-computing/spark-on-gcp/baseline.md b/content/learning-paths/servers-and-cloud-computing/spark-on-gcp/baseline.md
@@ -0,0 +1,51 @@
+---
+title: Baseline Testing
+weight: 5
+
+### FIXED, DO NOT MODIFY
+layout: learningpathall
+---
+
+
+Since Apache Spark is installed successfully on your GCP C4A Arm virtual machine, let's now perform simple baseline testing to validate that Spark runs correctly and gives expected output. 
+
+## Spark Baseline Test
+
+Create a simple Spark job file: 
+```console
+nano ~/spark_baseline_test.scala
+```
+Below is this content of **spark_baseline_test.scala** file:
+
+```scala
+val data = Seq(1, 2, 3, 4, 5) 
+val distData = spark.sparkContext.parallelize(data) 
+
+// Basic transformation and action 
+val squared = distData.map(x => x * x).collect() 
+
+println("Squared values: " + squared.mkString(", ")) 
+```
+Code Explanation:
+This code is a basic Apache Spark example in Scala, demonstrating how to create an RDD (Resilient Distributed Dataset), perform a transformation, and collect results.
+
+What it does, step by step:
+
+- **val data = Seq(1, 2, 3, 4, 5)** : Creates a local Scala sequence of integers.
+- **val distData = spark.sparkContext.parallelize(data)** : Uses parallelize to convert the local sequence into a distributed RDD (so Spark can operate on it in parallel across cluster nodes or CPU cores).
+- **val squared = distData.map(x => x * x).collect()** : `map(x => x * x)` squares each element in the list, `.collect()` brings all the transformed data back to the driver program as a regular Scala collection.
+- **println("Squared values: " + squared.mkString(", "))** : Prints the squared values, joined by commas.
+
+
+### Run the Test in Spark Shell
+
+Run the test in the interactive shell: 
+```console
+spark-shell < ~/spark_baseline_test.scala 
+```
+You should see an output similar to:
+```output
+Squared values: 1, 4, 9, 16, 25
+```
+This confirms that Spark is working correctly with its driver, executor, and cluster manager in local mode. 
+