Skip to content

yashwin1999/Spark-Matrix-Multiplication

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 

Repository files navigation

Spark Matrix Multiplication (Scala)

This project implements matrix multiplication using Apache Spark in Scala.
It demonstrates multiplication of both Coordinate Matrices (COO format) and Block Matrices, with two implementations each:

  • Using cogroup
  • Using join (commented as alternatives in the code)

📂 Project Structure

Spark-Matrix-Multiplication/
├── MatrixMultiplication.scala   # Scala code with functions + small demo
├── README.md                    # Project documentation
└── .gitignore                   # Ignore unnecessary files

⚡ Features

  • Matrix multiplication for small and large matrices
  • Support for coordinate and block representations
  • Example with 16,384 x 16,384 sparse matrices
  • Runs on Databricks, Spark Shell, or any Spark environment

🔧 Requirements

  • Apache Spark (2.4+ or 3.x)
  • Scala (2.11 or 2.12 depending on your Spark version)
  • Optional: Databricks Community Edition for notebooks

▶️ How to Run

1. Load the code

In Spark shell:

:load MatrixMultiplication.scala

Or in Databricks:
Copy the contents of MatrixMultiplication.scala into a notebook cell and run it.


2. Run Coordinate Matrix Multiplication

val resultCoo = COOMatrixMultiply(M_RDD_Small, N_RDD_Small)
resultCoo.collect.foreach(println)

Example Input Matrices

M = [ [1, 2],
      [3, 4] ]

N = [ [5, 6],
      [7, 8] ]

Expected Output

((0,0),19.0)
((0,1),22.0)
((1,0),43.0)
((1,1),50.0)

3. Run Block Matrix Multiplication

val resultBlock = BlockMatrixMultiply(M_RDD_Block, N_RDD_Block, blockSize)
resultBlock.collect.foreach(println)

You can adjust the blockSize parameter (e.g., 2, 4, 8).


4. Run Large Matrices (optional)

val R_RDD_Coo_large = COOMatrixMultiply(M_RDD_Coo_large, N_RDD_Coo_large)
println(R_RDD_Coo_large.count())

val R_RDD_Block_large = BlockMatrixMultiply(M_RDD_Block_large, N_RDD_Block_large, blockSize = 8)
println(R_RDD_Block_large.count())

⚠️ Running with 16384 x 16384 matrices requires significant cluster resources.


📘 Notes

  • Two implementations are provided: cogroup (used by default) and join (commented in code). You can switch by uncommenting.
  • The functions are generic: you can replace the sample matrices with your own RDDs.
  • For large-scale experiments, consider tuning partitioning and block size.

👨‍💻 Author

Created by Yashwin Bangalore Subramani

About

Implimentation of matrix multiplication using Apache Spark in Scala

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages