This project implements matrix multiplication using Apache Spark in Scala.
It demonstrates multiplication of both Coordinate Matrices (COO format) and Block Matrices, with two implementations each:
- Using
cogroup - Using
join(commented as alternatives in the code)
Spark-Matrix-Multiplication/
├── MatrixMultiplication.scala # Scala code with functions + small demo
├── README.md # Project documentation
└── .gitignore # Ignore unnecessary files
- Matrix multiplication for small and large matrices
- Support for coordinate and block representations
- Example with 16,384 x 16,384 sparse matrices
- Runs on Databricks, Spark Shell, or any Spark environment
- Apache Spark (2.4+ or 3.x)
- Scala (2.11 or 2.12 depending on your Spark version)
- Optional: Databricks Community Edition for notebooks
In Spark shell:
:load MatrixMultiplication.scalaOr in Databricks:
Copy the contents of MatrixMultiplication.scala into a notebook cell and run it.
val resultCoo = COOMatrixMultiply(M_RDD_Small, N_RDD_Small)
resultCoo.collect.foreach(println)Example Input Matrices
M = [ [1, 2],
[3, 4] ]
N = [ [5, 6],
[7, 8] ]
Expected Output
((0,0),19.0)
((0,1),22.0)
((1,0),43.0)
((1,1),50.0)
val resultBlock = BlockMatrixMultiply(M_RDD_Block, N_RDD_Block, blockSize)
resultBlock.collect.foreach(println)You can adjust the blockSize parameter (e.g., 2, 4, 8).
val R_RDD_Coo_large = COOMatrixMultiply(M_RDD_Coo_large, N_RDD_Coo_large)
println(R_RDD_Coo_large.count())
val R_RDD_Block_large = BlockMatrixMultiply(M_RDD_Block_large, N_RDD_Block_large, blockSize = 8)
println(R_RDD_Block_large.count())16384 x 16384 matrices requires significant cluster resources.
- Two implementations are provided:
cogroup(used by default) andjoin(commented in code). You can switch by uncommenting. - The functions are generic: you can replace the sample matrices with your own RDDs.
- For large-scale experiments, consider tuning partitioning and block size.
Created by Yashwin Bangalore Subramani