Sparkly Mini Ice

This repository is a toy deployment of Apache Spark in standalone mode using docker-compose, with MinIO as a storage backend and Apache Iceberg as the table format, with a local filesystem metadata management (on the master node).

The objective of the repo is to show an example of what Spark is capable of in a presentation for the university discipline "Distributed Computing Systems".

Deploying

To deploy spark and two worker nodes, plus the necessary MinIO backend, first add the required JAR files in the extra-jars/ directories using ./scripts/download-jars.sh at the root of the repo. To use that, make sure you have curl installed.

After that, you should be able to run docker compose up, acess http://localhost:8080 and see the Spark UI with two workers active, and acess http://localhost:9000 and see the MinIO UI (to login, use minio-spark-example as both the login and password).

Preparing dataset

The dataset we will use is about UK housing prices from 1995 to 2023, the objective is to do some queries using the dataset such as different aggregations by region and date.

To setup the Iceberg table in MinIO, use ./scripts/prepare.sh (run as root if docker is not in rootless mode).

Running the code

To send the python code to be executed by the workers, use ./scripts/execute.sh (run as root if docker is not in rootless mode).

Made fully by

Lucas Eduardo Gulka Pulcinelli

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
extra-jars		extra-jars
scripts		scripts
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
flake.lock		flake.lock
flake.nix		flake.nix
spark-defaults.conf		spark-defaults.conf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Sparkly Mini Ice

Deploying

Preparing dataset

Running the code

Made fully by

About

Uh oh!

Releases

Packages

Languages

License

lucasgpulcinelli/sparkly-mini-ice

Folders and files

Latest commit

History

Repository files navigation

Sparkly Mini Ice

Deploying

Preparing dataset

Running the code

Made fully by

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages