KMeans Clustering using MapReduce Framework

Steps to Start System in Local Machine 💻

Make sure to install openjdk, openssh-server, and openssh-client
Download the latest version of hadoop from Apache's website.
Follow the instruction to setup hadoop on a single node
Use the command hadoop classpath to get the class path and add its a variable in the complie.sh file

System used for Development and Testing

i7-1170GH
8 GB RAM
OpenJDK 8
Hadoop 3.2.4

For deployed hadoop enviroments 🌐

When using a networked deployed hadoop enviroment, use the command hadoop classpath of the deployement to compile the

Running mapreduce

Run the following file to compile the Mapreduce Program

./complie.sh

If unable to execute, give necessary executable permission. eg. chmod +x compile.sh

Move the dataset use put to HDFS
Run the following command to execute the program

hadoop jar kmeans.jar /path/to/dataset /path/to/output number_of_clusters

Possible Improvements

Needs testing at scale, as the Overhead was too much in a smaller system with limited processing power, as the number of node the system used was also limited
Need help in finding out how to use the a deployed hadoop enviroment, as the instructions are very vague

Tasks ✅

Create the Mapper Program
Create the Reducer Program
Using a Reducer to reduce network overheadL

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
KMeans.java		KMeans.java
README.md		README.md
compile.sh		compile.sh
kmeans.jar		kmeans.jar

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

KMeans Clustering using MapReduce Framework

Steps to Start System in Local Machine 💻

System used for Development and Testing

For deployed hadoop enviroments 🌐

Running mapreduce

Possible Improvements

Tasks ✅

About

Uh oh!

Releases

Packages

Languages

vivindeena/pdcproject

Folders and files

Latest commit

History

Repository files navigation

KMeans Clustering using MapReduce Framework

Steps to Start System in Local Machine 💻

System used for Development and Testing

For deployed hadoop enviroments 🌐

Running mapreduce

Possible Improvements

Tasks ✅

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages