-
Make sure to install
openjdk,openssh-server, andopenssh-client -
Download the latest version of hadoop from Apache's website.
-
Follow the instruction to setup hadoop on a single node
-
Use the command
hadoop classpathto get the class path and add its a variable in thecomplie.shfile
- i7-1170GH
- 8 GB RAM
- OpenJDK 8
- Hadoop 3.2.4
When using a networked deployed hadoop enviroment, use the command hadoop classpath of the deployement to compile the
- Run the following file to compile the Mapreduce Program
./complie.sh
If unable to execute, give necessary executable permission. eg. chmod +x compile.sh
-
Move the dataset use
putto HDFS -
Run the following command to execute the program
hadoop jar kmeans.jar /path/to/dataset /path/to/output number_of_clusters
-
Needs testing at scale, as the Overhead was too much in a smaller system with limited processing power, as the number of node the system used was also limited
-
Need help in finding out how to use the a deployed hadoop enviroment, as the instructions are very vague
-
Create the Mapper Program
-
Create the Reducer Program
-
Using a Reducer to reduce network overheadL