- Part of: Build on Asean 2021 Competition Project
- Task: College Major Recommendation System
Unitics (University Analytics) is designed to help high school students who would like to pursue higher education in Indonesian state university. Identified problems, such as lack of self-understanding (only following others, not their personal interest or talents and only see trends), lack of relevant information (various majors but no idea which one is suitable), and lack of ability to make decisions become the reasons why this solution is needed.
SNMPTN 2016 and 2017 data from HaloKampus formed a dataset of accepted students' scores at 9 Indonesian state universities. Six subjects trained a Science Multi-Layer Perceptron (MLP) model and seven for a Social MLP model. The models achieved 94.57% accuracy for science (174 majors) and 89.79% accuracy for social (70 majors).
This project consists of 4 parts:
- Data Preprocessing
- Exploratory Data Analysis
- Machine Learning Modeling - Multi-Layer Perceptron / Dense Neural Network
- College Major Prediction and Recommendation
and Machine Learning part is divided into two different model:
- Multi-Layer Perceptron (MLP) for Science Major
- Multi-Layer Perceptron (MLP) for Social Major
Selected Attributes:
- Jurusan Sekolah (School Major): The high school major of the students, "IPA" for Science major and "IPS" for Social major
- Jurusan Diterima (Accepted Major)
- PTN Diterima (Accepted University)
- Mat Sem 1, Mat Sem 2, Mat Sem 3, Mat Sem 4, Mat Sem 5 (1st up to 5th semester Mathematics subject score)
- Ing Sem 1, Ing Sem 2, Ing Sem 3, Ing Sem 4, Ing Sem 5 (1st up to 5th semester English subject score)
- Ind Sem 1, Ind Sem 2, Ind Sem 3, Ind Sem 4, Ind Sem 5 (1st up to 5th semester Indonesian subject score)
- Fis Sem 1, Fis Sem 2, Fis Sem 3, Fis Sem 4, Fis Sem 5 (1st up to 5th semester Physics subject score)
- Kim Sem 1, Kim Sem 2, Kim Sem 3, Kim Sem 4, Kim Sem 5 (1st up to 5th semester Chemistry subject score)
- Bio Sem 1, Bio Sem 2, Bio Sem 3, Bio Sem 4, Bio Sem 5 (1st up to 5th semester Biology subject score)
- Eko Sem 1, Eko Sem 2, Eko Sem 3, Eko Sem 4, Eko Sem 5 (1st up to 5th semester Economy subject score)
- Geo Sem 1, Geo Sem 2, Geo Sem 3, Geo Sem 4, Geo Sem 5 (1st up to 5th semester Geography subject score)
- Sos Sem 1, Sos Sem 2, Sos Sem 3, Sos Sem 4, Sos Sem 5 (1st up to 5th semester Sociology subject score)
- Sej Sem 1, Sej Sem 2, Sej Sem 3, Sej Sem 4, Sej Sem 5 (1st up to 5th semester History subject score)
Non-Selected Attributes:
- Nama Panggilan (Nickname)
- Asal Sekolah (School Origin)
- Jenis Sekolah (School Type)
- Jenis Kelas (Class Type, acceleration / regular class)
- Akreditasi Sekolah (School Accreditation)
- Prestasi Sekolah (School Achievement)
- Alumni Sekolah (School Alumni)
- PTN Pilihan I (1st university choice)
- Jurusan Pilihan I di PTN Pilihan I (1st chosen major on 1st university choice)
- Jurusan Pilihan II di PTN Pilihan I (2nd chosen major on 1st university choice)
- PTN Pilihan II (2nd university choice)
- Jurusan Pilihan I di PTN Pilihan II (1st chosen major on 2nd university choice)
- Jurusan Pilihan II di PTN Pilihan II (2nd chosen major on 2nd university choice)
- Kom Sem 1, Kom Sem 2, Kom Sem 3, Kom Sem 4, Kom Sem 5 (1st up to 5th semester Computer subject score)
- Peringkat Kelas Sem 1 (1st semester class rank)
- Peringkat Kelas Sem 2 (2nd semester class rank)
- Peringkat Kelas Sem 3 (3rd semester class rank)
- Peringkat Kelas Sem 4 (4th semester class rank)
- Peringkat Kelas Sem 5 (5th semester class rank)
- Nilai UN (National Exam Scores)
- Saran buat adik kelas (Tips for younger students / classmate)
- Prestasi lain yang dilampirkan (Other achievements)
Public data of SNMPTN 2016 and 2017 from the HaloKampus website were gathered to form a dataset consisting of student score reports accepted in 9 Indonesian state universities. Score reports from 6 subjects were chosen to train the Science Multi-Layer Perceptron (MLP) model, and 7 subjects were chosen for the Social MLP model. After several tuning processes, the trained MLP model could achieve 94,57% accuracy for the science major model to classify 174 college majors and 89,79% accuracy for the social major model to classify 70 college majors.
- Try to collect more data from various sources
- More in-depth observation on the dataset, since there's a lot of outlier.
- A lot of google searches (geeksforgeeks, kaggle, github, medium articles)
- Stack Overflow for debugging the problems
- ChatGPT: I use chatGPT to help me check if my code syntax is correct or not. (In the newer version of the UniticsML improvement. It haven't existed back then in 2021)