This project aims to predict real estate prices using machine learning models in Python. By leveraging data science techniques, the project explores how factors like house age, proximity to MRT stations, number of convenience stores, and geographical coordinates influence housing prices. Two models, Random Forest Regressor and Linear Regression, were built, evaluated, and compared.
- Data preprocessing and cleaning.
- Exploratory Data Analysis (EDA) to uncover insights.
- Model training using Random Forest Regressor and Linear Regression.
- Model evaluation using metrics like Root Mean Squared Error (RMSE) and R-squared (R²).
- Prediction functionality for new property data.
- Python: Core programming language.
- Pandas & NumPy: Data manipulation and analysis.
- Matplotlib & Seaborn: Data visualization.
- Scikit-learn: Machine learning algorithms and evaluation metrics.
-
Data Loading and Exploration:
- The dataset contains features like
Transaction Date,House Age,Distance to MRT Station,Number of Convenience Stores,Latitude,Longitude, andHouse Price of Unit Area. - Shape of the dataset: 414 rows, 7 columns.
- The dataset contains features like
-
Data Preprocessing:
- Handled missing values by replacing them with column means.
- Normalized features using
StandardScaler.
-
Model Development:
- Random Forest Regressor: Used 100 estimators with a random state of 42.
- Linear Regression: Built as a baseline model.
-
Evaluation Metrics:
- Root Mean Squared Error (RMSE).
- R-squared (R²) score.
-
Prediction Function:
- Created a function to predict house prices based on user-provided inputs.
- Random Forest Regressor:
- RMSE: $6.32
- R² Score: 0.8557
- Linear Regression:
- RMSE: $13.49
- R² Score: 0.3422
For a sample house with the following details:
- House age: 10 years
- Distance to MRT station: 500 meters
- Number of convenience stores: 5
- Latitude: 25.03
- Longitude: 121.53
The predictions were:
- Random Forest Predicted Price: $35.96 per unit area.
- Linear Regression Predicted Price: $2,117.12 per unit area.
-
Clone the repository:
git clone <repository-url>
-
Install required libraries:
pip install -r requirements.txt
-
Run the Jupyter Notebook or Python script to execute the project.
-
Modify the
sample_housedictionary in the script to predict prices for custom data.
This project demonstrates the practical application of machine learning in the real estate domain, providing actionable insights and accurate predictions for property pricing. It highlights the efficiency of Random Forest over Linear Regression for this dataset.
- Experiment with additional machine learning models.
- Implement hyperparameter tuning for Random Forest.
- Explore deep learning techniques for improved performance.