This project focuses on predicting house prices using machine learning techniques. The dataset consists of over 1,000,000+ rows and 12 columns containing information about various house attributes. The goal is to build predictive models to estimate house prices based on these attributes. The project explores different machine learning models, including Linear Regression, Decision Trees, and Random Forests using scikit-learn.
- Data Cleaning & Analysis: Processed and analyzed a large dataset with over 1,000,000 rows to prepare it for modeling.
- Models Trained:
- Linear Regression: Simple but effective for baseline prediction.
- Decision Tree: Captures non-linear relationships between features and target.
- Random Forests: Ensemble method that improved accuracy by combining multiple decision trees.
- Model Performance: Achieved a Root Mean Square Error (RMSE) of 866,152 after hyperparameter tuning. π
The dataset contains more than 1,000,000 rows and 12 columns, including features like:
- Price of the house π° (target variable)
- Date of Transfer π
- Property Type π (e.g., detached, semi-detached, terraced, flat)
- Old/New π‘ (indicates whether the property is newly built or existing)
- Duration β³ (e.g., freehold or leasehold)
- Town/City π
- District π’
- County π
- PPDCategory Type π (indicates if the property was a full or partial sale)
- Record Status ποΈ (applicable to monthly file updates)
- Python (version 3.6 or higher recommended)
- Required Python libraries:
- scikit-learn (for machine learning algorithms) π€
- Pandas (for data manipulation) π
- NumPy (for numerical operations) π’
- Matplotlib & Seaborn (for data visualization) π¨
You can install these dependencies using pip:
pip install scikit-learn pandas numpy matplotlib seaborn
- Data Cleaning: Missing values were handled, and categorical variables were encoded.
- Feature Engineering: Created new features or transformed existing ones to improve model accuracy.
- Model Training: Trained multiple machine learning models, including:
- Linear Regression (baseline)
- Decision Tree Regressor
- Random Forest Regressor
- Model Evaluation: Evaluated model performance using Root Mean Square Error (RMSE).
- Hyperparameter Tuning: Used grid search to tune the hyperparameters and improve model performance.
-
Clone this repository to your local machine:
git clone https://github.com/dharmendradiwaker/Forecasting-House-Prices-Using-Machine-Learning.git
-
Navigate to the project folder:
cd forecasting-house-prices-using-machine-learning
-
Install the required libraries:
pip install -r requirements.txt
-
Run the Jupyter Notebook or Python script:
jupyter notebook House_Price_Prediction.ipynb
- Load the house price dataset from the provided
data/
folder. - Follow the steps in the notebook or script to clean, preprocess, and train the models.
- Explore the performance metrics of each model and see the final predictions.
- You can adjust hyperparameters or add new features to improve the model's accuracy.
This project demonstrates how to predict house prices using machine learning algorithms, offering insights into the key factors that influence the price of a property. The achieved Root Mean Square Error (RMSE) of 866,152 indicates that the model is performing fairly well after fine-tuning, although there is still room for improvement with further feature engineering or advanced techniques.
- @Dharmendradiwaker12