Build and evaluate a Linear Regression model to predict housing prices based on various independent features. This task demonstrates preprocessing, feature engineering, model training, evaluation, and result interpretation.
- Name: Housing.csv
- Source: Kaggle Dataset
- Target Variable:
price
- Python
- Pandas
- NumPy
- Matplotlib
- scikit-learn
- Data Loading: Imported the dataset using Pandas.
- Exploratory Analysis: Checked data types, missing values, and basic statistics.
- Preprocessing:
- One-hot encoded categorical variables using
get_dummies(). - Split data into independent variables (X) and target variable (y).
- One-hot encoded categorical variables using
- Train-Test Split: Used
train_test_splitto divide the data into 80% training and 20% testing sets. - Model Building: Trained a Linear Regression model using
LinearRegression()from scikit-learn. - Evaluation:
- MAE (Mean Absolute Error)
- MSE (Mean Squared Error)
- R² Score
- Visualization:
- Scatter plot of Actual vs Predicted values
- Interpretation: Analyzed the coefficients of the trained model to understand feature impact.
The model successfully predicted housing prices with a decent R² score and low error metrics. The output included coefficient values for each feature, indicating their influence on price prediction.
task3_linear_regression.ipynb– Jupyter Notebook with full code and explanationsHousing.csv– Dataset usedREADME.md– Project documentation
August 7, 2025
This task was completed as part of my AI/ML internship to practice real-world applications of regression techniques.