This is a self initiated project to get the best Air BnB occupancy rate for the Western Cape Region of South Africa.
Milkah Petso is a young enterprenuer who seeks to venture into the Air BnB buisness model in the Western part of Cape Town. She has asked me to find find the optimum property in the optimum neighborhood within the given region to maximize her ocupancy so she can rent one within the area and furnish it to start her Air BnB buiseness.
Below are the buisness questions we need to answer;
- What factors affect occupancy rate the most in an Air BnB listing?
- What can be done to increase the positive effects of these factors?
- What is the best performing neighbourhood on the Western Cape in terms of occupancy rate?
The data will be sourced from the (Inside AirBnB platform )[https://insideairbnb.com/get-the-data/]. Generally quarterly data for the last 12 months.
The data contains the following columns as decsribed;
- id - The guest id
- name - the name of the guest
- host_id - the host's id
- host_name - The host's name
- neighbourhood_group - the grouped neighboodhood
- neighbourhood - the specific neighbourhood
- latitude - The latitude
- longitude - the longitude
- room_type - the type of room
- price - the price per night
- minimum_nights - minimum number of nights per stay
- number of reviews - number of reviews
- last review - last review
- reviews per month - reviews per month
- calculated host listings_count - The number of listings the host has in the city/region
- availability_365 - the number of days the listing is available per year
- number of reviews ltm - reviews recieved in the last twelve months
- license - license or registration number required by local authorities for short-term rental properties
The data was cleaned by dropping NaN values, imputing missing numerical values, dropping irrelevant columns and leaving the neighbourhood, number of reviews, minimum_nights and room type. I one hot encoded the categorical variables i.e room type and neighbourhood.
- Decision Tree Classifier
- Logistic Regression
- Random Forest Classifier
The model performed well and with a clssification table determined an accuracy of 1.0
The model did well on both the training and test data and determined an accuracy test with 99% for both
The model performed well on the training and test data and determined an accuracy of 99% using a cross validation test