Instacart, a leading online grocery store, seeks to enhance its customer segmentation and marketing strategies through data analysis. As a data analyst, our goal is to perform an exploratory analysis of Instacart's sales data to uncover patterns in customer behavior and order trends. The insights derived from this project will help optimize Instacart’s targeted marketing efforts and improve customer engagement.
The analysis relies on multiple datasets, including:
-
Instacart Open Source Data (2017):
- Provided by Instacart and available on Kaggle.
- Contains information on grocery orders, product categories, and user interactions.
-
Customer Data Set (Fabricated for the project):
- Includes demographic and behavioral attributes of Instacart customers.
- Helps in profiling customer segments.
-
Data Dictionary:
- Provides definitions for variables across datasets.
- Helps in understanding the meaning and relationships of different fields.
To ensure the data was ready for analysis, several cleaning and preprocessing steps were followed:
-
Handling Missing Values:
- Checked for null values in all datasets.
- Imputed missing values where necessary (e.g., replacing missing
days_since_last_order
with median values). - Removed rows with excessive missing data.
-
Renaming Columns:
- Renamed
order_dow
toorders_day_of_week
for better readability. - Ensured consistency across all dataset column names.
- Renamed
-
Fixing Data Types:
- Converted categorical variables (e.g.,
orders_day_of_week
,order_hour_of_day
) to appropriate formats. - Ensured numeric variables (e.g.,
income
,prices
) were stored as float or integer types.
- Converted categorical variables (e.g.,
-
Removing Duplicates:
- Identified and dropped duplicate records to prevent data inconsistencies.
-
Merging Datasets:
- Combined
orders
,customer_data
, andproducts
datasets usinguser_id
andproduct_id
as unique keys. - Verified the integrity of merged data by checking for unmatched records.
- Combined
Contains order-related details of Instacart customers.
order_id
: Unique identifier for each order.user_id
: Unique identifier for each customer.order_number
: The sequential number of the order for a given customer.orders_day_of_week
: Day of the week the order was placed (0=Sunday, 6=Saturday).order_hour_of_day
: The hour of the day the order was placed (0-23).days_since_last_order
: Days since the customer’s last order.
Contains demographic information of Instacart customers.
user_id
: Unique identifier for each customer.first_name
: Customer’s first name.last_name
: Customer’s last name.gender
: Customer’s gender (Male/Female).state
: The state where the customer resides.age
: Customer’s age.date_joined
: The date the customer joined Instacart.n_dependants
: Number of dependents in the customer’s household.fam_status
: Family status (e.g., married, single).income
: Annual income of the customer.
Contains details about breakfast-related products.
product_id
: Unique identifier for each product.product_name
: Name of the product.aisle_id
: Aisle identifier where the product is located.department_id
: Department identifier the product belongs to.prices
: Price of the product.
Contains information about various product departments.
department_id
: Index column.department
: Name of the department (e.g., frozen, bakery, produce, alcohol).
-
Busiest Days and Hours:
-
Peak order hours are from 10 AM to 10 PM, with minimal orders between midnight and early morning.
-
The highest number of orders are placed on weekends, specifically on Saturday and Sunday.
-
Orders gradually increase from early morning, peaking in the early afternoon, and decline towards the night.
Spending fluctuates over the day, with slightly higher spending occurring in the early morning and evening hours.
-
Customer segmentation was established based on behavioral and demographic criteria:
-
Loyalty Categories:
-
Demographics:
-
Segmentation based on age and number of dependants:
- Single young: age between 18 and 35, no dependants
- Young parent: age between 18 to 35 with one or more dependents
- Singe adult: age 35 to 60, no dependents
- Family Shopper: age 35+, multiple dependents
- Senior Shopper: age 60+, fewer or no dependents
This bar chart shows the distribution of customer profiles based on a segmentation of age and number of dependents. Here is how we can describe the chart:
- The "Family Shopper" profile dominates, with a significantly higher count compared to the others. This suggests that a large portion of customers are in the 30+ age range and have multiple dependents, likely shopping for family-sized products or groceries.
- The "Senior Shopper" profile is the second most common, which could indicate a significant number of customers over 60 who may be shopping for essential or convenience items, and possibly fewer or no dependents.
- The "Young Parent" profile also shows a noticeable number of customers, indicating a group of younger individuals (aged 18-30) with children or dependents.
- The "Single Young" profile is the least common, which could reflect that fewer young, independent individuals without dependents are frequent shoppers in this dataset.
- The 'Single Adult' profile is in the middle.
-
The two following visualizations, average spending and standard deviation spending, provide insights into the spending behaviors of different customer profiles.
-
Average Spending:
-
The average spending by customer profile is relatively similar across all groups, with Family Shoppers, Senior Shoppers, Single Adults, Young Parents, and Single Young shoppers spending around the same amount (around 7.7 for all segments).
-
This suggests that while the total spending differs between profiles (with Family Shoppers being the largest spenders), the average spending per transaction is consistent across these groups.
-
-
Standard Deviation Spending:
-
The standard deviation of spending is also comparable among all customer segments, with each profile showing similar variability in their spending (ranging from 3.7 to 4).
-
This indicates that, despite differences in total spending, customers within each group demonstrate consistent shopping behaviors in terms of spending per order.
-
The even spread of standard deviation across segments suggests that marketing efforts should aim to keep the spending patterns consistent, particularly for Family Shoppers, who are the largest contributors to total spending.
-
Both visualizations help highlight that while average spending varies slightly across segments, the variation in spending remains consistent, implying that consistent purchasing behaviors could be promoted for all profiles.
-
Low income: below the mean minus one standard deviation
-
Middle income: between the mean minus one standard deviation and the mean plus one standard deviation.
-
High income: above the mean plus one standard deviation
This bar chart illustrates the distribution of customer income groups across different regions. Here are some points:
- Middle income customers make up the largest proportion in every region, with a notable spike in the South and West regions.
- Low income customers are particularly prominent in the South, though they are also represented in all regions, but to a lesser extent compared to middle-income customers.
- High income customers have a smaller share overall, with a slight increase in the South and West regions.
This stacked bar chart shows the distribution of product groups by customer segment. Here's the main takeaways:
- Food is the largest category by far (it also has the most items available), especially for Family Shoppers.
- Other product groups have significally fewer items and are sold less. As expected Family Shoppers and Young Parents are the ones who buy most of baby care.
- Family Shopper contributes the most to the Food category, followed by smaller contributions to Baby Care and Drink.
- Single Adult and Young Parent segments also show notable participation in Food. Senior Shopper and Young Adult have smaller contributions overall.
- Food is the most popular category across all segments, while other categories like Pet Care and Non-consumable are less significant.
Family Shoppers dominate the South and West, while Senior Shoppers are more prominent in the South and Midwest. The Northeast has a more even distribution, and the South has the highest number of shoppers.
This project provided valuable insights into Instacart's customer behavior and purchase patterns, leading to key strategic recommendations:
-
Enhance Marketing Strategies for Key Customer Segments
- Implement targeted promotions for Family Shoppers and Young Parents, as they form the largest segment.
- Personalize recommendations and offer incentives for Single Adults and Senior Shoppers to increase engagement.
-
Optimize Regional and Time-Based Shopping Behavior
- Expand promotional efforts in the South, the region with the highest engagement, while boosting awareness in the Northeast.
- Encourage weekday and off-peak shopping with special discounts to balance demand more efficiently.
-
Introduce Tiered Pricing and Discounts Based on Income Segments
- Offer affordable bundles and household staples for middle and low-income customers.
- Develop premium product incentives and faster delivery options for high-income customers.
-
Improve Customer Retention with Subscription Models and Rewards
- Convert Regular Customers into Loyal Customers through subscription-based discounts and AI-driven personalized promotions.
- Strengthen existing loyalty programs with exclusive benefits, early access to sales, and curated shopping experiences.
-
Expand Product and Pricing Strategies to Match Consumer Demand
- Prioritize promotions on high-demand food categories while boosting non-food purchases through cross-selling strategies.
- Introduce bulk purchase discounts and tailored offers to increase engagement across all price ranges.
Citation:
- "The Instacart Online Grocery Shopping Dataset 2017", Accessed from Instacart Kaggle Dataset
- Customer data set provided by CareerFoundry for educational purposes.