Analyzing bank customer transactions to derive insights about Customers.
https://www.kaggle.com/datasets/shivamb/bank-customer-segmentation
This dataset consists of 1 Million+ transaction by over 800K customers for a bank in India. The data contains information such as - customer age (DOB), location, gender, account balance at the time of the transaction, transaction details, transaction amount, etc.
Databricks Community Edition Python Notebook
-
In this project, we conducted an data analysis of customer transaction data using PySpark, derived business insights with complex operations including filtering, projection, group by, joins, partition over ranking functions, moving averages,etc. We used Matplotlib and Seaborn libraries to provide data vizualizations and comprehensive view of the findings.
-
Data Cleaning and Preprocessing:
The importance of thorough data cleaning and preprocessing was evident. Handling missing values, formatting dates, and ensuring data consistency were crucial steps to ensure accurate analysis. -
Performance Optimization:
Efficient use of PySpark operations significantly improved the performance of data processing tasks. Techniques like partitioning and using appropriate aggregations were key to managing large datasets. -
Complex Operations in PySpark:
Understanding and implementing complex operations such as window functions, moving averages, and ranking functions provided deeper insights into customer data. -
Visualization Techniques:
The ability to visualize data effectively using Matplotlib and Seaborn helped in better interpretation of the results and facilitated clearer communication of findings. -
Business Insights and Decision Making:
Deriving actionable business insights from data analysis is critical.
By leveraging PySpark for data processing and Python for visualization, we were able to derive meaningful insights that help for decision-making. This analysis not only highlighted the current state of customer transactions but also provided a foundation for future data-driven strategies.