Python MySQL CSV Analyzer

Introduction

This project demonstrates the integration of Python with MySQL databases for data analysis workflows. The system illustrates the complete process of importing data from CSV files into a relational database, creating appropriate table structures, and performing increasingly sophisticated SQL queries to derive meaningful insights.

PythonMySQLCSV.webm

Designed as both a learning tool and a practical reference implementation, this project showcases real-world data processing techniques that are commonly used in data analysis, business intelligence, and academic research settings.

Purpose and Educational Value

This project serves multiple educational purposes:

Database Integration: Demonstrates how Python can interact with MySQL databases using the PyMySQL library
Data Processing Pipeline: Shows a complete ETL (Extract, Transform, Load) workflow from raw CSV data to structured database storage
SQL Query Development: Provides incrementally challenging SQL exercises from basic CRUD operations to complex analytical queries
Data Visualization: Illustrates how to present query results in a readable, formatted manner in the terminal environment
Real-world Application: Models a simplified version of academic record management and analysis

By working through this project, you'll gain practical experience with:

Establishing secure database connections with credential management
Parsing and validating CSV data
Creating robust database queries
Implementing data categorization and grading systems
Analyzing relative performance using gap analysis
Formatting and presenting data analysis results

System Architecture and Components

The project follows a modular architecture with clear separation of concerns, making it easier to understand each component's responsibility and how they interact together to form a complete system.

Core Components:

db_config.py:
- Manages database connection configuration and security
- Loads credentials from environment variables for better security practices
- Implements a connection factory pattern with error handling
- Creates the necessary database table structure with appropriate constraints
- Provides a clean connection closure mechanism
csv_importer.py:
- Handles the extraction and transformation of data from CSV sources
- Implements CSV parsing with validation checks
- Performs data cleaning and type conversion
- Executes SQL INSERT operations for database loading
- Includes detailed logging of the import process
data_analyzer.py:
- Contains the core analytical functions for student performance analysis
- Implements database retrieval functions with optimized queries
- Formats and displays data in human-readable tables
- Calculates summary statistics for student performance metrics
- Identifies top performers for recognition
main.py:
- Orchestrates the entire data processing workflow
- Manages the execution sequence of the application
- Provides user feedback throughout the process
- Handles exception management and error reporting
- Serves as the entry point for the application

Exercise Components:

additional_queries_exercise.py:
- Contains structured SQL query exercises of increasing complexity
- Provides clear instructions and expected outcomes for each exercise
- Includes placeholder comments guiding implementation
- Features built-in visualization of query results
- Offers a practical platform for SQL skill development

Data Assets:

data/students.csv:
- Contains structured student record data featuring Star Wars character names
- Includes student identification, names, and CGPA values
- Serves as the raw data source for the database import process
- Provides a balanced distribution of grade values for meaningful analysis

Environment Setup and Configuration

Prerequisites

Before running this project, you need:

Python 3.10 or higher
MySQL Server 8.0 or higher
Access to create databases and tables on your MySQL server
Basic understanding of SQL and Python

Installation Process

Clone or download the repository:

git clone https://github.com/prakash-aryan/python_mysql_csv_analyzer.git

Create a virtual environment (recommended):

python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

Install required dependencies:
```
pip install -r requirements.txt
```
This will install:
- pymysql: For MySQL database connectivity
- python-dotenv: For secure environment variable management
- pandas: For data manipulation (optional, used in some advanced queries)
- rich: For terminal output formatting

Database Configuration: Create and Edit the .env file with your MySQL connection details:

# MySQL Database Configuration
DB_HOST=your_database_host    # e.g., localhost
DB_PORT=3306                  # Default MySQL port
DB_NAME=your_database_name    # Must exist on the server
DB_USER=your_username         # Must have CREATE TABLE permissions
DB_PASSWORD=your_password     # Store securely

Database Preparation:
- Ensure your MySQL server is running
- Create the database specified in your .env file:
```
CREATE DATABASE your_database_name;
```
- The application will automatically create the table structure when run
Verify Setup: Check that:
- Your virtual environment is activated
- Your MySQL server is accessible with the provided credentials
- The specified database exists
- The data/students.csv file is present in the correct location

Application Execution and Workflow

The project is designed to guide you through a progressive learning experience, starting with fundamental database operations and moving towards more complex analytical queries.

Step 1: Run the Main Application

Start by running the main application to establish your database, import the data, and perform basic analysis:

python main.py

What this does:

Database Connection:
- Establishes a secure connection to your MySQL server
- Outputs connection details including host and port
- Verifies database accessibility
Table Creation:
- Checks if the student_records table exists
- Creates the table with appropriate column types and constraints if needed
- Confirms table structure is ready for data import
Data Import Process:
- Opens and validates the CSV file structure
- Parses the header row and confirms expected columns
- Performs data type validation for each record
- Handles potential errors with appropriate messaging
- Truncates any existing data to ensure a fresh import
- Imports each record with SQL parameterization for security
- Provides a count of successfully imported records
Data Analysis and Display:
- Retrieves all records with formatted SQL query
- Presents data in a well-structured table format
- Calculates key statistical measures (min, max, average CGPA)
- Identifies and highlights top-performing students
- Summarizes the overall class performance

Expected Output: You will see a detailed, step-by-step progress output with formatted tables showing:

Connection and initialization status
Import progress and confirmation
Complete student records in tabular form
Statistical analysis of the CGPA distribution
Identification of top performers

This output allows you to verify the entire data pipeline is functioning correctly before proceeding to the exercises.

Step 2: Complete the SQL Query Exercises

Once you've run the main application and verified that the data pipeline is functioning correctly, you can proceed to the SQL query exercises, which represent the core educational component of this project:

python additional_queries_exercise.py

The exercise file contains three progressively challenging query tasks, each designed to develop specific SQL skills and analytical thinking. Currently, these contain placeholder comments that you'll need to replace with working SQL statements.

Exercise 1: Performance Category Classification

Learning Objectives:

Implementing SQL CASE statements for data categorization
Grouping and aggregating data with GROUP BY
Applying multiple aggregate functions in a single query
Sorting grouped results meaningfully

Task Description: Write a query that:

Creates a performance category label using CGPA thresholds:
- Excellent: CGPA ≥ 3.8
- Very Good: CGPA ≥ 3.5
- Good: CGPA ≥ 3.3
- Satisfactory: CGPA ≥ 3.0
- Needs Improvement: CGPA < 3.0
Groups students by these categories
Calculates for each category:
- Student count
- Minimum CGPA
- Maximum CGPA
- Average CGPA
Orders results by minimum CGPA in descending order

Visualization: The results will be displayed in a formatted table showing the distribution of students across performance categories with associated statistics.

Exercise 2: Letter Grade Assignment

Learning Objectives:

Creating complex CASE expressions with multiple conditions
Implementing academic grading standards in SQL
Constructing multiple related queries that share logic
Calculating percentage distributions

Task Description: This exercise requires two related queries:

Query 1:

Select student ID, name, and CGPA
Create a letter grade column using this grading scale:
- A+: CGPA ≥ 3.9
- A: CGPA ≥ 3.7
- A-: CGPA ≥ 3.5
- B+: CGPA ≥ 3.3
- B: CGPA ≥ 3.0
- B-: CGPA ≥ 2.7
- C+: CGPA ≥ 2.3
- C: CGPA ≥ 2.0
- F: CGPA < 2.0
Order results by CGPA in descending order

Query 2:

Use the identical CASE logic to ensure consistency
Count how many students received each letter grade
Group the results by letter grade
Order by minimum CGPA in descending order

Visualization: Results will be displayed as two tables:

Individual student grades showing ID, name, CGPA, and assigned letter grade
Grade distribution summary showing counts and percentages for each grade

Exercise 3: CGPA Gap Analysis

Learning Objectives:

Writing queries that enable sequential data analysis
Understanding how to prepare data for gap calculations
Implementing ranking and ordering in SQL
Identifying significant performance thresholds

Task Description: Write a query that:

Selects student ID, name, and CGPA
Orders the results precisely by CGPA in descending order
Provides the data needed for the Python code to calculate gaps between consecutive students

Visualization: The results will be displayed as a ranked table showing:

Student rank by CGPA
Student ID and name
CGPA value
Gap to the next ranked student
Highlighting of the largest performance gap
Analysis of where the significant performance jumps occur

Step-by-Step Exercise Completion Guide

To successfully complete these exercises:

Understand the Requirements:
- Carefully read each exercise description
- Note the specific columns needed in the result set
- Pay attention to ordering requirements
- Understand how the result will be used by the display code
Implement the Queries:
- Open additional_queries_exercise.py in your preferred code editor
- Locate the placeholder comments for each query
- Write SQL statements that fulfill all requirements
- Ensure your column names match exactly what the display code expects
Test Incrementally:
- Implement one query at a time
- Run the script after each implementation to test your solution
- Debug any errors by checking column names and syntax
Verify Results:
- Check that your output tables look correct and well-formatted
- Verify that the data makes sense (e.g., grade assignments match CGPA values)
- Confirm that ordering and grouping work as expected
Refine Your Solutions:
- Look for opportunities to improve query efficiency
- Ensure consistent CASE logic between related queries
- Consider edge cases in your data categorization

Technical Reference Information

Database Schema Details

The student_records table is designed with a straightforward but effective schema optimized for academic record storage:

CREATE TABLE IF NOT EXISTS student_records (
    id_no VARCHAR(20) PRIMARY KEY,
    name VARCHAR(100) NOT NULL,
    cgpa FLOAT NOT NULL
)

Schema Design Rationale:

id_no as VARCHAR(20): Allows for alphanumeric student IDs with prefix coding (e.g., S001)
name as VARCHAR(100): Accommodates full names of various lengths with room for special characters
cgpa as FLOAT: Stores cumulative grade point average with decimal precision
PRIMARY KEY constraint: Ensures each student record is unique and optimizes lookup queries
NOT NULL constraints: Maintains data integrity by preventing missing critical information

Sample Dataset

The system uses a curated dataset featuring Star Wars character names to make the learning experience more engaging:

ID_NO	NAME	CGPA
S001	Luke Skywalker	3.7
S002	Princess Leia	3.9
S003	Han Solo	3.5
S004	Rey Skywalker	4.0
S005	Obi-Wan Kenobi	3.2
S006	Padmé Amidala	3.8
S007	Anakin Skywalker	3.1
S008	Ahsoka Tano	3.6
S009	Din Djarin	3.3
S010	Grogu	3.4

Dataset Characteristics:

10 student records providing sufficient data for meaningful analysis
CGPA values ranging from 3.1 to 4.0, representing a realistic distribution
Strategic distribution of values to demonstrate interesting patterns in analysis
Balanced representation across different performance categories
Sequential student IDs for easy reference

This data structure provides a foundation for practicing SQL queries ranging from simple selections to complex analytical grouping operations.

SQL Query Development Guide

Essential SQL Techniques for the Exercises

When implementing your solutions for the exercises, consider these SQL best practices and techniques:

Effective CASE Statements:
```
CASE 
    WHEN condition1 THEN result1
    WHEN condition2 THEN result2
    ...
    ELSE default_result 
END AS column_name
```
- Use logical progression in condition order (often highest to lowest)
- Ensure conditions don't overlap to avoid unexpected results
- Include an ELSE clause to handle all possible values
- Give meaningful aliases to derived columns
Aggregate Functions:
- COUNT(*): Count all rows in a group
- MIN(column): Find the minimum value
- MAX(column): Find the maximum value
- AVG(column): Calculate the average value
- SUM(column): Total the values in a column
- Consider using ROUND(AVG(column), 2) for cleaner decimal display
Grouping and Sorting:
- GROUP BY: Must include all non-aggregated columns in SELECT
- ORDER BY: Can reference columns by position number (e.g., ORDER BY 1 DESC)
- Multiple grouping: GROUP BY column1, column2
- Multiple sorting: ORDER BY column1 DESC, column2 ASC
Readability Practices:
- Use consistent indentation for nested statements
- Place each major clause (SELECT, FROM, WHERE, etc.) on a new line
- Use table aliases for complex queries (FROM table_name AS t)
- Comment complex logic, particularly in CASE statements
Common Pitfalls to Avoid:
- Forgetting to include columns in GROUP BY that appear in SELECT
- Using ambiguous column names when they could appear in multiple tables
- Missing edge cases in CASE statement conditions
- Not handling NULL values appropriately

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Python MySQL CSV Analyzer

Introduction

Purpose and Educational Value

System Architecture and Components

Core Components:

Exercise Components:

Data Assets:

Environment Setup and Configuration

Prerequisites

Installation Process

Application Execution and Workflow

Step 1: Run the Main Application

Step 2: Complete the SQL Query Exercises

Exercise 1: Performance Category Classification

Exercise 2: Letter Grade Assignment

Exercise 3: CGPA Gap Analysis

Step-by-Step Exercise Completion Guide

Technical Reference Information

Database Schema Details

Sample Dataset

SQL Query Development Guide

Essential SQL Techniques for the Exercises

About

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
data		data
.env		.env
.gitignore		.gitignore
README.md		README.md
additional_queries_exercise.py		additional_queries_exercise.py
csv_importer.py		csv_importer.py
data_analyzer.py		data_analyzer.py
db_config.py		db_config.py
main.py		main.py
requirements.txt		requirements.txt

prakash-aryan/python_mysql_csv_analyzer

Folders and files

Latest commit

History

Repository files navigation

Python MySQL CSV Analyzer

Introduction

Purpose and Educational Value

System Architecture and Components

Core Components:

Exercise Components:

Data Assets:

Environment Setup and Configuration

Prerequisites

Installation Process

Application Execution and Workflow

Step 1: Run the Main Application

Step 2: Complete the SQL Query Exercises

Exercise 1: Performance Category Classification

Exercise 2: Letter Grade Assignment

Exercise 3: CGPA Gap Analysis

Step-by-Step Exercise Completion Guide

Technical Reference Information

Database Schema Details

Sample Dataset

SQL Query Development Guide

Essential SQL Techniques for the Exercises

About

Topics

Resources

Stars

Watchers

Forks

Languages