A comprehensive collection of hands-on data analysis exercises and projects using Python, Pandas, and Jupyter Notebooks.
This repository contains a structured learning path for aspiring data analysts, covering fundamental to advanced data manipulation, cleaning, and analysis techniques. Each activity builds upon previous concepts, providing a progressive learning experience.
Data_Analyst_BootCamp/
│
├── 📓 Activity Notebooks
│ ├── Activity_1.ipynb # Python fundamentals & basic operations
│ ├── 2nd_Activty.ipynb # Data types, lists, tuples, and DataFrames
│ └── Lab3.ipynb # Conditional logic and data type operations
│ ├── Activity_4.ipynb # Data exploration with Iris dataset
│ ├── Activity_6 (1).ipynb # Data cleaning and missing value handling
│ ├── Activity_7.ipynb # Working with multiple datasets (Netflix, Spotify, etc.)
│ ├── Activity_8.ipynb # Advanced data cleaning and preprocessing
│ ├── Activty_11.ipynb # Data organization: sorting, filtering, slicing
│ ├── Activty_12.ipynb # Advanced data manipulation with multiple datasets
│
├── 📁 Sample Datasets
│ ├── netflix_sample.csv
│ ├── online_retail_sample.csv
│ ├── sales_data2.csv
│ ├── spotify_sample.csv
│ ├── superstore_sample.csv
│ ├── train_and_test2.csv
│ └── world_population_sample.csv
│
└── 📄 Documentation
└── Student_Quick_Reference.md
- Python 3.7 or higher
- Jupyter Notebook or JupyterLab
- Basic understanding of Python syntax
-
Clone the repository
git clone https://github.com/asadjan4611/Data_Analyst_BootCamp.git cd Data_Analyst_BootCamp -
Install required packages
pip install pandas numpy seaborn openpyxl jupyter
-
Launch Jupyter Notebook
jupyter notebook
- Basic Python operations
- Working with lists and tuples
- Introduction to Pandas DataFrames
- Data manipulation basics
- Lists, tuples, and dictionaries
- Creating and manipulating DataFrames
- Column operations and renaming
- Data filtering and merging
- Boolean operations
- Conditional statements
- Date and time operations
- String manipulation
- Working with different data types
- Loading and exploring datasets (Iris dataset)
- Data inspection techniques (
head(),info(),describe()) - Working with JSON data
- File system operations
- Handling missing values
- Filling null values with mean/median
- Dropping rows and columns
- Data quality assessment
- Working with Netflix, Spotify, Superstore datasets
- Missing value detection and removal
- Data filtering and aggregation
- Exporting data to various formats (CSV, Excel, JSON, XML)
- ETL pipeline basics
- String manipulation and cleaning
- Removing special characters
- Case standardization
- Email validation
- Outlier detection and treatment
- Data type conversion
- Sorting data (single and multiple columns)
- Filtering with conditions
- Slicing data (
ilocvsloc) - Transposing and pivoting
- Appending and concatenating datasets
- Truncating data
- Working with Iris, Tips, and Flights datasets
- Complex sorting and filtering operations
- Data slicing and transposition
- Appending and truncating operations
- Multi-dataset workflows
- ✅ Data Loading: Reading CSV, Excel, and JSON files
- ✅ Data Exploration: Understanding dataset structure and statistics
- ✅ Data Cleaning: Handling missing values, duplicates, and outliers
- ✅ Data Transformation: Sorting, filtering, grouping, and aggregating
- ✅ Data Manipulation: Merging, concatenating, and reshaping data
- ✅ Data Export: Saving processed data in multiple formats
- ✅ String Operations: Cleaning and standardizing text data
- ✅ Data Validation: Email validation, data type conversion
| Dataset | Description | Use Case |
|---|---|---|
| Iris | Classic classification dataset | Data exploration and manipulation |
| Tips | Restaurant tipping data | Aggregation and grouping exercises |
| Flights | Airline passenger data | Time series and sorting operations |
| Netflix | Movie/show catalog | Missing value handling |
| Spotify | Music popularity data | Data transformation |
| Superstore | Sales transaction data | Filtering and aggregation |
| Sales Data | E-commerce transactions | Advanced cleaning and preprocessing |
- Beginner → Start with
Activity_1.ipynband2nd_Activty.ipynb - Intermediate → Progress to
Activity_4.ipynb,Activity_6, andActivity_7 - Advanced → Master
Activity_8,Activty_11, andActivty_12
- Code Organization: Clean, well-commented code
- Data Validation: Checking data quality before processing
- Error Handling: Graceful handling of missing data
- Documentation: Clear explanations and markdown cells
- Reproducibility: Consistent data processing workflows
- Pandas documentation team
- Seaborn for providing sample datasets
- Jupyter project for the excellent notebook environment
- Add data visualization exercises
- Include machine learning basics
- Add SQL integration examples
- Create video tutorials
- Add more real-world datasets
⭐ If you find this repository helpful, please consider giving it a star! ⭐
Made with ❤️ for the data analysis community