This project involves analyzing a movie dataset to explore relationships between budget, gross earnings, and ratings. The analysis includes correlation studies, visualizations, and identification of key trends and top-performing companies.
- Data Exploration: Initial inspection, handling missing values, and outlier detection.
- Correlation Analysis: Calculation and visualization of correlations using Pearson, Kendall, and Spearman methods.
- Data Visualization: Scatter plots, box plots, and heatmaps to illustrate data trends.
- Revenue Analysis: Identification of top-performing companies based on gross revenue.
- Tools Used: Python libraries including Pandas, Seaborn, and Matplotlib.
- Python 3.x
- Pandas
- NumPy
- Seaborn
- Matplotlib
- Clone the repository:
git clone https://github.com/yourusername/comprehensive-movie-data-analysis.git
- Navigate to the project directory:
cd comprehensive-movie-data-analysis
- Install the required libraries:
pip install pandas numpy seaborn matplotlib
-
Place your dataset in the project directory. The dataset should be named
movies.csv
. -
Run the analysis script:
python analysis.py
This script performs data exploration, correlation analysis, and generates visualizations.
analysis.py
: Python script containing data analysis and visualization code.movies.csv
: Dataset containing movie information (budget, gross earnings, ratings, etc.).README.md
: This file.
Feel free to open issues or submit pull requests if you have suggestions or improvements.
This project is licensed under the MIT License - see the LICENSE file for details.