Skip to content

mariabourbon/techschools_project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 

Repository files navigation

techschools_project

Global Tech School Data Collection and Analysis -Python and MySQL

Project Title: Global Tech School Data Collection and Analysis

Introduction: This project focuses on analyzing data from technical schools worldwide, with the primary objectives of expanding the data pool, comprehending data structures, creating an SQL database, transferring data from Jupyter to SQL, and performing data cleaning and manipulation in both Python and SQL.

The primary objectives of this project are as follows:

  • Adding School IDs: Incorporate 7 new school IDs into the web scraping code to collect more data that aligns with our project objectives.
  • Comprehend Data Frame Structures: Understand data frame structures using data cleaning and manipulation techniques.
  • Create the New SQL Database: Establish an SQL database named "project_4" to store and manage our collected data.
  • Transferring Data from Jupyter to SQL: Transfer data from Jupyter notebooks to the SQL database.
  • Data Cleaning and Manipulation: Perform data cleaning and manipulation on both Python and SQL to prepare the data for analysis.
  • Queries and Insights: Execute queries to derive insights and conclusions from the data.

Libraries and Imports:

-re: used to perform pattern matching and string manipulation when extracting data from web pages. Regular expressions are essential for identifying and capturing specific patterns or information within the text, which is crucial for web scraping.

-pandas: was employed to efficiently handle and manipulate data. We used DataFrames and Series provided by pandas to structure and organize the collected data for analysis.

-json_normalize: the json_normalize function in pandas was utilized to flatten and normalize JSON data.

-requests: was essential for making HTTP requests to websites and APIs, enabling us to retrieve data for your project. It facilitated web scraping by fetching web pages and data from online sources.

-mysql.connector: used the mysql.connector library to interact with MySQL databases. This library allowed our team to create a connection to the MySQL database, insert, query, and manage data. It played a key role in storing and retrieving data for analysis.

-getpass: was used for securely inputting database passwords.

-create_engine from sqlalchemy: was employed to establish a connection to the SQL database.

-matplotlib.pyplot: for data visualization. This library allowed our group to create charts to visually represent my data analysis results, making it easier to convey insights and findings.

Conclusion: We have accurately planned a comprehensive data model that allows us to extract valuable insights and answers to our questions regarding technical schools worldwide. This methodical approach allowed us to uncover and derive meaningful responses to our specific questions, presenting us with a well-structured and professionally executed solution for our data analysis needs.

About

Global Tech School Data Collection and Analysis | Python and MySQL

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published