Skip to content

This project consists of two main parts. The first part is a Python script that implements a word search algorithm on a 2D board. The second part is a web scraper that extracts issue reports from the Apache Camel project on Jira and stores them in a .csv file.

Notifications You must be signed in to change notification settings

aref98/WordSearch-and-JiraScraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

WordSearch-and-JiraScraper

This project consists of two main parts. The first part is a Python script that implements a word search algorithm on a 2D board. The second part is a web scraper that extracts issue reports from the Apache Camel project on Jira and stores them in a .csv file. The web scraper uses both BeautifulSoup and Selenium libraries to handle static and dynamic web content respectively.

Justification for Using Both BeautifulSoup and Selenium

In this project, BeautifulSoup is used to parse the static HTML content of the webpage, while Selenium is used to handle the dynamic content (the comments), which are loaded by JavaScript and cannot be accessed directly by a simple HTTP request. This approach was chosen to demonstrate the difference between static and dynamic content in web scraping, and how different tools can be used to handle different types of web content. However, in a real project, it would be more efficient to use Selenium to load the entire page (both static and dynamic content), and then parse the page source with BeautifulSoup. This would reduce the number of requests to the server and potentially speed up the scraping process. But for the purpose of demonstrating the difference between static and dynamic content in web scraping, this approach is perfectly fine.

Important Note

This project respects the robots.txt file of the Apache Jira website. Please note that the robots.txt file can change over time, and this crawler was designed in accordance with the robots.txt file as of January 12, 2024. Future users of this code should check the current robots.txt file and adjust the crawler behavior as necessary to respect any changes.

About

This project consists of two main parts. The first part is a Python script that implements a word search algorithm on a 2D board. The second part is a web scraper that extracts issue reports from the Apache Camel project on Jira and stores them in a .csv file.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages