This project consists of two main parts. The first part is a Python script that implements a word search algorithm on a 2D board. The second part is a web scraper that extracts issue reports from the Apache Camel project on Jira and stores them in a .csv file. The web scraper uses both BeautifulSoup and Selenium libraries to handle static and dynamic web content respectively.
In this project, BeautifulSoup is used to parse the static HTML content of the webpage, while Selenium is used to handle the dynamic content (the comments), which are loaded by JavaScript and cannot be accessed directly by a simple HTTP request. This approach was chosen to demonstrate the difference between static and dynamic content in web scraping, and how different tools can be used to handle different types of web content. However, in a real project, it would be more efficient to use Selenium to load the entire page (both static and dynamic content), and then parse the page source with BeautifulSoup. This would reduce the number of requests to the server and potentially speed up the scraping process. But for the purpose of demonstrating the difference between static and dynamic content in web scraping, this approach is perfectly fine.
This project respects the robots.txt file of the Apache Jira website. Please note that the robots.txt file can change over time, and this crawler was designed in accordance with the robots.txt file as of January 12, 2024. Future users of this code should check the current robots.txt file and adjust the crawler behavior as necessary to respect any changes.