Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Lyrics: Refactor Genius, Google backends, and consolidate common func…
…tionality (#5474) ### Bug Fixes - Fixed #4791: Resolved an issue with the Genius backend where it couldn't match lyrics if there was a slight variation in the artist's name. ### Plugin Enhancements * **Session Management**: Introduced a `TimeoutSession` to enable connection pooling and maintain consistent configuration across requests. * **Error Handling**: Centralized error handling logic in a new `RequestsHandler` class, which includes methods for retrieving either HTML text or JSON data. * **Logging**: Added methods to ensure the backend name is included in log messages. ### Configuration Changes * Added a new `dist_thresh` field to the configuration, allowing users to control the maximum tolerable mismatch between the artist and title of the lyrics search result and their item. Interestingly, this field was previously available (though undocumented) and used in the `Tekstowo` backend. Now, this threshold has also been applied to **Genius** and **Google** search logic. ### Backend Updates * All backends that perform searches now validate each result against the configured `dist_thresh`. #### Genius * Removed the need to scrape HTML tags for lyrics; instead, lyrics are now parsed from the JSON data embedded in the HTML. This change should reduce our vulnerability to Genius' frequent alterations in their HTML structure. * Documented the structure of their search JSON data. #### Google * Typed the response data returned by the Google Custom Search API. * Excluded certain pages under **https://letras.mus.br** that do not contain lyrics. * Excluded all results from MusiXmatch, as we cannot access their pages. * Improved parsing of URL titles (used for matching item/lyrics artist/title): - Handled results from long search queries where URL titles are truncated with an ellipsis. - Enhanced URL title cleanup logic. - Added functionality to determine (or rather, guess) not only the track title but also the artist from the URL title. * Similar to #5406, search results are now compared to the original item and sorted by distance. Results exceeding the configured `dist_thresh` value are discarded. The previous functionality simply selected the first result containing the track's title in its URL, which often led to returning lyrics for the wrong artist, particularly for short track titles. * Since we now fetch lyrics confidently, redundant checks for valid lyrics and credits cleanup have been removed. ### HTML Cleanup * Organized regex patterns into a new `Html` class. * Adjusted patterns to ensure new lines between blocks of lyrics text scraped from `letras.mus.br` and `musica.com`. * Modified patterns to scrape missing lyrics text on `paroles.net` and `lacoccinelle.net`. See the diff in `test/plugins/lyrics_page.py`.
- Loading branch information