* Type the response data that Google Custom Search API return.
* Exclude some 'letras.mus.br' pages that do not contain lyric.
* Exclude results from Musixmatch as we cannot access their pages.
* Improve parsing of the URL title:
- Handle long URL titles that get truncated (end with ellipsis) for
long searches
- Remove domains starting with 'www'
- Parse the title AND the artist. Previously this would only parse the
title, and fetch lyrics even when the artist did not match.
* Remove now redundant credits cleanup and checks for valid lyrics.
Tidy up 'Google.is_page_candidate' method and remove 'Google.sluggify'
method which was a duplicate of 'slug'.
Since 'GeniusFetchTest' only tested whether the artist name is cleaned
up (the rest of the functionality is patched), remove it and move its
test cases to the 'test_slug' test.
Having removed it I fuond that only the Genius lyrics changed: it had en
extra new line. Thus I defined a function 'collapse_newlines' which now
gets called for the Genius lyrics.
Improve requests performance with requests.Session which uses connection
pooling for repeated requests to the same host.
Additionally, this centralizes request configuration, making sure that
we use the same timeout and provide beets user agent for all requests.
This commit introduces a distance threshold mechanism for the Genius and
Google backends.
- Create a new `SearchBackend` base class with a method `check_match`
that performs checking.
- Start using undocumented `dist_thresh` configuration option for good,
and mention it in the docs. This controls the maximum allowable
distance for matching artist and title names.
These changes aim to improve the accuracy of lyrics matching, especially
when there are slight variations in artist or title names, see #4791.
## Description
Added a quick checkpoint to ensure the config file is set up correctly
prior to users importing their music library. This was something I
discovered later after running into an issue with my config file and
hope it helps new users avoid the issues I had.
## New config option `keep_existing` (#4982)
- Fix the behavior of the`force` option. Previously disabling the option
had "incomplete" behaviour:
- If content was found, a whitelist check was issued and if valid the
plugin exited early and logged ("keep").
- This whitelist check was not aware of multiple genres (separated
typically by a string like `, `), thus it failed erased all existing
genres and overwrote with new ones.
_**This didn't feel like a typical behaviour of a `force` option, which
this PR tries to improve as follows...**_
- String-separated multi-genres are now compiled into a list and
depending on the `whitelist` option are kept and enriched with freshly
fetched last.fm genres.
- If force is off, pre-populated tags are not touched.
- A lot of refactoring was done, some absolutely required, some as a
preparation for future work on the plugin.
- The main processing function `_get_genre` was massively overhauled and
got a new `pytest.mark.parametrize` test which includes much more test
cases.