Tidy up 'Google.is_page_candidate' method and remove 'Google.sluggify'
method which was a duplicate of 'slug'.
Since 'GeniusFetchTest' only tested whether the artist name is cleaned
up (the rest of the functionality is patched), remove it and move its
test cases to the 'test_slug' test.
Having removed it I fuond that only the Genius lyrics changed: it had en
extra new line. Thus I defined a function 'collapse_newlines' which now
gets called for the Genius lyrics.
Improve requests performance with requests.Session which uses connection
pooling for repeated requests to the same host.
Additionally, this centralizes request configuration, making sure that
we use the same timeout and provide beets user agent for all requests.
This commit introduces a distance threshold mechanism for the Genius and
Google backends.
- Create a new `SearchBackend` base class with a method `check_match`
that performs checking.
- Start using undocumented `dist_thresh` configuration option for good,
and mention it in the docs. This controls the maximum allowable
distance for matching artist and title names.
These changes aim to improve the accuracy of lyrics matching, especially
when there are slight variations in artist or title names, see #4791.
## Description
Added a quick checkpoint to ensure the config file is set up correctly
prior to users importing their music library. This was something I
discovered later after running into an issue with my config file and
hope it helps new users avoid the issues I had.
## New config option `keep_existing` (#4982)
- Fix the behavior of the`force` option. Previously disabling the option
had "incomplete" behaviour:
- If content was found, a whitelist check was issued and if valid the
plugin exited early and logged ("keep").
- This whitelist check was not aware of multiple genres (separated
typically by a string like `, `), thus it failed erased all existing
genres and overwrote with new ones.
_**This didn't feel like a typical behaviour of a `force` option, which
this PR tries to improve as follows...**_
- String-separated multi-genres are now compiled into a list and
depending on the `whitelist` option are kept and enriched with freshly
fetched last.fm genres.
- If force is off, pre-populated tags are not touched.
- A lot of refactoring was done, some absolutely required, some as a
preparation for future work on the plugin.
- The main processing function `_get_genre` was massively overhauled and
got a new `pytest.mark.parametrize` test which includes much more test
cases.
- Rename method _dedup_genre, since it's only used for
finalizing/polishing existing genres.
- Return separator-delimited string already.
- Decide on not passing "separator" to methods, it's a config
setting available throughout the plugin. Assign to variable where
useful for readability though.
- In the force branch, remove re-assigning keep_genres to empty list.
- Fix a test. Existing genres are "polished" now, which means:
configured title_case is applied.
- Fix/add type hints on all touched and new methods
- If the keep_existing option is set, just remember everything for now.
- Dedup happening later on via _combine... _resolve_genres...
- Even knowing if whitelist or not is not important at this point.