trying to get a little order in the chaos. Maybe reordering and/or
moving out of the main plugin logic would be a better idea for some
methods but don't put much more refactoring into this PR to keep it
readable.
- Handle genre combination logic in a well documented helper function
that also include type hints.
- Throughout the _get_genre function rename the result variable to
new_genres to make it clearly descriptive.
- Rewrite thze _get_genre function's docstring.
- Retrieving, filtering and deduplicating present genres of Items/Albums
via separate methods.
- Implement all four cases of behaviour as described in PR#4982
- Issues:
- There is quite some unnecessary spliting of genres from strings into
lists and the other way round happening throughout the plugin.
- In the case where existing genres get "augmented" with last.fm
genres, we might end up with _more_ genres than the configured
limit.
- Default to False.
- During PR#4982 discussions we came to the conclusion that the
following behaviour would be a good new default choice:
- Keep whitelisted existing genres
- Only Fetch last.fm genres for empty tags.
- To get this we also have to change the default of the force
option!!!
- Resulting in "force: no" and "keep_allowed: yes"; see Case 4 in
PR#4982 description
- Options are not put to use yet, just defined and defaults set!
When `lastgenre.source: track` is configured,
- `lastgenre -a` _should not_ fall back to the album level genre (by
making use of the with_album=False kwarg of the Libary's get method).
- `lastgenre -a`, when finally storing the genres of _an album_, should
_not_ also write the tracks genres (by making use of the inherit=False
kwarg of the Album's store method.
I found that the `/get` endpoint often returns incorrect or unsynced
lyrics, while results returned by the `/search` more accurate options.
Thus I reversed the change in the previous commit to prioritize
searching first.
Adjust the base URL to perform a '/search' instead of attempting to
'/get' specific lyrics where we're unlikely to find lyrics for the
specific combination of album, artist, track names and the duration (see
https://lrclib.net/docs).
Since we receive an array of matching lyrics candidates, rank them by
their duration similarity to the item's duration, and whether they
contain synced lyrics.
Since at least one Backend requires album` and `duration` arguments
(`LRCLib`), the caller (`LyricsPlugin.fetch_item_lyrics`) must always
provide them.
Since they need to provided, we need to enforce this by defining them as
positional arguments.
Why is this important? I found that integrated `LRCLib` tests have been
passing, but they called `LRCLib.fetch` with values for `artist` and
`title` fields only, while the actual functionality *always* provides
values for `album` and `duration` fields too.
When I adjusted the test to provide values for the missing fields,
I found that it failed. This makes sense: Lib `album` and `duration`
filters are strict on LRCLib, so I was not surprised the lyrics could
not be found.
Thus I adjusted `LRCLib` backend implementation to only filter by each
of these fields when their values are truthy.
Modified `search_pairs` function in `lyrics.py` to:
* Firstly strip each of `artist`, `artist_sort` and `title` fields
* Only generate alternatives if both `artist` and `title` are not empty
* Ensure that `artist_sort` is not empty and not equal to artist (ignoring
case) before appending it to the artists
Extended tests to cover the changes.
Two google sources failed to return the expected output. I looked into
each case why parsing failed:
- lyrics on musica.com contain <aside> Google Ads
- each lyrics line on lacoccinelle.net is wrapped within alternating
<em> and <strong> tags
Thus remove these tags as part of the HTML cleanup logic.
- Printing out album/item in default format could lead to unreadable
clutter depending on the user's configured formats.
- The album's name and the individual tracks' title should be just
sufficient to provide context as well readability.
- Log like this while importing as well as in standalone runs.
It was rather confusing that the lastgenre plugin, when handling
singletons, sometimes showed that it applied genres from last.fm and
sometimes didn't (it did only in debug log). This streamlines the
behaviour:
- Change debug to info log.
- Streamline wording.
- Display details about the track.
When `lastgenre.source: track` is configured,
- `lastgenre -a` _should not_ fall back to the album level genre (by
making use of the with_album=False kwarg of the Libary's get method).
- `lastgenre -a`, when finally storing the genres of _an album_, should
_not_ also write the tracks genres (by making use of the inherit=False
kwarg of the Album's store method.
This utilises regex substitution in the substitute plugin. The previous
approach only used regex to match the pattern, then replaced it with a
static string. This change allows more complex substitutions, where the
output depends on the input.
### Example use case
Say we want to keep only the first artist of a multi-artist credit, as
in the following list:
```
Neil Young & Crazy Horse -> Neil Young
Michael Hurley, The Holy Modal Rounders, Jeffrey Frederick & The Clamtones -> Michael Hurley
James Yorkston and the Athletes -> James Yorkston
````
This would previously have required three separate rules, one for each
resulting artist. By using a regex substitution, we can get the desired
behaviour in a single rule:
```yaml
substitute:
^(.*?)(,| &| and).*: \1
```
(Capture the text until the first `,` ` &` or ` and`, then use that
capture group as the output)
### Notes
I've kept the previous behaviour of only applying the first matching
rule, but I'm not 100% sure it's the ideal approach.
I can imagine both cases where you want to apply several rules in
sequence and cases where you want to stop after the first match.
- Refactored Tekstowo backend to fetch lyrics directly from song pages.
- Added `encode` method to convert artist and title to their URL format,
where non-alphanumeric characters are replaced with underscores.
- Removed the now redundant search functionality and associated tests.
- Simplified `extract_lyrics` method to directly parse lyrics without
any checks.