This commit introduces a distance threshold mechanism for the Genius and
Google backends.
- Create a new `SearchBackend` base class with a method `check_match`
that performs checking.
- Start using undocumented `dist_thresh` configuration option for good,
and mention it in the docs. This controls the maximum allowable
distance for matching artist and title names.
These changes aim to improve the accuracy of lyrics matching, especially
when there are slight variations in artist or title names, see #4791.
- Rename method _dedup_genre, since it's only used for
finalizing/polishing existing genres.
- Return separator-delimited string already.
- Decide on not passing "separator" to methods, it's a config
setting available throughout the plugin. Assign to variable where
useful for readability though.
- In the force branch, remove re-assigning keep_genres to empty list.
- Fix a test. Existing genres are "polished" now, which means:
configured title_case is applied.
- Fix/add type hints on all touched and new methods
- Adapt tests to _resolve_genres returning a list with not yet formatted genres.
- Rename and adapt test_count -> test_to_delimited_string. Note that the
new function does not apply whitelist, prefer anything. It just cuts
to count and formats!
- No idea where a missing separator (which is default) could
happen...just set it explicitely.
- Since we now refactored fetch_genre to returning a list we can add
mock multiple fetched gernes easier.
I found that the `/get` endpoint often returns incorrect or unsynced
lyrics, while results returned by the `/search` more accurate options.
Thus I reversed the change in the previous commit to prioritize
searching first.
Adjust the base URL to perform a '/search' instead of attempting to
'/get' specific lyrics where we're unlikely to find lyrics for the
specific combination of album, artist, track names and the duration (see
https://lrclib.net/docs).
Since we receive an array of matching lyrics candidates, rank them by
their duration similarity to the item's duration, and whether they
contain synced lyrics.
Add explicit checks for lyrics texts fetched from the tested sources.
- Introduced `LyricsPage` class to represent lyrics pages for integrated
tests.
- Configured expected lyrics for each of the URLs that are being
fetched.
- Consolidated integrated tests in a new `TestLyricsSources` class.
- Mocked Google Search API to return the lyrics page under test.
Since at least one Backend requires album` and `duration` arguments
(`LRCLib`), the caller (`LyricsPlugin.fetch_item_lyrics`) must always
provide them.
Since they need to provided, we need to enforce this by defining them as
positional arguments.
Why is this important? I found that integrated `LRCLib` tests have been
passing, but they called `LRCLib.fetch` with values for `artist` and
`title` fields only, while the actual functionality *always* provides
values for `album` and `duration` fields too.
When I adjusted the test to provide values for the missing fields,
I found that it failed. This makes sense: Lib `album` and `duration`
filters are strict on LRCLib, so I was not surprised the lyrics could
not be found.
Thus I adjusted `LRCLib` backend implementation to only filter by each
of these fields when their values are truthy.
Modified `search_pairs` function in `lyrics.py` to:
* Firstly strip each of `artist`, `artist_sort` and `title` fields
* Only generate alternatives if both `artist` and `title` are not empty
* Ensure that `artist_sort` is not empty and not equal to artist (ignoring
case) before appending it to the artists
Extended tests to cover the changes.
- Consolidated multiple test cases into parameterized tests for better
readability and maintainability.
- Simplified assertions by comparing lists of actual and expected
artists/titles.
- Added `unexpected_empty_artist` marker to handle cases which
unexpectedly return an empty artist. This seems to be happen when
`artist_sort` field is empty.
- Replaced unittest.mock with pytest fixtures for better test isolation and readability.
- Simplified test cases by using parameterized tests.
- Added `requests-mock` dependency to `pyproject.toml` and `poetry.lock`.
- Removed redundant helper functions and classes.
In order to include the table name for fields in this query, use the
`field_query` method.
Since `AnyFieldQuery` is just an `OrQuery` under the hood, remove it and
construct `OrQuery` explicitly instead.
Unify query creation logic from
- queryparse.py:construct_query_part,
- Model.field_query,
- DefaultTemplateFunctions._tmpl_unique
to a single implementation under `LibModel.field_query` class method.
This method should be used for query resolution for model fields.
Ubuntu version in GitHub Actions has recently been upgraded to 24.04:
https://github.com/actions/runner-images/issues/10636)
This meant that pandoc was upgraded and it changed the way markdown is
formatted by default.
# Improve release notes formatting / changelog conversion from rst to md
During our last release, we discovered issues with changelog formatting.
This PR improves and fixes several aspects:
## Changes
- Rewrite the changelog conversion logic to be more robust and
maintainable
- Fix indentation issues with nested bullet points
- Improve handling of long section headers
- Order bullet points alphabetically within sections for better
readability
- Use Sphinx `objects.inv` to resolve references and include links to
the documentation in _Markdown_
- Add tests to prevent formatting regressions
- Add pandoc as a dependency for Ubuntu CI builds
- Ensure documentation is built before generating changelog
## Problem
A regression was introduced when adjusting the track matching logic to
use `lapjv` instead of `munkres`. The `lapjv` algorithm returns `-1` for
unmatched items, which wasn't being handled correctly in the matching
logic. This caused incorrect track assignments when importing new music.
## Solution
- Modified the mapping creation to filter out unmatched items (where
index is `-1`)
- Updated test case to properly catch this scenario
These tests depend on certain `track_length_grace` and
`track_length_max` configuration which was set by other tests in this
module.
I discovered this issue when I tried to run
`test_order_works_when_track_names_are_entirely_wrong` test only
- I found that my local configuration was read and the test failed.
I had previously tested the `munkres` -> `lapjv` replacement
extensively, so I was today surprised to find that nothing gets matched
correctly when I tried importing some new tracks.
On the other hand I now remember making a small adjustment in the logic
to make autotagging tests pass which is when I introduced a bug: I did
not realize that `lapjv` returns index '-1' for each unmatched item.
This issue did not get caught by tests because this 'unmatched' item
index '-1' anecdotally ended up pointing to the last (expected) item in
the test making it pass.
This commit adjusts the aforementioned test to catch this issue and
fixes the logic to correctly identify unmatched tracks.
This utilises regex substitution in the substitute plugin. The previous
approach only used regex to match the pattern, then replaced it with a
static string. This change allows more complex substitutions, where the
output depends on the input.
### Example use case
Say we want to keep only the first artist of a multi-artist credit, as
in the following list:
```
Neil Young & Crazy Horse -> Neil Young
Michael Hurley, The Holy Modal Rounders, Jeffrey Frederick & The Clamtones -> Michael Hurley
James Yorkston and the Athletes -> James Yorkston
````
This would previously have required three separate rules, one for each
resulting artist. By using a regex substitution, we can get the desired
behaviour in a single rule:
```yaml
substitute:
^(.*?)(,| &| and).*: \1
```
(Capture the text until the first `,` ` &` or ` and`, then use that
capture group as the output)
### Notes
I've kept the previous behaviour of only applying the first matching
rule, but I'm not 100% sure it's the ideal approach.
I can imagine both cases where you want to apply several rules in
sequence and cases where you want to stop after the first match.
This PR refactors the test codebase by removing redundant functions and
simplifying item and album creation. Key changes include:
- Removed redundant `_item_ident` index tracker from `_common.py`.
- Removed `album` function from `_common.py` replacing it with direct
`library.Album` invocations.
- Removed `generate_album_info` and `generate_track_info` functions,
replacing them directly with `TrackInfo` and `AlbumInfo`.
- Updated `setup.cfg` to exclude test helper files from coverage
reports.
- Adjusted the tests regarding the changes, and simplified
`test_mbsync.py`.
These functions were used to generate mock data for tests but have been
replaced with direct instantiation of AlbumInfo and TrackInfo objects.
This change simplifies the test code and removes unnecessary helper
functions.
- Refactored Tekstowo backend to fetch lyrics directly from song pages.
- Added `encode` method to convert artist and title to their URL format,
where non-alphanumeric characters are replaced with underscores.
- Removed the now redundant search functionality and associated tests.
- Simplified `extract_lyrics` method to directly parse lyrics without
any checks.
Since the `for_artist` keyword has been removed from
`ftintitle.contains_feat`, the unit tests need to be updated.
This includes the deletion of the test cases that test the
`for_artist=True` delimiters.
The previous version of the `plugins.feat_tokens` regular expression
only matched "feat. X" parts if preceded by a space. This caused missed
detections in the `ftintitle.contains_feat` function.
This commit adds unit tests for the updated regex that also matches
"feat. X" parts within parentheses and brackets
The variable in `test_ordered_enum` was flagged for naming issues, and
I noticed that `OrderedEnum` is essentially `enum.IntEnum`.
I guess `OrderedEnum` exists because it was created before
`enum.IntEnum` was made available in Python 3.4. We do not need it
anymore though, so it's now gone.
* Replace `noqa` comments in `assert...` method definitions with
a configuration option to ignore these names.
* Use the `__all__` variable to specify importable items from the
module, replacing `*` imports and `noqa` comments for unused imports.
* Address issues with poorly named variables and methods by renaming
them appropriately.
- Fix imports
- Fix pytest issues
- Do not assign lambda as variable
- Use isinstance instead of type to check type
- Rename ambiguously named variables
- Name custom errors with Error suffix