Commit graph

173 commits

Author SHA1 Message Date
Šarūnas Nejus
734bcc28a8
Append source to the lyrics 2025-01-27 10:56:53 +00:00
Šarūnas Nejus
bdc564a573
Tidy up handling of backends 2025-01-27 10:56:53 +00:00
Šarūnas Nejus
70554640e5
Create Html class for cleaning up the html text
Additionally, improve HTML pre-processing:

* Ensure a new line between blocks of lyrics text from letras.mus.br.
* Parse a missing last block of lyrics text from lacocinelle.net.
* Parse a missing last block of lyrics text from paroles.net.
* Fix encoding issues with AZLyrics by setting response encoding to
  None, allowing `requests` to handle it.
2025-01-27 10:56:52 +00:00
Šarūnas Nejus
c5c4138d66
Google: Refactor and improve
* Type the response data that Google Custom Search API return.
* Exclude some 'letras.mus.br' pages that do not contain lyric.
* Exclude results from Musixmatch as we cannot access their pages.
* Improve parsing of the URL title:
  - Handle long URL titles that get truncated (end with ellipsis) for
    long searches
  - Remove domains starting with 'www'
  - Parse the title AND the artist. Previously this would only parse the
    title, and fetch lyrics even when the artist did not match.
* Remove now redundant credits cleanup and checks for valid lyrics.
2025-01-27 10:56:52 +00:00
Šarūnas Nejus
12c5eaae5e
Unite Genius, Tekstowo and Google backends under the same interface 2025-01-27 10:56:52 +00:00
Šarūnas Nejus
745c5eb9f0
Genius: refactor and simplify 2025-01-27 10:56:52 +00:00
Šarūnas Nejus
f94d2767f9
Use a single slug implementation
Tidy up 'Google.is_page_candidate' method and remove 'Google.sluggify'
method which was a duplicate of 'slug'.

Since 'GeniusFetchTest' only tested whether the artist name is cleaned
up (the rest of the functionality is patched), remove it and move its
test cases to the 'test_slug' test.
2025-01-27 08:50:50 +00:00
Šarūnas Nejus
dd9f178fff
Do not try to strip cruft from the parsed lyrics text.
Having removed it I fuond that only the Genius lyrics changed: it had en
extra new line. Thus I defined a function 'collapse_newlines' which now
gets called for the Genius lyrics.
2025-01-27 08:50:50 +00:00
Šarūnas Nejus
7c2fb31136
Leave a single chef in the kitchen 2025-01-27 08:50:50 +00:00
Šarūnas Nejus
cb29605bfd
Include class name in the log messages 2025-01-27 08:50:50 +00:00
Šarūnas Nejus
283c513c72
Centralise request error handling 2025-01-27 08:50:49 +00:00
Šarūnas Nejus
2ff57505d8
Apply dist_thresh to Genius and Google backends
This commit introduces a distance threshold mechanism for the Genius and
Google backends.

- Create a new `SearchBackend` base class with a method `check_match`
  that performs checking.
- Start using undocumented `dist_thresh` configuration option for good,
  and mention it in the docs. This controls the maximum allowable
  distance for matching artist and title names.

These changes aim to improve the accuracy of lyrics matching, especially
when there are slight variations in artist or title names, see #4791.
2025-01-27 08:50:48 +00:00
J0J0 Todos
5d9d8840ae Fix a lastgenre test 2025-01-22 18:20:41 +01:00
J0J0 Todos
261379f395 Add test_get_genre case proving no-force keeps any
because _get_existing_genres does not rely on configured separator.
2025-01-22 18:07:43 +01:00
J0J0 Todos
34b9021772 Fix test_get_genrer - configure first
otherwise self.whitelist is "set up" before the test cases config is
set.
2025-01-22 18:07:43 +01:00
J0J0 Todos
1219b43af4 Fix a test: no-force always early returns 2025-01-21 17:48:21 +01:00
J0J0 Todos
d703a7f712 Fix tests for log-label changes 2025-01-21 17:14:15 +01:00
J0J0 Todos
6e3f5b3127 Fix type hints, Refactor existing genres method
- Rename method _dedup_genre, since it's only used for
  finalizing/polishing existing genres.
- Return separator-delimited string already.
- Decide on not passing "separator" to methods, it's a config
  setting available throughout the plugin. Assign to variable where
  useful for readability though.
- In the force branch, remove re-assigning keep_genres to empty list.
- Fix a test. Existing genres are "polished" now, which means:
  configured title_case is applied.
- Fix/add type hints on all touched and new methods
2025-01-21 17:04:03 +01:00
J0J0 Todos
6cd000750a _resolve_genre as list tests, add test_to_delimited_string
- Adapt tests to _resolve_genres returning a list with not yet formatted genres.
- Rename and adapt test_count -> test_to_delimited_string. Note that the
  new function does not apply whitelist, prefer anything. It just cuts
  to count and formats!
2025-01-21 17:04:02 +01:00
J0J0 Todos
dd3a74ca4d Fix lastgenre limit to count test
- No idea where a missing separator (which is default) could
  happen...just set it explicitely.
- Since we now refactored fetch_genre to returning a list we can add
  mock multiple fetched gernes easier.
2025-01-21 17:04:02 +01:00
J0J0 Todos
b4590ec3e0 Fix/add lastgenre fallback tests 2025-01-21 17:04:02 +01:00
J0J0 Todos
50e2619ed2 Refactor lastgenre mocked fetchers return list 2025-01-21 17:04:02 +01:00
J0J0 Todos
3e452e7b76 lastgenre test _get_genre for renamed keep_existing
and decide to use the original default whitelist instead of trying to
mock it. Some of the existing tests do it that way as well.
2025-01-21 17:04:02 +01:00
J0J0 Todos
1c14574e85 Add lastgenre testcase with unicode \0 separator 2025-01-21 17:04:02 +01:00
J0J0 Todos
a434ecfe7b Refactor _get_genre test to parametrized pytest 2025-01-21 17:04:02 +01:00
J0J0 Todos
70d641c556 Experiment with test_lastgenre 2025-01-21 17:04:02 +01:00
Šarūnas Nejus
bb5f3e0593
lyrics: sort lrclib lyrics by synced field and query search first
I found that the `/get` endpoint often returns incorrect or unsynced
lyrics, while results returned by the `/search` more accurate options.

Thus I reversed the change in the previous commit to prioritize
searching first.
2025-01-20 13:14:37 +00:00
Šarūnas Nejus
618c3a21a6
Try to GET LRCLib lyrics before searching 2025-01-19 18:39:54 +00:00
Šarūnas Nejus
2fb72c65a5
lyrics/LRCLib: handle instrumental lyrics 2025-01-19 15:19:44 +00:00
Šarūnas Nejus
a398fbe62d
LRCLib: Improve exception handling 2025-01-19 15:19:44 +00:00
Šarūnas Nejus
8d4a569291
Fix fetching lyrics from lrclib
Adjust the base URL to perform a '/search' instead of attempting to
'/get' specific lyrics where we're unlikely to find lyrics for the
specific combination of album, artist, track names and the duration (see
https://lrclib.net/docs).

Since we receive an array of matching lyrics candidates, rank them by
their duration similarity to the item's duration, and whether they
contain synced lyrics.
2025-01-19 15:19:41 +00:00
Šarūnas Nejus
e5c006d99d
Test lyrics texts explicitly
Add explicit checks for lyrics texts fetched from the tested sources.

- Introduced `LyricsPage` class to represent lyrics pages for integrated
  tests.
- Configured expected lyrics for each of the URLs that are being
  fetched.
- Consolidated integrated tests in a new `TestLyricsSources` class.
- Mocked Google Search API to return the lyrics page under test.
2025-01-19 01:54:53 +00:00
Šarūnas Nejus
c250bfa724
Google: test the entire fetch method 2025-01-19 01:48:04 +00:00
Šarūnas Nejus
334bbde826
Make album, duration required for LyricsPlugin.fetch
Since at least one Backend requires album` and `duration` arguments
(`LRCLib`), the caller (`LyricsPlugin.fetch_item_lyrics`) must always
provide them.

Since they need to provided, we need to enforce this by defining them as
positional arguments.

Why is this important? I found that integrated `LRCLib` tests have been
passing, but they called `LRCLib.fetch` with values for `artist` and
`title` fields only, while the actual functionality *always* provides
values for `album` and `duration` fields too.

When I adjusted the test to provide values for the missing fields,
I found that it failed. This makes sense: Lib `album` and `duration`
filters are strict on LRCLib, so I was not surprised the lyrics could
not be found.

Thus I adjusted `LRCLib` backend implementation to only filter by each
of these fields when their values are truthy.
2025-01-19 01:48:04 +00:00
Šarūnas Nejus
0a12d07a94
Do not attempt to fetch lyrics with empty data
Modified `search_pairs` function in `lyrics.py` to:

* Firstly strip each of `artist`, `artist_sort` and `title` fields
* Only generate alternatives if both `artist` and `title` are not empty
* Ensure that `artist_sort` is not empty and not equal to artist (ignoring
  case) before appending it to the artists

Extended tests to cover the changes.
2025-01-19 01:48:04 +00:00
Šarūnas Nejus
767a83fbe6
Refactor utils test cases to use pytest.mark.parametrize 2025-01-19 01:48:04 +00:00
Šarūnas Nejus
f674d65a65
Refactor search_pairs tests to use pytest parametrize
- Consolidated multiple test cases into parameterized tests for better
  readability and maintainability.
- Simplified assertions by comparing lists of actual and expected
  artists/titles.
- Added `unexpected_empty_artist` marker to handle cases which
  unexpectedly return an empty artist. This seems to be happen when
  `artist_sort` field is empty.
2025-01-19 01:48:04 +00:00
Šarūnas Nejus
14fd151f80
Refactor test_slug to pytest 2025-01-19 01:48:04 +00:00
Šarūnas Nejus
67e0af526c
Remove outdated GeniusLyrics test
The test for GeniusLyrics was heavily patched and no longer provided
useful coverage. It has been removed to clean up the test suite.
2025-01-19 01:48:04 +00:00
Šarūnas Nejus
35dcfe508a
Configure integrated lyrics tests to only run on lyrics code changes 2025-01-19 01:48:03 +00:00
Šarūnas Nejus
fc49902f3a
Refactor lyrics backend tests to use pytest fixtures
- Replaced unittest.mock with pytest fixtures for better test isolation and readability.
- Simplified test cases by using parameterized tests.
- Added `requests-mock` dependency to `pyproject.toml` and `poetry.lock`.
- Removed redundant helper functions and classes.
2025-01-19 01:33:15 +00:00
Šarūnas Nejus
b9bc2cbc04
lyrics: isolate test configuration
(#5102) Refactor lyrics tests which depended on local developer beets
configuration.
2025-01-19 01:33:14 +00:00
Šarūnas Nejus
29a3dd5084
Remove redundant lyrics test files 2025-01-19 01:32:17 +00:00
Šarūnas Nejus
e99d457c9d
Rewrite lyrics integration tests 2025-01-19 01:32:17 +00:00
Stefano Rivera
bcc79a5b09 Future proof BucketPluginTest.test_year_single_year_last_folder
2025 won't be in the future, forever.

Fixes: https://bugs.debian.org/1091495
2024-12-27 16:28:38 -04:00
Šarūnas Nejus
5c81f94cf7
Move imports required for typing under the TYPE_CHECKING block 2024-12-10 06:10:04 +00:00
Šarūnas Nejus
161b0522bb
Update deprecated imports 2024-12-10 06:10:04 +00:00
Šarūnas Nejus
51f9dd229e
Use PEP585 lowercase collections typing annotations 2024-12-10 06:10:03 +00:00
Šarūnas Nejus
7be8f9c97a
Update CI config, minimum ruff version, docs and add changelog note 2024-12-10 06:10:03 +00:00
Edgars Supe
09360259cc lyrics: Fallback to plain lyrics if synced not available 2024-12-07 19:08:37 +02:00