trying to get a little order in the chaos. Maybe reordering and/or
moving out of the main plugin logic would be a better idea for some
methods but don't put much more refactoring into this PR to keep it
readable.
- Handle genre combination logic in a well documented helper function
that also include type hints.
- Throughout the _get_genre function rename the result variable to
new_genres to make it clearly descriptive.
- Rewrite thze _get_genre function's docstring.
- Retrieving, filtering and deduplicating present genres of Items/Albums
via separate methods.
- Implement all four cases of behaviour as described in PR#4982
- Issues:
- There is quite some unnecessary spliting of genres from strings into
lists and the other way round happening throughout the plugin.
- In the case where existing genres get "augmented" with last.fm
genres, we might end up with _more_ genres than the configured
limit.
Keep both options' "Configuration" chapter texts as compact as possible,
while linking to a new chapter that describes all 4 possible
combinations in detail.
- Default to False.
- During PR#4982 discussions we came to the conclusion that the
following behaviour would be a good new default choice:
- Keep whitelisted existing genres
- Only Fetch last.fm genres for empty tags.
- To get this we also have to change the default of the force
option!!!
- Resulting in "force: no" and "keep_allowed: yes"; see Case 4 in
PR#4982 description
- Options are not put to use yet, just defined and defaults set!
When `lastgenre.source: track` is configured,
- `lastgenre -a` _should not_ fall back to the album level genre (by
making use of the with_album=False kwarg of the Libary's get method).
- `lastgenre -a`, when finally storing the genres of _an album_, should
_not_ also write the tracks genres (by making use of the inherit=False
kwarg of the Album's store method.
Fixes#5102
### LRCLib lyrics backend fixes
#### Bug Fixes
- Fixed fetching lyrics from `lrclib` source. If lyrics for a specific
album, artist, and title combination are not found, the plugin now
searches for the artist and title and picks the most relevant result,
scoring them by
1. Duration similarity to the target item
1. Availability of synced lyrics
- Updated the default `sources` configuration to prioritize `lrclib`
over other sources for faster and more reliable results.
#### Code Improvements
- Added type annotations to `fetch` method in all backends.
- Introduced `LRCLyrics` and `LRCLibItem` classes to encapsulate lyrics
data and improve code structure.
- Enhanced error handling and logging enchancements to the `LRCLib`
backend. These will be added to the rest of the backends in a separate
PR.
I found that the `/get` endpoint often returns incorrect or unsynced
lyrics, while results returned by the `/search` more accurate options.
Thus I reversed the change in the previous commit to prioritize
searching first.
Adjust the base URL to perform a '/search' instead of attempting to
'/get' specific lyrics where we're unlikely to find lyrics for the
specific combination of album, artist, track names and the duration (see
https://lrclib.net/docs).
Since we receive an array of matching lyrics candidates, rank them by
their duration similarity to the item's duration, and whether they
contain synced lyrics.
## Description
Fixes#2635Fixes#5133
I realised that #5406 has gotten too big, thus I'm splitting it into
several smaller PRs.
This PR refactors lyrics plugin tests and fixes an empty metadata issue
in the lyrics logic.
#### CI
- Added `--extras=lyrics` to the Poetry install command to include the
lyrics plugin dependencies.
- In the main task which measures coverage, set `LYRICS_UPDATED`
environment variable based on changes detected in the lyrics files.
#### Test setup
- Introduced `ConfigMixin` to centralize configuration setup for tests,
reducing redundancy. This can be used by tests based on `pytest`.
#### Lyrics logic
- Trimmed whitespace from `item.title`, `item.artist`, and
`item.artist_sort` in `search_pairs` function.
- Added checks to avoid searching for lyrics if either the artist or
title is missing.
- Improved `_scrape_strip_cruft` function to remove Google Ads tags and
unnecessary HTML tags.
#### Lyrics tests overhaul
- Migrated lyrics tests to use `pytest` for better isolation and
configuration management.
- Deleted redundant lyrics text files and some unused utils.
- Marked tests that should only run when lyrics source code is updated
(`LYRICS_UPDATED` is set from the CI) using the `on_lyrics_update`
marker.
#### Documentation and Dependencies
- Added `requests-mock` version `1.12.1` to `pyproject.toml` and
`poetry.lock` for mocking HTTP requests in tests.
- Updated `setup.cfg` to include a new marker `on_lyrics_update`.
Add explicit checks for lyrics texts fetched from the tested sources.
- Introduced `LyricsPage` class to represent lyrics pages for integrated
tests.
- Configured expected lyrics for each of the URLs that are being
fetched.
- Consolidated integrated tests in a new `TestLyricsSources` class.
- Mocked Google Search API to return the lyrics page under test.
Since at least one Backend requires album` and `duration` arguments
(`LRCLib`), the caller (`LyricsPlugin.fetch_item_lyrics`) must always
provide them.
Since they need to provided, we need to enforce this by defining them as
positional arguments.
Why is this important? I found that integrated `LRCLib` tests have been
passing, but they called `LRCLib.fetch` with values for `artist` and
`title` fields only, while the actual functionality *always* provides
values for `album` and `duration` fields too.
When I adjusted the test to provide values for the missing fields,
I found that it failed. This makes sense: Lib `album` and `duration`
filters are strict on LRCLib, so I was not surprised the lyrics could
not be found.
Thus I adjusted `LRCLib` backend implementation to only filter by each
of these fields when their values are truthy.
Modified `search_pairs` function in `lyrics.py` to:
* Firstly strip each of `artist`, `artist_sort` and `title` fields
* Only generate alternatives if both `artist` and `title` are not empty
* Ensure that `artist_sort` is not empty and not equal to artist (ignoring
case) before appending it to the artists
Extended tests to cover the changes.
- Consolidated multiple test cases into parameterized tests for better
readability and maintainability.
- Simplified assertions by comparing lists of actual and expected
artists/titles.
- Added `unexpected_empty_artist` marker to handle cases which
unexpectedly return an empty artist. This seems to be happen when
`artist_sort` field is empty.
- Replaced unittest.mock with pytest fixtures for better test isolation and readability.
- Simplified test cases by using parameterized tests.
- Added `requests-mock` dependency to `pyproject.toml` and `poetry.lock`.
- Removed redundant helper functions and classes.
Two google sources failed to return the expected output. I looked into
each case why parsing failed:
- lyrics on musica.com contain <aside> Google Ads
- each lyrics line on lacoccinelle.net is wrapped within alternating
<em> and <strong> tags
Thus remove these tags as part of the HTML cleanup logic.
## Description
Fixes a `sqlite3.OperationalError` that occurs when querying *any* field
and a field that only exists in the relation table. For exaple, trying
to list albums that contain **keyword** and contain a track with **foo**
in its title:
```
beet list -a keyword title:foo
```
## Root Cause
SQLite fails when JOINs contain ambiguous column references. This
happened because:
- *any* Album field search looks at `album`, `albumartist` and `genre`
fields.
- The second part of the query `title:foo` queries a field in the
`items` table, which got
JOINed with `albums`
- Some fields (like `album`) exist in both `items` and `albums` tables,
thus SQLite couldn't resolve which table's column to use
## Changes
- Centralize query construction in `LibModel` with consistent table
qualification
- Add methods:
- `field_query()` - Creates table-qualified field queries
- `any_field_query()` - Creates multi-field OR queries
- `any_writable_media_field_query()` - Similar to the above but for BPD
/ media files
- `match_all_query()` - Creates multi-field AND queries
- Remove `AnyFieldQuery` in favor of composed `OrQuery`
- Add tests for shared field querying
In order to include the table name for fields in this query, use the
`field_query` method.
Since `AnyFieldQuery` is just an `OrQuery` under the hood, remove it and
construct `OrQuery` explicitly instead.
Unify query creation logic from
- queryparse.py:construct_query_part,
- Model.field_query,
- DefaultTemplateFunctions._tmpl_unique
to a single implementation under `LibModel.field_query` class method.
This method should be used for query resolution for model fields.
Adjust Sphinx configuration and release script
Changes:
- Update `source_suffix` in docs/conf.py to use dict format as required
by newer Sphinx
versions
- Fix typo in theme config: `pygment_light_style` ->
`pygments_light_style`
- Improve release script to handle Sphinx intersphinx output formatting:
- Replace tabs with spaces for consistent parsing
- Add reference to sphinx.ext.intersphinx command in docstring
- Update line parsing to match space-indented format
- Restructure CI workflow to ensure relevant dependencies are installed
and changelog formatting is tested.
These changes ensure compatibility with newer Sphinx versions and
improve the robustness
of the release script's reference parsing.
Edit:
While working on this PR GitHub decided to upgrade their linux/ubuntu
runners with `Ubuntu 24.04`. Ubuntu upgrades are the worst: imagine not
updating your system for 3 years and then suddenly upgrading
**everything** to the most recent versions. At which point you start
realizing that the final **S** in their **LTS** means __Suffering__
instead of __Support__.
We use just a few system dependencies in our builds and as expected,
things broke:
* `libcairo2-dev` now needs to be installed for pygobject.
* `pandoc` `rst` -> `markdown` conversion output has changed
* Completion tests are unhappy about `bash` / `bash-completion` upgrade,
and I could not figure out why so I'm just `xfail`ing that test in CI.
Ubuntu version in GitHub Actions has recently been upgraded to 24.04:
https://github.com/actions/runner-images/issues/10636)
This meant that pandoc was upgraded and it changed the way markdown is
formatted by default.