Adjust the base URL to perform a '/search' instead of attempting to
'/get' specific lyrics where we're unlikely to find lyrics for the
specific combination of album, artist, track names and the duration (see
https://lrclib.net/docs).
Since we receive an array of matching lyrics candidates, rank them by
their duration similarity to the item's duration, and whether they
contain synced lyrics.
Since at least one Backend requires album` and `duration` arguments
(`LRCLib`), the caller (`LyricsPlugin.fetch_item_lyrics`) must always
provide them.
Since they need to provided, we need to enforce this by defining them as
positional arguments.
Why is this important? I found that integrated `LRCLib` tests have been
passing, but they called `LRCLib.fetch` with values for `artist` and
`title` fields only, while the actual functionality *always* provides
values for `album` and `duration` fields too.
When I adjusted the test to provide values for the missing fields,
I found that it failed. This makes sense: Lib `album` and `duration`
filters are strict on LRCLib, so I was not surprised the lyrics could
not be found.
Thus I adjusted `LRCLib` backend implementation to only filter by each
of these fields when their values are truthy.
Modified `search_pairs` function in `lyrics.py` to:
* Firstly strip each of `artist`, `artist_sort` and `title` fields
* Only generate alternatives if both `artist` and `title` are not empty
* Ensure that `artist_sort` is not empty and not equal to artist (ignoring
case) before appending it to the artists
Extended tests to cover the changes.
Two google sources failed to return the expected output. I looked into
each case why parsing failed:
- lyrics on musica.com contain <aside> Google Ads
- each lyrics line on lacoccinelle.net is wrapped within alternating
<em> and <strong> tags
Thus remove these tags as part of the HTML cleanup logic.
- Refactored Tekstowo backend to fetch lyrics directly from song pages.
- Added `encode` method to convert artist and title to their URL format,
where non-alphanumeric characters are replaced with underscores.
- Removed the now redundant search functionality and associated tests.
- Simplified `extract_lyrics` method to directly parse lyrics without
any checks.
- Fix imports
- Fix pytest issues
- Do not assign lambda as variable
- Use isinstance instead of type to check type
- Rename ambiguously named variables
- Name custom errors with Error suffix
The previous code had the potential to crash if (when?) Tekstowo changes
their website structure sufficiently.
The new code is rather ugly due to the explicit checks after each and
every function call. Unfortunately, the alternative would be to catch a
bunch of very generic Exceptions (AttributeError, ...), since there's no
such thing as a `BeautifulSoupNotFoundError`.
I experienced a failure to parse Tekstowo for song lyrics.
This patch allowed the lyrics plugin to fetch the lyrics from another provider as opposed to failing.
* clean-up code & add tests for genius lyrics backend
* add genius fetch tests
* organize imports: standard lib -> pip -> local
* check in sample genius lyrics page
* fix mock import
* force utf-8 encoding for opened files
* use io.open to force utf-8 encoding w/ python2.7
* Skip ReST writing & sphinx info messages if query doesn't yield anything
* `writerest` into `appendrest` and `writerest`, don't call `writerest(item=None)` to flush state at the end.
Searching only for the title and just verifying the artist afterwards leads to songs with very common titles not being found, since Genius limits the amount of returned hits.
An example would be 'Saviour' by 'Circa Waves'.
This will handle cases where the artist name includes some
characters which can cause a search failure for some
websites - mainly unicode.
Fixes https://github.com/beetbox/beets/issues/3340.
we just look for the bad string in the HTML. this has the downside
that we may consider songs that have those exact lyrics (you never
know, really) may trigger this warning as well and we would fail to
fetch those songs.
we also fail if lyrics contain another magic string that seems to come
up when you do fill in the CAPTCHA after being blocked.
this is necessary because otherwise artists with different string
representations but the same slug would overwrite one another
this outlines more clearly the code duplication between the rst code
and the slugify function, something which can be fixed later.