- Add `LyricsMetadataInFlexFieldsMigration` to extract legacy source
URLs and language metadata from lyrics text into flex attributes
- Add `Lyrics.from_legacy_text` to parse legacy lyrics format
- Move `with_row_factory` context manager up to base `Migration` class
- Rename `migrate_table` to `migrate_model` and pass model class
instead of table name string. This is so that the migration can access
both `_table` and `_flex_table` attributes.
- Make `langdetect` import optional in `Lyrics.__post_init__`: users may
not have have the dependency installed, and we do not want the
migration to fail because of that.
- Move `BACKEND_BY_NAME` to module level for use outside plugin class
* Introduce a `Lyrics` dataclass to carry text, source URL, and language
metadata through fetch, translation, and storage paths.
* Return `Lyrics` from backends and plugin lookup methods instead of raw
tuples/strings.
* Store backend name in `lyrics_source` derived from fetched URL root
domain.
* Simplify translator flow to operate on `Lyrics`, reuse line splitting,
append translations in-place, and record translation language
metadata.
When lyrics.synced is enabled, avoid replacing existing synced lyrics with
newly fetched unsynced lyrics, even with force enabled.
Allow replacement when the new lyrics are also synced, or when synced mode
is disabled.
Introduce a new RequestHandler base class to introduce a shared session,
centralize HTTP request management and error handling across plugins.
Key changes:
- Add RequestHandler base class with a shared/cached session
- Convert TimeoutSession to use SingletonMeta for proper resource
management
- Create LyricsRequestHandler subclass with lyrics-specific error
handling
- Update MusicBrainzAPI to inherit from RequestHandler
I found that the translator would sometimes replace the pipe character
with another symbol (maybe it got confused thinking the character is
part of the text?).
Added spaces around the pipe to make it more clear that it's definitely
the separator.
If we get caught by Cloudfare, it forwards our request somewhere else
and returns some validation text response. To make sure that this text
does not get assumed for lyrics, we can disable redirects for the Google
backend, check the response code and raise if there's a redirect
attempt. This source will then be skipped and the backend continues with
the next one.
Additionally, improve HTML pre-processing:
* Ensure a new line between blocks of lyrics text from letras.mus.br.
* Parse a missing last block of lyrics text from lacocinelle.net.
* Parse a missing last block of lyrics text from paroles.net.
* Fix encoding issues with AZLyrics by setting response encoding to
None, allowing `requests` to handle it.
* Type the response data that Google Custom Search API return.
* Exclude some 'letras.mus.br' pages that do not contain lyric.
* Exclude results from Musixmatch as we cannot access their pages.
* Improve parsing of the URL title:
- Handle long URL titles that get truncated (end with ellipsis) for
long searches
- Remove domains starting with 'www'
- Parse the title AND the artist. Previously this would only parse the
title, and fetch lyrics even when the artist did not match.
* Remove now redundant credits cleanup and checks for valid lyrics.
Tidy up 'Google.is_page_candidate' method and remove 'Google.sluggify'
method which was a duplicate of 'slug'.
Since 'GeniusFetchTest' only tested whether the artist name is cleaned
up (the rest of the functionality is patched), remove it and move its
test cases to the 'test_slug' test.