Backup_Repos/beets

mirror of https://github.com/beetbox/beets.git synced 2025-12-07 00:53:08 +01:00

Author	SHA1	Message	Date
Sebastian Mohr	34101c2d13	Merge branch 'master' into importer-typehints-and-refactor	2025-02-07 16:22:34 +01:00
Sebastian Mohr	23f4f8261c	Added some more typehints	2025-02-04 16:38:24 +01:00
valrus	99d2da66dc	use actual value of matcher, not typo'd one	2025-02-03 19:32:27 -08:00
valrus	d298738612	add missing space in comment	2025-02-03 19:32:27 -08:00
valrus	f520790713	s/macthin/matching/	2025-02-03 19:32:27 -08:00
Sebastian Mohr	ed92f9b997	Do not skip tests in ci. The return type of the stage decorator should in theory be `T\|None` but the return of task types is not consistent in its usage. Would need some bigger changes for which I'm not ready at the moment.	2025-02-03 11:37:33 +01:00
Šarūnas Nejus	731519b0a3	Use up-to-date namespace package setup for plugins (#5505 ) Refactor `beetsplug` to use native namespace packages by removing `__init__.py`. Update documentation and `setup.cfg` to support namespace packages. ### Motivation Adopt PEP 420 native namespace packages to simplify plugin management and eliminate the need for `__init__.py`. See https://realpython.com/python-namespace-package. This setup is backwards-compatible, so plugins using the old pkgutil-based setup will continue working fine. The advantage with this setup is that external plugins will now be able to import modules from 'beetsplug' package for typing purposes. Previously, mypy could not resolve these modules due to presence of `__init__.py`.	2025-02-02 16:32:53 +00:00
Sebastian Mohr	04aa1b8ddb	Added last missing typehints in importer and fixed some typehint issues in util. Also run poe format	2025-02-01 15:52:11 +01:00
Sebastian Mohr	09b15aaf52	Added type hints for pipeline stage decorators	2025-02-01 15:08:41 +01:00
Sebastian Mohr	c83f2e4e71	Recreating importstate class to imitate previous code, otherwise we have slightly different behavior.	2025-02-01 13:45:18 +01:00
Sebastian Mohr	6f2ee5c614	Removed abc as the importer class is used directly sometimes...	2025-02-01 13:37:24 +01:00
Sebastian Mohr	a7ea60356b	Init taghistory before loading	2025-02-01 13:26:23 +01:00
Sebastian Mohr	c81a2c9b18	Using 3.9 unions instead of new 3.10 style unions for typehints	2025-02-01 13:25:25 +01:00
Sebastian Mohr	435864cb50	Removed import state functions in favor of an import state dataclass. Makes this more readable in my opinion, we also now have typehints for the import state.	2025-02-01 13:16:04 +01:00
Max Goltzsche	5d96509cfe	smartplaylist: change encoding of additional field URL-encode additional item `fields` within generated EXTM3U playlists instead of JSON-encoding them. This is because JSON-encoding additional fields/attributes made it difficult to parse the `EXTINF` line but using URL-encoding for these values makes parsing easy (because URL-encoded values cannot contain commas, quotation marks and spaces). I introduced the generation of additional EXTM3U item fields earlier this year and I want to correct that now. Design/definition background: Unfortunately, I didn't find a clear definition of how additional playlist item attributes should be encoded - apparently there is none. Given that item URIs within an M3U playlist can be URL-encoded already, defining the values of additional attributes to be URL-encoded is consistent design. I didn't find examples of additional EXTM3U item attributes in the web where the attribute value contains a comma, space or quotation mark but examples that specified numeric IDs and URLs as attribute values. Because the URL attribute examples I found didn't contain URL-encoded characters and because it is more readable and unproblematic for parsing, I've let the attribute URL encoding treat `:` and `/` as safe characters. Breaking change: While this is a breaking change in theory, in practice it is not since afaik all integrations of the smartplaylist plugin's additional EXTM3U item attribute generation feature (beets-webm3u) work with simple attribute values such as the item ID (numeric) whose formatting/encoding is not affected when changing from JSON to URL-encoding. In other words the change is backward-compatible with the beets-webm3u plugin (which I'll adjust correspondingly after this beets PR was merged).	2025-02-01 01:14:27 +01:00
J0J0 Todos	f4d41482e8	Fix musicbrainz genres fetching - genres are now called tags - tags needs to be in "mb fetch includes" - release-group has them - release has them - and recording as well but we don't use them - not sure what this outdated check was doing - see musicbrainz.VALID_INCLUDES for reference	2025-01-31 23:27:17 +01:00
Šarūnas Nejus	89f1ef4d2f	Add documentation links	2025-01-30 12:20:11 +00:00
Šarūnas Nejus	916d40f86f	Remove outdated namespace package definition and update docs See https://realpython.com/python-namespace-package. This setup is backwards-compatible, so plugins using the old pkgutil-based setup will continue working fine. This setup has an advantage where external plugins will now be able to import modules from 'beetsplug' package for typing purposes. Previously, mypy could not resolve these modules due to presence of `__init__.py`.	2025-01-30 12:20:11 +00:00
Šarūnas Nejus	a1c0ebdeef	Lyrics: Refactor Genius, Google backends, and consolidate common functionality (#5474 ) ### Bug Fixes - Fixed #4791: Resolved an issue with the Genius backend where it couldn't match lyrics if there was a slight variation in the artist's name. ### Plugin Enhancements * Session Management: Introduced a `TimeoutSession` to enable connection pooling and maintain consistent configuration across requests. * Error Handling: Centralized error handling logic in a new `RequestsHandler` class, which includes methods for retrieving either HTML text or JSON data. * Logging: Added methods to ensure the backend name is included in log messages. ### Configuration Changes * Added a new `dist_thresh` field to the configuration, allowing users to control the maximum tolerable mismatch between the artist and title of the lyrics search result and their item. Interestingly, this field was previously available (though undocumented) and used in the `Tekstowo` backend. Now, this threshold has also been applied to Genius and Google search logic. ### Backend Updates * All backends that perform searches now validate each result against the configured `dist_thresh`. #### Genius * Removed the need to scrape HTML tags for lyrics; instead, lyrics are now parsed from the JSON data embedded in the HTML. This change should reduce our vulnerability to Genius' frequent alterations in their HTML structure. * Documented the structure of their search JSON data. #### Google * Typed the response data returned by the Google Custom Search API. * Excluded certain pages under https://letras.mus.br that do not contain lyrics. * Excluded all results from MusiXmatch, as we cannot access their pages. * Improved parsing of URL titles (used for matching item/lyrics artist/title): - Handled results from long search queries where URL titles are truncated with an ellipsis. - Enhanced URL title cleanup logic. - Added functionality to determine (or rather, guess) not only the track title but also the artist from the URL title. * Similar to #5406, search results are now compared to the original item and sorted by distance. Results exceeding the configured `dist_thresh` value are discarded. The previous functionality simply selected the first result containing the track's title in its URL, which often led to returning lyrics for the wrong artist, particularly for short track titles. * Since we now fetch lyrics confidently, redundant checks for valid lyrics and credits cleanup have been removed. ### HTML Cleanup * Organized regex patterns into a new `Html` class. * Adjusted patterns to ensure new lines between blocks of lyrics text scraped from `letras.mus.br` and `musica.com`. * Modified patterns to scrape missing lyrics text on `paroles.net` and `lacoccinelle.net`. See the diff in `test/plugins/lyrics_page.py`.	2025-01-27 11:10:14 +00:00
Šarūnas Nejus	dab9a0d7c4	Bring back Tekstowo search It was my mistake to remove search earlier - I found that in many cases it works fine.	2025-01-27 10:56:54 +00:00
Šarūnas Nejus	7389f241f4	Do not search for Various Artists, split titles by ' / '	2025-01-27 10:56:53 +00:00
Šarūnas Nejus	39c479fcab	Google: add support for dainuzodziai.lt	2025-01-27 10:56:53 +00:00
Šarūnas Nejus	858c13558c	Xfail Songlyrics source	2025-01-27 10:56:53 +00:00
Šarūnas Nejus	734bcc28a8	Append source to the lyrics	2025-01-27 10:56:53 +00:00
Šarūnas Nejus	bdc564a573	Tidy up handling of backends	2025-01-27 10:56:53 +00:00
Šarūnas Nejus	04054cac5c	Remove dependency existence checks I think we can make our life easier by removing these checks assuming that users follow the instructions in the docs.	2025-01-27 10:56:53 +00:00
Šarūnas Nejus	b2402b1634	Google: make sure we do not return the captcha text If we get caught by Cloudfare, it forwards our request somewhere else and returns some validation text response. To make sure that this text does not get assumed for lyrics, we can disable redirects for the Google backend, check the response code and raise if there's a redirect attempt. This source will then be skipped and the backend continues with the next one.	2025-01-27 10:56:53 +00:00
Šarūnas Nejus	07d372c13d	Google: prioritise Songlyrics and AZlyrics sources	2025-01-27 10:56:53 +00:00
Šarūnas Nejus	70554640e5	Create Html class for cleaning up the html text Additionally, improve HTML pre-processing: * Ensure a new line between blocks of lyrics text from letras.mus.br. * Parse a missing last block of lyrics text from lacocinelle.net. * Parse a missing last block of lyrics text from paroles.net. * Fix encoding issues with AZLyrics by setting response encoding to None, allowing `requests` to handle it.	2025-01-27 10:56:52 +00:00
Šarūnas Nejus	c5c4138d66	Google: Refactor and improve * Type the response data that Google Custom Search API return. * Exclude some 'letras.mus.br' pages that do not contain lyric. * Exclude results from Musixmatch as we cannot access their pages. * Improve parsing of the URL title: - Handle long URL titles that get truncated (end with ellipsis) for long searches - Remove domains starting with 'www' - Parse the title AND the artist. Previously this would only parse the title, and fetch lyrics even when the artist did not match. * Remove now redundant credits cleanup and checks for valid lyrics.	2025-01-27 10:56:52 +00:00
Šarūnas Nejus	12c5eaae5e	Unite Genius, Tekstowo and Google backends under the same interface	2025-01-27 10:56:52 +00:00
Šarūnas Nejus	745c5eb9f0	Genius: refactor and simplify	2025-01-27 10:56:52 +00:00
Šarūnas Nejus	54fc67b30a	Remove extract_text_between	2025-01-27 08:50:50 +00:00
Šarūnas Nejus	55b7824948	Replace custom unescape implementation by html.unescape	2025-01-27 08:50:50 +00:00
Šarūnas Nejus	8a1ce27421	lyrics: Do not write item unless lyrics have changed	2025-01-27 08:50:50 +00:00
Šarūnas Nejus	8bdc2c6cf0	lyrics: Add symbols for better visual feedback in the logs	2025-01-27 08:50:50 +00:00
Šarūnas Nejus	f94d2767f9	Use a single slug implementation Tidy up 'Google.is_page_candidate' method and remove 'Google.sluggify' method which was a duplicate of 'slug'. Since 'GeniusFetchTest' only tested whether the artist name is cleaned up (the rest of the functionality is patched), remove it and move its test cases to the 'test_slug' test.	2025-01-27 08:50:50 +00:00
Šarūnas Nejus	dd9f178fff	Do not try to strip cruft from the parsed lyrics text. Having removed it I fuond that only the Genius lyrics changed: it had en extra new line. Thus I defined a function 'collapse_newlines' which now gets called for the Genius lyrics.	2025-01-27 08:50:50 +00:00
Šarūnas Nejus	7c2fb31136	Leave a single chef in the kitchen	2025-01-27 08:50:50 +00:00
Šarūnas Nejus	cb29605bfd	Include class name in the log messages	2025-01-27 08:50:50 +00:00
Šarūnas Nejus	283c513c72	Centralise request error handling	2025-01-27 08:50:49 +00:00
Šarūnas Nejus	06eac79c0d	Centralize requests setup with requests.Session Improve requests performance with requests.Session which uses connection pooling for repeated requests to the same host. Additionally, this centralizes request configuration, making sure that we use the same timeout and provide beets user agent for all requests.	2025-01-27 08:50:49 +00:00
Šarūnas Nejus	c40db1034a	Make lyrics plugin documentation slightly more clear	2025-01-27 08:50:49 +00:00
Šarūnas Nejus	2ff57505d8	Apply dist_thresh to Genius and Google backends This commit introduces a distance threshold mechanism for the Genius and Google backends. - Create a new `SearchBackend` base class with a method `check_match` that performs checking. - Start using undocumented `dist_thresh` configuration option for good, and mention it in the docs. This controls the maximum allowable distance for matching artist and title names. These changes aim to improve the accuracy of lyrics matching, especially when there are slight variations in artist or title names, see #4791.	2025-01-27 08:50:48 +00:00
J0J0 Todos	80bc539705	Checkpoint to make sure config file is set up correctly added to the getting started guide #4820 ## Description Added a quick checkpoint to ensure the config file is set up correctly prior to users importing their music library. This was something I discovered later after running into an issue with my config file and hope it helps new users avoid the issues I had.	2025-01-24 14:05:46 +01:00
J0J0 Todos	6161b449f6	Apply config-sanity-check suggestion in docs	2025-01-24 11:43:10 +01:00
Ashleigh	1ae4677d93	Added a quick checkpoint to ensure the config file is set up correctly prior to users importing their music library	2025-01-24 11:43:10 +01:00
J0J0 Todos	346071c04a	Add missing changelog for #4982	2025-01-24 10:44:18 +01:00
J0J0 Todos	9682f24c07	Lastgenre: New config option `keep_existing` (#4982 ) ## New config option `keep_existing` (#4982) - Fix the behavior of the`force` option. Previously disabling the option had "incomplete" behaviour: - If content was found, a whitelist check was issued and if valid the plugin exited early and logged ("keep"). - This whitelist check was not aware of multiple genres (separated typically by a string like `, `), thus it failed erased all existing genres and overwrote with new ones. _This didn't feel like a typical behaviour of a `force` option, which this PR tries to improve as follows..._ - String-separated multi-genres are now compiled into a list and depending on the `whitelist` option are kept and enriched with freshly fetched last.fm genres. - If force is off, pre-populated tags are not touched. - A lot of refactoring was done, some absolutely required, some as a preparation for future work on the plugin. - The main processing function `_get_genre` was massively overhauled and got a new `pytest.mark.parametrize` test which includes much more test cases.	2025-01-23 09:58:27 +01:00
J0J0 Todos	9d4653f92f	Final lastgenre docstring nitpicks and a tiny docs fix.	2025-01-23 09:04:06 +01:00

... 2 3 4 5 6 ...

12509 commits