No description
Find a file
Francesco Grillo a79a86d5d6
Fix lyrics Unicode corruption and escaped quotes in Genius plugin
## Problem
The lyrics plugin has two bugs that corrupt fetched lyrics:

1. **Unicode corruption**: Characters like `ò`, `è`, `à` are corrupted to `√≤`, `√®`, etc.
2. **Escaped quotes**: Quotes appear as `\"` instead of `"` in lyrics

## Root Causes

### Issue 1: MacRoman encoding misdetection
- **Location**: `RequestHandler.fetch_text()` line 220
- **Cause**: Setting `r.encoding = None` forces requests to use `apparent_encoding`
- **Problem**: For Genius.com (and others), requests incorrectly detects MacRoman instead of UTF-8
- **Result**: UTF-8 bytes `c3 b2` (ò) decoded as MacRoman produces "√≤" (U+221A U+2264)

### Issue 2: Incomplete JSON unescape
- **Location**: `Genius.scrape()` line 576
- **Cause**: The `remove_backslash` regex doesn't handle all escape patterns in JSON
- **Problem**: Genius embeds lyrics in JSON with patterns like `\\"` and `\\\\"` 
- **Result**: After BeautifulSoup processing, escaped quotes remain in final text

## Solution

### Fix 1: Trust server encoding, fallback to UTF-8
```python
# OLD: r.encoding = None
# NEW:
if not r.encoding:
    r.encoding = 'utf-8'
```
- Respects server's declared encoding (UTF-8 for Genius)
- Falls back to UTF-8 if no encoding specified (safer than apparent_encoding)
- Preserves original intent of handling misconfigured servers

### Fix 2: Iteratively clean escaped quotes
```python
while '\\"' in lyrics:
    lyrics = lyrics.replace('\\"', '"')
```
- Handles variable escape levels (`\"`, `\\\"`, `\\\\\"`)
- Minimal change - keeps original `remove_backslash` regex
- Applied after BeautifulSoup to avoid interfering with HTML parsing

## Testing

Tested with:
- Caparezza - "Argenti Vive" (Italian, many accented characters)
- WestsideGunn - "Heel Cena" (escaped quotes in lyrics)

Before:
```
mi si parò davanti
\\"I got big moves\\"
```

After:
```
mi si parò davanti
"I got big moves"
```

## Impact
- Fixes lyrics for all languages with non-ASCII characters
- Fixes Genius lyrics with quotes
- No breaking changes - maintains backward compatibility
- Minimal code changes (14 lines total)
2025-12-23 22:31:21 +02:00
.github Make musicbrainzngs dependency optional and requests required 2025-12-20 01:35:52 +00:00
beets fix: Sanitize log messages by removing control characters 2025-12-02 15:27:24 +05:00
beetsplug Fix lyrics Unicode corruption and escaped quotes in Genius plugin 2025-12-23 22:31:21 +02:00
docs Merge branch 'master' into gabepush-test-fix 2025-12-23 15:34:47 +01:00
extra pyupgrade Python 3.10 2025-11-08 12:09:52 +00:00
test Merge branch 'master' into gabepush-test-fix 2025-12-23 15:34:47 +01:00
.git-blame-ignore-revs Add missing blame ignore revs from musicbrainz plugin 2025-12-20 01:35:51 +00:00
.gitignore fix transaction context manager signature 2025-10-19 15:07:17 +02:00
.pre-commit-config.yaml Configure docstrfmt 2025-08-10 16:25:04 +01:00
.readthedocs.yaml Fix path typo 2023-09-22 15:29:39 -04:00
CODE_OF_CONDUCT.rst Reformat all docs using docstrfmt 2025-08-10 16:25:05 +01:00
codecov.yml Make cov setup a bit more useful and upgrade cov upload action 2025-08-09 15:11:59 +01:00
CONTRIBUTING.rst Update python version references 2025-11-08 12:09:52 +00:00
LICENSE Update copyright dates to 2016 2015-12-30 15:42:06 +00:00
poetry.lock Make musicbrainzngs dependency optional and requests required 2025-12-20 01:35:52 +00:00
pyproject.toml Make musicbrainzngs dependency optional and requests required 2025-12-20 01:35:52 +00:00
README.rst docs: Fix link to plugin development docs 2025-12-02 11:40:18 +01:00
README_kr.rst docs: Fix link to plugin development docs 2025-12-02 11:40:18 +01:00
SECURITY.md Create security policy 2021-12-22 09:34:41 -08:00
setup.cfg Upload test results to codecov 2025-08-09 15:27:17 +01:00

.. image:: https://img.shields.io/pypi/v/beets.svg
    :target: https://pypi.python.org/pypi/beets

.. image:: https://img.shields.io/codecov/c/github/beetbox/beets.svg
    :target: https://codecov.io/github/beetbox/beets

.. image:: https://img.shields.io/github/actions/workflow/status/beetbox/beets/ci.yaml
    :target: https://github.com/beetbox/beets/actions

.. image:: https://repology.org/badge/tiny-repos/beets.svg
    :target: https://repology.org/project/beets/versions

beets
=====

Beets is the media library management system for obsessive music geeks.

The purpose of beets is to get your music collection right once and for all. It
catalogs your collection, automatically improving its metadata as it goes. It
then provides a suite of tools for manipulating and accessing your music.

Here's an example of beets' brainy tag corrector doing its thing:

::

    $ beet import ~/music/ladytron
    Tagging:
        Ladytron - Witching Hour
    (Similarity: 98.4%)
     * Last One Standing      -> The Last One Standing
     * Beauty                 -> Beauty*2
     * White Light Generation -> Whitelightgenerator
     * All the Way            -> All the Way...

Because beets is designed as a library, it can do almost anything you can
imagine for your music collection. Via plugins_, beets becomes a panacea:

- Fetch or calculate all the metadata you could possibly need: `album art`_,
  lyrics_, genres_, tempos_, ReplayGain_ levels, or `acoustic fingerprints`_.
- Get metadata from MusicBrainz_, Discogs_, and Beatport_. Or guess metadata
  using songs' filenames or their acoustic fingerprints.
- `Transcode audio`_ to any format you like.
- Check your library for `duplicate tracks and albums`_ or for `albums that are
  missing tracks`_.
- Clean up crufty tags left behind by other, less-awesome tools.
- Embed and extract album art from files' metadata.
- Browse your music library graphically through a Web browser and play it in any
  browser that supports `HTML5 Audio`_.
- Analyze music files' metadata from the command line.
- Listen to your library with a music player that speaks the MPD_ protocol and
  works with a staggering variety of interfaces.

If beets doesn't do what you want yet, `writing your own plugin`_ is shockingly
simple if you know a little Python.

.. _acoustic fingerprints: https://beets.readthedocs.org/page/plugins/chroma.html

.. _album art: https://beets.readthedocs.org/page/plugins/fetchart.html

.. _albums that are missing tracks: https://beets.readthedocs.org/page/plugins/missing.html

.. _beatport: https://www.beatport.com

.. _discogs: https://www.discogs.com/

.. _duplicate tracks and albums: https://beets.readthedocs.org/page/plugins/duplicates.html

.. _genres: https://beets.readthedocs.org/page/plugins/lastgenre.html

.. _html5 audio: https://html.spec.whatwg.org/multipage/media.html#the-audio-element

.. _lyrics: https://beets.readthedocs.org/page/plugins/lyrics.html

.. _mpd: https://www.musicpd.org/

.. _musicbrainz: https://musicbrainz.org/

.. _musicbrainz music collection: https://musicbrainz.org/doc/Collections/

.. _plugins: https://beets.readthedocs.org/page/plugins/

.. _replaygain: https://beets.readthedocs.org/page/plugins/replaygain.html

.. _tempos: https://beets.readthedocs.org/page/plugins/acousticbrainz.html

.. _transcode audio: https://beets.readthedocs.org/page/plugins/convert.html

.. _writing your own plugin: https://beets.readthedocs.org/page/dev/plugins/index.html

Install
-------

You can install beets by typing ``pip install beets`` or directly from Github
(see details here_). Beets has also been packaged in the `software
repositories`_ of several distributions. Check out the `Getting Started`_ guide
for more information.

.. _getting started: https://beets.readthedocs.org/page/guides/main.html

.. _here: https://beets.readthedocs.io/en/latest/faq.html#run-the-latest-source-version-of-beets

.. _software repositories: https://repology.org/project/beets/versions

Contribute
----------

Thank you for considering contributing to ``beets``! Whether you're a programmer
or not, you should be able to find all the info you need at CONTRIBUTING.rst_.

.. _contributing.rst: https://github.com/beetbox/beets/blob/master/CONTRIBUTING.rst

Read More
---------

Learn more about beets at `its Web site`_. Follow `@b33ts`_ on Mastodon for news
and updates.

.. _@b33ts: https://fosstodon.org/@beets

.. _its web site: https://beets.io/

Contact
-------

- Encountered a bug you'd like to report? Check out our `issue tracker`_!

  - If your issue hasn't already been reported, please `open a new ticket`_ and
    we'll be in touch with you shortly.
  - If you'd like to vote on a feature/bug, simply give a :+1: on issues you'd
    like to see prioritized over others.
  - Need help/support, would like to start a discussion, have an idea for a new
    feature, or would just like to introduce yourself to the team? Check out
    `GitHub Discussions`_!

.. _github discussions: https://github.com/beetbox/beets/discussions

.. _issue tracker: https://github.com/beetbox/beets/issues

.. _open a new ticket: https://github.com/beetbox/beets/issues/new/choose

Authors
-------

Beets is by `Adrian Sampson`_ with a supporting cast of thousands.

.. _adrian sampson: https://www.cs.cornell.edu/~asampson/