mirror of
https://github.com/beetbox/beets.git
synced 2026-01-06 16:02:53 +01:00
Fix LyricsWiki scraping code
LyricsWiki now escapes song lyrics using HTML entities (presumably to prevent scraping), so we now unescape these before parsing. LyricsWiki has also added a script tag inside the div we are scraping, so we have to remove this using `scrape_lyrics_from_html`.
This commit is contained in:
parent
8a0b18c960
commit
60148918d9
1 changed files with 2 additions and 1 deletions
|
|
@ -321,7 +321,8 @@ class LyricsWiki(SymbolsReplaced):
|
|||
html = self.fetch_url(url)
|
||||
if not html:
|
||||
return
|
||||
lyrics = extract_text_in(html, u"<div class='lyricbox'>")
|
||||
lyrics = extract_text_in(unescape(html), u"<div class='lyricbox'>")
|
||||
lyrics = scrape_lyrics_from_html(lyrics)
|
||||
if lyrics and 'Unfortunately, we are not licensed' not in lyrics:
|
||||
return lyrics
|
||||
|
||||
|
|
|
|||
Loading…
Reference in a new issue