Fix LyricsWiki scraping code

LyricsWiki now escapes song lyrics using HTML entities (presumably to
prevent scraping), so we now unescape these before parsing.

LyricsWiki has also added a script tag inside the div we are scraping,
so we have to remove this using `scrape_lyrics_from_html`.
This commit is contained in:
Jack Wilsdon 2016-03-17 17:47:50 +00:00
parent 8a0b18c960
commit 60148918d9

View file

@ -321,7 +321,8 @@ class LyricsWiki(SymbolsReplaced):
html = self.fetch_url(url)
if not html:
return
lyrics = extract_text_in(html, u"<div class='lyricbox'>")
lyrics = extract_text_in(unescape(html), u"<div class='lyricbox'>")
lyrics = scrape_lyrics_from_html(lyrics)
if lyrics and 'Unfortunately, we are not licensed' not in lyrics:
return lyrics