lyrics: detect MusixMatch blocking

we just look for the bad string in the HTML. this has the downside
that we may consider songs that have those exact lyrics (you never
know, really) may trigger this warning as well and we would fail to
fetch those songs.

we also fail if lyrics contain another magic string that seems to come
up when you do fill in the CAPTCHA after being blocked.
This commit is contained in:
Antoine Beaupré 2017-07-17 12:21:55 -04:00
parent b303d5beb0
commit 5e8d17a4fc
No known key found for this signature in database
GPG key ID: 792152527B75921E

View file

@ -301,9 +301,19 @@ class MusiXmatch(SymbolsReplaced):
html = self.fetch_url(url)
if not html:
return
if "We detected that your IP is blocked" in html:
self._log.warning(u'we are blocked at MusixMatch: url %s failed'
% url)
return
html_part = html.split('<p class="mxm-lyrics__content')[-1]
lyrics = extract_text_between(html_part, '>', '</p>')
return lyrics.strip(',"').replace('\\n', '\n')
lyrics = lyrics.strip(',"').replace('\\n', '\n')
# another odd case: sometimes only that string remains, for
# missing songs. this seems to happen after being blocked
# above, when filling in the CAPTCHA.
if "Instant lyrics for all your music." in lyrics:
return
return lyrics
class Genius(Backend):