Commit graph

180 commits

Author SHA1 Message Date
Adrian Sampson
d389ac15e1 Use HTTPS for MS translator API (from #2247) 2017-01-02 21:00:01 -05:00
Adrian Sampson
fbc0f322f6 Merge branch 'tigranl-https_fix' 2017-01-02 20:54:17 -05:00
Adrian Sampson
f941fd42de Always use SSL on servers that don't require SNI
I did a little audit using the `openssl` command-line tool to find the servers
that don't require SNI. Here's what I found:

icbrainz.org: SNI
images.weserv.nl: inconclusive, but docs say yes SNI
coverartarchive.org: SNI
webservice.fanart.tv: *no* SNI
dbpedia.org: *no* SNI
en.wikipedia.org: *no* SNI
ws.audioscrobbler.com: *no* SNI
api.microsofttranslator.com: *no* SNI

In summary, *only* MusicBrainz and CoverArtArchive were found to require SNI.
So I'm using SSL unconditionally on all the other sites.
2017-01-02 20:39:10 -05:00
Adrian Sampson
8bb24e3134 lyrics: Set User-Agent header (fix #2357) 2016-12-30 10:55:24 -05:00
tigranl
dd115b1310 Add ui import 2016-12-11 00:35:51 +03:00
tigranl
5ca664e4aa Fix typos 2016-12-11 00:25:37 +03:00
tigranl
6ba5099034 Python version check for lyrics.py 2016-12-06 16:17:25 +03:00
Adrian Sampson
62e9a15f4d Fix a copy n' paste error found by flake8 2016-11-16 12:03:07 -05:00
Fabrice Laporte
7226624405 replace strip_part() by generate_alternatives()
Delegate the update of titles and artists lists to the helper
generate_alternatives() function.
2016-09-25 19:37:14 +02:00
Fabrice Laporte
e2703b9a7c always yield item artist and title first
Rather than using an unordered set for storing pairs, append to a list
and build an OrderedDict from it to filter duplicated strings while
keeping order.
2016-09-25 15:46:22 +02:00
Fabrice Laporte
8b4f39da42 lyrics: search for song title part preceding colon. fix #2205 2016-09-23 22:23:32 +02:00
Fabrice Laporte
4b702b338e lyrics: reduce code duplication in search_pairs() 2016-09-23 22:21:00 +02:00
Johnny Robeson
7a2bdf502f s/utf8/utf-8/ in all encoding/decoding contexts
This matches up with the python documentation.
2016-09-06 23:10:24 -04:00
Johnny Robeson
fcbfce3984 replace deprecated log.warn() with log.warning() 2016-08-09 00:33:38 -04:00
Johnny Robeson
be08d4b129 replace unichr with six.unichr in lyrics plugin 2016-07-02 02:36:05 -04:00
Adrian Sampson
5efd5b21c5 Use new as_str method
Instead of `get(six.text_type)`, which was a surprisingly large portion of our
uses of six.
2016-06-25 19:16:14 -07:00
Adrian Sampson
e16cc58cb9 Walk back some six.iter* uses
In places where it doesn't much matter whether we use an iterator or the old
Python 2 list way, using the six name just hurts legibility.
2016-06-25 18:29:55 -07:00
Johnny Robeson
78334876c3 treat HTMLParseError as a noop when missing
Strict mode no longer exists in html.parser on python >= 3.5, and no longer means anything on python >= 3.3
2016-06-24 05:53:56 -04:00
Johnny Robeson
edb1cbc5fc replace iter{items|values} with six.iter{items|values} 2016-06-24 05:53:55 -04:00
Johnny Robeson
e8afcbe7ec replace unicode with six.text_type 2016-06-24 05:53:49 -04:00
Johnny Robeson
4649226b9b use urllib from six.moves 2016-06-23 04:40:18 -04:00
Johnny Robeson
129e140015 use html_parser (really html.parser) from six.moves 2016-06-23 04:40:18 -04:00
Johnny Robeson
8fa71f78fe decode bytes from .encode() in lyrics plugin 2016-06-14 00:44:43 -04:00
Adrian Sampson
0051bdb506 lyrics: Avoid a spurious warning 2016-06-02 21:33:33 -07:00
Adrian Sampson
581fba6288 lyrics: Avoid crash when enabling google
If you *both* haven't set an API key *and* BeautifulSoup wasn't
installed, the list.remove() call would crash. (This came up when
running the tests on a fresh machine without many dependencies.)
2016-06-02 11:58:14 -07:00
wordofglass
1dd6739218 lyrics: fix a bug where the lyricswiki fetcher would try to unescape an empty (None) response and crash 2016-04-30 01:25:02 +02:00
wordofglass
c3c7da8061 lyrics: simplify source handling a little 2016-04-28 18:31:22 +02:00
wordofglass
2928a16bd5 lyrics: actually disable translation when there's no langdetect 2016-04-28 17:22:55 +02:00
wordofglass
c4b11f889f lyrics: clean up import handling and source removal 2016-04-28 17:15:25 +02:00
Jack Wilsdon
c5e2334fb5 Remove useless unescape
Remove useless unescape as _scrape_script_cruft does it for us.
2016-04-25 19:24:26 +01:00
Jack Wilsdon
1be9c3003e Use different method to remove junk from LyricsWiki
Use `_scrape_strip_cruft` instead of `scrape_lyrics_from_html` so that
LyricsWiki does not depend on Beautiful Soup.
2016-04-25 19:14:30 +01:00
wordofglass
607f41be43 Fix the previous fix... 2016-04-24 00:42:31 +02:00
wordofglass
4a5b886944 Fix two non-guarded import statements in the lyrics plugin
These could make the import process crash with a traceback.
2016-04-24 00:35:15 +02:00
Guilherme Danno
bf1b06f0c7 don't print entire lyrics during import 2016-04-22 17:30:06 -03:00
Fabrice Laporte
05970e8a93 re-query token when it has expired 2016-04-14 22:57:41 +02:00
Fabrice Laporte
56d7e5dfa0 send as little text as possible to bing api
Bing API has a limit of 2M chars/month. It’s common to have repeating
sentences in lyrics so to reduce number of chars sent per song, store
sentences in a set and send it, instead of sending the whole lyrics.
2016-04-14 22:57:17 +02:00
Fabrice Laporte
6cfc106b8a better docs and debug msg 2016-04-14 08:31:55 +02:00
Fabrice Laporte
58df77e2cb langdetect conditional import 2016-04-14 08:31:14 +02:00
Fabrice Laporte
e03c3af91f don't translate lyrics already in the target language 2016-04-14 01:11:14 +02:00
Fabrice Laporte
66a627fed8 restore module docstring 2016-04-14 00:58:42 +02:00
Fabrice Laporte
3c2479ab49 translate lyrics using Bing API
By subscribing to Microsoft Translator API, one can now activate the
translation of lyrics from one set of source langages to a target
langage.
Translations are appended to each original sentence using ‘/‘ as
separator.
2016-04-14 00:53:58 +02:00
Fabrice Laporte
d67950cdcc pep8 2016-04-14 00:45:55 +02:00
Adrian Sampson
d1753b341e lyrics: Some comments and better naming 2016-03-21 10:28:30 -07:00
Adrian Sampson
f684f29a25 lyrics: Tolerate pages without text (fix #1914) 2016-03-21 10:24:13 -07:00
Adrian Sampson
c9be5bc7d1 Merge pull request #1911 from jackwilsdon/fix-musixmatch-url
Fix MusixMatch issues
2016-03-18 11:51:27 -04:00
Jack Wilsdon
60148918d9 Fix LyricsWiki scraping code
LyricsWiki now escapes song lyrics using HTML entities (presumably to
prevent scraping), so we now unescape these before parsing.

LyricsWiki has also added a script tag inside the div we are scraping,
so we have to remove this using `scrape_lyrics_from_html`.
2016-03-17 17:49:41 +00:00
Jack Wilsdon
c417003184 Add missing newline 2016-03-16 21:07:28 +00:00
Jack Wilsdon
1ec06e14c5 Fix lyrics extraction from MusiXmatch
Remove "lyrics_" prefix from extract_text_between arguments to reflect
changes made to the MusiXmatch website.
2016-03-16 20:48:57 +00:00
Jack Wilsdon
44c799320f Improve URL generation in lyrics plugin
Allow custom replacements to be defined in subclasses of
SymbolsReplaced.

Replace spaces with a hyphens when the source is MusiXmatch, instead of
(incorrectly) using underscores. This fixes #1880.
2016-03-16 20:46:36 +00:00
Adrian Sampson
e54c7eec3d Standardize __future__ imports without parentheses
Since the list is short enough now, we don't need parentheses for the line
wrap. This is a little less ugly.
2016-02-28 15:03:51 -08:00