Fabrice Laporte
d4d5c085fa
lyrics : remove empty divs before scraping
...
it may result in \n being inserted that we will strip in
_scrape_strip_cruft
2014-12-30 23:37:23 +01:00
Fabrice Laporte
802d1521ed
lyrics: don't throw when extraction fails
2014-12-21 14:38:19 +01:00
Fabrice Laporte
dfc1aa20b3
lyrics: musixmatch, disable https verification
2014-12-21 14:35:16 +01:00
Fabrice Laporte
39584a8b69
fix flake8
2014-12-19 00:19:59 +01:00
Fabrice Laporte
c3f82c65a4
fix lyrics.com extraction markers
2014-12-18 23:56:33 +01:00
Adrian Sampson
b3bf70c11d
Privatize global SOURCES in lyrics
2014-12-18 11:13:02 +00:00
Adrian Sampson
ac3f0824b0
Merge pull request #1148 from Kraymer/lyrics_musixmatch
...
lyrics: add 'musixmatch' source
Conflicts:
beetsplug/lyrics.py
2014-12-18 10:36:16 +00:00
Adrian Sampson
20db9bb1a6
lyrics: Connect force option to CLI ( #1150 )
2014-12-18 04:03:15 +00:00
Fabrice Laporte
544d6dbe47
lyrics: add 'force' option
2014-12-17 22:40:44 +01:00
Fabrice Laporte
474adffe63
move helper functions from utils to plugins
2014-12-17 22:00:00 +01:00
Fabrice Laporte
9d0ca15ace
lyrics: preserve default order of sources
2014-12-17 01:05:58 +01:00
Fabrice Laporte
e7a4b92de5
lyrics: add 'sources' option
2014-12-17 00:42:11 +01:00
Fabrice Laporte
0f2f43ca9b
lyrics: add musixmatch source
2014-12-17 00:41:21 +01:00
Adrian Sampson
a984c1aa44
Use a non-capturing group by default ( #1060 )
...
Now clients don't have to decide whether they need parentheses or not.
2014-12-16 11:37:40 +00:00
Fabrice Laporte
829b623665
remove capturing parentheses
2014-12-15 22:48:01 +01:00
Fabrice Laporte
b62f15d9d9
feat_tokens: change argument name, fix regex flag
2014-12-14 22:46:51 +01:00
Fabrice Laporte
91a998df3c
fix #1060
2014-12-13 23:34:50 +01:00
Adrian Sampson
165ee80f81
lyrics: Handle requests exceptions ( #1136 )
2014-12-11 16:03:49 -08:00
Fabrice Laporte
d31a7c6b28
remove str decoding as input sources are unicode
2014-12-11 00:14:43 +01:00
Fabrice Laporte
321f862f23
fix #1135
2014-12-09 23:37:42 +01:00
Alberto Leal
5883ee0b76
Default value for link title for page searches.
...
Google API may not return results with a title attribute.
2014-11-16 16:11:25 -05:00
Fabrice Laporte
b143ad7e3e
fix #1035 do scraping tests on mock data
...
don’t store scraped pages with licensed lyrics in repo
2014-11-06 22:10:15 +01:00
Adrian Sampson
9137b5c2f3
Fix another lyrics scraper regression ( #1034 )
...
Along with a test.
2014-10-24 20:08:32 -07:00
Adrian Sampson
0325fe2225
lyrics: Remove script tags ( fix #1034 )
2014-10-24 17:33:11 -07:00
Fabrice Laporte
a6f0649c40
return no lyrics when HtmlParseError occured
2014-10-09 08:22:51 +02:00
Fabrice Laporte
c0c474b20f
lyrics: strip title excerpt before matching
...
improve the extraction of lyrics title from url title and increase the
matching threshold as a consequence.
2014-10-08 14:49:09 +02:00
Fabrice Laporte
3ef52e8ead
lyrics.py: remove unnecessary re compile step
2014-09-26 07:08:54 +02:00
Fabrice Laporte
a6a83be434
fix flake8
2014-09-24 23:30:38 +02:00
Fabrice Laporte
76b658b14a
use beautiful soup strainer for a x20 performance gain!
2014-09-24 18:04:16 +02:00
Fabrice Laporte
8ef7837d22
merge strip_cruft() and _scrape_normalize_eol() into _scrape_strip_cruft
2014-09-24 16:51:54 +02:00
Fabrice Laporte
333591fd78
no html entities in _scrape_streamline_soup output
2014-09-24 00:25:50 +02:00
Fabrice Laporte
91a7eb249c
add _scrape_merge_paragraphs lyrics scraping step + others scraping enh
2014-09-23 17:58:58 +02:00
Fabrice Laporte
a938e68c98
refactor scrape_lyrics_from_url into smaller functions
2014-09-23 13:21:31 +02:00
Fabrice L.
151ee87d8d
remove a log.info()
2014-09-22 17:32:15 +02:00
Fabrice Laporte
aea640d241
lyrics.py: fix regexes used by strip_cruft (make them case insensitive)
...
strip_cruft() should now correctly replace all <br> with \n thus making
insert_line_feeds() and sanitize_lyrics() functions superfluous (they have been
removed).
2014-09-22 17:20:25 +02:00
e5e4eaeacd39c5cfba4d7c852c48277ae50331e6
816e4fb152
clean up after rebase
2014-09-09 11:53:44 +10:00
e5e4eaeacd39c5cfba4d7c852c48277ae50331e6
020ee2b1ed
Fix Travis errors
...
I was over zealous on the brackets for formatting
2014-09-09 11:31:43 +10:00
e5e4eaeacd39c5cfba4d7c852c48277ae50331e6
65de93941d
flake8 cleanup
...
Cleanup after cleanup
2014-09-09 11:28:43 +10:00
e5e4eaeacd39c5cfba4d7c852c48277ae50331e6
66aee8094f
Clean up of logging messages as described here
...
All logging now prefers the ' (single quote) over the " (double quote)
https://github.com/sampsyo/beets/wiki/Hacking
2014-09-09 11:28:43 +10:00
Adrian Sampson
c0ce8c3e54
Changelog for #927
2014-09-02 21:45:35 -07:00
Padraic O'Donoghue
5b57032981
Remove scripts from lyrics
2014-09-03 03:15:34 +01:00
Thomas Scholtes
b512a0ce37
lyrics: Use multiple lyrics search strings.
...
In particular we use the original artist and title before stripping
*and* and *featuring* suffixes.
Fixes #914 .
2014-08-24 16:17:21 +02:00
Fabrice Laporte
74898347c0
is_page_candidate() handle more languages
...
Add translations in spanish and german for tokens to
ignore when comparing an url title with a song title.
2014-04-26 19:25:25 +02:00
Fabrice Laporte
8ba91a49c6
change chars replacements done by slugify()
...
adapt slugify() to make it return strings that can be used
as yaml keys (no spaces, etc.)
adapt is_page_candidate() accordingly
2014-04-26 19:22:40 +02:00
Fabrice Laporte
567e6300fd
fix flake8
2014-04-26 07:27:13 +02:00
Fabrice Laporte
117d16f2ad
lyrics: add tests to track which websites can be scraped by our algo and be
...
used as sources fot the google custom search engine.
2014-04-26 07:26:50 +02:00
Adrian Sampson
e5d28e2171
lyrics is flake8-clean
2014-04-12 13:32:46 -07:00
Adrian Sampson
7fcd7daf7c
lyrics: minor style/doc cleanup
2014-04-12 13:08:24 -07:00
Fabrice Laporte
42747797cd
better handling of songs with featuring artists, songs variants versions (live
...
etc), and songs combinations (lyrics are then appended)
2014-04-12 12:29:20 +02:00
Thomas Scholtes
c3ea1ded30
Add item.try_write() to log errors
...
Many commands and plugins use `item.write()` to update tags. Since the success
of the call is not critical to the functionality of most consumers we want to
catch any exceptions, log an error and continue with our task. The new method
encapsulates this logic.
This fixes #675 .
2014-04-10 15:26:05 +02:00