Fabrice Laporte
d31a7c6b28
remove str decoding as input sources are unicode
2014-12-11 00:14:43 +01:00
Fabrice Laporte
321f862f23
fix #1135
2014-12-09 23:37:42 +01:00
Alberto Leal
5883ee0b76
Default value for link title for page searches.
...
Google API may not return results with a title attribute.
2014-11-16 16:11:25 -05:00
Fabrice Laporte
b143ad7e3e
fix #1035 do scraping tests on mock data
...
don’t store scraped pages with licensed lyrics in repo
2014-11-06 22:10:15 +01:00
Adrian Sampson
9137b5c2f3
Fix another lyrics scraper regression ( #1034 )
...
Along with a test.
2014-10-24 20:08:32 -07:00
Adrian Sampson
0325fe2225
lyrics: Remove script tags ( fix #1034 )
2014-10-24 17:33:11 -07:00
Fabrice Laporte
a6f0649c40
return no lyrics when HtmlParseError occured
2014-10-09 08:22:51 +02:00
Fabrice Laporte
c0c474b20f
lyrics: strip title excerpt before matching
...
improve the extraction of lyrics title from url title and increase the
matching threshold as a consequence.
2014-10-08 14:49:09 +02:00
Fabrice Laporte
3ef52e8ead
lyrics.py: remove unnecessary re compile step
2014-09-26 07:08:54 +02:00
Fabrice Laporte
a6a83be434
fix flake8
2014-09-24 23:30:38 +02:00
Fabrice Laporte
76b658b14a
use beautiful soup strainer for a x20 performance gain!
2014-09-24 18:04:16 +02:00
Fabrice Laporte
8ef7837d22
merge strip_cruft() and _scrape_normalize_eol() into _scrape_strip_cruft
2014-09-24 16:51:54 +02:00
Fabrice Laporte
333591fd78
no html entities in _scrape_streamline_soup output
2014-09-24 00:25:50 +02:00
Fabrice Laporte
91a7eb249c
add _scrape_merge_paragraphs lyrics scraping step + others scraping enh
2014-09-23 17:58:58 +02:00
Fabrice Laporte
a938e68c98
refactor scrape_lyrics_from_url into smaller functions
2014-09-23 13:21:31 +02:00
Fabrice L.
151ee87d8d
remove a log.info()
2014-09-22 17:32:15 +02:00
Fabrice Laporte
aea640d241
lyrics.py: fix regexes used by strip_cruft (make them case insensitive)
...
strip_cruft() should now correctly replace all <br> with \n thus making
insert_line_feeds() and sanitize_lyrics() functions superfluous (they have been
removed).
2014-09-22 17:20:25 +02:00
e5e4eaeacd39c5cfba4d7c852c48277ae50331e6
816e4fb152
clean up after rebase
2014-09-09 11:53:44 +10:00
e5e4eaeacd39c5cfba4d7c852c48277ae50331e6
020ee2b1ed
Fix Travis errors
...
I was over zealous on the brackets for formatting
2014-09-09 11:31:43 +10:00
e5e4eaeacd39c5cfba4d7c852c48277ae50331e6
65de93941d
flake8 cleanup
...
Cleanup after cleanup
2014-09-09 11:28:43 +10:00
e5e4eaeacd39c5cfba4d7c852c48277ae50331e6
66aee8094f
Clean up of logging messages as described here
...
All logging now prefers the ' (single quote) over the " (double quote)
https://github.com/sampsyo/beets/wiki/Hacking
2014-09-09 11:28:43 +10:00
Adrian Sampson
c0ce8c3e54
Changelog for #927
2014-09-02 21:45:35 -07:00
Padraic O'Donoghue
5b57032981
Remove scripts from lyrics
2014-09-03 03:15:34 +01:00
Thomas Scholtes
b512a0ce37
lyrics: Use multiple lyrics search strings.
...
In particular we use the original artist and title before stripping
*and* and *featuring* suffixes.
Fixes #914 .
2014-08-24 16:17:21 +02:00
Fabrice Laporte
74898347c0
is_page_candidate() handle more languages
...
Add translations in spanish and german for tokens to
ignore when comparing an url title with a song title.
2014-04-26 19:25:25 +02:00
Fabrice Laporte
8ba91a49c6
change chars replacements done by slugify()
...
adapt slugify() to make it return strings that can be used
as yaml keys (no spaces, etc.)
adapt is_page_candidate() accordingly
2014-04-26 19:22:40 +02:00
Fabrice Laporte
567e6300fd
fix flake8
2014-04-26 07:27:13 +02:00
Fabrice Laporte
117d16f2ad
lyrics: add tests to track which websites can be scraped by our algo and be
...
used as sources fot the google custom search engine.
2014-04-26 07:26:50 +02:00
Adrian Sampson
e5d28e2171
lyrics is flake8-clean
2014-04-12 13:32:46 -07:00
Adrian Sampson
7fcd7daf7c
lyrics: minor style/doc cleanup
2014-04-12 13:08:24 -07:00
Fabrice Laporte
42747797cd
better handling of songs with featuring artists, songs variants versions (live
...
etc), and songs combinations (lyrics are then appended)
2014-04-12 12:29:20 +02:00
Thomas Scholtes
c3ea1ded30
Add item.try_write() to log errors
...
Many commands and plugins use `item.write()` to update tags. Since the success
of the call is not critical to the functionality of most consumers we want to
catch any exceptions, log an error and continue with our task. The new method
encapsulates this logic.
This fixes #675 .
2014-04-10 15:26:05 +02:00
Filipe Fortes
0824ccb85f
Don't try to parse None as HTML when fetching lyrics
...
Fixes #479
2013-12-18 07:37:01 -08:00
Adrian Sampson
38ecb35718
lyrics -f ( #455 , closes #414 ): style, changelog
2013-11-25 15:58:53 -08:00
Pedro Silva
9b75db8326
merge Bitdemon-master
...
- minor style changes
- synchronize with documentation
2013-11-17 12:34:53 +01:00
Bitdemon
f87606869c
Update lyrics.py
2013-11-10 23:18:53 +01:00
Bitdemon
de0f0792ef
Update lyrics.py
2013-11-10 23:13:35 +01:00
Bitdemon
f5f73be2b5
Update lyrics.py
2013-11-10 22:58:36 +01:00
Bitdemon
86d74f7d39
Update lyrics.py
...
added -f option to lyrics plugin
first draft
2013-11-10 22:44:37 +01:00
Adrian Sampson
c7fe017752
remove Library.{move,store} methods
...
These methods are now provided by LibModel, which makes dealing with items and
albums symmetric.
2013-08-21 15:34:45 -07:00
Fabrice Laporte
995d75f3f3
Logging: remove match ratio, add source website name
2013-06-29 14:24:41 +02:00
Fabrice Laporte
9780be270c
Some tweaking to yield better results by not
...
rejecting valid lyrics.
2013-06-29 14:23:53 +02:00
Fabrice Laporte
c6f935ac4c
Don't consider text between parentheses when
...
matching url title with song title.
2013-06-29 14:21:55 +02:00
Fabrice L.
6c8f45c7f7
Update lyrics.py
2013-06-12 01:20:08 +03:00
Fabrice Laporte
b3747189e5
lyrics: google backend should turn up more results
...
bs4 scraping routine has been made more generic,
relying less on specific markup tags.
Better algorithm to detect which url titles match
song titles: domain names are now removed from url
titles.
Use regex to decimate \n in fetched lyrics.
2013-06-12 00:07:01 +02:00
Fabrice Laporte
09e721efe6
rename "section" markup
2013-06-02 22:35:36 +02:00
Adrian Sampson
a0ef886801
lyrics: substitute more punctuation ( fixes #270 )
2013-05-12 12:45:49 -07:00
Adrian Sampson
a5cb34360d
lyrics: fix encoding for Lyrics.com
2013-05-12 12:38:48 -07:00
Adrian Sampson
2a9afd3908
misc. style cleanup for #243
2013-04-15 10:52:17 -07:00
Fabrice Laporte
479b25bac3
Code style + fix doc typo
2013-04-08 18:35:02 +02:00
Fabrice Laporte
7b13edee40
lyrics: restore tags write and fix extract_text()
2013-04-06 18:24:30 +02:00
Fabrice Laporte
cfb6735e43
Add a lyrics backend that scraps results from google custom search api.
...
Add a 'fallback' option to facilitate working around the 100 queries/day google
limit by marking files as 'visited' so they are not considered for lyrics search
on the next beet run.
I've put my own google_engine_ID as default value in the code but could be
reconsidered, this engine contains databases known to be scrappable by the
plugin algorithm though.
2013-04-06 15:22:04 +02:00
Fabrice Laporte
cf73d7cb08
replace \r by \n
2013-02-02 09:38:49 +01:00
Adrian Sampson
d6c7cfa4e3
lyrics: replace apostrophes with ' (GC-498)
2013-01-11 10:51:22 -08:00
Adrian Sampson
7a410f636b
happy new year ✨
...
For future reference, this command did the trick:
ack -l 'Copyright 201' | xargs perl -pi -E 's/Copyright 201./Copyright 2013/'
2013-01-11 10:43:41 -08:00
Adrian Sampson
14b5170aec
GH-72: some cleanup and changelog note
2013-01-05 17:20:39 -08:00
Adrian Sampson
6d68a4855e
per-plugin configuration defaults in __init__()
...
This uses the new BeetsPlugin.config convenience view heavily. Things are
slowly getting less verbose.
2012-12-18 22:35:44 -08:00
Adrian Sampson
3ef9e006f4
finish confit-ifying all the plugins
2012-12-13 17:14:19 -08:00
Adrian Sampson
729a89cff3
lyrics: possibly address a Unicode error
2012-11-09 00:01:36 -08:00
Adrian Sampson
dcb9ad7373
fix several non-unicode logging statements
...
A user reported a problem with one of the logging statements where .format()
tried to convert a Unicode string to bytes because the log message was '', not
u''. As a rule, we should ensure that all logging statements use Unicode
literals.
2012-10-24 15:14:33 -07:00
Adrian Sampson
420c78ff1b
lyrics: fix UnicodeDecodeError with non-ASCII text
2012-08-19 13:42:43 -07:00
Adrian Sampson
174824c570
lyrics: suppress not-found message
...
(in Lyrics.com results)
2012-07-24 15:33:44 -07:00
Adrian Sampson
ec849a3f88
chroma & lyrics: crash due to name change
...
I changed ImportTask.all_items to ImportTask.imported_items but forgot to change
the calls in the chroma and lyrics plugins.
2012-07-03 17:18:23 -07:00
Adrian Sampson
c5424dce05
lastgenre and lyrics: use new pluggable import stages
...
This solves a problem where files were copied before the genre field was
updated, resulting in problems when $genre was used in a path (GC-357).
2012-06-08 15:17:49 -07:00
Adrian Sampson
429af42e14
use print_function __future__ import
...
All code should now use Python 3-style "print"s.
2012-05-13 21:08:27 -07:00
Adrian Sampson
b68e87b92c
The Great Trailing Whitespace Purge of 2012
...
What can I say? I used to use TextMate!
2012-05-13 20:22:17 -07:00
Adrian Sampson
6ce08c4ce6
merge
2012-05-08 11:59:41 -07:00
Adrian Sampson
8b25a86ee3
use 2.6-compatible format strings
2012-05-08 11:46:08 -07:00
Adrian Sampson
a28f930c52
transaction objects to control DB access
...
In an attempt to finally address the longstanding SQLite locking issues, I'm
introducing a way to explicitly, lexically scope transactions. The Transaction
class is a context manager that always fully fetches after SELECTs and
automatically commits on exit. No direct access to the library is allowed, so
all changes will eventually be committed and all queries will be completed. This
will also provide a debugging mechanism to show where concurrent transactions
are beginning and ending.
To support composition (transaction reentrancy), an internal, per-Library stack
of transactions is maintained. Commits only happen when the outermost
transaction exits. This means that, while it's possible to introduce atomicity
bugs by invoking Library methods outside of a transaction, you can conveniently
call them *without* a currently-active transaction to get a single atomic
action.
Note that this "transaction stack" concepts assumes a single Library object per
thread. Because we need to duplicate Library objects for concurrent access due
to sqlite3 limitation already, this is fine for now. Later, the interface should
provide one transaction stack per thread for shared Library objects.
2012-05-06 23:24:05 -07:00
Adrian Sampson
2c11855b1e
catch URL fetch exceptions in lyrics plugin
2012-04-10 21:05:01 -07:00
Adrian Sampson
3ffbe171e5
lyrics: detect missing lyrics in lyrics.com result
2012-03-10 12:40:19 +00:00
Adrian Sampson
c65b237b99
lyrics: resolve entity
2012-03-10 12:37:57 +00:00
Adrian Sampson
5befad8ba0
lyrics safely decoded from bytes
2012-03-10 12:33:19 +00:00
Adrian Sampson
8d9c324b61
fix unicode encoding for lyrics requests ( #350 )
2012-02-26 12:54:27 -08:00
Adrian Sampson
a58253b79c
"lyrics -p" prints out lyrics
2012-01-19 12:43:29 -08:00
Adrian Sampson
248a433d1d
lyrics auto-fetch on import ( #137 )
2012-01-19 12:39:20 -08:00
Adrian Sampson
01a54e2e0e
first stab at revamped lyrics plugin ( #137 )
2012-01-19 12:25:11 -08:00