David Lynch
cfd073fb5c
Fix an error in _soup if parsed content doesn't have a <head>
2025-03-06 22:33:32 -06:00
David Lynch
4d9c31b6ac
Make the parser used for BeautifulSoup configurable, still default lxml
...
Refs #98
2025-03-04 23:14:51 -06:00
David Lynch
9ed2d54db7
Make the _soup method able to cope with being given a html string
2025-03-04 23:14:51 -06:00
Max Isom
53bc2045f0
Use lxml (>40% faster)
2025-03-04 22:23:50 -06:00
David Lynch
d49d7891c3
Fix some images not having srcset and sizes removed
2024-11-23 22:34:46 -06:00
David Lynch
746ec1b994
Fix image enabling by default
...
Follow-up to 6ecb1d8942
2024-11-23 22:10:19 -06:00
David Lynch
0cac7ff945
New spoilers behavior: --spoilers [include/inline/skip]
...
Fixes #75
2024-11-23 21:39:54 -06:00
David Lynch
a39e1e9f89
Use the newer syntax for attrs
2024-11-23 19:42:35 -06:00
David Lynch
b6310658e8
Command-line flag to enable/disable fetching images
2024-11-23 16:33:01 -06:00
David Lynch
21834bb5ed
_clean takes a base argument and reformats image srcs into absolute urls
2024-11-23 15:30:57 -06:00
David Lynch
a0a057c48c
_soup always returns a base URL
2024-11-23 15:15:29 -06:00
Idan Dor
1edde92a9d
Fixed whitespacing for flake8.
2024-11-23 13:22:53 -06:00
Idan Dor
31f663c6e0
Added image embedding support for epub
...
Specifically, added image_selector for arbitrary sites that allows
selecting img tags from chapters, downloading them
and embedding them within the resulting epub.
In the case of Pale, this means that the character banners and
extra materials do not require an internet connection to view.
Also made the two pale.json's more consistent (pale.json now correctly
includes the title of the chapters).
2024-11-23 13:22:53 -06:00
David Lynch
64d77b62db
Improve cloudflare email decoding
...
New format for the protected emails, wrapping a span in an a.
2024-01-28 13:26:34 -06:00
David Lynch
f57db3e1a8
Helper for extracting form data from a soup
2022-05-13 11:04:05 -05:00
David Lynch
4242aa6f63
Strip colors on all sites, not just xenforo
2021-11-07 11:16:26 -06:00
David Lynch
f1bd28e942
Fanfiction.net: experiment with falling back to the wayback machine
2021-07-19 15:17:39 -05:00
David Lynch
d1caf85883
Extract tags when present
...
Supported currently on Xenforo and AO3
2021-05-01 16:35:49 -05:00
David Lynch
77cc334bcf
Merge pull request #60 from ClaasJG/master
...
Stable seed generation for Sections
2021-03-27 19:16:11 -05:00
ClaasJG
5b39c73904
Add stable Section id based on URL
...
Remove Chapter id
2021-03-28 00:41:03 +01:00
David Lynch
bf315d06fe
Grab the much more-pythonic CF email decode from #37
2021-03-27 11:20:01 -05:00
David Lynch
f25befc237
Decode cloudflare email address protection
...
Makes a generic _clean function on Site that can be called. Will
probably want to migrate some other generic bits into there after
analysis of what's *really* generic.
2021-03-27 10:46:39 -05:00
David Lynch
d50f23d07b
Special exception for hitting a cloudflare captcha page
...
Fanfiction.net is currently doing this, so let's at least acknowledge it
Refs #53
2021-02-12 16:02:55 -06:00
David Lynch
7208cfdaaf
Minor readability improvement: use f-strings
2019-10-15 11:14:27 -05:00
David Lynch
c584988994
Update dependencies
2019-10-14 00:40:34 -05:00
David Lynch
2bd5d77715
Helper for URL-joining
2019-05-29 01:55:35 -05:00
David Lynch
40b4856a14
Optimize AO3: use full_work URL
2019-05-25 15:31:39 -05:00
David Lynch
0a81069d24
Slightly more verbose logging of load failures
2018-12-29 20:46:55 -06:00
David Lynch
e78ffdb85b
Method to get a site-key for config
...
Means that things like XenForoIndex and AO3Series don't require separate
config entries.
2018-10-11 15:42:59 -05:00
Alex Raubach
fe76b5427b
Add cover_url attribute
2018-09-02 22:08:36 -04:00
Will Oursler
d1842e2bf1
Adds a system for site options to be included as click.options on commands.
2018-04-14 12:56:31 -04:00
Will Oursler
ecebf1de58
Merge branch 'master' into clickify
2018-04-13 17:52:37 -04:00
David Lynch
7d2c1647e2
Safer check on retry-after
2018-02-28 20:54:37 -06:00
David Lynch
6d52c72c99
Use logging instead of print
...
Fixes #10
2017-11-04 00:09:09 -05:00
David Lynch
43599aceb5
Merge branch 'master' into clickify
2017-11-03 15:21:44 -05:00
David Lynch
f1ac7c8bda
Retry failed site-requests
2017-10-31 00:27:54 -05:00
Will Oursler
9b4d2a0998
Adds a more sensible default for options in the Site base class.
2017-10-13 19:43:38 -04:00
Will Oursler
c702337040
Reworks how site-specific options work.
2017-10-13 19:37:13 -04:00
Will Oursler
db48233cf4
Switch from using raw argparser to using click. Preserves the existing
...
interface, except leech --flush becomes leech flush
2017-10-12 13:00:24 -04:00
Will Oursler
5bd07a5b90
Splits out ebook generation logic into a seperate module, in anticipation of maybe supporting multiple output formats.
2017-10-12 09:49:32 -04:00
David Lynch
5b4b9a0dc3
Canonicalize URLs
2017-02-23 15:03:23 -06:00
David Lynch
f066fc663d
Use attrs
2017-02-02 23:18:21 -06:00
David Lynch
e6343cb1c9
Stories are now made of nested sections/chapters
...
This is prep-work for improving epub TOC generation a bit.
2017-01-10 00:23:24 -08:00
David Lynch
24fa9aa22d
Use a namedtuple for chapters
2016-09-23 13:11:52 -05:00
David Lynch
574cea3fc8
Make the sites system not require editing __init__.py
2016-09-23 12:51:03 -05:00
David Lynch
86f02812d2
Use requests-cache
2016-08-29 10:59:20 -05:00
David Lynch
d9e65e5b6a
Add a little documentation on the extract method
2016-04-04 09:58:47 -05:00
David Lynch
9eb5b270ab
Ignore the linting on my sites import
2016-04-04 09:45:45 -05:00
David Lynch
008eb8e63d
Support ArchiveOfOurOwn
2016-04-03 21:30:29 -05:00
David Lynch
aa4ba528b7
Let sites define their own custom arguments
...
Use this to let xenforo force the inclusion of the index-post
2015-12-05 01:34:20 -06:00