David Lynch
14e7d5bf1b
Don't break trying to de-cloudflare links without an href
2026-01-04 22:41:46 -06:00
David Lynch
a71ac62f8b
Stop xenforo from over-stripping styles
2025-08-04 13:31:57 -05:00
David Lynch
b3489d5016
Add a basic Patreon site definition
...
Works for getting *all* posts from an author, or (more usefully) getting
all posts within a tag from an author
2025-08-01 21:23:26 -05:00
David Lynch
5f72f23e72
Note to self about royalroad chapter URLs
2025-08-01 19:43:31 -05:00
David Lynch
6fddf628fb
Add a bit more messaging around logging in to sites
2025-06-09 20:06:54 -05:00
David Lynch
5cb887f767
Move image processing into sites
...
The epub-builder still downloads the image, but all the html-mangling
is done in the extraction process now.
Turns footnotes into a chapter-object, for easier processing later on.
2025-03-22 19:39:16 -05:00
David Lynch
81189f4e1d
xenforo: minor fixes around images in spoilers
2025-03-22 00:16:11 -05:00
David Lynch
3c5a4bb75a
Merge pull request #100 from kpedro88/multiple-next-items
...
Handle multiple entries in next_link
2025-03-18 20:07:16 -05:00
Kevin Pedro
de6913a9af
simplify algorithm
2025-03-08 09:48:32 -06:00
Kevin Pedro
d4e1214be3
return to loop-based algorithm
2025-03-08 09:40:42 -06:00
David Lynch
cfd073fb5c
Fix an error in _soup if parsed content doesn't have a <head>
2025-03-06 22:33:32 -06:00
Kevin Pedro
b2f15eb76c
satisfy linter
2025-03-05 21:03:35 -06:00
Kevin Pedro
280b242a27
stop loop once a new link is found
2025-03-05 20:56:47 -06:00
Kevin Pedro
0066a148bb
process all next_link items
2025-03-05 20:56:47 -06:00
David Lynch
4d9c31b6ac
Make the parser used for BeautifulSoup configurable, still default lxml
...
Refs #98
2025-03-04 23:14:51 -06:00
David Lynch
9ed2d54db7
Make the _soup method able to cope with being given a html string
2025-03-04 23:14:51 -06:00
Max Isom
53bc2045f0
Use lxml (>40% faster)
2025-03-04 22:23:50 -06:00
David Lynch
9a2b574b4b
Missed a call to _soup in ao3
2024-12-23 21:02:09 -06:00
David Lynch
acce8138a9
Also pass the base through to the super clean for royalroad
2024-12-02 11:01:34 -06:00
David Lynch
31154ed8d4
Fix a call to _clean for royalroad
2024-12-02 00:00:58 -06:00
David Lynch
ffb8e54e91
Better error for an Arbitrary story that fetches no content
2024-11-23 23:07:16 -06:00
David Lynch
d49d7891c3
Fix some images not having srcset and sizes removed
2024-11-23 22:34:46 -06:00
David Lynch
746ec1b994
Fix image enabling by default
...
Follow-up to 6ecb1d8942
2024-11-23 22:10:19 -06:00
David Lynch
bf248bbfc8
Remove unused register import in xenforo.py
2024-11-23 21:48:46 -06:00
David Lynch
ef43295c25
AlternateHistory is on XenForo2 now
...
...this was the last site I had in old XenForo, so I will probably want
to clean that up soon.
2024-11-23 21:41:36 -06:00
David Lynch
0cac7ff945
New spoilers behavior: --spoilers [include/inline/skip]
...
Fixes #75
2024-11-23 21:39:54 -06:00
David Lynch
a39e1e9f89
Use the newer syntax for attrs
2024-11-23 19:42:35 -06:00
David Lynch
b6310658e8
Command-line flag to enable/disable fetching images
2024-11-23 16:33:01 -06:00
David Lynch
9510a22cb0
Remove arbitrary's special-case image loading, since the default works
2024-11-23 16:33:01 -06:00
David Lynch
21834bb5ed
_clean takes a base argument and reformats image srcs into absolute urls
2024-11-23 15:30:57 -06:00
David Lynch
a0a057c48c
_soup always returns a base URL
2024-11-23 15:15:29 -06:00
Emmanuel Jemeni
4e9ad1ed7e
feat: Leech can now download images in xenforo spoilers. The --include-spoilers tag has to be added for Leech to download images in spoilers.
2024-11-23 13:22:54 -06:00
Idan Dor
1edde92a9d
Fixed whitespacing for flake8.
2024-11-23 13:22:53 -06:00
Idan Dor
31f663c6e0
Added image embedding support for epub
...
Specifically, added image_selector for arbitrary sites that allows
selecting img tags from chapters, downloading them
and embedding them within the resulting epub.
In the case of Pale, this means that the character banners and
extra materials do not require an internet connection to view.
Also made the two pale.json's more consistent (pale.json now correctly
includes the title of the chapters).
2024-11-23 13:22:53 -06:00
David Lynch
7967c59636
Support 2fa for xenforo logins
2024-10-13 00:52:50 -05:00
David Lynch
249221f5d7
Fix questionable questing, which has moved to xenforo2
2024-05-14 22:08:22 -05:00
David Lynch
1f57cd6f07
Basic success-testing on logins
2024-05-14 22:07:05 -05:00
David Lynch
ef9309eb66
Fix xenforo login
2024-05-14 22:06:09 -05:00
David Lynch
cc423f62bb
Fix the royalroad stolen-content removal
...
They added speak:none to the CSS, and I was strictly checking for a rule
that only contained display:none.
2024-02-10 20:05:49 -06:00
David Lynch
64d77b62db
Improve cloudflare email decoding
...
New format for the protected emails, wrapping a span in an a.
2024-01-28 13:26:34 -06:00
David Lynch
d30e56a518
Strip out the new stolen-content warnings on royalroad
...
They might make these harder to work out in the future, but for now...
2024-01-19 21:34:39 -06:00
David Lynch
6c692968a4
Use isinstance rather than direct type comparison
2023-08-06 17:56:13 -05:00
David Lynch
03e9d3844f
Add the-sietch.com to xenforo sites
2023-08-06 17:43:51 -05:00
David Lynch
5ddbb310b3
Let xenforo sites cope with index.php URLs
2023-08-06 17:43:28 -05:00
David Lynch
7230f65a68
Add offset/limit options to royalroad
2023-05-04 09:42:13 -05:00
KeinNiemand
356bae9a7a
Don't prettify royalroad soup, Fixes #92
2023-05-04 13:17:28 +02:00
David Lynch
6895a0eb61
AO3 single-chapter story bugs
2023-03-31 23:51:27 -05:00
David Lynch
fe5ca86d87
Royalroad's markup has changed slightly, fix so title and summary work
2023-03-17 16:06:52 -05:00
David Lynch
d81eefa7f3
AO3: use new form helper so this shouldn't break again if fields change
2022-05-13 11:04:25 -05:00
David Lynch
f57db3e1a8
Helper for extracting form data from a soup
2022-05-13 11:04:05 -05:00