1
0
Fork 0
mirror of https://github.com/kemayo/leech synced 2025-12-06 08:22:56 +01:00
Commit graph

385 commits

Author SHA1 Message Date
David Lynch
5bfd1b40a0 Give image downloading a timeout 2025-03-26 20:55:19 -05:00
David Lynch
5cb887f767 Move image processing into sites
The epub-builder still downloads the image, but all the html-mangling
is done in the extraction process now.

Turns footnotes into a chapter-object, for easier processing later on.
2025-03-22 19:39:16 -05:00
David Lynch
81189f4e1d xenforo: minor fixes around images in spoilers 2025-03-22 00:16:11 -05:00
David Lynch
3c5a4bb75a
Merge pull request #100 from kpedro88/multiple-next-items
Handle multiple entries in next_link
2025-03-18 20:07:16 -05:00
Kevin Pedro
de6913a9af simplify algorithm 2025-03-08 09:48:32 -06:00
Kevin Pedro
d4e1214be3 return to loop-based algorithm 2025-03-08 09:40:42 -06:00
David Lynch
cfd073fb5c Fix an error in _soup if parsed content doesn't have a <head> 2025-03-06 22:33:32 -06:00
Kevin Pedro
b2f15eb76c satisfy linter 2025-03-05 21:03:35 -06:00
Kevin Pedro
280b242a27 stop loop once a new link is found 2025-03-05 20:56:47 -06:00
Kevin Pedro
0066a148bb process all next_link items 2025-03-05 20:56:47 -06:00
David Lynch
5213ec2632 Update dependencies to latest versions, remove html5lib 2025-03-04 23:14:51 -06:00
David Lynch
4d9c31b6ac Make the parser used for BeautifulSoup configurable, still default lxml
Refs #98
2025-03-04 23:14:51 -06:00
David Lynch
9ed2d54db7 Make the _soup method able to cope with being given a html string 2025-03-04 23:14:51 -06:00
Max Isom
53bc2045f0 Use lxml (>40% faster) 2025-03-04 22:23:50 -06:00
Kevin Pedro
52213725c9 update docker recipe for python changes 2025-03-04 22:14:38 -06:00
David Lynch
9a2b574b4b Missed a call to _soup in ao3 2024-12-23 21:02:09 -06:00
David Lynch
3fdbae5851 Pass through some more headers in the session 2024-12-17 16:34:57 -06:00
David Lynch
204807add6 Don't hardcode a story ID into a path before it's needed 2024-12-04 17:20:11 -06:00
David Lynch
bb1fcc0e50 Always process images if they're included in the chapter object 2024-12-04 17:15:09 -06:00
David Lynch
5392593621 Image options in an options-object pattern, like cover options 2024-12-04 17:12:06 -06:00
David Lynch
bedaec9989 Avoid potential image overlaps with nested sections 2024-12-04 16:51:21 -06:00
David Lynch
1fe907bec2 Pass image arguments to nested sections 2024-12-04 16:42:47 -06:00
David Lynch
acce8138a9 Also pass the base through to the super clean for royalroad 2024-12-02 11:01:34 -06:00
David Lynch
31154ed8d4 Fix a call to _clean for royalroad 2024-12-02 00:00:58 -06:00
David Lynch
e3c63bce3c New config option: allow_spaces
Determines whether spaces in filenames will be replaced with underscores
2024-11-30 14:07:40 -06:00
David Lynch
2f21280d76 Adjust option loading so it's easier to override 2024-11-30 14:07:40 -06:00
David Lynch
91d2c4fd4b Fully cancel if the story extraction fails 2024-11-30 13:52:43 -06:00
David Lynch
ffb8e54e91 Better error for an Arbitrary story that fetches no content 2024-11-23 23:07:16 -06:00
acestronautical
85da618cb2 Fix selectors for the Dungeon Keeper Ami example 2024-11-23 22:57:50 -06:00
David Lynch
7f91f1cc43 Some general readme updates 2024-11-23 22:44:34 -06:00
David Lynch
59923e0f63 Add note about alt="" behavior 2024-11-23 22:35:55 -06:00
David Lynch
6988fc8ccc Add output mentioning when an image is cached 2024-11-23 22:34:46 -06:00
David Lynch
d49d7891c3 Fix some images not having srcset and sizes removed 2024-11-23 22:34:46 -06:00
David Lynch
746ec1b994 Fix image enabling by default
Follow-up to 6ecb1d8942
2024-11-23 22:10:19 -06:00
David Lynch
bf248bbfc8 Remove unused register import in xenforo.py 2024-11-23 21:48:46 -06:00
David Lynch
ef43295c25 AlternateHistory is on XenForo2 now
...this was the last site I had in old XenForo, so I will probably want
to clean that up soon.
2024-11-23 21:41:36 -06:00
David Lynch
0cac7ff945 New spoilers behavior: --spoilers [include/inline/skip]
Fixes #75
2024-11-23 21:39:54 -06:00
David Lynch
3f6fd401ad Update the readme with the current python version requirement 2024-11-23 19:43:56 -06:00
David Lynch
a39e1e9f89 Use the newer syntax for attrs 2024-11-23 19:42:35 -06:00
David Lynch
d6d23e4c60 Bump dependency versions and required python version 2024-11-23 17:38:52 -06:00
David Lynch
4f15e0517f Change how the build a cover test runs 2024-11-23 16:59:25 -06:00
David Lynch
3fbe181b12 In no-images case, replace with alt if present rather than decomposing
Putting a placeholder there for the altless, to avoid confusion.
2024-11-23 16:48:09 -06:00
David Lynch
740a41f4ef Avoid refetching images that're repeated across chapters 2024-11-23 16:33:01 -06:00
David Lynch
6ecb1d8942 Make downloading images the default behavior 2024-11-23 16:33:01 -06:00
David Lynch
400c5cc801 Configurable whether to always convert images 2024-11-23 16:33:01 -06:00
David Lynch
b6310658e8 Command-line flag to enable/disable fetching images 2024-11-23 16:33:01 -06:00
David Lynch
e2bc6eba1c Change order of config loading so site-specific overrides of cover/image work 2024-11-23 16:33:01 -06:00
David Lynch
4856649424 Be less verbose when downloading images 2024-11-23 16:33:01 -06:00
David Lynch
9510a22cb0 Remove arbitrary's special-case image loading, since the default works 2024-11-23 16:33:01 -06:00
David Lynch
21834bb5ed _clean takes a base argument and reformats image srcs into absolute urls 2024-11-23 15:30:57 -06:00