1
0
Fork 0
mirror of https://github.com/kemayo/leech synced 2025-12-06 08:22:56 +01:00
Commit graph

43 commits

Author SHA1 Message Date
David Lynch
5cb887f767 Move image processing into sites
The epub-builder still downloads the image, but all the html-mangling
is done in the extraction process now.

Turns footnotes into a chapter-object, for easier processing later on.
2025-03-22 19:39:16 -05:00
David Lynch
4d9c31b6ac Make the parser used for BeautifulSoup configurable, still default lxml
Refs #98
2025-03-04 23:14:51 -06:00
Max Isom
53bc2045f0 Use lxml (>40% faster) 2025-03-04 22:23:50 -06:00
David Lynch
3fdbae5851 Pass through some more headers in the session 2024-12-17 16:34:57 -06:00
David Lynch
204807add6 Don't hardcode a story ID into a path before it's needed 2024-12-04 17:20:11 -06:00
David Lynch
bb1fcc0e50 Always process images if they're included in the chapter object 2024-12-04 17:15:09 -06:00
David Lynch
5392593621 Image options in an options-object pattern, like cover options 2024-12-04 17:12:06 -06:00
David Lynch
bedaec9989 Avoid potential image overlaps with nested sections 2024-12-04 16:51:21 -06:00
David Lynch
1fe907bec2 Pass image arguments to nested sections 2024-12-04 16:42:47 -06:00
David Lynch
e3c63bce3c New config option: allow_spaces
Determines whether spaces in filenames will be replaced with underscores
2024-11-30 14:07:40 -06:00
David Lynch
59923e0f63 Add note about alt="" behavior 2024-11-23 22:35:55 -06:00
David Lynch
6988fc8ccc Add output mentioning when an image is cached 2024-11-23 22:34:46 -06:00
David Lynch
a39e1e9f89 Use the newer syntax for attrs 2024-11-23 19:42:35 -06:00
David Lynch
3fbe181b12 In no-images case, replace with alt if present rather than decomposing
Putting a placeholder there for the altless, to avoid confusion.
2024-11-23 16:48:09 -06:00
David Lynch
740a41f4ef Avoid refetching images that're repeated across chapters 2024-11-23 16:33:01 -06:00
David Lynch
400c5cc801 Configurable whether to always convert images 2024-11-23 16:33:01 -06:00
David Lynch
4856649424 Be less verbose when downloading images 2024-11-23 16:33:01 -06:00
David Lynch
9508b00bcb Rearrange the image options to match cover options 2024-11-23 14:54:12 -06:00
Emmanuel Jemeni
34bf962df6 feat: Leech can now compress images to a specific target size 2024-11-23 13:22:54 -06:00
Emmanuel Jemeni
e6ad77a9fc fix: Completely fixes #2 ! 2024-11-23 13:22:54 -06:00
Emmanuel Jemeni
63ac765e41 fix(ebook/__init__.py): Leech will now ignore empty image tags (because apparently that's a thing).
feat(ebook/__init__.py): Leech print out more information about the images it is downloading. The number of images in each chapter and the image downloading currently.
2024-11-23 13:22:54 -06:00
Emmanuel Jemeni
dca26e95ea feat(ebook/__init__.py): leech checks if an image has an alt attribute and adds one if it doesn't 2024-11-23 13:22:54 -06:00
Emmanuel Jemeni
71345b2658 fix(Partial-Fix-to-Issue-#2): Leech can now download images however there is no way of disabling this option and this was only tested with stories from fiction.live
BREAKING CHANGE:
2024-11-23 13:22:54 -06:00
Idan Dor
31f663c6e0 Added image embedding support for epub
Specifically, added image_selector for arbitrary sites that allows
selecting img tags from chapters, downloading them
and embedding them within the resulting epub.

In the case of Pale, this means that the character banners and
extra materials do not require an internet connection to view.

Also made the two pale.json's more consistent (pale.json now correctly
includes the title of the chapters).
2024-11-23 13:22:53 -06:00
David Lynch
7eb48c872d Clean up epub file list generation 2022-02-13 12:25:01 -06:00
David Lynch
bb9491cb96 Config option: output_dir
Can be provided on the command line as `--output-dir`, or in leech.json
as `output_dir` (also in the `site_options` in leech.json).

Refs #67
2021-09-04 15:46:16 -05:00
David Lynch
25312736d4 Use namedtuple in the epub generator so it's easier to understand 2021-07-21 10:32:55 -05:00
David Lynch
9b80a112d0 Output summary and tags in the Front Matter 2021-05-01 16:36:08 -05:00
David Lynch
0d0bdf470e Escape chapter titles when building templates
Unescaped ampersands cause validation errors...

TODO: should move away from string substitution to build XHTML

Refs #56
2021-03-02 02:30:05 -06:00
David Lynch
533c14f0d7 Normalize fancy unicode characters by default
Kindle can't display the "𝖙𝖍𝖚𝖌 𝖑𝖎𝖋𝖊" mathematical bold fraktur codepoints
so NFKC normalize them (and anything else) into its plain equivalent.

Can be disabled by running with `--no-normalize` if needed.
2021-02-05 01:59:20 -06:00
David Lynch
7208cfdaaf Minor readability improvement: use f-strings 2019-10-15 11:14:27 -05:00
David Lynch
c584988994 Update dependencies 2019-10-14 00:40:34 -05:00
David Lynch
ac3ba8db77 Fix sparse object passed for cover options 2019-10-14 00:29:40 -05:00
David Lynch
61f3bb1a6e Filter down the cover options to valid ones 2018-10-01 15:00:53 -05:00
David Lynch
02bd6ae0c6
Merge pull request #16 from AlexRaubach/covers
Download cover art from RR and arbitrary sites
2018-10-01 12:18:39 -05:00
David Lynch
8f8d7b1edd Better fallback for no-title case on chapters 2018-10-01 11:12:52 -05:00
Alex Raubach
60084534a8 Create empty dict when leech.json not present 2018-09-15 11:03:52 -04:00
Alex Raubach
1f57305e11 Download cover image if cover_url is in json 2018-09-10 23:13:26 -04:00
Alex Raubach
f2fc2c11db Capture cover options from leech.json and pass them to generate_epub() 2018-09-10 23:03:08 -04:00
Alex Raubach
ea60ac5122 Download cover images for RoyalRoad Stories 2018-09-02 22:08:36 -04:00
David Lynch
8ac1aa8bb0 Add cover config to leech.json 2017-10-12 11:20:45 -05:00
Will Oursler
1c577b6f67 Fix lint errors 2017-10-12 10:07:22 -04:00
Will Oursler
5bd07a5b90 Splits out ebook generation logic into a seperate module, in anticipation of maybe supporting multiple output formats. 2017-10-12 09:49:32 -04:00