Backup_Repos/leech

mirror of https://github.com/kemayo/leech synced 2025-12-06 08:22:56 +01:00

Author	SHA1	Message	Date
David Lynch	a0a057c48c	_soup always returns a base URL	2024-11-23 15:15:29 -06:00
Idan Dor	31f663c6e0	Added image embedding support for epub Specifically, added image_selector for arbitrary sites that allows selecting img tags from chapters, downloading them and embedding them within the resulting epub. In the case of Pale, this means that the character banners and extra materials do not require an internet connection to view. Also made the two pale.json's more consistent (pale.json now correctly includes the title of the chapters).	2024-11-23 13:22:53 -06:00
David Lynch	f25befc237	Decode cloudflare email address protection Makes a generic _clean function on Site that can be called. Will probably want to migrate some other generic bits into there after analysis of what's really generic.	2021-03-27 10:46:39 -05:00
claasjg	d4f3986515	Detect URL loop with next selector	2021-03-19 14:49:38 +01:00
David Lynch	28cc1fbcc7	Arbitrary should store contents as a string, not a bs4 Tag It coincidentally works by being string-like for previous uses, but it's not string-like enough for the new unicode stuff. Fixes #54	2021-02-05 19:58:47 -06:00
IdanDor	6d7b5ffcf0	Removed trailing whitespace.	2021-01-23 13:30:03 +02:00
IdanDor	1afac50437	Made arbitrary sites no longer leak memory and fixed worm epub. Each `Chapter` object had a reference to the entire page tree, meaning that the program rose in RAM usage by a lot. Transformed Worm to be with next_selector so the chapters are correctly ordered, E.2 is not skipped and the download does not crush due to `?share=twitter` url matched before. Fixed Worm titles.	2021-01-23 12:12:48 +02:00
David Lynch	c208e33752	Arbitrary: strip all namespaced elements This is `fb:like` and similar, which break some epub readers. Refs: #41, #43	2020-09-08 23:04:47 -05:00
David Lynch	6fbdc8843d	Make arbitrary site chapter-title selectors more resilient	2020-04-29 17:55:20 -05:00
David Lynch	532a7c6682	Fix typo of title_element in arbitrary Fixes #25	2019-07-30 09:37:03 -05:00
David Lynch	2bd5d77715	Helper for URL-joining	2019-05-29 01:55:35 -05:00
David Lynch	02bd6ae0c6	Merge pull request #16 from AlexRaubach/covers Download cover art from RR and arbitrary sites	2018-10-01 12:18:39 -05:00
David Lynch	929284b67d	New features for arbitrary sites * next_selector: find next content page, if not using chapter selector * content_title_selector: pull a chapter title from the content * content_text_selector: pull specific text from the content element `content_selector` will now fetch all content elements on the page, each as a Chapter, not just the first one that matches.	2018-10-01 11:18:39 -05:00
Alex Raubach	ff568eef10	Allow arbitrary sites to include a cover url	2018-09-02 22:08:36 -04:00
Alex Raubach	1bfc9b75f7	Remove unneeded whitespace	2018-08-28 23:24:59 -04:00
Alex Raubach	2019616505	Check that the chapter has content before parsing Trying to select the first element in line 87 will throw a list index out of range error if there is no content matching the selector.	2018-08-28 21:59:16 -04:00
David Lynch	6d52c72c99	Use logging instead of print Fixes #10	2017-11-04 00:09:09 -05:00
David Lynch	257ab69394	Arbitrary handler: canonicalize URLs	2017-10-22 17:31:10 -05:00
David Lynch	dc0d2162fb	Arbitrary handler had misplaced url arg	2017-10-22 17:06:40 -05:00
Will Oursler	5bd07a5b90	Splits out ebook generation logic into a seperate module, in anticipation of maybe supporting multiple output formats.	2017-10-12 09:49:32 -04:00
David Lynch	d60c21cae3	Remove TODO from arbitrary `529b85c7` implemented this, so it's good.	2017-10-06 14:08:18 -05:00
David Lynch	529b85c7a6	Adjust Arbitrary so it can handle non-chapter works	2017-04-29 20:59:04 -05:00
David Lynch	17664125f3	Changed mind for arbitrary: JSON definitions	2017-04-24 22:02:16 -05:00
David Lynch	7171d2c9ea	Add an arbitrary-site handler	2017-04-24 01:09:43 -05:00

24 commits