David Lynch
28cc1fbcc7
Arbitrary should store contents as a string, not a bs4 Tag
...
It coincidentally works by being string-like for previous uses, but it's
not string-like enough for the new unicode stuff.
Fixes #54
2021-02-05 19:58:47 -06:00
IdanDor
6d7b5ffcf0
Removed trailing whitespace.
2021-01-23 13:30:03 +02:00
IdanDor
1afac50437
Made arbitrary sites no longer leak memory and fixed worm epub.
...
Each `Chapter` object had a reference to the entire page tree, meaning that the program rose in RAM usage by a lot.
Transformed Worm to be with next_selector so the chapters are correctly ordered, E.2 is not skipped and the download does not crush due to `?share=twitter` url matched before.
Fixed Worm titles.
2021-01-23 12:12:48 +02:00
David Lynch
c208e33752
Arbitrary: strip all namespaced elements
...
This is `fb:like` and similar, which break some epub readers.
Refs: #41 , #43
2020-09-08 23:04:47 -05:00
David Lynch
6fbdc8843d
Make arbitrary site chapter-title selectors more resilient
2020-04-29 17:55:20 -05:00
David Lynch
532a7c6682
Fix typo of title_element in arbitrary
...
Fixes #25
2019-07-30 09:37:03 -05:00
David Lynch
2bd5d77715
Helper for URL-joining
2019-05-29 01:55:35 -05:00
David Lynch
02bd6ae0c6
Merge pull request #16 from AlexRaubach/covers
...
Download cover art from RR and arbitrary sites
2018-10-01 12:18:39 -05:00
David Lynch
929284b67d
New features for arbitrary sites
...
* next_selector: find next content page, if not using chapter selector
* content_title_selector: pull a chapter title from the content
* content_text_selector: pull specific text from the content element
`content_selector` will now fetch all content elements on the page, each
as a Chapter, not just the first one that matches.
2018-10-01 11:18:39 -05:00
Alex Raubach
ff568eef10
Allow arbitrary sites to include a cover url
2018-09-02 22:08:36 -04:00
Alex Raubach
1bfc9b75f7
Remove unneeded whitespace
2018-08-28 23:24:59 -04:00
Alex Raubach
2019616505
Check that the chapter has content before parsing
...
Trying to select the first element in line 87 will throw a list index out of range error if there is no content matching the selector.
2018-08-28 21:59:16 -04:00
David Lynch
6d52c72c99
Use logging instead of print
...
Fixes #10
2017-11-04 00:09:09 -05:00
David Lynch
257ab69394
Arbitrary handler: canonicalize URLs
2017-10-22 17:31:10 -05:00
David Lynch
dc0d2162fb
Arbitrary handler had misplaced url arg
2017-10-22 17:06:40 -05:00
Will Oursler
5bd07a5b90
Splits out ebook generation logic into a seperate module, in anticipation of maybe supporting multiple output formats.
2017-10-12 09:49:32 -04:00
David Lynch
d60c21cae3
Remove TODO from arbitrary
...
529b85c7 implemented this, so it's good.
2017-10-06 14:08:18 -05:00
David Lynch
529b85c7a6
Adjust Arbitrary so it can handle non-chapter works
2017-04-29 20:59:04 -05:00
David Lynch
17664125f3
Changed mind for arbitrary: JSON definitions
2017-04-24 22:02:16 -05:00
David Lynch
7171d2c9ea
Add an arbitrary-site handler
2017-04-24 01:09:43 -05:00