David Lynch
6895a0eb61
AO3 single-chapter story bugs
2023-03-31 23:51:27 -05:00
David Lynch
fe5ca86d87
Royalroad's markup has changed slightly, fix so title and summary work
2023-03-17 16:06:52 -05:00
David Lynch
d81eefa7f3
AO3: use new form helper so this shouldn't break again if fields change
2022-05-13 11:04:25 -05:00
David Lynch
f57db3e1a8
Helper for extracting form data from a soup
2022-05-13 11:04:05 -05:00
David Lynch
e9f704716a
Xenforo: change some of the style-removal
...
It was causing some formatting issues, particularly on Worm fics which
did forum-style sections. (Also, indented text done via margin-left on
divs, which entirely removed the div and ran lines together.)
2022-04-27 11:07:16 -05:00
David Lynch
56bc2b941c
AO3: utf8 field no longer in login form
2022-04-16 18:26:26 -05:00
David Lynch
08abe54e79
Switch out use of :=, forgot I wasn't requiring 3.8 yet
2022-03-06 10:46:13 -06:00
David Lynch
172877410b
Xenforo: if fetching a specific threadmark category, add it to the title
...
Unless it's 1, since that's always "threadmarks" and the main story.
Refs #79
2022-03-06 10:42:39 -06:00
David Lynch
29589a0886
RoyalRoad: don't error when covers are relative URLs
...
Only happens when the work has no set cover, because it gets a /dist/
URL rather than a CDN URL.
Fixes #77
2022-02-22 12:19:58 -06:00
David Lynch
f204dcd928
Add a class to generated spoiler divs
2022-02-13 11:44:36 -06:00
David Lynch
697e4c0bf9
Royalroad: don't crash on malformed spoiler tags
...
Fixes #74
2022-02-03 11:08:40 -06:00
David Lynch
dc9c9dbe57
Pull summary and tags for royalroad
2021-11-07 13:16:59 -06:00
David Lynch
4242aa6f63
Strip colors on all sites, not just xenforo
2021-11-07 11:16:26 -06:00
David Lynch
f05bfb51ef
AO3: work if www is present in the URL
2021-08-10 17:15:18 -05:00
David Lynch
f1bd28e942
Fanfiction.net: experiment with falling back to the wayback machine
2021-07-19 15:17:39 -05:00
David Lynch
d1caf85883
Extract tags when present
...
Supported currently on Xenforo and AO3
2021-05-01 16:35:49 -05:00
David Lynch
37cb0332b7
AO3: fix issue that could occur if the work had gaps in chapter numbers
2021-04-05 19:55:46 -05:00
David Lynch
77cc334bcf
Merge pull request #60 from ClaasJG/master
...
Stable seed generation for Sections
2021-03-27 19:16:11 -05:00
ClaasJG
5b39c73904
Add stable Section id based on URL
...
Remove Chapter id
2021-03-28 00:41:03 +01:00
David Lynch
bf315d06fe
Grab the much more-pythonic CF email decode from #37
2021-03-27 11:20:01 -05:00
David Lynch
f25befc237
Decode cloudflare email address protection
...
Makes a generic _clean function on Site that can be called. Will
probably want to migrate some other generic bits into there after
analysis of what's *really* generic.
2021-03-27 10:46:39 -05:00
David Lynch
dfa298dd3b
Better error message for restricted AO3 stories
2021-03-21 23:17:29 -05:00
claasjg
d4f3986515
Detect URL loop with next selector
2021-03-19 14:49:38 +01:00
David Lynch
ce998c84c3
Extract spoilers to footnotes on royalroad
2021-03-07 11:28:49 -06:00
David Lynch
d50f23d07b
Special exception for hitting a cloudflare captcha page
...
Fanfiction.net is currently doing this, so let's at least acknowledge it
Refs #53
2021-02-12 16:02:55 -06:00
David Lynch
28cc1fbcc7
Arbitrary should store contents as a string, not a bs4 Tag
...
It coincidentally works by being string-like for previous uses, but it's
not string-like enough for the new unicode stuff.
Fixes #54
2021-02-05 19:58:47 -06:00
David Lynch
ae1b77da2f
Wattpad: use API instead
...
Their on-page HTML sometimes uses JS to load parts of the story
2021-01-26 13:11:56 -06:00
David Lynch
23c7a1496c
Quick take on wattpad
2021-01-26 01:56:41 -06:00
IdanDor
6d7b5ffcf0
Removed trailing whitespace.
2021-01-23 13:30:03 +02:00
IdanDor
1afac50437
Made arbitrary sites no longer leak memory and fixed worm epub.
...
Each `Chapter` object had a reference to the entire page tree, meaning that the program rose in RAM usage by a lot.
Transformed Worm to be with next_selector so the chapters are correctly ordered, E.2 is not skipped and the download does not crush due to `?share=twitter` url matched before.
Fixed Worm titles.
2021-01-23 12:12:48 +02:00
David Lynch
c208e33752
Arbitrary: strip all namespaced elements
...
This is `fb:like` and similar, which break some epub readers.
Refs: #41 , #43
2020-09-08 23:04:47 -05:00
David Lynch
988368bb66
Better xenforo blockquote chrome removal
2020-08-18 13:21:01 -05:00
David Lynch
2103f37cfb
AO3: fallback for single-chapter works
2020-05-04 00:31:19 -05:00
David Lynch
6fbdc8843d
Make arbitrary site chapter-title selectors more resilient
2020-04-29 17:55:20 -05:00
David Lynch
6631095726
Fiction.live: niche URLs
...
* occasional stories with "Sci-fi" in the URL instead of "stories"
* rare cases of `-` in the work id
Fixes #31
2019-11-14 14:45:19 -06:00
David Lynch
a856f9d0f8
Fiction.live: account for a weird rare bug/possibility in votes
...
Also, add a bunch of error handling / logging to the section-parsing to
avoid this in the future.
Fixes #30
2019-11-07 09:34:39 -06:00
David Lynch
f89f5163b5
Fiction.live: Fix choices array check
...
Fixes #29
2019-11-05 15:02:09 -06:00
David Lynch
4861ffbd7e
Fiction.live can have votes for absent choices
...
Fixes #28 .
2019-10-29 08:17:01 -05:00
David Lynch
dc10e4cf17
FFN: less-destructive attribute clearing
2019-10-17 22:29:01 -05:00
David Lynch
7208cfdaaf
Minor readability improvement: use f-strings
2019-10-15 11:14:27 -05:00
David Lynch
c584988994
Update dependencies
2019-10-14 00:40:34 -05:00
David Lynch
9d0b5f1d3a
Merge pull request #26 from thegrinner/no-vote-fictionlive
...
Fix FictionLive download failure on missing vote node
2019-10-14 00:07:34 -05:00
David Lynch
d782928e0e
Spacebattles is now on XenForo2
2019-10-12 10:51:22 -05:00
thegrinner
4e4f16e7cc
Appease flake8
2019-10-03 17:48:45 -04:00
thegrinner
d0402daa7b
Add handling for votes that don't have a votes kvp
2019-10-03 17:36:43 -04:00
David Lynch
5e034a7d65
Xenforo let non-first-category threadmarks work
...
Currently this just requires passing a link to the reader view of a particular
category. In the future I might want to support more variants on this -- a
flag to pull down all the threadmark categories, for instance.
2019-08-06 17:29:53 -05:00
David Lynch
532a7c6682
Fix typo of title_element in arbitrary
...
Fixes #25
2019-07-30 09:37:03 -05:00
David Lynch
f002064352
Xenforo2 title labels
2019-07-24 23:29:12 -05:00
David Lynch
a148fa8c43
Flake8 errors
2019-07-13 13:17:54 -05:00
David Lynch
3443304ab1
XenForo: handle SV's XenForo2 changes
2019-07-13 11:42:22 -05:00
David Lynch
b1b51bdc8f
Xenforo: clean out title prefixes
2019-06-17 16:13:09 -05:00
David Lynch
c8f5b3f8d8
XenForo should use reader-view if available
...
Much like 40b4856 greatly sped up AO3, this greatly speeds up XenForo
2019-05-29 01:56:39 -05:00
David Lynch
2bd5d77715
Helper for URL-joining
2019-05-29 01:55:35 -05:00
David Lynch
66576048da
Fix flake8 errors
2019-05-25 20:03:17 -05:00
David Lynch
40b4856a14
Optimize AO3: use full_work URL
2019-05-25 15:31:39 -05:00
David Lynch
f64fce0286
AO3: login form changed
2018-12-29 21:00:02 -06:00
David Lynch
0a81069d24
Slightly more verbose logging of load failures
2018-12-29 20:46:55 -06:00
David Lynch
e78ffdb85b
Method to get a site-key for config
...
Means that things like XenForoIndex and AO3Series don't require separate
config entries.
2018-10-11 15:42:59 -05:00
David Lynch
cdcd110c50
AO3: change title detection for logged-in only
2018-10-11 15:42:36 -05:00
David Lynch
0c771ee767
Merge pull request #17 from AlexRaubach/rr_notes
...
Place post-chapter RR author notes at the end of the chapter
2018-10-01 12:19:35 -05:00
David Lynch
02bd6ae0c6
Merge pull request #16 from AlexRaubach/covers
...
Download cover art from RR and arbitrary sites
2018-10-01 12:18:39 -05:00
David Lynch
929284b67d
New features for arbitrary sites
...
* next_selector: find next content page, if not using chapter selector
* content_title_selector: pull a chapter title from the content
* content_text_selector: pull specific text from the content element
`content_selector` will now fetch all content elements on the page, each
as a Chapter, not just the first one that matches.
2018-10-01 11:18:39 -05:00
David Lynch
f17b040f64
Fix spacing
2018-09-29 14:32:38 -05:00
David Lynch
b3f4e720d0
Include AlternateHistory as a xenforo site
...
Fixes #18 . Well, makes it unnecessary. Strictly, it'd maybe still be useful to
show how to do a XenForo site via `arbitary`.
2018-09-29 12:00:41 -05:00
Alex Raubach
1ff009f893
Improve Prechapter author note detection
2018-09-27 13:48:33 -04:00
Alex Raubach
a9dfdb5dd3
Add a null check to RR author note placement
2018-09-17 20:22:13 -04:00
Alex Raubach
cf62faf5dd
Support two RR author notes in one chapter
2018-09-17 20:03:01 -04:00
Alex Raubach
94900cb126
Simplify Royal Road chapter scraper
2018-09-17 00:05:47 -04:00
Alex Raubach
d71184ae8b
Place post-chapter RR author notes at the end of the chapter
2018-09-16 17:36:07 -04:00
David Lynch
18c9d68617
Xenforo: cope with ThreadmarksPro's fetchers
2018-09-15 00:18:07 -05:00
Alex Raubach
ff568eef10
Allow arbitrary sites to include a cover url
2018-09-02 22:08:36 -04:00
Alex Raubach
571e262735
Find RR cover img src and assign to cover_url
2018-09-02 22:08:36 -04:00
Alex Raubach
fe76b5427b
Add cover_url attribute
2018-09-02 22:08:36 -04:00
David Lynch
8273ca1a77
Fix spacing
2018-08-29 23:39:39 -05:00
David Lynch
17cd0ea4e2
Royalroad domain name fiddliness
2018-08-29 23:07:06 -05:00
David Lynch
6c8ac39d64
fromtimestamp still needed
...
My bad.
2018-08-29 23:04:17 -05:00
David Lynch
a151f02c84
Fix spacing
...
...I'm bad at the web interface.
2018-08-29 23:01:43 -05:00
David Lynch
69c9c21f47
Avoid double-fetching the chapter contents
...
Doesn't matter hugely if caching is enabled, but it's still suboptimal.
2018-08-29 23:00:45 -05:00
random human
23b76d2aac
Fix royalroadl.com chapter dates
...
Since the timestamp provided with the chapter list is approximate, fetch
the actual chapter in order to get unixtime.
2018-08-30 04:03:29 +05:30
Alex Raubach
1bfc9b75f7
Remove unneeded whitespace
2018-08-28 23:24:59 -04:00
Alex Raubach
2019616505
Check that the chapter has content before parsing
...
Trying to select the first element in line 87 will throw a list index out of range error if there is no content matching the selector.
2018-08-28 21:59:16 -04:00
David Lynch
fb8d6cf0d6
Merge pull request #9 from Zomega/clickify
...
Switch from using raw argparser to using click.
2018-08-17 21:33:23 -05:00
David Lynch
499530993c
Royalroad seems to need www now
2018-07-11 21:24:22 -05:00
Will Oursler
d1842e2bf1
Adds a system for site options to be included as click.options on commands.
2018-04-14 12:56:31 -04:00
Will Oursler
ecebf1de58
Merge branch 'master' into clickify
2018-04-13 17:52:37 -04:00
David Lynch
868ef4b157
Handle mobile links for FFN
2018-03-30 15:18:57 -05:00
David Lynch
7d2c1647e2
Safer check on retry-after
2018-02-28 20:54:37 -06:00
David Lynch
2042f813d0
Allow AO3 logins for member-only stories
2018-01-19 14:15:43 -06:00
David Lynch
f8d494283c
Proper URL normalization for AO3 chapters
2018-01-19 13:19:45 -06:00
David Lynch
e9dab9ab7d
Fix linting on royalroad
2017-11-17 22:57:54 -06:00
David Lynch
e099f47e66
Support: RoyalRoad
2017-11-17 21:37:13 -06:00
David Lynch
7bb6da382c
Oh hey, another missing Section URL
2017-11-04 00:30:59 -05:00
David Lynch
6d52c72c99
Use logging instead of print
...
Fixes #10
2017-11-04 00:09:09 -05:00
David Lynch
43599aceb5
Merge branch 'master' into clickify
2017-11-03 15:21:44 -05:00
David Lynch
f1ac7c8bda
Retry failed site-requests
2017-10-31 00:27:54 -05:00
David Lynch
27b677a444
Fix no-threadmarks autodetect
2017-10-29 19:50:19 -05:00
David Lynch
257ab69394
Arbitrary handler: canonicalize URLs
2017-10-22 17:31:10 -05:00
David Lynch
dc0d2162fb
Arbitrary handler had misplaced url arg
2017-10-22 17:06:40 -05:00
Will Oursler
9b4d2a0998
Adds a more sensible default for options in the Site base class.
2017-10-13 19:43:38 -04:00
Will Oursler
c702337040
Reworks how site-specific options work.
2017-10-13 19:37:13 -04:00