Compare commits

...

418 commits

Author SHA1 Message Date
Jim Miller
a172a7bd2b Bump Test Version 4.57.7 2026-05-07 13:54:08 -05:00
Jim Miller
ab103dce6e browsercache_sqldb: Better share_open and read-only. #1341 2026-05-07 13:54:02 -05:00
Jim Miller
892e9207f0 Bump Test Version 4.57.6 2026-05-06 19:53:58 -05:00
Jim Miller
b4e392fae1 browsercache_sqldb: Use share_open for windows file locking. #1341 2026-05-06 19:53:44 -05:00
Jim Miller
d9525d9726 Bump Test Version 4.57.5 2026-05-06 13:22:28 -05:00
Jim Miller
cb77b12754 Adding browsercache_sqldb for Yet Another caching scheme in Chrome. #1341 2026-05-06 13:22:22 -05:00
Jim Miller
b41a633821 Bump Test Version 4.57.4 2026-05-05 08:11:07 -05:00
Jim Miller
50c8db2992 browsercache_simple: Tweak index file size check. #1341 2026-05-05 08:10:59 -05:00
Jim Miller
ef6dd99bfe Bump Test Version 4.57.3 2026-05-04 15:05:25 -05:00
Jim Miller
59796ff537 Add debug out to Browser Cache cache dir checking #1341 2026-05-04 15:05:13 -05:00
Jim Miller
8ee0a6e898 Bump Test Version 4.57.2 2026-05-03 09:06:51 -05:00
Jim Miller
c53fc362bd Include genre/category in defaults.ini when include_in_X for extragenres/extracategories 2026-05-03 09:06:44 -05:00
Jim Miller
c87cfc1057 adapter_fanficauthorsnet: Domains changed from .nsns to -nsns 2026-05-01 10:10:37 -05:00
Jim Miller
6ee151c90a Bump Release Version 4.57.0 2026-05-01 09:38:27 -05:00
Jim Miller
db01c828a0 Update translations. 2026-05-01 09:37:13 -05:00
Jim Miller
4d03874f06 Fix a bad comment-out 2026-04-29 15:42:59 -05:00
Jim Miller
36f56483e6 Bump Test Version 4.56.10 2026-04-29 13:01:28 -05:00
Jim Miller
18e45a403b PI Anthology: Reuse epub cover if there is one. 2026-04-29 13:01:22 -05:00
Jim Miller
2e25172ba3 adapter_scribblehubcom: Update ajax call for chapters data. Didn't fix #1339 but change noted 3+ years ago 2026-04-29 10:15:26 -05:00
Jim Miller
65e3fd562b Update translations. 2026-04-27 16:53:06 -05:00
Jim Miller
7089bf6689 Bump Test Version 4.56.9 2026-04-21 15:02:05 -05:00
Jim Miller
061dc1333f PI: Correct Series field url link when setanthologyseries 2026-04-21 15:01:58 -05:00
Jim Miller
0a7fb5c090 Bump Test Version 4.56.8 2026-04-19 14:08:29 -05:00
Jim Miller
cf02f729ae adapter_literotica: Fix for numeric tag value from json. #1336 2026-04-19 14:08:21 -05:00
Jim Miller
730c4f77f9 Bump Test Version 4.56.7 2026-04-19 09:33:07 -05:00
Jim Miller
c02da29cbd Added strings for translation 2026-04-19 09:33:00 -05:00
Jim Miller
b87d796221 PI: Add Fix Series Case setting for #1338 2026-04-19 09:30:15 -05:00
Jim Miller
436370fe5b Done profiling for now 2026-04-19 09:03:10 -05:00
Jim Miller
ac77f31bc2 Move NotGoingToDownload to exceptions.py #1337 2026-04-19 09:02:32 -05:00
Jim Miller
16f2c74e4b Bump Test Version 4.56.6 2026-04-18 13:47:51 -05:00
praschke
af5c2aa0bc adapter_kakuyomujp: site update 2026-04-18 13:47:14 -05:00
Jim Miller
31dec5b62d Bump Test Version 4.56.5 2026-04-18 12:58:56 -05:00
Jim Miller
97d37fcfc1 fix_relative_text_links: Allow hrefs to name anchors as well as id. 2026-04-18 12:58:46 -05:00
Jim Miller
c730aa2f68 Bump Test Version 4.56.4 2026-04-17 10:22:20 -05:00
Jim Miller
4e2e359dee PI Anthologies: Only put status in tags if in include_subject_tags. Closes #1332 2026-04-17 10:22:13 -05:00
Jim Miller
bb96049934 Remove some debug 2026-04-16 14:27:48 -05:00
Jim Miller
84965ef25f Bump Test Version 4.56.3 2026-04-12 21:20:09 -05:00
Jim Miller
348d129a1e adapter_ficwadcom: Detect missing username as well as failed login #1330 2026-04-12 21:05:42 -05:00
Jim Miller
4794e9bc51 Bump Test Version 4.56.2 2026-04-10 21:56:43 -05:00
Jim Miller
d46dc76ae1 Somewhat better consolidated perf profiling 2026-04-10 21:56:43 -05:00
Jim Miller
08bae8d9be Imperfect, but working perf profiling 2026-04-10 16:49:17 -05:00
Jim Miller
405c37aeb5 Remove some dead code. 2026-04-10 16:43:49 -05:00
Jim Miller
270e01c3c7 Cache config values for performance improvement. 2026-04-10 16:24:37 -05:00
Jim Miller
12d57f5950 Bump Test Version 4.56.1 2026-04-06 12:07:14 -05:00
Jim Miller
562b3a4ecd Unnew Perf Improvement w/profiling 2026-04-06 12:07:05 -05:00
Jim Miller
e69045fd98 Bump Release Version 4.56.0 2026-04-02 10:03:42 -05:00
Jim Miller
747bde3394 Update (commented out) profiling code. 2026-04-02 10:02:58 -05:00
Jim Miller
aa00c7ae03 Bump Test Version 4.55.4 2026-03-27 11:54:50 -05:00
Jim Miller
0539f818f3 Add top menu items for Add/Edit Reject URLs. 2026-03-27 11:54:44 -05:00
Jim Miller
41a6f56f44 Remove fanficfare_macmenuhack. 2026-03-27 11:43:53 -05:00
Jim Miller
e3832245e6 Add Reject URLs: Accept story URLs drag/drop & paste like Add Stories by URL 2026-03-27 10:52:30 -05:00
Jim Miller
909b64c83c Remove some image processing debug output 2026-03-27 10:51:29 -05:00
Jim Miller
732f5e2571 Bump Test Version 4.55.3 2026-03-19 13:03:11 -05:00
Jim Miller
d9dd04396e Epub Update: Don't cache cover image with others, trips dedup. 2026-03-19 13:03:03 -05:00
Jim Miller
36e2183d45 Bump Test Version 4.55.2 2026-03-12 15:13:01 -05:00
Jim Miller
040b7205b8 adapter_literotica: Fix for site change (#1318) 2026-03-12 15:11:26 -05:00
Jim Miller
d8ed180eb1 Bump Test Version 4.55.1 2026-03-09 13:04:56 -05:00
Jim Miller
2a6c1e74db Make seriesUrl mutable again. 2026-03-09 13:04:50 -05:00
Jim Miller
b7c8c96153 Put download list at start of BG job too 2026-03-09 13:04:24 -05:00
Jim Miller
a16096592c Bump Release Version 4.55.0 2026-03-01 09:25:11 -06:00
Jim Miller
bb34eecc7c Remove a line of unused code. 2026-02-23 13:08:57 -06:00
Jim Miller
ceed7ef1a8 Bump Test Version 4.54.5 2026-02-10 08:45:34 -06:00
Jim Miller
1d2a887c2d Epub Update: Skip missing chapter, image and css files instead of failing. 2026-02-10 08:45:20 -06:00
Jim Miller
a3f3302312 Plugin only: In Skip mode, don't do initial metadata fetch if already matched in library. #1309 2026-02-10 08:30:02 -06:00
Jim Miller
ecf005b145 Bump Test Version 4.54.4 2026-02-05 16:09:00 -06:00
Jim Miller
3bd074fa2c Additional checks for svg images to reject--Calibre only. Related to #1298 2026-02-05 16:08:54 -06:00
Jim Miller
0fd95daa8e Bump Test Version 4.54.3 2026-02-05 13:46:42 -06:00
Jim Miller
1b57e49d98 Ignore CSS url() when ttf/otf/woff/woff2 font files 2026-02-05 13:46:24 -06:00
Jim Miller
db0d39c9cd Bump Test Version 4.54.2 2026-02-02 13:12:56 -06:00
Jim Miller
cbde66cf41 adapter_fimfictionnet/adapter_royalroadcom: Better handling of cover image size fall back #1306 2026-02-02 13:12:42 -06:00
Jim Miller
17331e9eb3 Bump Test Version 4.54.1 2026-02-01 13:51:23 -06:00
Jim Miller
9b96c151a5 adapter_adultfanfictionorg: Fixes for site changes #1305 2026-02-01 13:51:22 -06:00
Jim Miller
1b65a30798 Making some metadata entries immutable 2026-02-01 13:51:22 -06:00
Jim Miller
c9a47877f7 Allow for language getting changed by replace_metadata not breaking langcode 2026-02-01 09:15:31 -06:00
Jim Miller
bdc77ad0f6 Remove Site: swi.org.ru No DNS for site. 2026-02-01 09:15:31 -06:00
Jim Miller
719971c76c Don't set numChapters--it's done automatically. 2026-02-01 09:15:31 -06:00
Jim Miller
c74dba472a Fixes for mutable metadata entries used in code 2026-02-01 09:15:31 -06:00
Jim Miller
c1fb7f0fc5 Refactor metadata entry and settings name code a bit 2026-02-01 09:15:31 -06:00
Jim Miller
94c932cd2f Bump Release Version 4.54.0 2026-02-01 09:04:34 -06:00
Jim Miller
27fb765c0d Update translations. 2026-02-01 09:04:08 -06:00
Jim Miller
06ce46f64a Bump Test Version 4.53.15 2026-01-30 08:52:46 -06:00
Jim Miller
c04d85fa97 Plugin BG settings: Remove 'old' vs 'new' BG handling verbiage 2026-01-29 13:16:56 -06:00
Jim Miller
b6cdc30db5 Bump Test Version 4.53.14 2026-01-29 11:23:03 -06:00
Jim Miller
9bbb5e8b01 adapter_ficbooknet: Change how replace_text_formatting converts to text. 2026-01-29 11:22:40 -06:00
Jim Miller
18ce6e6fba BrowserCache: Add comment about py2 and gzip.decompress 2026-01-29 11:20:42 -06:00
Jim Miller
507910f5da Don't give format section warnings for fix_excess_space 2026-01-29 09:28:45 -06:00
Jim Miller
ccf7801a89 Bump Test Version 4.53.13 2026-01-27 11:24:25 -06:00
Jim Miller
9a52a10626 adapter_ficbooknet: Add replace_text_formatting option to replace CSS paragraphing with tags, for txt output. 2026-01-27 11:24:15 -06:00
Jim Miller
6963153aac adapter_storiesonlinenet: Site changed, get series number from series page now. 2026-01-27 10:10:52 -06:00
Jim Miller
ee357cd5b4 Bump Test Version 4.53.12 2026-01-24 09:32:26 -06:00
Jim Miller
b84e3d2858 adapter_royalroadcom: Fix login failure reporting #1302 2026-01-24 09:32:09 -06:00
Jim Miller
9377fc6671 Bump Test Version 4.53.11 2026-01-22 13:33:43 -06:00
Jim Miller
aaa0fa613a Image Handling: Fix tidy cover caching when no cover. 2026-01-22 13:33:36 -06:00
Jim Miller
eac5acfbfa Bump Test Version 4.53.10 2026-01-22 12:13:37 -06:00
Jim Miller
8dca1ef343 Image Handling: Remove unused images properly with dedup_img_files 2026-01-22 12:11:45 -06:00
Jim Miller
28e8f61cf8 Image Handling: Tidy cover caching 2026-01-22 11:29:20 -06:00
Jim Miller
78abf476ea Image Handling: Rename dedup'ed images on first pass, too. 2026-01-22 11:20:12 -06:00
Jim Miller
2b1f9446dd Bump Test Version 4.53.9 2026-01-20 10:09:19 -06:00
Jim Miller
9815736b4e Fix dedup_img_files - changes <img longdesc= to deduped URL. 2026-01-20 10:09:19 -06:00
Jim Miller
3f54cce9a1 Don't record longdesc on img fails. 2026-01-20 10:09:19 -06:00
Jim Miller
223138b8e5 Image Handling: Cache fails w/in download (but not between), keep full src URL with failedtodownload marker 2026-01-20 10:09:12 -06:00
Jim Miller
4aa47c8bab Bump Test Version 4.53.8 2026-01-15 18:06:47 -06:00
Jim Miller
a97a85f357 epub update: Read all images for oldimgs after reading chapters to keep longdesc=origurl 2026-01-15 18:03:54 -06:00
Jim Miller
ffc3696d84 Bump Test Version 4.53.7 2026-01-15 15:14:38 -06:00
Jim Miller
86c4e1974b Skip CSS url() handling on empty tags by content instead of stripHTML 2026-01-15 15:14:23 -06:00
Jim Miller
b6fd7c2ca4 Fix additional_images 2026-01-15 13:23:01 -06:00
Jim Miller
326300b40e Correct comment. 2026-01-15 13:22:40 -06:00
Jim Miller
282bafe514 Bump Test Version 4.53.6 2026-01-15 12:20:53 -06:00
Jim Miller
061a8feccf CSS url() processing only when include_images:true 2026-01-15 12:20:46 -06:00
Jim Miller
26c9b6d2ce Bump Test Version 4.53.5 2026-01-15 09:10:13 -06:00
Jim Miller
ed02d61953 epubutils: Load all images, not just referenced. uuid5 will still allow use. 2026-01-15 09:10:07 -06:00
Jim Miller
b58d54b8ea Bump Test Version 4.53.4 2026-01-14 16:53:53 -06:00
Jim Miller
1bc3ffc269 base_xenforo2forum_adapter: Add ytimg.com to default cover_exclusion_regexp 2026-01-14 16:53:46 -06:00
Jim Miller
cbd295f911 Bump Test Version 4.53.3 2026-01-14 13:55:33 -06:00
Jim Miller
35653f533f base_xenforo2forum_adapter: Add link_embedded_media option 2026-01-14 13:55:23 -06:00
Jim Miller
ea7afea8c2 Fix XF sites lists in configurable.py 2026-01-14 13:35:51 -06:00
Jim Miller
384a2fe8b7 CSS url() style attr--don't do when tag is empty. 2026-01-14 13:18:51 -06:00
Jim Miller
b278cac620 Bump Test Version 4.53.2 2026-01-13 16:45:35 -06:00
Jim Miller
e23de49fb5 uuid5 converts to bytes but gets unhappy about getting bytes to start on
Calibre?
2026-01-13 16:45:00 -06:00
Jim Miller
f64f041546 Adding CSS url() image inclusion, name all images by uuid5 2026-01-13 14:20:11 -06:00
Jim Miller
1d53c506c9 writer_epub: Pretty print epub meta files 2026-01-13 13:47:56 -06:00
Jim Miller
c8d6ce8004 Add webp as a known image type. 2026-01-13 13:43:57 -06:00
Jim Miller
3f08417c04 writer_epub: Don't dup image ids in content.opf on update with old cover. 2026-01-10 15:16:00 -06:00
Jim Miller
79ebf6a02b Bump Test Version 4.53.1 2026-01-08 10:04:59 -06:00
Jim Miller
41dfb8eab8 base_xenforo2forum_adapter: Fix include_nonauthor_poster: Had left testing conditional 2026-01-08 09:10:40 -06:00
Jim Miller
590b663170 Bump Release Version 4.53.0 2026-01-01 09:18:34 -06:00
Jim Miller
9bb408c8b3 Bump Test Version 4.52.9 2025-12-31 10:01:20 -06:00
Jim Miller
5d6a63a8ca Fix for rare 'false' as INI list corner case 2025-12-31 09:59:53 -06:00
Jim Miller
4078ccfdb1 Bump Test Version 4.52.8 2025-12-29 12:49:57 -06:00
Jim Miller
79c29121c3 writer_epub: Add <spine page-progression-direction=rtl> option as page_progression_direction_rtl 2025-12-29 12:49:40 -06:00
Jim Miller
dea48d9e07 adapter_storiesonlinenet: Improve inject_chapter_title for #1294 2025-12-29 12:25:27 -06:00
Jim Miller
c165196a35 base_xenforo2forum_adapter: Add include_nonauthor_poster option 2025-12-29 12:10:26 -06:00
Jim Miller
c385013db9 adapter_literotica: Remove unused chapter_categories_use_all option, fix other site options for better defaults.ini #1292 2025-12-29 10:48:36 -06:00
Jim Miller
8780aa3105 Bump Test Version 4.52.7 2025-12-26 11:53:04 -06:00
Jim Miller
12c7bfe29c adapter_literotica: Remove unused chapter_categories_use_all option, fix other site options for better defaults.ini #1292 2025-12-26 11:52:51 -06:00
Jim Miller
08d0b8a4e0 Changes for #1292 for normalizing different series URL forms. 2025-12-26 11:45:26 -06:00
Jim Miller
1d401f8dba Bump Test Version 4.52.6 2025-12-20 19:41:22 -06:00
Jim Miller
193bb3ed61 AO3: Site changed 'don't have permission' string 2025-12-20 19:40:54 -06:00
Jim Miller
63fd8cd660 Calc words_added even if not in logpage_entries. 2025-12-14 19:49:45 -06:00
Jim Miller
26a1152390 Bump Test Version 4.52.5 2025-12-11 11:20:26 -06:00
WWeapn
e0907147f7
adapter_literotica: Get series ID from data object (#1290) 2025-12-11 11:20:02 -06:00
Jim Miller
99bba3ff12 Bump Test Version 4.52.4 2025-12-10 09:57:34 -06:00
Jim Miller
3fdb6630fb Remove dup of remove_class_chapter from get_valid_set_options() 2025-12-10 09:57:28 -06:00
dbhmw
0d6b789c9f
adapter_literotica: Add chapter descriptions to summary (#1287) 2025-12-10 09:56:15 -06:00
Jim Miller
edaa03ef42 Bump Test Version 4.52.3 2025-12-07 11:06:43 -06:00
Jim Miller
4e17a10792 adapter_literotica: Don't require tags_from_chapters for old eroticatags collection. From #1280 2025-12-07 11:06:37 -06:00
Jim Miller
9fd48e0168 Bump Test Version 4.52.2 2025-12-04 14:04:35 -06:00
Jim Miller
818e990184 adapter_fictionlive: create self.chapter_id_to_api earlier for normalize_chapterurl 2025-12-04 14:04:24 -06:00
Jim Miller
9bb7b54023 Bump Test Version 4.52.1 2025-12-04 09:26:48 -06:00
Jim Miller
af6695e27f adapter_literotica: Fix for one-shot aver_rating #1285 2025-12-04 09:26:32 -06:00
Jim Miller
46293f2d02 Bump Release Version 4.52.0 2025-12-01 08:25:22 -06:00
Jim Miller
7f968ba102 Bump Test Version 4.51.7 2025-11-30 11:02:57 -06:00
Jim Miller
1e5cb9b184 Update translations. 2025-11-30 11:02:30 -06:00
Jim Miller
9627e6e62c Remove site: www.wuxiaworld.xyz - DN parked somewhere questionable for +2 years 2025-11-30 10:58:18 -06:00
Jim Miller
5e644098f9 Remove Site: sinful-dreams.com/whispered/muse - broken for 6+ years even though other two sites on same DN work 2025-11-30 10:37:42 -06:00
Jim Miller
fa3a56d096 adapter_fanfictionsfr: Site SSL requires www now 2025-11-30 10:33:40 -06:00
Jim Miller
ba18216ef8 Bump Test Version 4.51.6 2025-11-28 12:48:32 -06:00
Jim Miller
f207e31b3b Add standard metadata entry marked_new_chapters for epub updated '(new)' chapters count 2025-11-28 12:48:25 -06:00
Jim Miller
0e1ace18e4 Bump Test Version 4.51.5 2025-11-28 09:05:21 -06:00
Jim Miller
b17a632640 adapter_literotica: fix tags_from_chapters for #1283 2025-11-25 10:48:46 -06:00
Jim Miller
485d4631f9 adapter_literotica: Partial fix for #1283, chapters from JSON fetch 2025-11-24 13:20:38 -06:00
Jim Miller
30929bc38e Better handling for no chapters found (#1283) 2025-11-24 12:24:44 -06:00
Jim Miller
ae4311f4dd Bump Test Version 4.51.4 2025-11-19 09:56:07 -06:00
MacaroonRemarkable
3a3c35ea1f Made it possible to use human-readable URLs in addition to api urls for ignore_chapter_url_list 2025-11-19 09:54:57 -06:00
MacaroonRemarkable
19dd89fb4d Fixed missing setting in plugin defaults 2025-11-19 09:54:57 -06:00
MacaroonRemarkable
b247a7465b Added include_appendices config option for fiction.live 2025-11-19 09:54:57 -06:00
albyofdoom
d5c20db681 Implement Alternate Tagging and Date calculation for Literotica 2025-11-19 09:54:40 -06:00
MacaroonRemarkable
a599ff6ad2 Added missing line to plugin-defaults 2025-11-19 09:54:13 -06:00
MacaroonRemarkable
e21c6604a1 Update QQ reader_posts_per_page default 2025-11-19 09:54:13 -06:00
Jim Miller
273c1931f4 Bump Test Version 4.51.3 2025-11-13 08:27:08 -06:00
Jim Miller
fdf29eeade adapter_royalroadcom: New status Inactive 2025-11-13 08:26:54 -06:00
Jim Miller
06e55728d0 Bump Test Version 4.51.2 2025-11-11 20:09:20 -06:00
Jim Miller
0a3ab4bc9d Fix for add_chapter_numbers:toconly and unnew. Closes #1274 2025-11-11 20:08:57 -06:00
Jim Miller
a4a91b373f Bump Test Version 4.51.1 2025-11-10 08:50:28 -06:00
Jim Miller
a68e771026 Don't issue flaresolverr image warning unless include_images:true 2025-11-10 08:50:11 -06:00
Jim Miller
d7c79fcb3b Bump Release Version 4.51.0 2025-11-07 09:53:24 -06:00
Jim Miller
5cc05ed96d Update translations. 2025-11-07 09:33:20 -06:00
Jim Miller
e5b5768f11 Perf improvement for unnew 2025-11-04 12:20:39 -06:00
Jim Miller
6cf2519ef9 Bump Test Version 4.50.5 2025-11-02 20:09:20 -06:00
Jim Miller
f4f98e0877 Don't include default_cover_image with use_old_cover with a different name. 2025-11-02 20:08:16 -06:00
Jim Miller
bb8fb9efa5 writer_epub: More epub3 - prefix & prop cover-image 2025-11-02 18:38:29 -06:00
Jim Miller
be38778d72 Bump Test Version 4.50.4 2025-11-02 09:50:15 -06:00
Jim Miller
55d8efbdcd writer_epub: Only do svg check for epub3 2025-11-02 09:49:51 -06:00
Jim Miller
9df7822e32 Bump Test Version 4.50.3 2025-11-01 14:12:45 -05:00
Jim Miller
69e6a3d2cf writer_epub: Rearrange to detect and flag files containing svg tags for epub3. 2025-11-01 14:12:40 -05:00
Jim Miller
8ea03be5f3 epub3 - Flag the cover *page*--epub3 only flags cover *img* 2025-11-01 13:03:08 -05:00
Jim Miller
75a213beb9 Find and use epub3 cover on update--relies on Calibre's calibre:title-page property. 2025-11-01 12:48:03 -05:00
Jim Miller
ead830c60a adapter_storiesonlinenet: Set authorUrl to site homepage when (Hidden) author for #1272 2025-11-01 09:09:31 -05:00
Brian
20681315e7
Update adapter_storiesonlinenet.py
Removed extraneous parens on conditional 'if' statements
2025-10-31 22:50:56 -07:00
Brian
e2961eaadf
adapter_storiesonlinenet.py - tolerate contest stories
Contest stories have author="(Hidden)" which breaks the code to get story info from author's page.
Added checks for this and also checks to verify soup actually found results before trying to blindly use the results.
2025-10-31 15:01:45 -07:00
Jim Miller
7f0d7f70be Bump Test Version 4.50.2 2025-10-29 13:48:06 -05:00
dbhmw
c5264c2147 adapter_ficbooknet: Collect numWords 2025-10-29 13:47:46 -05:00
MacaroonRemarkable
ff402c16ca
Preserve original titles for Reader Post blocks from fiction.live (#1269)
* Preserve original titles for Reader Post blocks from fiction.live

* Update adapter_fictionlive.py

Changed for py2 backward compatibility

* Update adapter_fictionlive.py

Switched to concatenation rather than .format

* Update adapter_fictionlive.py

Missing space -_-
2025-10-29 13:47:26 -05:00
Jim Miller
4a9da1c02e Bump Test Version 4.50.1 2025-10-19 22:14:16 -05:00
Jim Miller
c14f1014b8 OTW/AO3: Don't apply series page handling to non-series pages 2025-10-19 22:14:08 -05:00
Jim Miller
74bc398994 Bump Release Version 4.50.0 2025-10-19 19:00:10 -05:00
Jim Miller
6e8e74fc55 Bump Test Version 4.49.6 2025-10-18 09:29:20 -05:00
Jim Miller
68ad4c87aa OTW: Fix for site change breaking logged in detection. Closes #1263 2025-10-18 09:29:14 -05:00
Jim Miller
fe82aed91d Bump Test Version 4.49.5 2025-10-12 09:26:37 -05:00
Jim Miller
7d14bf6e90 base_otw_adapter: Fix for markedforlater site change 2025-10-12 09:26:20 -05:00
Brian
39500a9386 Update adapter_storiesonlinenet.py
Add check for SOL accounts in renewal warning period to verbosely explain to users why their downloads don't work
2025-10-12 09:15:38 -05:00
dbhmw
d5f8891e4f adapter_literotica: Site change, regex outdated. 2025-10-12 09:08:12 -05:00
Jim Miller
edce6949ae Bump Test Version 4.49.4 2025-10-10 11:12:09 -05:00
Jim Miller
bec6fac2ea base_otw_adapter: Use download link for chapter->work conversion #1258 2025-10-10 11:11:58 -05:00
Jim Miller
a9bd19a079 Bump Test Version 4.49.3 2025-10-07 10:35:46 -05:00
Jim Miller
7135ba5892 OTW(AO3): Accept /chapter/999 URLs without /works/999 for #1258 2025-10-07 10:35:38 -05:00
Jim Miller
9ba4c100ca Bump Test Version 4.49.2 2025-10-02 13:38:44 -05:00
Jim Miller
fe565149ba Fix tuple vs grouping vs list, closes #1254 2025-10-02 13:38:26 -05:00
Jim Miller
624f60a5c1 Bump Test Version 4.49.1 2025-10-01 11:55:08 -05:00
Jim Miller
5c79ac0b5c New site: althistory.com (NOT alternatehistory.com) for #1252 2025-10-01 11:55:08 -05:00
Jim Miller
615711f904 Comment some debugs 2025-10-01 11:55:08 -05:00
kilandra
2f77bd9e97 Spiritfanfiction login, closes #1247
Add login functionality to Spiritfanfiction.com
2025-10-01 09:05:09 -05:00
Jim Miller
abdc881812 Bump Release Version 4.49.0 2025-10-01 08:50:15 -05:00
Jim Miller
1ba73bf316 Update translations. 2025-09-30 09:22:34 -05:00
Jim Miller
a359c6b326 adapter_storiesonlinenet: Change page not found error reporting 2025-09-23 10:04:29 -05:00
Jim Miller
ff64356e85 Bump Test Version 4.48.7 2025-09-11 09:09:46 -05:00
Jim Miller
0271b14f6c adapter_literotica: Yet another site change, addresses #1245 2025-09-11 09:09:28 -05:00
Jim Miller
bf845e200f Bump Test Version 4.48.6 2025-09-10 13:47:45 -05:00
Jim Miller
e94ff6e1e8 base_otw: Add collectionsUrl and collectionsHTML metadata--keep in order 2025-09-10 13:47:39 -05:00
Jim Miller
07313d2744 Bump Test Version 4.48.5 2025-09-10 13:40:29 -05:00
Jim Miller
bd2026df7e base_otw: Add collectionsUrl and collectionsHTML metadata 2025-09-10 13:40:23 -05:00
Jim Miller
0fa177ff79 Bump Test Version 4.48.4 2025-09-10 08:40:01 -05:00
Jim Miller
d84c72a215 adapter_literotica: Site change 2025-09-10 08:39:55 -05:00
Jim Miller
c319857da0 Bump Test Version 4.48.3 2025-09-08 21:41:18 -05:00
Jim Miller
df586e9bb7 browsercache_simple: Code for 0 length stream in cache file, only seen in Mac 2025-09-08 21:41:11 -05:00
Jim Miller
354a5708ce Bump Test Version 4.48.2 2025-08-27 11:13:15 -05:00
Jim Miller
096face5d2 Add continue_on_chapter_error_try_limit setting 2025-08-27 11:13:07 -05:00
Jim Miller
02e3bddd5c Bump Test Version 4.48.1 2025-08-22 11:19:06 -05:00
Jim Miller
9dadef1905 adapter_fireflyfansnet: Allow for missing authorId. 2025-08-22 11:19:01 -05:00
Jim Miller
2e8a899d8c Bump Release Version 4.48.0 2025-08-07 11:42:37 -05:00
Jim Miller
623915f623 Update translations. 2025-08-07 11:42:36 -05:00
Jim Miller
57865ca53d scribblehub: slow_down_sleep_time:5 per user recommendation 2025-08-07 11:32:03 -05:00
Jim Miller
e9c4b9ef30 Bump Test Version 4.47.4 2025-08-05 08:41:54 -05:00
Jim Miller
0ad088b663 adapter_ficwadcom: Fix for site change. 2025-08-05 08:41:48 -05:00
Jim Miller
e37a7f72be Tweak a few defaults.ini settings. 2025-08-05 08:41:27 -05:00
Jim Miller
9befe122dd Bump Test Version 4.47.3 2025-07-20 12:17:29 -05:00
Jim Miller
e6d6227ff1 Improve error reporting for open_pages_in_browser_tries_limit #1231 2025-07-20 12:17:24 -05:00
Jim Miller
d854a6efe7 Bump Test Version 4.47.2 2025-07-09 10:50:08 -05:00
Jim Miller
a97af94f8a OTW/AO3 - change to 'need to login' text, accept both old and new and another string. #1229 2025-07-09 10:49:45 -05:00
Jim Miller
e2ea97e99a Bump Test Version 4.47.1 2025-07-05 08:41:20 -05:00
Jim Miller
215f6dd8ff OTW/AO3 - change to 'need to login' text 2025-07-05 08:41:09 -05:00
Jim Miller
687aa9c3ba Bump Release Version 4.47.0 2025-07-03 08:21:33 -05:00
Jim Miller
523cf78640 Update strings for translation. 2025-07-03 08:19:59 -05:00
Jim Miller
90e50964b6 Bump Test Version 4.46.11 2025-06-25 08:42:33 -05:00
Jim Miller
a83823ea13 adapter_ashwindersycophanthexcom: http to https 2025-06-25 08:41:47 -05:00
Jim Miller
727aa6f1bc Bump Test Version 4.46.10 2025-06-22 20:15:59 -05:00
Jim Miller
072d929298 adapter_fimfictionnet: New img attr and class. #1226 2025-06-22 20:15:19 -05:00
Jim Miller
992c5a1378 Bump Test Version 4.46.9 2025-06-22 11:59:56 -05:00
Jim Miller
f8937c1af3 Report BG job failed entirely as individual books failed instead of just exception. For #1225 2025-06-22 10:45:05 -05:00
Jim Miller
af5c78e2e9 Remove some unused imports 2025-06-22 09:38:40 -05:00
Jim Miller
4a26dfdfff Plugin BG Jobs: Remove old multi-process code 2025-06-16 19:24:46 -05:00
Jim Miller
a82ef5dbae Bump Test Version 4.46.8 2025-06-16 19:16:18 -05:00
snoonan
6adc995fa5 Update defaults.ini per PR 2025-06-16 19:11:43 -05:00
snoonan
f534efd3df Support for logging into royal road to keep chapter progress (and count as page views) 2025-06-16 19:11:43 -05:00
Jim Miller
f41e64141a Add SB favicons to cover_exclusion_regexp. 2025-06-15 17:30:47 -05:00
Jim Miller
94036e3fbb Send refresh_screen=True when updating Reading Lists in case of series column updates. 2025-06-13 21:07:42 -05:00
Jim Miller
9142609c61 Bump Test Version 4.46.7 2025-06-12 22:05:11 -05:00
Jim Miller
f9d7b893ee Fix images from existing epub being discarded during update. 2025-06-12 22:02:35 -05:00
Jim Miller
4e2ae7441d Bump Test Version 4.46.6 2025-06-11 15:29:12 -05:00
Jim Miller
87dbef980f Mildly kludgey fix for status bar notifications. 2025-06-11 10:47:09 -05:00
Jim Miller
921f8c287b Shutdown IMAP connection when done with it. 2025-06-10 17:42:07 -05:00
Jim Miller
637c6e3cc3 Change default base_xenforoforum minimum_threadmarks:1. See #1218 2025-06-10 16:36:21 -05:00
Jim Miller
ba90ff9f3a Bump Test Version 4.46.5 2025-06-10 12:56:26 -05:00
Jim Miller
34e84b2942 PI BG Jobs: Fix split without reconsolidate. 2025-06-10 12:56:16 -05:00
Jim Miller
31eb7f421a Bump Test Version 4.46.4 2025-06-08 09:45:01 -05:00
Jim Miller
85d4656005 alternatehistory needs at least cloudscraper now, it seems. 2025-06-08 09:45:01 -05:00
Jim Miller
006b8873a5 Fix xenforo2 prefixtags, some still using tags in title 2025-06-08 09:44:48 -05:00
Jim Miller
3246036f88 Bump Test Version 4.46.3 2025-06-08 08:39:04 -05:00
Jim Miller
6d114532e2 Py2 fix for split BG jobs, closes #1214 2025-06-08 08:38:24 -05:00
Jim Miller
2edb1d58d5 Bump Test Version 4.46.2 2025-06-07 13:42:29 -05:00
Jim Miller
8dc3c5d3d8 Skip OTW(AO3) login when open_pages_in_browser AND use_browser_cache AND use_browser_cache_only 2025-06-07 13:22:30 -05:00
Jim Miller
2ec8c97e28 Bump Test Version 4.46.1 2025-06-07 12:51:24 -05:00
Rae Knowler
c51161c3d1 Include Accept:image/* header when requesting an image url 2025-06-07 12:50:12 -05:00
Jim Miller
bd645a97c7 Add use_flaresolverr_session and flaresolverr_session settings for #1211 2025-06-07 12:49:08 -05:00
Jim Miller
f7cbfa56bb Bump Release Version 4.46.0 2025-06-06 20:02:47 -05:00
Jim Miller
07fd16813f Bump Test Version 4.45.15 2025-06-05 16:56:16 -05:00
Jim Miller
2fe971c79f OTW(AO3): Don't attempt login with use_archive_transformativeworks_org or open_pages_in_browser #1210 2025-06-05 16:56:10 -05:00
Jim Miller
e4082c6235 Bump Test Version 4.45.14 2025-06-05 08:59:03 -05:00
Jim Miller
960d5ba11a Ignore use_browser_cache_only when URL scheme is file 2025-06-05 08:57:39 -05:00
Jim Miller
066539793d Update translations. 2025-06-04 22:14:33 -05:00
Jim Miller
5b312494fb Bump Test Version 4.45.13 2025-05-27 19:16:33 -05:00
Jim Miller
e628b10247 adapter_literotica: Fix date parsing. See #1208 2025-05-27 19:16:23 -05:00
dbhmw
61c063ed72 adapter_ficbooknet: Site changes 2025-05-27 19:11:54 -05:00
Jim Miller
11d3f601c9 Add Ctrl-Enter to AddDialog, consolidating code with INIEdit 2025-05-24 13:05:05 -05:00
Jim Miller
3b8d0f63d4 Bump Test Version 4.45.12 2025-05-23 11:46:28 -05:00
Jim Miller
b8b30c6a78 adapter_literotica: Update for site change #1208 2025-05-23 11:46:17 -05:00
Jim Miller
b007f68a88 Bump Test Version 4.45.11 2025-05-23 10:19:17 -05:00
Jim Miller
6d8a67ef2e adapter_literotica: Update for site change #1208 2025-05-23 10:19:05 -05:00
Jim Miller
ab66e9e285 Bump Test Version 4.45.10 2025-05-23 10:02:15 -05:00
Jim Miller
b3f7add5a1 Split BG: Fixes for error column & showing meta collection errors 2025-05-23 10:02:09 -05:00
Jim Miller
800be43d24 Bump Test Version 4.45.9 2025-05-22 12:31:02 -05:00
Jim Miller
70f77e17e2 adapter_literotica: Update for site change 2025-05-22 12:07:16 -05:00
Jim Miller
caf46ba421 Bump Test Version 4.45.8 2025-05-19 15:38:40 -05:00
Jim Miller
686ed80230 Update BG Job changes settings verbiage and defaults 2025-05-19 15:38:27 -05:00
Jim Miller
56689a10c4 Bump Test Version 4.45.7 2025-05-18 10:13:45 -05:00
Jim Miller
065d077752 Improve job 'reconsolidate' for failed jobs and setting changing. 2025-05-18 10:10:02 -05:00
Jim Miller
c8f817e830 Bump Test Version 4.45.6 2025-05-17 13:53:49 -05:00
Jim Miller
1432241319 Single proc bg processing, optionally split by site & accumulate results -- experimental 2025-05-17 13:53:27 -05:00
Jim Miller
0e9f60f8a6 Bump Test Version 4.45.5 2025-05-12 17:02:59 -05:00
Jim Miller
74de62385f Fix remove_empty_p regexp to work with nested <br> tags and whitespace. 2025-05-12 17:02:51 -05:00
Jim Miller
d2f69eb5d5 Bump Test Version 4.45.4 2025-05-10 09:29:20 -05:00
Jim Miller
c3655d59ca AO3 make use_(domain) options not replace media.archiveofourown.org 2025-05-10 09:29:14 -05:00
Emmanuel Ferdman
aca07bbf59 Migrate to new bs4 API
Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>
2025-05-06 17:38:14 -05:00
Jim Miller
3edd3c3e7b Bump Test Version 4.45.3 2025-05-06 16:17:58 -05:00
Jim Miller
61ba096c6e Fix 'Add New Book' dialog when multiple existing found on update. 2025-05-06 16:17:51 -05:00
Jim Miller
47fd71c4b9 XF2: Allow extra / before threads in story URL. 2025-05-05 12:59:38 -05:00
Jim Miller
e1d0bed52d Bump Test Version 4.45.2 2025-05-05 09:46:15 -05:00
Jim Miller
acb88cbefc Include 'Add New Book' dialog when multiple existing found on update. 2025-05-05 09:45:33 -05:00
Jim Miller
f1e7cabf6a Bump Test Version 4.45.1 2025-05-04 10:15:28 -05:00
kilandra
21ec27ffd4 Fix for adapter_spiritfanfictioncom.py
Commenters are being identified as authors since webpage change.
2025-05-04 10:14:27 -05:00
Matěj Cepl
5567e6417d fix(pyproject): replace license by file to using SPDX keyword
As per https://packaging.python.org/en/latest/guides/writing-pyproject-toml/#license
2025-05-02 16:17:35 -05:00
Jim Miller
af352a480c Bump Release Version 4.45.0 2025-05-01 10:49:01 -05:00
Jim Miller
92069dc638 Add comment 2025-05-01 10:48:49 -05:00
Jim Miller
76e9421858 Bump Test Version 4.44.11 2025-04-30 09:17:23 -05:00
Jim Miller
70558bf444 Transition finestories.com to storyroom.com 2025-04-30 09:16:07 -05:00
Brian
b60dfdcc28
Update configurable.py
Add support for WPC/WLPC/SOL sub-site "storyroom.com" to replace "finestories.com"
2025-04-29 19:51:02 -07:00
Brian
b976439669
Create adapter_storyroomcom.py
Add support for WPC/WLPC/SOL sub-site "soryroom.com" which replaces "finestories.com"
2025-04-29 19:49:15 -07:00
Brian
6de50509ed
Update __init__.py
Add support for new WPC/WLPC/SOL subsite "storyroom.com" to duplicate/replace "finestories.com"
2025-04-29 19:38:48 -07:00
Jim Miller
4d9c38d3c2 Bump Test Version 4.44.10 2025-04-28 20:27:42 -05:00
Jim Miller
90ecb63be4 Fix for alternatehistory.com changing threadmark date attr. 2025-04-28 20:26:07 -05:00
Jim Miller
bd49f8e8fa XF2: Add threadmarks_per_page setting 2025-04-28 19:40:53 -05:00
Jim Miller
21c0315e60 Bump Test Version 4.44.9 2025-04-28 09:22:39 -05:00
dbhmw
fc97fa6d5c adapter_literotica: get_urls_from_page - series have urls 2025-04-28 09:22:22 -05:00
Jim Miller
2c3bf3c642 Update translations. 2025-04-28 09:22:01 -05:00
Jim Miller
a9c725d32a Bump Test Version 4.44.8 2025-04-26 22:14:30 -05:00
Jim Miller
f936c5b0fb Remove base_xenforoforum_adapter, consolidate into base_xenforo2forum_adapter 2025-04-26 22:14:24 -05:00
Jim Miller
53344afa49 Merge branch 'main' of https://github.com/JimmXinu/FanFicFare 2025-04-25 12:25:18 -05:00
Jim Miller
d5addfa2fd Complete impl of use_archiveofourown_gay 2025-04-25 12:25:01 -05:00
Jim Miller
6d8375a9f3 Bump Test Version 4.44.7 2025-04-23 09:47:10 -05:00
Jim Miller
7bc03ac798 adapter_archiveofourownorg: Add use_archiveofourown_gay, allow archiveofourown.gay input for story URLs. 2025-04-23 09:44:18 -05:00
Jim Miller
05d62a5343
Update CLI test version link 2025-04-21 12:06:49 -05:00
Jim Miller
31115f9245 Bump Test Version 4.44.6 2025-04-19 16:40:51 -05:00
dbhmw
26ee692208 adapter_fanfictionnet: Make get_urls_from_page work. 2025-04-19 16:35:01 -05:00
Jim Miller
dd43d25f76 Bump Test Version 4.44.5 2025-04-12 12:32:54 -05:00
dbhmw
fffd15d7ea adapter_ficbooknet: Add series collection & fix downloads 2025-04-12 12:32:30 -05:00
Jim Miller
7c2700c8ea Bump Test Version 4.44.4 2025-04-11 16:00:13 -05:00
Jim Miller
94518c4f25 adapter_fictionmaniatv: Update for ancient stories 2025-04-11 16:00:07 -05:00
Jim Miller
531b965b22 Bump Test Version 4.44.3 2025-04-09 09:46:28 -05:00
Jim Miller
658b637716 adapter_fictionmaniatv: Updates for site change 2025-04-09 09:46:22 -05:00
Jim Miller
44f5feacfb Remove some debugs. 2025-04-09 09:45:55 -05:00
Jim Miller
52451a3eba Bump Test Version 4.44.2 2025-04-03 10:00:46 -05:00
dbhmw
7123f7dd6f Reject HTML sites in no_convert_image 2025-04-03 10:00:32 -05:00
Jim Miller
08a0f9b5fc Bump Test Version 4.44.1 2025-04-01 22:32:36 -05:00
Jim Miller
74ac96a67e base_xenforoforum: Add timeperiodtags and better handle unexpected typed tags 2025-04-01 22:32:26 -05:00
Jim Miller
9eed0340e9 Bump Release Version 4.44.0 2025-04-01 09:54:58 -05:00
Jim Miller
73b90c0291 Additional translation strings 2025-04-01 09:52:17 -05:00
Jim Miller
c33a6e6b05 Bump Test Version 4.43.12 2025-03-29 17:17:51 -05:00
Jim Miller
d77cc15586 adapter_storiesonlinenet(et al): Add always_login option. Closes #1185 2025-03-29 17:17:44 -05:00
Jim Miller
21483f7227 Bump Test Version 4.43.11 2025-03-24 13:26:48 -05:00
Jim Miller
6c0df42fe7 Implementing Timed One Time Password(TOTP) 2FA Exception and collection 2025-03-24 13:22:26 -05:00
Jim Miller
c3a90a8914 Improve logpage updating 2025-03-24 10:47:05 -05:00
Jim Miller
e7f66d293a Bump Test Version 4.43.10 2025-03-21 12:24:46 -05:00
Jim Miller
e49b3a6be0 adapter_asianfanficscom: Add inject_chapter_image option. Closes #1143 2025-03-21 12:20:42 -05:00
Jim Miller
ae72efdc00 Note on open_pages_in_browser for MacOS users linking to #1142 2025-03-21 11:10:44 -05:00
Jim Miller
bc935e213a Bump Test Version 4.43.9 2025-03-21 10:51:29 -05:00
dbhmw
a8e0eabbd8 adapter_literotica: Fixed incorrect parsing for get url from webpage option. 2025-03-21 10:51:06 -05:00
Jim Miller
81b84a8133 Bump Test Version 4.43.8 2025-03-18 20:51:16 -05:00
Jim Miller
a973b8c926 ffnet only: try_shortened_title_urls option #1166 2025-03-18 20:50:12 -05:00
Jim Miller
08ccc659ca open_pages_in_browser_tries_limit is an int 2025-03-17 13:03:04 -05:00
Jim Miller
fb610de27a Revert "adapter_fanfictionnet: Attempt chapter from m. (vs www) when chapter not found"
This reverts commit 370be379f0.
2025-03-17 12:17:56 -05:00
Jim Miller
29d2e3734b base_otw sites list for ini settings. 2025-03-16 21:37:31 -05:00
Jim Miller
48cf17c7b7 Bump Test Version 4.43.7 2025-03-16 13:52:52 -05:00
Jim Miller
ac61c2bb68 AO3 use_archive_transformativeworks_org option 2025-03-16 13:52:52 -05:00
Nicolas SAPA
a12d2a688b Document the new 'directimages' for BrowserCache feature
Explain that this feature is useful for images delivered by WebSite with
a no-cache attribute when `use_browser_cache_only` is true (currently AO3).

Signed-off-by: Nicolas SAPA <nico@ByMe.at>
2025-03-16 13:52:19 -05:00
Nicolas SAPA
52027eac46 Silence a spammy debug
Silence a debug in addImgUrl that was spammy.

Signed-off-by: Nicolas SAPA <nico@ByMe.at>
2025-03-16 13:52:19 -05:00
Nicolas SAPA
a1d4fba728 Add support for 'directimages' with use_browser_cache
Hook the configurable into the direct_fetcher logic already existing for flaresolverr

Signed-off-by: Nicolas SAPA <nico@ByMe.at>
2025-03-16 13:52:19 -05:00
Nicolas SAPA
69872b922c Convert 'use_browser_cache' to bool+
Permit the 'use_browser_cache' configurable to take 'directimage'
so we can later use the default fetcher for image (only).

Signed-off-by: Nicolas SAPA <nico@ByMe.at>
2025-03-16 13:52:19 -05:00
dbhmw
7bd1a1acfc adapter_ficbooknet: Fix additional metadata collection 2025-03-14 18:58:36 -05:00
Jim Miller
80e5a22f0d Bump Test Version 4.43.5 2025-03-10 20:17:19 -05:00
Jim Miller
3cd4188bd8 Add remove_empty_p option, usually for AO3/OTW. #1177 2025-03-10 20:17:04 -05:00
Jim Miller
21d16dbe90 Bump Test Version 4.43.4 2025-03-09 09:56:47 -05:00
Brian
5ce7875851
Update adapter_storiesonlinenet.py
Moved soup.find for article below chapter search code, as it breaks when the description/details contains extraneous /div tag.
2025-03-08 15:20:01 -08:00
Jim Miller
35be14a168 Bump Test Version 4.43.3 2025-03-06 16:04:46 -06:00
dbhmw
930940c7fd adapter_fimfictionnet: Correct the config 2025-03-06 15:57:04 -06:00
dbhmw
f001f19a47 adapter_fimfictionnet: Fetch only the stories in the bookshelf. 2025-03-06 15:57:04 -06:00
Jim Miller
fd7382fb56 Bump Test Version 4.43.2 2025-03-06 13:08:46 -06:00
praschke
c69e940d2a
adapter_syosetucom: remove warningtags from ini 2025-03-06 18:58:39 +00:00
praschke
31dcd8e6ff
adapter_syosetucom: site update 2025-03-06 18:58:26 +00:00
Jim Miller
0bd85c10a8 Bump Test Version 4.43.1 2025-03-05 11:09:54 -06:00
Jim Miller
b075c22261 BrowserCache Chrome Block: Treat entry missing headers same as not found. #1167 #1169 2025-03-05 11:03:32 -06:00
Jim Miller
87b3e04fa1 Bump Release Version 4.43.0 2025-03-01 15:27:40 -06:00
Jim Miller
630f09e644 Bump Test Version 4.42.14 2025-02-28 20:03:09 -06:00
Jim Miller
a0463fc85b base_xenforoforum: Add details_spoiler option for #1165 2025-02-28 20:00:54 -06:00
Jim Miller
de7d8079d9 Add [base_otw] with use_basic_cache:true to defaults.ini 2025-02-26 13:42:04 -06:00
Jim Miller
4aad0ec913 Bump Test Version 4.42.13 2025-02-24 21:24:55 -06:00
Jim Miller
c379b45cb9 BrowserCache: Better handle cache file changing/failing while reading. 2025-02-24 21:24:43 -06:00
Jim Miller
82825d1b16 Bump Test Version 4.42.12 2025-02-24 20:26:13 -06:00
Jim Miller
11b2d5643e Fix BrowserCache for image--cache partitioned by parent(story) page. 2025-02-24 20:26:05 -06:00
Jim Miller
06dc2add8f Bump Test Version 4.42.11 2025-02-24 11:46:50 -06:00
Jim Miller
ab7198bb8f base_otw_adapter: Detect & report 'This site is in beta' page 2025-02-24 11:05:38 -06:00
Jim Miller
d854733ffa AO3: Double default slow_down_sleep_time 2025-02-24 11:05:07 -06:00
Jim Miller
a2cc6bcdd3 Bump Test Version 4.42.10 2025-02-23 20:46:13 -06:00
Jim Miller
c9accda3f8 adapter_mcstoriescom: Suppress site URLs that look like stories but aren't. #1160 2025-02-23 20:46:03 -06:00
Jim Miller
8e55d1e6f4 More direct way for /../ in Get Story URLs from web page, previous broke other sites. #1160 2025-02-23 20:45:47 -06:00
Jim Miller
9b8eb547fc Use urljoin() to remove /../ and /./ from Get Story URLs from web page 2025-02-23 15:22:27 -06:00
Jim Miller
62b3c9264e Bump Test Version 4.42.9 2025-02-22 10:00:41 -06:00
Jim Miller
370be379f0 adapter_fanfictionnet: Attempt chapter from m. (vs www) when chapter not found 2025-02-22 10:00:22 -06:00
Jim Miller
1addfe14fc Strip leading m./www. from domain for brower cache partition. 2025-02-22 10:00:18 -06:00
Jim Miller
e510fb027e Bump Test Version 4.42.8 2025-02-20 19:27:48 -06:00
dbhmw
86b807805f adapter_literotica: Implements get_urls_from_page 2025-02-20 19:27:25 -06:00
Jim Miller
0ace02ee75 six fix for py2/Calibre2 2025-02-19 20:28:40 -06:00
Jim Miller
38ad74af68 Bump Test Version 4.42.7 2025-02-19 10:10:49 -06:00
Jim Miller
6c70a60cdb Add include_tocpage:always option. 2025-02-19 10:10:42 -06:00
Jim Miller
80ee0ca9b9 adapter_fimfictionnet: Further cover fix 2025-02-19 09:58:28 -06:00
Jim Miller
8b143a0c1b Bump Test Version 4.42.6 2025-02-17 20:23:03 -06:00
Jim Miller
9fb86da341 adapter_fimfictionnet: Fix cover images and use data-source attr for img src. 2025-02-17 20:22:55 -06:00
Jim Miller
5c703122ec Bump Test Version 4.42.5 2025-02-14 20:43:59 -06:00
Jim Miller
75f89beab1 adapter_storiesonlinenet: Remove some code that broke parsing when 'author' was in the title. 2025-02-14 20:43:51 -06:00
Jim Miller
fc9d184f20 Bump Test Version 4.42.4 2025-02-13 12:46:19 -06:00
Jim Miller
6c411e054a adapter_literotica: Site changes for non-www domains. 2025-02-13 12:46:12 -06:00
Jim Miller
dbef4719d9 adapter_literotica: http->https 2025-02-13 10:22:06 -06:00
Jim Miller
da6b4c25f2 Bump Test Version 4.42.3 2025-02-12 09:00:13 -06:00
Jim Miller
23004e3953 Make plugin use own copy of six only--including in Smarten Punc 2025-02-12 09:00:06 -06:00
Jim Miller
4a15c2a7d5 Bump Test Version 4.42.2 2025-02-09 08:39:01 -06:00
Hazel Shanks
84dad2ec43 fix bounds check in vote accumulaton. resolves JimmXinu#1154 2025-02-09 08:37:52 -06:00
Jim Miller
5ac38fc327 Bump Test Version 4.42.1 2025-02-05 17:15:32 -06:00
Jim Miller
35e0ada643 Make plugin use own copy of six only. 2025-02-05 17:15:08 -06:00
Alexandre Detiste
a9533364ec make plugin work without system "six" 2025-02-05 21:48:17 +01:00
135 changed files with 20169 additions and 14563 deletions

View file

@ -53,7 +53,7 @@ Test versions are available at:
- The [test plugin] is posted at MobileRead.
- The test version of CLI for pip install is uploaded to the testpypi repository and can be installed with:
```
pip install --extra-index-url https://testpypi.python.org/pypi --upgrade FanFicFare
pip install --extra-index-url https://test.pypi.org/simple/ --upgrade FanFicFare
```
### Other Releases

View file

@ -33,7 +33,7 @@ except NameError:
from calibre.customize import InterfaceActionBase
# pulled out from FanFicFareBase for saving in prefs.py
__version__ = (4, 42, 0)
__version__ = (4, 57, 7)
## Apparently the name for this class doesn't matter--it was still
## 'demo' for the first few versions.

View file

@ -2,7 +2,6 @@
from __future__ import (unicode_literals, division, absolute_import,
print_function)
import six
__license__ = 'GPL v3'
__copyright__ = '2011, Grant Drake <grant.drake@gmail.com>, 2018, Jim Miller'
@ -22,7 +21,9 @@ from calibre.gui2.actions import menu_action_unique_name
from calibre.gui2.keyboard import ShortcutConfig
from calibre.utils.config import config_dir
from calibre.utils.date import now, format_date, qt_to_dt, UNDEFINED_DATE
from fanficfare.six import text_type as unicode
import fanficfare.six as six
from six import text_type as unicode
# Global definition of our plugin name. Used for common functions that require this.
plugin_name = None

View file

@ -2,7 +2,6 @@
from __future__ import (unicode_literals, division, absolute_import,
print_function)
import six
__license__ = 'GPL v3'
__copyright__ = '2021, Jim Miller'
@ -24,7 +23,8 @@ from PyQt5.Qt import (QWidget, QVBoxLayout, QHBoxLayout, QGridLayout, QLabel,
from calibre.gui2 import dynamic, info_dialog
from calibre.gui2.complete2 import EditWithComplete
from calibre.gui2.dialogs.confirm_delete import confirm
from fanficfare.six import text_type as unicode
import fanficfare.six as six
from six import text_type as unicode
try:
from calibre.ebooks.covers import generate_cover as cal_generate_cover
@ -371,6 +371,7 @@ class ConfigWidget(QWidget):
prefs['suppresstitlesort'] = self.std_columns_tab.suppresstitlesort.isChecked()
prefs['authorcase'] = self.std_columns_tab.authorcase.isChecked()
prefs['titlecase'] = self.std_columns_tab.titlecase.isChecked()
prefs['seriescase'] = self.std_columns_tab.seriescase.isChecked()
prefs['setanthologyseries'] = self.std_columns_tab.setanthologyseries.isChecked()
prefs['set_author_url'] =self.std_columns_tab.set_author_url.isChecked()
@ -416,6 +417,10 @@ class ConfigWidget(QWidget):
prefs['auto_reject_from_email'] = self.imap_tab.auto_reject_from_email.isChecked()
prefs['update_existing_only_from_email'] = self.imap_tab.update_existing_only_from_email.isChecked()
prefs['download_from_email_immediately'] = self.imap_tab.download_from_email_immediately.isChecked()
prefs['site_split_jobs'] = self.other_tab.site_split_jobs.isChecked()
prefs['reconsolidate_jobs'] = self.other_tab.reconsolidate_jobs.isChecked()
prefs.save_to_db()
self.plugin_action.set_popup_mode()
@ -756,6 +761,7 @@ class BasicTab(QWidget):
tooltip=_("One URL per line:\n<b>http://...,note</b>\n<b>http://...,title by author - note</b>"),
rejectreasons=rejecturllist.get_reject_reasons(),
reasonslabel=_('Add this reason to all URLs added:'),
accept_storyurls=True,
save_size_name='fff:Add Reject List')
d.exec_()
if d.result() == d.Accepted:
@ -1274,6 +1280,31 @@ class OtherTab(QWidget):
self.l = QVBoxLayout()
self.setLayout(self.l)
groupbox = QGroupBox()
self.l.addWidget(groupbox)
groupl = QVBoxLayout()
groupbox.setLayout(groupl)
label = QLabel("<h3>"+
_("Background Job Settings")+
"</h3>"
)
label.setWordWrap(True)
groupl.addWidget(label)
self.site_split_jobs = QCheckBox(_('Split downloads into separate background jobs by site'),self)
self.site_split_jobs.setToolTip(_("Launches a separate background Job for each site in the list of stories to download/update. Otherwise, there will be only one background job."))
self.site_split_jobs.setChecked(prefs['site_split_jobs'])
groupl.addWidget(self.site_split_jobs)
self.reconsolidate_jobs = QCheckBox(_('Reconsolidate split downloads before updating library'),self)
self.reconsolidate_jobs.setToolTip(_("Hold all downloads/updates launched together until they all finish. Otherwise, there will be a 'Proceed to update' dialog for each site."))
self.reconsolidate_jobs.setChecked(prefs['reconsolidate_jobs'])
groupl.addWidget(self.reconsolidate_jobs)
self.l.addSpacing(5)
label = QLabel(_("These controls aren't plugin settings as such, but convenience buttons for setting Keyboard shortcuts and getting all the FanFicFare confirmation dialogs back again."))
label.setWordWrap(True)
self.l.addWidget(label)
@ -1607,6 +1638,11 @@ class StandardColumnsTab(QWidget):
self.setanthologyseries.setChecked(prefs['setanthologyseries'])
row.append(self.setanthologyseries)
self.seriescase = QCheckBox(_('Fix Series Case?'),self)
self.seriescase.setToolTip(_("If checked, Calibre's routine for correcting the capitalization of title will be applied.")
+"\n"+_("This effects Calibre metadata only, not FanFicFare metadata in title page."))
self.seriescase.setChecked(prefs['seriescase'])
row.append(self.seriescase)
grid = QGridLayout()
for rownum, row in enumerate(rows):
for colnum, col in enumerate(row):

View file

@ -38,6 +38,7 @@ from calibre.gui2 import gprefs
show_download_options = 'fff:add new/update dialogs:show_download_options'
from calibre.gui2.dialogs.confirm_delete import confirm
from calibre.gui2.complete2 import EditWithComplete
from fanficfare.exceptions import NotGoingToDownload
from fanficfare.six import text_type as unicode, ensure_text
# pulls in translation files for _() strings
@ -155,15 +156,6 @@ class RejectUrlEntry:
return retval
class NotGoingToDownload(Exception):
def __init__(self,error,icon='dialog_error.png',showerror=True):
self.error=error
self.icon=icon
self.showerror=showerror
def __str__(self):
return self.error
class DroppableQTextEdit(QTextEdit):
def __init__(self,parent):
QTextEdit.__init__(self,parent)
@ -189,12 +181,32 @@ class DroppableQTextEdit(QTextEdit):
else:
return QTextEdit.insertFromMimeData(self, mime_data)
class AddNewDialog(SizePersistedDialog):
class HotKeyedSizePersistedDialog(SizePersistedDialog):
def __init__(self, gui, save_size_name):
super(HotKeyedSizePersistedDialog,self).__init__(gui, save_size_name)
self.keys=dict()
def addCtrlKeyPress(self,key,func):
# print("addKeyPress: key(0x%x)"%key)
# print("control: 0x%x"%QtCore.Qt.ControlModifier)
self.keys[key]=func
def keyPressEvent(self, event):
# print("event: key(0x%x) modifiers(0x%x)"%(event.key(),event.modifiers()))
if (event.modifiers() & QtCore.Qt.ControlModifier) and event.key() in self.keys:
func = self.keys[event.key()]
return func()
else:
return super(HotKeyedSizePersistedDialog,self).keyPressEvent(event)
class AddNewDialog(HotKeyedSizePersistedDialog):
go_signal = pyqtSignal(object, object, object, object)
def __init__(self, gui, prefs, icon):
SizePersistedDialog.__init__(self, gui, 'fff:add new dialog')
super(AddNewDialog,self).__init__(gui, 'fff:add new dialog')
self.prefs = prefs
self.setMinimumWidth(300)
@ -333,6 +345,9 @@ class AddNewDialog(SizePersistedDialog):
self.button_box.rejected.connect(self.reject)
self.l.addWidget(self.button_box)
self.addCtrlKeyPress(QtCore.Qt.Key_Return,self.ok_clicked)
self.addCtrlKeyPress(QtCore.Qt.Key_Enter,self.ok_clicked) # num pad
def click_show_download_options(self,x):
self.gbf.setVisible(x)
gprefs[show_download_options] = x
@ -498,7 +513,6 @@ class AddNewDialog(SizePersistedDialog):
def get_urlstext(self):
return unicode(self.url.toPlainText())
class FakeLineEdit():
def __init__(self):
pass
@ -620,6 +634,48 @@ class UserPassDialog(QDialog):
self.status=False
self.hide()
class TOTPDialog(QDialog):
'''
Need to collect Timebased One Time Password(TOTP) for some sites.
'''
def __init__(self, gui, site, exception=None):
QDialog.__init__(self, gui)
self.status=False
self.l = QVBoxLayout()
self.setLayout(self.l)
grid = QGridLayout()
self.l.addLayout(grid)
self.setWindowTitle(_('Time-based One Time Password(TOTP)'))
grid.addWidget(QLabel(_("Site requires a Time-based One Time Password(TOTP) for this url:\n%s")%exception.url),0,0,1,2)
grid.addWidget(QLabel(_("TOTP:")),2,0)
self.totp = QLineEdit(self)
grid.addWidget(self.totp,2,1)
horz = QHBoxLayout()
self.l.addLayout(horz)
self.ok_button = QPushButton(_('OK'), self)
self.ok_button.clicked.connect(self.ok)
horz.addWidget(self.ok_button)
self.cancel_button = QPushButton(_('Cancel'), self)
self.cancel_button.clicked.connect(self.cancel)
horz.addWidget(self.cancel_button)
self.resize(self.sizeHint())
def ok(self):
self.status=True
self.hide()
def cancel(self):
self.status=False
self.hide()
def LoopProgressDialog(gui,
book_list,
foreach_function,
@ -1264,6 +1320,7 @@ class EditTextDialog(SizePersistedDialog):
icon=None, title=None, label=None, tooltip=None,
read_only=False,
rejectreasons=[],reasonslabel=None,
accept_storyurls=False,
save_size_name='fff:edit text dialog',
):
SizePersistedDialog.__init__(self, parent, save_size_name)
@ -1277,7 +1334,10 @@ class EditTextDialog(SizePersistedDialog):
self.setWindowIcon(icon)
self.l.addWidget(self.label)
self.textedit = QTextEdit(self)
if accept_storyurls:
self.textedit = DroppableQTextEdit(self)
else:
self.textedit = QTextEdit(self)
self.textedit.setLineWrapMode(QTextEditNoWrap)
self.textedit.setReadOnly(read_only)
self.textedit.setText(text)
@ -1332,7 +1392,7 @@ class QTextEditPlainPaste(QTextEdit):
else:
QTextEdit.insertFromMimeData(self, mimeData)
class IniTextDialog(SizePersistedDialog):
class IniTextDialog(HotKeyedSizePersistedDialog):
def __init__(self, parent, text,
icon=None, title=None, label=None,
@ -1340,9 +1400,7 @@ class IniTextDialog(SizePersistedDialog):
read_only=False,
save_size_name='fff:ini text dialog',
):
SizePersistedDialog.__init__(self, parent, save_size_name)
self.keys=dict()
super(IniTextDialog,self).__init__(parent, save_size_name)
self.l = QVBoxLayout()
self.setLayout(self.l)
@ -1443,19 +1501,6 @@ class IniTextDialog(SizePersistedDialog):
# print("call parent accept")
return SizePersistedDialog.accept(self)
def addCtrlKeyPress(self,key,func):
# print("addKeyPress: key(0x%x)"%key)
# print("control: 0x%x"%QtCore.Qt.ControlModifier)
self.keys[key]=func
def keyPressEvent(self, event):
# print("event: key(0x%x) modifiers(0x%x)"%(event.key(),event.modifiers()))
if (event.modifiers() & QtCore.Qt.ControlModifier) and event.key() in self.keys:
func = self.keys[event.key()]
return func()
else:
return SizePersistedDialog.keyPressEvent(self, event)
def get_plain_text(self):
return unicode(self.textedit.toPlainText())

View file

@ -2,29 +2,14 @@
from __future__ import (unicode_literals, division, absolute_import,
print_function)
import six
from six.moves import range
__license__ = 'GPL v3'
__copyright__ = '2021, Jim Miller'
__docformat__ = 'restructuredtext en'
import fanficfare.six as six
from fanficfare.six import ensure_text, string_types, text_type as unicode
# import cProfile
# def do_cprofile(func):
# def profiled_func(*args, **kwargs):
# profile = cProfile.Profile()
# try:
# profile.enable()
# result = func(*args, **kwargs)
# profile.disable()
# return result
# finally:
# profile.print_stats()
# return profiled_func
import logging
logger = logging.getLogger(__name__)
@ -32,6 +17,7 @@ import os
import re
import sys
import threading
import copy
from io import BytesIO
from functools import partial
from datetime import datetime, time
@ -39,7 +25,7 @@ from string import Template
import traceback
from collections import defaultdict
from PyQt5.Qt import (QApplication, QMenu, QTimer, QToolButton, pyqtSignal)
from PyQt5.Qt import (QApplication, QMenu, QTimer, QToolButton, pyqtSignal, QEventLoop)
from calibre.ptempfile import PersistentTemporaryFile, PersistentTemporaryDirectory, remove_dir
from calibre.ebooks.metadata import MetaInformation
@ -78,12 +64,14 @@ from fanficfare import adapters, exceptions
from fanficfare.epubutils import (
get_dcsource, get_dcsource_chaptercount, get_story_url_from_epub_html,
get_story_url_from_zip_html, reset_orig_chapters_epub, get_cover_data)
get_story_url_from_zip_html, reset_orig_chapters_epub, get_cover_img)
from fanficfare.geturls import (
get_urls_from_page, get_urls_from_text,get_urls_from_imap,
get_urls_from_mime)
from fanficfare.fff_profile import do_cprofile
from calibre_plugins.fanficfare_plugin.fff_util import (
get_fff_adapter, get_fff_config, get_fff_personalini,
get_common_elements)
@ -109,9 +97,10 @@ from calibre_plugins.fanficfare_plugin.prefs import (
from calibre_plugins.fanficfare_plugin.dialogs import (
AddNewDialog, UpdateExistingDialog,
LoopProgressDialog, UserPassDialog, AboutDialog, CollectURLDialog,
RejectListDialog, EmailPassDialog,
RejectListDialog, EmailPassDialog, TOTPDialog,
save_collisions, question_dialog_all,
NotGoingToDownload, RejectUrlEntry, IniTextDialog)
RejectUrlEntry, IniTextDialog,
EditTextDialog)
# because calibre immediately transforms html into zip and don't want
# to have an 'if html'. db.has_format is cool with the case mismatch,
@ -194,6 +183,7 @@ class FanFicFarePlugin(InterfaceAction):
self.menu.aboutToShow.connect(self.about_to_show_menu)
self.imap_pass = None
self.download_job_manager = DownloadJobManager()
def initialization_complete(self):
# otherwise configured hot keys won't work until the menu's
@ -204,20 +194,6 @@ class FanFicFarePlugin(InterfaceAction):
prefs,
self.qaction.icon())
## Kludgey, yes, but with the real configuration inside the
## library now, how else would a user be able to change this
## setting if it's crashing calibre?
def check_macmenuhack(self):
try:
return self.macmenuhack
except:
file_path = os.path.join(calibre_config_dir,
*("plugins/fanficfare_macmenuhack.txt".split('/')))
file_path = os.path.abspath(file_path)
logger.debug("Plugin %s macmenuhack file_path:%s"%(self.name,file_path))
self.macmenuhack = os.access(file_path, os.F_OK)
return self.macmenuhack
accepts_drops = True
def accept_enter_event(self, event, mime_data):
@ -442,30 +418,38 @@ class FanFicFarePlugin(InterfaceAction):
self.reject_list_action = self.create_menu_item_ex(self.menu, _('Reject Selected Books'),
unique_name='Reject Selected Books', image='rotate-right.png',
triggered=self.reject_list_urls)
# self.menu.addSeparator()
# print("platform.system():%s"%platform.system())
# print("platform.mac_ver()[0]:%s"%platform.mac_ver()[0])
if not self.check_macmenuhack(): # not platform.mac_ver()[0]: # Some macs crash on these menu items for unknown reasons.
self.menu.addSeparator()
self.editpersonalini_action = self.create_menu_item_ex(self.menu, _('Edit personal.ini'),
image= 'config.png',
unique_name='Edit personal.ini',
shortcut_name=_('Edit personal.ini'),
triggered=self.editpersonalini)
self.add_reject_urls_action = self.create_menu_item_ex(self.menu, _('Add Reject URLs'),
image='rotate-right.png',
unique_name='Add Reject URLs',
shortcut_name=_('Add Reject URLs'),
triggered=self.add_reject_urls)
self.config_action = self.create_menu_item_ex(self.menu, _('&Configure FanFicFare'),
image= 'config.png',
unique_name='Configure FanFicFare',
shortcut_name=_('Configure FanFicFare'),
triggered=do_user_config)
self.edit_reject_urls_action = self.create_menu_item_ex(self.menu, _('Edit Reject URLs'),
image='rotate-right.png',
unique_name='Edit Reject URLs',
shortcut_name=_('Edit Reject URLs'),
triggered=self.edit_reject_urls)
self.about_action = self.create_menu_item_ex(self.menu, _('About FanFicFare'),
image= 'images/icon.png',
unique_name='About FanFicFare',
shortcut_name=_('About FanFicFare'),
triggered=self.about)
self.menu.addSeparator()
self.editpersonalini_action = self.create_menu_item_ex(self.menu, _('Edit personal.ini'),
image= 'config.png',
unique_name='Edit personal.ini',
shortcut_name=_('Edit personal.ini'),
triggered=self.editpersonalini)
self.config_action = self.create_menu_item_ex(self.menu, _('&Configure FanFicFare'),
image= 'config.png',
unique_name='Configure FanFicFare',
shortcut_name=_('Configure FanFicFare'),
triggered=do_user_config)
self.about_action = self.create_menu_item_ex(self.menu, _('About FanFicFare'),
image= 'images/icon.png',
unique_name='About FanFicFare',
shortcut_name=_('About FanFicFare'),
triggered=self.about)
self.gui.keyboard.finalize()
def about(self,checked):
@ -501,6 +485,35 @@ class FanFicFarePlugin(InterfaceAction):
prefs['personal.ini'] = get_resources('plugin-example.ini')
prefs.save_to_db()
def add_reject_urls(self):
d = EditTextDialog(self.gui,
"http://example.com/story.php?sid=5,"+_("Reason why I rejected it")+"\nhttp://example.com/story.php?sid=6,"+_("Title by Author")+" - "+_("Reason why I rejected it"),
# icon=self.windowIcon(),
title=_("FanFicFare"),
label=_("Add Reject URLs. Use: <b>http://...,note</b> or <b>http://...,title by author - note</b><br>Invalid story URLs will be ignored."),
tooltip=_("One URL per line:\n<b>http://...,note</b>\n<b>http://...,title by author - note</b>"),
rejectreasons=rejecturllist.get_reject_reasons(),
reasonslabel=_('Add this reason to all URLs added:'),
accept_storyurls=True,
save_size_name='fff:Add Reject List')
d.exec_()
if d.result() == d.Accepted:
rejecturllist.add_text(d.get_plain_text(),d.get_reason_text())
def edit_reject_urls(self):
with busy_cursor():
d = RejectListDialog(self.gui,
rejecturllist.get_list(),
rejectreasons=rejecturllist.get_reject_reasons(),
header=_("Edit Reject URLs List"),
show_delete=False,
show_all_reasons=False)
d.exec_()
if d.result() != d.Accepted:
return
with busy_cursor():
rejecturllist.add(d.get_reject_list(),clear=True)
def create_menu_item_ex(self, parent_menu, menu_text, image=None, tooltip=None,
shortcut=None, triggered=None, is_checked=None, shortcut_name=None,
unique_name=None):
@ -540,11 +553,11 @@ class FanFicFarePlugin(InterfaceAction):
def update_lists(self,checked,add=True):
if prefs['addtolists'] or prefs['addtoreadlists']:
if not self.is_library_view():
self.gui.status_bar.show_message(_('Cannot Update Reading Lists from Device View'), 3000)
self.do_status_message(_('Cannot Update Reading Lists from Device View'), 3000)
return
if len(self.gui.library_view.get_selected_ids()) == 0:
self.gui.status_bar.show_message(_('No Selected Books to Update Reading Lists'), 3000)
self.do_status_message(_('No Selected Books to Update Reading Lists'), 3000)
return
self.update_reading_lists(self.gui.library_view.get_selected_ids(),add)
@ -589,7 +602,7 @@ class FanFicFarePlugin(InterfaceAction):
try:
with busy_cursor():
self.gui.status_bar.show_message(_('Fetching Story URLs from Email...'),6000)
self.do_status_message(_('Fetching Story URLs from Email...'),1000)
url_list = get_urls_from_imap(prefs['imapserver'],
prefs['imapuser'],
imap_pass,
@ -633,7 +646,7 @@ class FanFicFarePlugin(InterfaceAction):
notupdate_list = set([x for x in url_list if not self.do_id_search(adapters.getNormalStoryURL(x))])
url_list = url_list - notupdate_list
self.gui.status_bar.show_message(_('No Valid Story URLs Found in Unread Emails.'),3000)
self.do_status_message(_('No Valid Story URLs Found in Unread Emails.'),3000)
if prefs['download_from_email_immediately']:
## do imap fetch w/o GUI elements
@ -649,7 +662,7 @@ class FanFicFarePlugin(InterfaceAction):
'add_tag':prefs['imaptags'],
},"\n".join(url_list))
else:
self.gui.status_bar.show_message(_('Finished Fetching Story URLs from Email.'),3000)
self.do_status_message(_('Finished Fetching Story URLs from Email.'),3000)
else:
if url_list:
@ -706,12 +719,12 @@ class FanFicFarePlugin(InterfaceAction):
return
with busy_cursor():
self.gui.status_bar.show_message(_('Fetching Story URLs from Page...'))
self.do_status_message(_('Fetching Story URLs from Page...'))
frompage = self.get_urls_from_page(url)
url_list = frompage.get('urllist',[])
self.gui.status_bar.show_message(_('Finished Fetching Story URLs from Page.'),3000)
self.do_status_message(_('Finished Fetching Story URLs from Page.'),3000)
if url_list:
# make a copy before adding to avoid changing passed param
@ -736,7 +749,7 @@ class FanFicFarePlugin(InterfaceAction):
def list_story_urls(self,checked):
'''Get list of URLs from existing books.'''
if not self.gui.current_view().selectionModel().selectedRows() :
self.gui.status_bar.show_message(_('No Selected Books to Get URLs From'),
self.do_status_message(_('No Selected Books to Get URLs From'),
3000)
return
@ -783,12 +796,12 @@ class FanFicFarePlugin(InterfaceAction):
def unnew_books(self,checked):
'''Get list of URLs from existing books.'''
if not self.is_library_view():
self.gui.status_bar.show_message(_('Can only UnNew books in library'),
self.do_status_message(_('Can only UnNew books in library'),
3000)
return
if not self.gui.current_view().selectionModel().selectedRows() :
self.gui.status_bar.show_message(_('No Selected Books to Get URLs From'),
self.do_status_message(_('No Selected Books to Get URLs From'),
3000)
return
@ -849,7 +862,7 @@ class FanFicFarePlugin(InterfaceAction):
changed_ids = [ x['calibre_id'] for x in book_list if x['changed'] ]
if changed_ids:
logger.debug(_('Starting auto conversion of %d books.')%(len(changed_ids)))
self.gui.status_bar.show_message(_('Starting auto conversion of %d books.')%(len(changed_ids)), 3000)
self.do_status_message(_('Starting auto conversion of %d books.')%(len(changed_ids)), 3000)
self.gui.iactions['Convert Books'].auto_convert_auto_add(changed_ids)
def reject_list_urls(self,checked):
@ -864,7 +877,7 @@ class FanFicFarePlugin(InterfaceAction):
book_list = [ self.make_book_from_device_row(x) for x in rows ]
if len(book_list) == 0 :
self.gui.status_bar.show_message(_('No Selected Books have URLs to Reject'), 3000)
self.do_status_message(_('No Selected Books have URLs to Reject'), 3000)
return
# Progbar because fetching urls from device epubs can be slow.
@ -940,15 +953,15 @@ class FanFicFarePlugin(InterfaceAction):
def update_anthology(self,checked,extraoptions={}):
self.check_valid_collision(extraoptions)
if not self.get_epubmerge_plugin():
self.gui.status_bar.show_message(_('Cannot Make Anthologys without %s')%'EpubMerge 1.3.1+', 3000)
self.do_status_message(_('Cannot Make Anthologys without %s')%'EpubMerge 1.3.1+', 3000)
return
if not self.is_library_view():
self.gui.status_bar.show_message(_('Cannot Update Books from Device View'), 3000)
self.do_status_message(_('Cannot Update Books from Device View'), 3000)
return
if len(self.gui.library_view.get_selected_ids()) != 1:
self.gui.status_bar.show_message(_('Can only update 1 anthology at a time'), 3000)
self.do_status_message(_('Can only update 1 anthology at a time'), 3000)
return
db = self.gui.current_db
@ -958,13 +971,13 @@ class FanFicFarePlugin(InterfaceAction):
try:
with busy_cursor():
self.gui.status_bar.show_message(_('Fetching Story URLs for Series...'))
self.do_status_message(_('Fetching Story URLs for Series...'))
book_id = self.gui.library_view.get_selected_ids()[0]
mergebook = self.make_book_id_only(book_id)
self.populate_book_from_calibre_id(mergebook, db)
if not db.has_format(book_id,'EPUB',index_is_id=True):
self.gui.status_bar.show_message(_('Can only Update Epub Anthologies'), 3000)
self.do_status_message(_('Can only Update Epub Anthologies'), 3000)
return
tdir = PersistentTemporaryDirectory(prefix='fff_anthology_')
@ -995,7 +1008,7 @@ class FanFicFarePlugin(InterfaceAction):
url_list_text = "\n".join(url_list)
self.gui.status_bar.show_message(_('Finished Fetching Story URLs for Series.'),3000)
self.do_status_message(_('Finished Fetching Story URLs for Series.'),3000)
except NotAnthologyException:
# using an exception purely to get outside 'with busy_cursor:'
info_dialog(self.gui, _("Cannot Update Anthology"),
@ -1070,14 +1083,14 @@ class FanFicFarePlugin(InterfaceAction):
def update_dialog(self,checked,id_list=None,extraoptions={}):
if not self.is_library_view():
self.gui.status_bar.show_message(_('Cannot Update Books from Device View'), 3000)
self.do_status_message(_('Cannot Update Books from Device View'), 3000)
return
if not id_list:
id_list = self.gui.library_view.get_selected_ids()
if len(id_list) == 0:
self.gui.status_bar.show_message(_('No Selected Books to Update'), 3000)
self.do_status_message(_('No Selected Books to Update'), 3000)
return
self.check_valid_collision(extraoptions)
@ -1140,9 +1153,9 @@ class FanFicFarePlugin(InterfaceAction):
## Aug2024 moved site specific search changes to adapters as
## classmethod
regexp = adapters.get_url_search(url)
logger.debug(regexp)
# logger.debug(regexp)
retval = self.gui.current_db.search_getting_ids(regexp,None,use_virtual_library=False)
logger.debug(retval)
# logger.debug(retval)
return retval
def prep_downloads(self, options, books, merge=False, extrapayload=None):
@ -1182,7 +1195,7 @@ class FanFicFarePlugin(InterfaceAction):
win_title=_("Downloading metadata for stories")
status_prefix=_("Fetched metadata for")
self.gui.status_bar.show_message(status_bar, 3000)
self.do_status_message(status_bar, 3000)
LoopProgressDialog(self.gui,
books,
partial(self.prep_download_loop, options = options, merge=merge),
@ -1191,7 +1204,7 @@ class FanFicFarePlugin(InterfaceAction):
win_title=win_title,
status_prefix=status_prefix)
else:
self.gui.status_bar.show_message(_('No valid story URLs entered.'), 3000)
self.do_status_message(_('No valid story URLs entered.'), 3000)
# LoopProgressDialog calls prep_download_loop for each 'good' story,
# prep_download_loop updates book object for each with metadata from site,
# LoopProgressDialog calls start_download_job at the end which goes
@ -1241,9 +1254,9 @@ class FanFicFarePlugin(InterfaceAction):
def get_story_metadata_only(self,adapter):
url = adapter.url
## three tries, that's enough if both user/pass & is_adult needed,
## or a couple tries of one or the other
for x in range(0,2):
## 5 tries, should be enough if user/pass, totp & is_adult
## needed, or a couple tries of one or the other
for x in [0,1,2,3,4]:
try:
adapter.getStoryMetadataOnly(get_cover=False)
except exceptions.FailedToLogin as f:
@ -1254,6 +1267,13 @@ class FanFicFarePlugin(InterfaceAction):
adapter.username = userpass.user.text()
adapter.password = userpass.passwd.text()
except exceptions.NeedTimedOneTimePassword as e:
logger.warn("Login Failed, Need Username/Password.")
totpdlg = TOTPDialog(self.gui,url,e)
totpdlg.exec_() # exec_ will make it act modal
if totpdlg.status:
adapter.totp = totpdlg.totp.text()
except exceptions.AdultCheckRequired:
if question_dialog_all(self.gui, _('Are You an Adult?'), '<p>'+
_("%s requires that you be an adult. Please confirm you are an adult in your locale:")%url,
@ -1265,7 +1285,7 @@ class FanFicFarePlugin(InterfaceAction):
# let other exceptions percolate up.
return adapter.getStoryMetadataOnly(get_cover=False)
# @do_cprofile
@do_cprofile
def prep_download_loop(self,book,
options={'fileform':'epub',
'collision':ADDNEW,
@ -1299,9 +1319,16 @@ class FanFicFarePlugin(InterfaceAction):
if self.reject_url(merge,book):
return
## Check existing for SKIP mode. Again, redundant with below
## for when story URL changes, but also kept here to avoid
## network hit.
identicalbooks = self.do_id_search(url)
if collision == SKIP and identicalbooks:
raise exceptions.NotGoingToDownload(_("Skipping duplicate story."),"list_remove.png")
# Dialogs should prevent this case now.
if collision in (UPDATE,UPDATEALWAYS) and fileform != 'epub':
raise NotGoingToDownload(_("Cannot update non-epub format."))
raise exceptions.NotGoingToDownload(_("Cannot update non-epub format."))
if not book['good']:
# book has already been flagged bad for whatever reason.
@ -1429,6 +1456,7 @@ class FanFicFarePlugin(InterfaceAction):
book['is_adult'] = adapter.is_adult
book['username'] = adapter.username
book['password'] = adapter.password
book['totp'] = adapter.totp
book['icon'] = 'plus.png'
book['status'] = _('Add')
@ -1482,6 +1510,7 @@ class FanFicFarePlugin(InterfaceAction):
# try to find by identifier url or uri first.
identicalbooks = self.do_id_search(url)
# logger.debug("identicalbooks:%s"%identicalbooks)
mi = None
if len(identicalbooks) < 1 and prefs['matchtitleauth']:
# find dups
mi = MetaInformation(book['title'],book['author'])
@ -1493,10 +1522,38 @@ class FanFicFarePlugin(InterfaceAction):
logger.debug("existing found by identifier URL")
if collision == SKIP and identicalbooks:
raise NotGoingToDownload(_("Skipping duplicate story."),"list_remove.png")
raise exceptions.NotGoingToDownload(_("Skipping duplicate story."),"list_remove.png")
if len(identicalbooks) > 1:
raise NotGoingToDownload(_("More than one identical book by Identifier URL or title/author(s)--can't tell which book to update/overwrite."),"minusminus.png")
identicalbooks_msg = _("More than one identical book by Identifier URL or title/author(s)--can't tell which book to update/overwrite.")
identicalwhy_msg = _('<b>%(url)s</b> is already in your library more than once.')%{'url':url}
if mi:
identicalwhy_msg = _('<b>%(title)s</b> by <b>%(author)s</b> is already in your library more than once with different source URLs.')%{'title':mi.title,'author':', '.join(mi.author)}
if question_dialog_all(self.gui,
_('Download as New Book?'),'''
<h3>%s</h3>
<p>%s</p>
<p>%s</p>
<p>%s</p>
<p>%s</p>
<p>%s</p>
<p>%s</p>'''%(_('Download as New Book?'),
identicalbooks_msg,
identicalwhy_msg,
_('Do you want to add a new book for this URL?'),
_('New URL: <a href="%(newurl)s">%(newurl)s</a>')%{'newurl':book['url']},
_("Click '<b>Yes</b>' to a new book with new URL."),
_("Click '<b>No</b>' to skip URL.")),
show_copy_button=False,
question_name='download_new_dup',
question_cache=self.question_cache):
book_id = None
mi = None
book['calibre_id'] = None
identicalbooks = []
collision = book['collision'] = ADDNEW
else:
raise exceptions.NotGoingToDownload(identicalbooks_msg,"minusminus.png")
## changed: add new book when CALIBREONLY if none found.
if collision in (CALIBREONLY, CALIBREONLYSAVECOL) and not identicalbooks:
@ -1583,11 +1640,11 @@ class FanFicFarePlugin(InterfaceAction):
# returns int adjusted for start-end range.
urlchaptercount = story.getChapterCount()
if chaptercount == urlchaptercount and collision == UPDATE:
raise NotGoingToDownload(_("Already contains %d chapters.")%chaptercount,'edit-undo.png',showerror=False)
raise exceptions.NotGoingToDownload(_("Already contains %d chapters.")%chaptercount,'edit-undo.png',showerror=False)
elif chaptercount > urlchaptercount and not (collision == UPDATEALWAYS and adapter.getConfig('force_update_epub_always')):
raise NotGoingToDownload(_("Existing epub contains %d chapters, web site only has %d. Use Overwrite or force_update_epub_always to force update.") % (chaptercount,urlchaptercount),'dialog_error.png')
raise exceptions.NotGoingToDownload(_("Existing epub contains %d chapters, web site only has %d. Use Overwrite or force_update_epub_always to force update.") % (chaptercount,urlchaptercount),'dialog_error.png')
elif chaptercount == 0:
raise NotGoingToDownload(_("FanFicFare doesn't recognize chapters in existing epub, epub is probably from a different source. Use Overwrite to force update."),'dialog_error.png')
raise exceptions.NotGoingToDownload(_("FanFicFare doesn't recognize chapters in existing epub, epub is probably from a different source. Use Overwrite to force update."),'dialog_error.png')
if collision == OVERWRITE and \
db.has_format(book_id,formmapping[fileform],index_is_id=True):
@ -1604,7 +1661,7 @@ class FanFicFarePlugin(InterfaceAction):
# updated does have time, use full timestamps.
if (lastupdated.time() == time.min and fileupdated.date() > lastupdated.date()) or \
(lastupdated.time() != time.min and fileupdated > lastupdated):
raise NotGoingToDownload(_("Not Overwriting, web site is not newer."),'edit-undo.png',showerror=False)
raise exceptions.NotGoingToDownload(_("Not Overwriting, web site is not newer."),'edit-undo.png',showerror=False)
# For update, provide a tmp file copy of the existing epub so
# it can't change underneath us. Now also overwrite for logpage preserve.
@ -1709,12 +1766,7 @@ class FanFicFarePlugin(InterfaceAction):
calonly = False
break
if calonly:
class NotJob(object):
def __init__(self,result):
self.failed=False
self.result=result
notjob = NotJob(book_list)
self.download_list_completed(notjob,options=options)
self._do_download_list_completed(book_list,options=options)
return
self.do_mark_series_anthologies(options.get('mark_anthology_ids',set()))
@ -1744,6 +1796,20 @@ class FanFicFarePlugin(InterfaceAction):
msgl)
return
### *Don't* split anthology.
if merge:
self.dispatch_bg_job(_("Anthology"), book_list, copy.copy(options), merge)
elif prefs['site_split_jobs']: ### YYY Split list into sites, one BG job per site
sites_lists = defaultdict(list)
[ sites_lists[x['site']].append(x) for x in book_list ]
for site in sites_lists.keys():
site_list = sites_lists[site]
self.dispatch_bg_job(site, site_list, copy.copy(options), merge)
else:
self.dispatch_bg_job(None, book_list, copy.copy(options), merge)
def dispatch_bg_job(self, site, book_list, options, merge):
options['site'] = site
basic_cachefile = PersistentTemporaryFile(suffix='.basic_cache',
dir=options['tdir'])
options['basic_cache'].save_cache(basic_cachefile.name)
@ -1763,18 +1829,30 @@ class FanFicFarePlugin(InterfaceAction):
# get libs from plugin zip.
options['plugin_path'] = self.interface_action_base_plugin.plugin_path
func = 'arbitrary_n'
cpus = self.gui.job_manager.server.pool_size
args = ['calibre_plugins.fanficfare_plugin.jobs', 'do_download_worker',
(book_list, options, cpus, merge)]
desc = _('Download %s FanFiction Book(s)') % sum(1 for x in book_list if x['good'])
args = ['calibre_plugins.fanficfare_plugin.jobs',
'do_download_worker_single',
(site, book_list, options, merge)]
if site:
desc = _('Download %s FanFiction Book(s) for %s') % (sum(1 for x in book_list if x['good']),site)
else:
desc = _('Download %s FanFiction Book(s)') % sum(1 for x in book_list if x['good'])
job = self.gui.job_manager.run_job(
self.Dispatcher(partial(self.download_list_completed,options=options,merge=merge)),
func, args=args,
self.Dispatcher(partial(self.download_list_completed,
options=options,merge=merge)),
'arbitrary_n',
args=args,
description=desc)
self.download_job_manager.get_batch(options['tdir']).add_job(site,job)
job.tdir=options['tdir']
job.site=site
job.orig_book_list = book_list
# set as part of job, otherwise *changing* reconsolidate_jobs
# after launch could cause job results to be ignored.
job.reconsolidate=prefs['reconsolidate_jobs'] # YYY batch update
self.gui.jobs_pointer.start()
self.gui.status_bar.show_message(_('Starting %d FanFicFare Downloads')%len(book_list),3000)
self.do_status_message(_('Starting %d FanFicFare Downloads')%len(book_list),3000)
def do_mark_series_anthologies(self,mark_anthology_ids):
if prefs['mark_series_anthologies'] and mark_anthology_ids:
@ -1803,6 +1881,7 @@ class FanFicFarePlugin(InterfaceAction):
else:
return None
@do_cprofile
def update_books_loop(self,book,db=None,
options={'fileform':'epub',
'collision':ADDNEW,
@ -1919,9 +1998,14 @@ class FanFicFarePlugin(InterfaceAction):
self.gui.library_view.sort_by_named_field('marked', True)
logger.debug(_('Finished Adding/Updating %d books.')%(len(update_list) + len(add_list)))
self.gui.status_bar.show_message(_('Finished Adding/Updating %d books.')%(len(update_list) + len(add_list)), 3000)
remove_dir(options['tdir'])
logger.debug("removed tdir")
self.do_status_message(_('Finished Adding/Updating %d books.')%(len(update_list) + len(add_list)), 3000)
batch = self.download_job_manager.get_batch(options['tdir'])
batch.finish_job(options.get('site',None))
if batch.all_done():
remove_dir(options['tdir'])
logger.debug("removed tdir(%s)"%options['tdir'])
else:
logger.debug("DIDN'T removed tdir(%s)"%options['tdir'])
if 'Count Pages' in self.gui.iactions and len(prefs['countpagesstats']) and len(all_ids):
cp_plugin = self.gui.iactions['Count Pages']
@ -1950,28 +2034,62 @@ class FanFicFarePlugin(InterfaceAction):
if prefs['autoconvert'] and all_not_calonly_ids:
logger.debug(_('Starting auto conversion of %d books.')%(len(all_ids)))
self.gui.status_bar.show_message(_('Starting auto conversion of %d books.')%(len(all_ids)), 3000)
self.do_status_message(_('Starting auto conversion of %d books.')%(len(all_ids)), 3000)
self.gui.iactions['Convert Books'].auto_convert_auto_add(all_not_calonly_ids)
def download_list_completed(self, job, options={},merge=False):
if job.failed:
self.gui.job_exception(job, dialog_title='Failed to Download Stories')
return
tdir = job.tdir
site = job.site
logger.debug("Batch Job:%s %s"%(tdir,site))
batch = self.download_job_manager.get_batch(tdir)
if job.failed:
# logger.debug(job.orig_book_list)
## I don't *think* there would be any harm to modifying
## the original book list, but I elect not to chance it.
failedjobresult = copy.deepcopy(job.orig_book_list)
for x in failedjobresult:
if x['good']:
## may have failed before reaching BG job.
x['good'] = False
x['status'] = _('Error')
x['added'] = False
x['reportorder'] = x['listorder']+10000000 # force to end.
x['comment'] = _('Background Job Failed, see Calibre Jobs log.')
x['showerror'] = True
self.gui.job_exception(job, dialog_title=_('Background Job Failed to Download Stories for (%s)')%job.site)
job.result = failedjobresult
if job.reconsolidate: # YYY batch update
logger.debug("batch.finish_job(%s)"%site)
batch.finish_job(site)
showsite = None
# set as part of job, otherwise *changing* reconsolidate_jobs
# after launch could cause job results to be ignored.
if job.reconsolidate: # YYY batch update
if batch.all_done():
book_list = batch.get_results()
else:
return
else:
showsite = site
book_list = job.result
return self._do_download_list_completed(book_list, options, merge, showsite)
def _do_download_list_completed(self, book_list, options={},merge=False,showsite=None):
self.previous = self.gui.library_view.currentIndex()
db = self.gui.current_db
book_list = job.result
good_list = [ x for x in book_list if x['good'] ]
bad_list = [ x for x in book_list if not x['good'] ]
chapter_error_list = [ x for x in book_list if 'chapter_error_count' in x ]
try:
good_list = sorted(good_list,key=lambda x : x['reportorder'])
bad_list = sorted(bad_list,key=lambda x : x['reportorder'])
except KeyError:
good_list = sorted(good_list,key=lambda x : x['listorder'])
bad_list = sorted(bad_list,key=lambda x : x['listorder'])
#print("book_list:%s"%book_list)
sort_func = lambda x : x.get('reportorder',x['listorder'])
good_list = sorted(good_list,key=sort_func)
bad_list = sorted(bad_list,key=sort_func)
payload = (good_list, bad_list, options)
msgl = [ _('FanFicFare found <b>%s</b> good and <b>%s</b> bad updates.')%(len(good_list),len(bad_list)) ]
@ -2012,6 +2130,8 @@ class FanFicFarePlugin(InterfaceAction):
do_update_func = self.do_download_merge_update
else:
if showsite:
msgl.append(_('Downloading from %s')%showsite)
msgl.extend([
_('See log for details.'),
_('Proceed with updating your library?')])
@ -2032,6 +2152,15 @@ class FanFicFarePlugin(InterfaceAction):
htmllog,
msgl)
def do_status_message(self,message,timeout=0):
self.gui.status_bar.show_message(message,timeout)
try:
QApplication.processEvents(QEventLoop.ProcessEventsFlag.ExcludeUserInputEvents)
except:
## older versions of qt don't have ExcludeUserInputEvents.
## but they also don't need the processEvents() call
pass
def do_proceed_question(self, update_func, payload, htmllog, msgl):
msg = '<p>'+'</p>\n<p>'.join(msgl)+ '</p>\n'
def proceed_func(*args, **kwargs):
@ -2059,7 +2188,7 @@ class FanFicFarePlugin(InterfaceAction):
good_list = sorted(good_list,key=lambda x : x['listorder'])
bad_list = sorted(bad_list,key=lambda x : x['listorder'])
self.gui.status_bar.show_message(_('Merging %s books.')%total_good)
self.do_status_message(_('Merging %s books.')%total_good)
existingbook = None
if 'mergebook' in options:
@ -2084,30 +2213,45 @@ class FanFicFarePlugin(InterfaceAction):
## start with None. If no subbook covers, don't force one
## here. User can configure FFF to always create/polish a
## cover if they want. This is about when we force it.
coverpath = None
coverimgpath = None
coverimgtype = None
had_cover = False
## first, look for covers inside the subbooks. Stop at the
## first one, which will be used if there isn't a pre-existing
# epubmerge wants a path to cover img on disk
def write_image(imgtype,imgdata):
tmp = PersistentTemporaryFile(prefix='cover_',
suffix='.'+imagetypes[imgtype],
dir=options['tdir'])
tmp.write(imgdata)
tmp.flush()
tmp.close()
return tmp.name
## if prior epub had a cover, we should use it again.
if mergebook['calibre_id'] and db.has_format(mergebook['calibre_id'],'EPUB',index_is_id=True):
(covertype,coverdata) = get_cover_img(db.format(mergebook['calibre_id'],'EPUB',index_is_id=True,as_file=True))
if coverdata:
had_cover = True
coverimgpath = write_image(covertype,coverdata)
coverimgtype = covertype
logger.debug("prior anthology cover found")
## look for covers inside the subbooks. Stop at the first
## one, which will be used if there isn't a pre-existing
## calibre cover.
if not coverpath:
if not coverimgpath:
for book in good_list:
coverdata = get_cover_data(book['outfile'])
(covertype,coverdata) = get_cover_img(book['outfile'])
if coverdata: # found a cover.
(coverimgtype,coverimgdata) = coverdata[4:6]
# logger.debug('coverimgtype:%s [%s]'%(coverimgtype,imagetypes[coverimgtype]))
tmpcover = PersistentTemporaryFile(suffix='.'+imagetypes[coverimgtype],
dir=options['tdir'])
tmpcover.write(coverimgdata)
tmpcover.flush()
tmpcover.close()
coverpath = tmpcover.name
coverimgpath = write_image(covertype,coverdata)
coverimgtype = covertype
logger.debug('from subbook coverimgpath:%s'%coverimgpath)
break
# logger.debug('coverpath:%s'%coverpath)
## if updating an existing book and there is at least one
## subbook cover:
if coverpath and mergebook['calibre_id']:
if not had_cover and coverimgpath and mergebook['calibre_id']:
logger.debug("anth cover: using cal cover")
# Couldn't find a better way to get the cover path.
calcoverpath = os.path.join(db.library_path,
db.path(mergebook['calibre_id'], index_is_id=True),
@ -2115,9 +2259,11 @@ class FanFicFarePlugin(InterfaceAction):
## if there's an existing cover, use it. Calibre will set
## it for us during lots of different actions anyway.
if os.path.exists(calcoverpath):
coverpath = calcoverpath
coverimgpath = calcoverpath
# logger.debug('coverpath:%s'%coverpath)
## Note that this cover will be replaced if 'inject
## generated' cover is on
logger.debug('coverimgpath:%s'%coverimgpath)
mrg_args = [tmp.name,
[ x['outfile'] for x in good_list ],]
mrg_kwargs = {
@ -2125,7 +2271,7 @@ class FanFicFarePlugin(InterfaceAction):
'titleopt':mergebook['title'],
'keepmetadatafiles':True,
'source':mergebook['url'],
'coverjpgpath':coverpath
'coverjpgpath':coverimgpath
}
logger.debug('anthology_merge_keepsingletocs:%s'%
mergebook['anthology_merge_keepsingletocs'])
@ -2154,11 +2300,10 @@ class FanFicFarePlugin(InterfaceAction):
good_list = sorted(good_list,key=lambda x : x['listorder'])
bad_list = sorted(bad_list,key=lambda x : x['listorder'])
self.gui.status_bar.show_message(_('FanFicFare Adding/Updating books.'))
self.do_status_message(_('FanFicFare Adding/Updating books.'))
errorcol_label = self.get_custom_col_label(prefs['errorcol'])
lastcheckedcol_label = self.get_custom_col_label(prefs['lastcheckedcol'])
columns = self.gui.library_view.model().custom_columns
if good_list or prefs['mark'] or (bad_list and errorcol_label) or lastcheckedcol_label:
LoopProgressDialog(self.gui,
good_list+bad_list,
@ -2504,7 +2649,6 @@ class FanFicFarePlugin(InterfaceAction):
db.new_api.set_link_for_authors(author_id_to_link_map)
# set series link if found.
logger.debug("has link_map:%s"%(hasattr(db.new_api,'set_link_map')))
## new_api.set_link_map added in Calibre v6.15
if hasattr(db.new_api,'set_link_map') and \
prefs['set_series_url'] and \
@ -2513,6 +2657,7 @@ class FanFicFarePlugin(InterfaceAction):
series = book['series']
if '[' in series: # a few can have a series w/o number
series = series[:series.rindex(' [')]
logger.debug("Setting series link:%s"%book['all_metadata']['seriesUrl'])
db.new_api.set_link_map('series',{series:
book['all_metadata']['seriesUrl']})
@ -2652,7 +2797,7 @@ class FanFicFarePlugin(InterfaceAction):
addremovefunc(l,
book_ids,
display_warnings=False,
refresh_screen=False)
refresh_screen=True)
else:
if l != '':
message="<p>"+_("You configured FanFicFare to automatically update Reading List '%s', but you don't have a list of that name?")%l+"</p>"
@ -2671,7 +2816,7 @@ class FanFicFarePlugin(InterfaceAction):
#add_book_ids,
book_ids,
display_warnings=False,
refresh_screen=False)
refresh_screen=True)
else:
if l != '':
message="<p>"+_("You configured FanFicFare to automatically update Reading List '%s', but you don't have a list of that name?")%l+"</p>"
@ -2702,6 +2847,9 @@ class FanFicFarePlugin(InterfaceAction):
mi.pubdate = book['pubdate']
mi.timestamp = book['timestamp']
mi.comments = book['comments']
if prefs['seriescase']:
from calibre.ebooks.metadata.sources.base import fixcase
book['series'] = fixcase(book['series'])
mi.series = book['series']
return mi
@ -3053,6 +3201,7 @@ The previously downloaded book is still in the anthology, but FFF doesn't have t
if prefs['setanthologyseries'] and book['title'] == series:
book['series'] = series+' [0]'
book['all_metadata']['seriesUrl'] = options.get('anthology_url','')
# logger.debug("anthology_title_pattern:%s"%configuration.getConfig('anthology_title_pattern'))
if configuration.getConfig('anthology_title_pattern'):
@ -3073,7 +3222,9 @@ The previously downloaded book is still in the anthology, but FFF doesn't have t
s = options.get('frompage',{}).get('status','')
if s:
book['all_metadata']['status'] = s
book['tags'].append(s)
## status into tags only if in include_subject_tags
if 'status' in configuration.getConfigList('include_subject_tags'):
book['tags'].append(s)
book['tags'].extend(configuration.getConfigList('anthology_tags'))
book['all_metadata']['anthology'] = "true"
@ -3111,9 +3262,53 @@ def pretty_book(d, indent=0, spacer=' '):
# return '\n'.join([(pretty_book(v, indent, spacer)) for v in d])
if isinstance(d, dict):
for k in ('password','username'):
for k in ('password','username','totp'):
if k in d and d[k]:
d[k]=_('(was set, removed for security)')
return '\n'.join(['%s%s:\n%s' % (kindent, k, pretty_book(v, indent + 1, spacer))
for k, v in d.items()])
return "%s%s"%(kindent, d)
class DownloadBatch():
def __init__(self,tdir=None):
self.runningjobs = dict() # keyed by site
self.jobsorder = []
self.tdir = tdir
def add_job(self,site,job):
self.runningjobs[site]=job
self.jobsorder.append(job)
def finish_job(self,site):
try:
self.runningjobs.pop(site)
except:
pass
def all_done(self):
return len(self.runningjobs) == 0
def get_results(self):
retlist = []
for j in self.jobsorder:
## failed / no result
try:
iter(j.result)
except TypeError:
# not iterable abc.Iterable only in newer pythons
pass
else:
retlist.extend(j.result)
return retlist
class DownloadJobManager():
def __init__(self):
self.batches = {}
def get_batch(self,batch):
if batch not in self.batches:
self.batches[batch] = DownloadBatch()
return self.batches[batch]
def remove_batch(self,batch):
del self.batches[batch]

View file

@ -2,7 +2,6 @@
from __future__ import (unicode_literals, division, absolute_import,
print_function)
import six
__license__ = 'GPL v3'
__copyright__ = '2020, Jim Miller, 2011, Grant Drake <grant.drake@gmail.com>'
@ -17,8 +16,6 @@ from io import StringIO
from collections import defaultdict
import sys
from calibre.utils.ipc.server import Empty, Server
from calibre.utils.ipc.job import ParallelJob
from calibre.utils.date import local_tz
# pulls in translation files for _() strings
@ -33,21 +30,11 @@ except NameError:
#
# ------------------------------------------------------------------------------
def do_download_worker(book_list,
options,
cpus,
merge=False,
notification=lambda x,y:x):
'''
Coordinator job, to launch child jobs to do downloads.
This is run as a worker job in the background to keep the UI more
responsive and get around any memory leak issues as it will launch
a child job for each book as a worker process
'''
## Now running one BG proc per site, which downloads for the same
## site in serial.
logger.info("CPUs:%s"%cpus)
server = Server(pool_size=cpus)
def do_download_worker_single(site,
book_list,
options,
merge,
notification=lambda x,y:x):
logger.info(options['version'])
@ -56,142 +43,87 @@ def do_download_worker(book_list,
from calibre.debug import print_basic_debug_info
print_basic_debug_info(sys.stderr)
sites_lists = defaultdict(list)
[ sites_lists[x['site']].append(x) for x in book_list if x['good'] ]
totals = {}
# can't do direct assignment in list comprehension? I'm sure it
# makes sense to some pythonista.
# [ totals[x['url']]=0.0 for x in book_list if x['good'] ]
[ totals.update({x['url']:0.0}) for x in book_list if x['good'] ]
# logger.debug(sites_lists.keys())
# Queue all the jobs
jobs_running = 0
for site in sites_lists.keys():
site_list = sites_lists[site]
logger.info(_("Launch background process for site %s:")%site + "\n" +
"\n".join([ x['url'] for x in site_list ]))
# logger.debug([ x['url'] for x in site_list])
args = ['calibre_plugins.fanficfare_plugin.jobs',
'do_download_site',
(site,site_list,options,merge)]
job = ParallelJob('arbitrary_n',
"site:(%s)"%site,
done=None,
args=args)
job._site_list = site_list
job._processed = False
server.add_job(job)
jobs_running += 1
# This server is an arbitrary_n job, so there is a notifier available.
# Set the % complete to a small number to avoid the 'unavailable' indicator
notification(0.01, _('Downloading FanFiction Stories'))
from calibre_plugins.fanficfare_plugin import FanFicFareBase
fffbase = FanFicFareBase(options['plugin_path'])
with fffbase: # so the sys.path was modified while loading the
# plug impl.
from fanficfare.fff_profile import do_cprofile
# dequeue the job results as they arrive, saving the results
count = 0
while True:
job = server.changed_jobs_queue.get()
# logger.debug("job get job._processed:%s"%job._processed)
# A job can 'change' when it is not finished, for example if it
# produces a notification.
msg = None
try:
## msg = book['url']
(percent,msg) = job.notifications.get_nowait()
# logger.debug("%s<-%s"%(percent,msg))
if percent == 10.0: # Only when signaling d/l done.
count += 1
totals[msg] = 1.0/len(totals)
# logger.info("Finished: %s"%msg)
else:
## extra function just so I can easily use the same
## @do_cprofile decorator
@do_cprofile
def profiled_func():
count = 0
totals = {}
# can't do direct assignment in list comprehension? I'm sure it
# makes sense to some pythonista.
# [ totals[x['url']]=0.0 for x in book_list if x['good'] ]
[ totals.update({x['url']:0.0}) for x in book_list if x['good'] ]
# logger.debug(sites_lists.keys())
def do_indiv_notif(percent,msg):
totals[msg] = percent/len(totals)
notification(max(0.01,sum(totals.values())), _('%(count)d of %(total)d stories finished downloading')%{'count':count,'total':len(totals)})
except Empty:
pass
# without update, is_finished will never be set. however, we
# do want to get all the notifications for status so we don't
# miss the 'done' ones.
job.update(consume_notifications=False)
notification(max(0.01,sum(totals.values())), _('%(count)d of %(total)d stories finished downloading')%{'count':count,'total':len(totals)})
# if not job._processed:
# sleep(0.5)
## Can have a race condition where job.is_finished before
## notifications for all downloads have been processed.
## Or even after the job has been finished.
# logger.debug("job.is_finished(%s) or job._processed(%s)"%(job.is_finished, job._processed))
if not job.is_finished:
continue
## only process each job once. We can get more than one loop
## after job.is_finished.
if not job._processed:
# sleep(1)
# A job really finished. Get the information.
## This is where bg proc details end up in GUI log.
## job.details is the whole debug log for each proc.
logger.info("\n\n" + ("="*80) + " " + job.details.replace('\r',''))
# logger.debug("Finished background process for site %s:\n%s"%(job._site_list[0]['site'],"\n".join([ x['url'] for x in job._site_list ])))
for b in job._site_list:
book_list.remove(b)
book_list.extend(job.result)
job._processed = True
jobs_running -= 1
## Can't use individual count--I've seen stories all reported
## finished before results of all jobs processed.
if jobs_running == 0:
book_list = sorted(book_list,key=lambda x : x['listorder'])
logger.info("\n"+_("Download Results:")+"\n%s\n"%("\n".join([ "%(status)s %(url)s %(comment)s" % book for book in book_list])))
good_lists = defaultdict(list)
bad_lists = defaultdict(list)
do_list = []
done_list = []
logger.info("\n\n"+_("Downloading FanFiction Stories")+"\n%s\n"%("\n".join([ "%(status)s %(url)s %(comment)s" % book for book in book_list])))
## pass failures from metadata through bg job so all results are
## together.
for book in book_list:
if book['good']:
good_lists[book['status']].append(book)
do_list.append(book)
else:
bad_lists[book['status']].append(book)
done_list.append(book)
for book in do_list:
# logger.info("%s"%book['url'])
done_list.append(do_download_for_worker(book,options,merge,do_indiv_notif))
count += 1
return finish_download(done_list)
return profiled_func()
order = [_('Add'),
_('Update'),
_('Meta'),
_('Different URL'),
_('Rejected'),
_('Skipped'),
_('Bad'),
_('Error'),
]
j = 0
for d in [ good_lists, bad_lists ]:
for status in order:
if d[status]:
l = d[status]
logger.info("\n"+status+"\n%s\n"%("\n".join([book['url'] for book in l])))
for book in l:
book['reportorder'] = j
j += 1
del d[status]
# just in case a status is added but doesn't appear in order.
for status in d.keys():
logger.info("\n"+status+"\n%s\n"%("\n".join([book['url'] for book in d[status]])))
break
def finish_download(donelist):
book_list = sorted(donelist,key=lambda x : x['listorder'])
logger.info("\n"+_("Download Results:")+"\n%s\n"%("\n".join([ "%(status)s %(url)s %(comment)s" % book for book in book_list])))
server.close()
good_lists = defaultdict(list)
bad_lists = defaultdict(list)
for book in book_list:
if book['good']:
good_lists[book['status']].append(book)
else:
bad_lists[book['status']].append(book)
order = [_('Add'),
_('Update'),
_('Meta'),
_('Different URL'),
_('Rejected'),
_('Skipped'),
_('Bad'),
_('Error'),
]
stnum = 0
for d in [ good_lists, bad_lists ]:
for status in order:
stnum += 1
if d[status]:
l = d[status]
logger.info("\n"+status+"\n%s\n"%("\n".join([book['url'] for book in l])))
for book in l:
# Add prior listorder to 10000 * status num for
# ordering of accumulated results with multiple bg
# jobs
book['reportorder'] = stnum*10000 + book['listorder']
del d[status]
# just in case a status is added but doesn't appear in order.
for status in d.keys():
logger.info("\n"+status+"\n%s\n"%("\n".join([book['url'] for book in d[status]])))
# return the book list as the job result
return book_list
def do_download_site(site,book_list,options,merge,notification=lambda x,y:x):
# logger.info(_("Started job for %s")%site)
retval = []
for book in book_list:
# logger.info("%s"%book['url'])
retval.append(do_download_for_worker(book,options,merge,notification))
notification(10.0,book['url'])
return retval
def do_download_for_worker(book,options,merge,notification=lambda x,y:x):
'''
Child job, to download story when run as a worker job
@ -201,13 +133,13 @@ def do_download_for_worker(book,options,merge,notification=lambda x,y:x):
fffbase = FanFicFareBase(options['plugin_path'])
with fffbase: # so the sys.path was modified while loading the
# plug impl.
from calibre_plugins.fanficfare_plugin.dialogs import NotGoingToDownload
from calibre_plugins.fanficfare_plugin.prefs import (
SAVE_YES, SAVE_YES_UNLESS_SITE, OVERWRITE, OVERWRITEALWAYS, UPDATE,
UPDATEALWAYS, ADDNEW, SKIP, CALIBREONLY, CALIBREONLYSAVECOL)
from calibre_plugins.fanficfare_plugin.wordcount import get_word_count
from fanficfare import adapters, writers
from fanficfare.epubutils import get_update_data
from fanficfare.exceptions import NotGoingToDownload
from fanficfare.six import text_type as unicode
from calibre_plugins.fanficfare_plugin.fff_util import get_fff_config
@ -236,6 +168,7 @@ def do_download_for_worker(book,options,merge,notification=lambda x,y:x):
adapter.is_adult = book['is_adult']
adapter.username = book['username']
adapter.password = book['password']
adapter.totp = book['totp']
adapter.setChaptersRange(book['begin'],book['end'])
## each site download job starts with a new copy of the
@ -426,7 +359,7 @@ def do_download_for_worker(book,options,merge,notification=lambda x,y:x):
data = {'smarten_punctuation':True}
opts = ALL_OPTS.copy()
opts.update(data)
O = namedtuple('Options', ' '.join(six.iterkeys(ALL_OPTS)))
O = namedtuple('Options', ' '.join(ALL_OPTS.keys()))
opts = O(**opts)
log = Log(level=Log.DEBUG)
@ -459,7 +392,8 @@ def inject_cal_cols(book,story,configuration):
if 'calibre_columns' in book:
injectini = ['[injected]']
extra_valid = []
for k, v in six.iteritems(book['calibre_columns']):
for k in book['calibre_columns'].keys():
v = book['calibre_columns'][k]
story.setMetadata(k,v['val'])
injectini.append('%s_label:%s'%(k,v['label']))
extra_valid.append(k)

View file

@ -124,6 +124,10 @@ include_titlepage: true
## include a TOC page before the story text
include_tocpage: true
## When set to 'true', tocpage is only included if there is more than
## one chapter in the story. If set to 'always', tocpage will be
## included even if the story only has one chapter.
#include_tocpage: always
## website encoding(s) In theory, each website reports the character
## encoding they use for each page. In practice, some sites report it
@ -315,6 +319,13 @@ conditionals_use_lists:true
## br paragraphs with p tags while preserving scene breaks.
#replace_br_with_p: false
## Some sites/authors/stories (notably AO3/OTW) add empty p tags where
## they aren't intended by the author during document upload and not
## all authors know how/take the time to fix it. This feature removes
## all "empty" <p> tags, ie, those containing only whitespace or <br>
## tags.
#remove_empty_p: false
## If you have the Generate Cover plugin installed, you can use the
## generate_cover_settings parameter to intelligently decide which GC
## setting to run. There are three parts 1) a template of which
@ -502,6 +513,13 @@ mark_new_chapters:false
## (new) marks in TOC when mark_new_chapters:true
#anthology_merge_keepsingletocs:false
## The count of how many chapters are marked '(new)' will be in
## metadata entry marked_new_chapters
marked_new_chapters_label:Chapters Marked New
# Add comma separators for numeric reads. Eg 10000 becomes 10,000
add_to_comma_entries:,marked_new_chapters
## chapter title patterns use python template substitution. The
## ${number} is the 'chapter' number and ${title} is the chapter
## title, after applying chapter_title_strip_pattern. ${index04} is
@ -620,6 +638,8 @@ browser_cache_age_limit:4.0
## can't already find in browser cache in your default browser, then
## check for it in the cache again. Note that your browser_cache_path
## setting *must* use your default browser for this to work.
## MacOS Users: You may also need to set Calibre's openers_by_scheme
## tweak. See https://github.com/JimmXinu/FanFicFare/issues/1142
#open_pages_in_browser:false
## As a (second) work around for certain sites blocking automated
@ -660,6 +680,15 @@ browser_cache_age_limit:4.0
## an error message in the ebook for that chapter.
continue_on_chapter_error:false
## When continue_on_chapter_error:true, after
## continue_on_chapter_error_try_limit chapters have failed, continue
## processing, but stop trying to download chapters. Mark all such
## chapters with chapter_title_error_mark, but chapter text will
## explain that no download attempt was made because
## continue_on_chapter_error_try_limit was exceeded. Set to -1 for
## infinite chapter errors.
continue_on_chapter_error_try_limit:5
## Append this to chapter titles that errored. Only used with
## continue_on_chapter_error:true
## Set empty to not mark failed chapters.
@ -719,6 +748,9 @@ storynotes_label:Story Notes
add_to_extra_titlepage_entries:,storynotes
[base_xenforoforum]
## NOTE: There are no supported XenForo1 sites anymore, only XenForo2
## site. The [base_xenforoforum] section is kept for backward
## compatibility.
use_basic_cache:true
## Some sites require login for some stories
#username:YourName
@ -744,7 +776,7 @@ max_fg_sleep:4.0
max_fg_sleep_at_downloads:4
## exclude emoji and default avatars.
cover_exclusion_regexp:(/styles/|xenforo/avatars/avatar.*\.png|https://cdn\.jsdelivr\.net/gh/|https://cdn\.jsdelivr\.net/emojione)
cover_exclusion_regexp:(/styles/|xenforo/avatars/avatar.*\.png|https://cdn\.jsdelivr\.net/gh/|https://cdn\.jsdelivr\.net/emojione|/data/svg/2/1/\d+/2022_favicon_[^.]*\.png|ytimg.com)
## use author(original poster)'s avatar as cover image when true.
author_avatar_cover:false
@ -753,7 +785,7 @@ author_avatar_cover:false
strip_chapter_numbers:false
## Copy title to tagsfromtitle for parsing tags.
add_to_extra_valid_entries:,tagsfromtitledetect,tagsfromtitle,forumtags,prefixtags,contenttags,formattags,parentforums
add_to_extra_valid_entries:,tagsfromtitledetect,tagsfromtitle,forumtags,prefixtags,contenttags,formattags,timeperiodtags,parentforums
## '.NOREPL' tells the system to *not* apply title's
## in/exclude/replace_metadata -- Only works on include_in_ lines.
@ -767,6 +799,7 @@ forumtags_label:Tags from Forum
prefixtags_label:Prefix Tags from Forum
contenttags_label:Content Tags from Forum
formattags_label:Format Tags from Forum
timeperiodtags_label:Time Period Tags from Forum
parentforums_label:Parent Forums
keep_in_order_parentforums:true
@ -825,7 +858,7 @@ add_to_replace_metadata:
# Remove [] from prefixtags ala [NSFW] on QQ
prefixtags=>[\[\]]+=>
add_to_extra_titlepage_entries:,tagsfromtitle,forumtags
#add_to_extra_titlepage_entries:,tagsfromtitle,forumtags
## XenForo tags are all lowercase everywhere that I've seen. This
## makes the first letter of each word uppercase. Applied before
@ -852,7 +885,9 @@ description_limit:500
## there are at least this many threadmarks. A number of older
## threads have a single threadmark to an 'index' post. Set to 1 to
## use threadmarks whenever they exist.
minimum_threadmarks:2
## Update Jun2025: Default changed to 1, index posts are not a common
## thing anymore.
minimum_threadmarks:1
## When 'first post' (or post URL) is being added as a chapter, give
## the chapter this title.
@ -900,8 +935,17 @@ always_use_forumtags:false
## spoiler blocks with the original spoiler button text as a label
## using fieldset and legend HTML tags. For a simple box, see the
## add_to_output_css example for [base_xenforoforum:epub] below.
## remove_spoilers overrides legend_spoilers
#legend_spoilers:false
## This option if uncommented and set true, will change the tags
## around spoiler blocks to a <details> tag with <summary> tag
## containing the original spoiler button text. For a simple line
## box, see the add_to_output_css example for [base_xenforoforum:epub]
## below.
## remove_spoilers and legend_spoilers override details
#details_spoilers:false
## True by built-in default, but only applied if using threadmarks for
## chapters and a 'reader' URL is found in the thread, 'reader mode'
## will reduce the number of pages fetched by roughly 10 to 1 for a
@ -912,6 +956,12 @@ always_use_forumtags:false
## 10.
#reader_posts_per_page:10
## XF2 sites let you change the number of threadmarks shown per page.
## This reduces the number of network requests considerably for
## stories with lots of threadmarks. Site default is 25, FFF default
## is 200, the common maximum.
#threadmarks_per_page:200
## xenforoforum has categories of threadmarks. This setting allows
## you to leave out categories you don't want. Skipping categories
## will also speed downloads as categories other than 'Threadmarks'
@ -1032,13 +1082,19 @@ use_threadmark_wordcounts:true
[base_xenforoforum:epub]
## See remove_spoilers above for more about 'spoilers'. This example
## See remove_spoilers/etc above for more about 'spoilers'. This example
## shows how to put a simple line around spoiler blocks. Uncomment
## all three lines, keep the leading space before .bbCodeSpoilerContainer.
#add_to_keep_html_attrs:,style
#add_to_output_css:
# .bbCodeSpoilerContainer { border: 1px solid black; padding: 2px; }
## This example shows how to put a simple line around
## 'details_spoilers' blocks. Uncomment both lines, keep the leading
## space before .bbCodeSpoilerContainer.
#add_to_output_css:
# .bbCodeSpoilerContainer { border: 1px solid black; padding: 2px; }
## When reveal_invisible_text:true, you can style the class
## invisible_text as you like for forum "invisible text". See
## reveal_invisible_text above. This is just one example. Note that
@ -1102,6 +1158,17 @@ skip_sticky_first_posts:true
## NOTE: SV requires login (always_login:true) to see dice rolls.
#include_dice_rolls:false
## If the poster is *not* the author (IE, original poster), include a
## poster link prepended to the chapter text.
## Eg: "<p>Chapter by: <a...>ThePoster</a></p>"
#include_nonauthor_poster:false
## Recognize 'data-s9e-mediaembed' embedded media with CSS background
## and modify to be an <a> tag linking to the video around an <img>
## tag of the CSS background image. Only seen with YouTube so far.
## On by default.
#link_embedded_media:true
[epub]
## Each output format has a section that overrides [defaults]
@ -1119,6 +1186,10 @@ zip_output: false
## epub carries the TOC in metadata.
## mobi generated from epub by calibre will have a TOC at the end.
include_tocpage: false
## When set to 'true', tocpage is only included if there is more than
## one chapter in the story. If set to 'always', tocpage will be
## included even if the story only has one chapter.
#include_tocpage: always
## include a Update Log page before the story text. If 'true', the
## log will be updated each time the epub is and all the metadata
@ -1331,7 +1402,7 @@ remove_transparency: true
## will break any other readers, but in case it does, the fix can be
## turned off. This setting is not used if replace_br_with_p is
## true--replace_br_with_p also fixes the problem.
nook_img_fix:true
#nook_img_fix:false
## Some ebook readers (Moon+ Reader was reported) read <meta
## name="calibre:series"...> and <meta name="calibre:series_index"...>
@ -1351,6 +1422,10 @@ nook_img_fix:true
## under [defaults] or [epub].
#force_update_epub_always:false
## Mark epub as having right-to-left page progression direction.
## Useful for RtL languages such as Japanese.
#page_progression_direction_rtl:false
[html]
## include images from img tags in the body and summary of
@ -1524,20 +1599,28 @@ chaptertitles:Prologue,Chapter 1\, Xenos on Cinnabar,Chapter 2\, Sinmay on Kinti
[adult-fanfiction.org]
use_basic_cache:true
extra_valid_entries:eroticatags,disclaimer
eroticatags_label:Erotica Tags
disclaimer_label:Disclaimer
extra_titlepage_entries:eroticatags,disclaimer
[althistory.com]
## Note this is NOT the same as www.alternatehistory.com
## see [base_xenforoforum]
## Some sites require login (or login for some rated stories) The
## program can prompt you, or you can save it in config. In
## commandline version, this should go in your personal.ini, not
## defaults.ini.
## NOTE: You will probably need to have recently logged in using your
## browser to solve a captcha before this will work in FFF.
#username:YourName
#password:yourpassword
[archiveofourown.org]
use_basic_cache:true
## This is a OTW-archive site.
## Some sites require login (or login for some rated stories) The
## program can prompt you, or you can save it in config. In
## commandline version, this should go in your personal.ini, not
@ -1590,7 +1673,8 @@ use_basic_cache:true
## series04,series04Url etc.
extra_valid_entries:fandoms, freeformtags, freefromtags,
ao3categories, comments, chapterslashtotal, chapterstotal, kudos,
hits, bookmarks, collections, byline, bookmarked, bookmarktags,
hits, bookmarks, collections, collectionsUrl, collectionsHTML,
byline, bookmarked, bookmarktags,
bookmarksummary, bookmarkprivate, bookmarkrec, subscribed,
markedforlater, restricted, series00, series01, series02, series03,
series00Url, series01Url, series02Url, series03Url, series00HTML,
@ -1605,6 +1689,7 @@ chapterstotal_label:Total Chapters
kudos_label:Kudos
hits_label:Hits
collections_label:Collections
collectionsHTML_label:Collections
## Count of bookmarks on story by all users
bookmarks_label:Bookmarks
## Tags & Summary from *your* bookmark on the story. Only collected
@ -1622,21 +1707,25 @@ series01HTML_label:Additional Series
series02HTML_label:Additional Series
series03HTML_label:Additional Series
## have to keep in order for name and URL to line up.
keep_in_order_collections:true
keep_in_order_collectionsUrl:true
## Assume entryUrl, apply to "<a class='%slink' href='%s'>%s</a>" to
## make entryHTML.
make_linkhtml_entries:series00,series01,series02,series03
make_linkhtml_entries:series00,series01,series02,series03,collections
## AO3 doesn't have anything it calls 'genre'. The adapter used to be
## hardcoded to include the site specific metadata freeformtags &
## ao3categories in the standard metadata field genre. By making it
## configurable, users can change it.
include_in_genre: freeformtags, ao3categories
include_in_genre: genre, freeformtags, ao3categories
## AO3 uses the word 'category' differently than most sites. The
## adapter used to be hardcoded to include the site specific metadata
## fandom in the standard metadata field category. By making it
## configurable, users can change it.
include_in_category:fandoms
include_in_category:category,fandoms
## freeformtags was previously typo'ed as freefromtags. This way,
## freefromtags will still work for people who've used it.
@ -1678,7 +1767,7 @@ add_to_replace_metadata:
## AO3 is blocking people more aggressively. If you download fewer
## stories less often you can likely get by with reducing this sleep.
slow_down_sleep_time:2
slow_down_sleep_time:4
## AO3 allows users to archive stories they didn't write in certain
## cases. These are indicated by showing a byline such as:
@ -1720,6 +1809,9 @@ extraships:Severus Snape/Hermione Granger
website_encodings:Windows-1252,utf8
[base_otw]
use_basic_cache:true
[bloodshedverse.com]
use_basic_cache:true
## website encoding(s) In theory, each website reports the character
@ -1842,7 +1934,7 @@ make_linkhtml_entries:translators,betas
## For most sites, 'category' is the fandom, but fanfics.me has
## fandoms and a separate category. By making it configurable, users
## can change it.
include_in_category:fandoms
include_in_category:category,fandoms
[fanfictalk.com]
use_basic_cache:true
@ -1920,6 +2012,9 @@ use_basic_cache:true
## browser cache will only be used if use_browser_cache:true and ONLY
## for a few sites. Requires a browser_cache_path set in
## [defaults].
## This configurable also accept 'directimages' if you want to ignore
## the BrowserCache for downloading images even if use_browser_cache_only is true.
## This is useful for sites that deliver images with a no-cache attribute.
#use_browser_cache:false
## use_browser_cache_only:true prevents FFF from falling through to
@ -1941,7 +2036,7 @@ add_to_output_css:
white-space: pre-wrap;
}
extra_valid_entries:dedication,authorcomment,likes,follows,reviews,numCollections,pages,numAwards,classification
extra_valid_entries:dedication,authorcomment,likes,follows,reviews,numCollections,pages,numAwards,classification,universe
dedication_label:Dedication
authorcomment_label:Author Comment
@ -1953,6 +2048,7 @@ pages_label:Pages
numAwards_label:Awards
awards_label:Awards
classification_label:FBN Category
universe_label:Universe
add_to_wide_titlepage_entries:,dedication,authorcomment
add_to_comma_entries:,likes,follows,reviews,numAwards,numCollections
@ -1960,6 +2056,22 @@ add_to_comma_entries:,likes,follows,reviews,numAwards,numCollections
datePublished_format:%%Y-%%m-%%d %%H:%%M
dateUpdated_format:%%Y-%%m-%%d %%H:%%M
## Ficbook chapters can include headnotes and footnotes. We've
## traditionally included them all in the chapter text, but this
## allows you to customize which you include. Copy this parameter to
## your personal.ini and list the ones you don't want.
#exclude_notes:headnotes,footnotes
[ficbook.net:txt]
## ficbook uses CSS white-space: pre-wrap instead of tags for
## paragraph breaks. This doesn't carry over to txt output. This
## site-specific feature replaces \n in chapter text and
## headnotes,footnotes only. Only applied if the known white-space
## classes are present and removes those classes.
## Can also be used in [ficbook.net] or [ficbook.net:epub] if you want
## to download epub and convert to text.
replace_text_formatting:true
[fiction.live]
## Recommended if you include images, fiction.live tends to have many
## duplicated images.
@ -1989,6 +2101,9 @@ likes_label:Likes
reader_input_label:Reader Input
keep_in_order_tags:true
# Choose whether to include Appendix chapters
include_appendices:true
add_to_keep_html_attrs:,style
add_to_output_css:
@ -2037,7 +2152,9 @@ datePublished_format:%%Y-%%m-%%d %%H:%%M:%%S
[fictionmania.tv]
use_basic_cache:true
slow_down_sleep_time:10
website_encodings:ISO-8859-1,auto
user_agent:
## Extra metadata that this adapter knows about. See [archiveofourown.org]
## for examples of how to use them.
@ -2068,66 +2185,6 @@ use_basic_cache:true
#username:YourName
#password:yourpassword
[finestories.com]
use_basic_cache:true
## Some sites require login (or login for some rated stories) The
## program can prompt you, or you can save it in config. In
## commandline version, this should go in your personal.ini, not
## defaults.ini.
## finestories.com has started requiring login by email rather than
## pen name.
#username:youremail@yourdomain.dom
#password:yourpassword
## dateUpdated/datePublished don't usually have time, but they do on this site.
## http://docs.python.org/library/datetime.html#strftime-strptime-behavior
## Note that ini format requires % to be escaped as %%.
dateUpdated_format:%%Y-%%m-%%d %%H:%%M:%%S
datePublished_format:%%Y-%%m-%%d %%H:%%M:%%S
## Clear FanFiction from defaults, site is original fiction.
extratags:
extra_valid_entries:size,universe,universeUrl,universeHTML,sitetags,notice,codes,score
#extra_titlepage_entries:size,universeHTML,sitetags,notice,score
include_in_codes:sitetags
## adds to include_subject_tags instead of replacing it.
#extra_subject_tags:sitetags
size_label:Size
universe_label:Universe
universeUrl_label:Universe URL
universeHTML_label:Universe
sitetags_label:Site Tags
notice_label:Notice
score_label:Score
## Assume entryUrl, apply to "<a class='%slink' href='%s'>%s</a>" to
## make entryHTML.
make_linkhtml_entries:universe
## finestories.com stories can be in a series and/or a universe. By
## default, series will be populated with the universe if there is
## universe but not series.
universe_as_series: true
## some sites include images that we don't ever want becoming the
## cover image. This lets you exclude them.
cover_exclusion_regexp:/css/bir.png
## This site uses shortened title chapters in chapter lists. When set
## true, this will inject the site's full length chapter title into
## the chapter text in a smaller h4 tag.
#inject_chapter_title:false
## append_datepublished_to_storyurl literally appends
## datePublished(-%Y-%m-%d) to storyUrl. This is an ugly kludge to
## (hopefully) help address the site's unfortunately habit of
## *reusing* storyId numbers. Off by default to *not* cause weirdness
## for those not expecting it.
#append_datepublished_to_storyurl:false
[forum.questionablequesting.com]
## see [base_xenforoforum]
@ -2138,6 +2195,13 @@ cover_exclusion_regexp:/css/bir.png
#username:YourName
#password:yourpassword
## forum.questionablequesting.com allows a larger maximum
## threadmarks_per_page than other XF2 sites.
threadmarks_per_page:400
## forum.questionablequesting.com shows more posts per reader page than other XF2 sites.
reader_posts_per_page:30
[forums.spacebattles.com]
## see [base_xenforoforum]
@ -2248,20 +2312,33 @@ extracategories:Lois & Clark: The New Adventures of Superman
[literotica.com]
use_basic_cache:true
slow_down_sleep_time:1
user_agent:
extra_valid_entries:eroticatags,averrating
eroticatags_label:Erotica Tags
averrating_label:Average Rating
extra_titlepage_entries:eroticatags,averrating
## Chapters can be in different categories. Default to not using all
## to be consistent with previous version.
chapter_categories_use_all: false
## For multiple chapter stories, attempt to clean up the chapter title. This will
## remove the story title and change "Ch. 01" to "Chapter 1", "Pt. 01" to "Part 1"
## or just use the text. If this can't be done, the full title is used.
clean_chapter_titles: false
#clean_chapter_titles: false
## For stories, collect tags from individual chapter pages in addition to the
## series page tags. This allows collection of tags beyond the top 10 on the series but
## if the author updates tags on a chapter and not the series, those tags may persist even if
## the chapter is not fetched during an update.
## Default is false to maintain previous behavior.
#tags_from_chapters: false
## For multi-chapter stories (series), use the chapter approval dates for datePublished
## and dateUpdated instead of the series metadata dates. This provides more accurate dates
## based on actual posting dates rather than just when the series metadata changes. This
## method can provide wildly different dates if chapters were written long before being
## approved, if chapters are approved out of order, or if the works were approved/updated
## before literotica's current series system was implemented.
## Default is false to maintain previous behavior.
#dates_from_chapters: false
## Some stories mistakenly include 'Ch' or 'Pt' at the end of the
## story title. Appears to be a site bug or common author error. Copy
@ -2270,7 +2347,13 @@ clean_chapter_titles: false
# title=> (Ch|Pt)$=>
## Add the chapter description at the start of each chapter.
description_in_chapter: false
#description_in_chapter: false
## Author's stories are now hidden behind 'Show More' button.
## This option will attemph to fetch the stories.
#fetch_stories_from_api: true
#include_chapter_descriptions_in_summary: false
## Clear FanFiction from defaults, site is original fiction.
extratags:Erotica
@ -2322,7 +2405,7 @@ averrating_label:Average Rating
## Clear FanFiction from defaults, site is original fiction.
extratags:
slow_down_sleep_time:2
slow_down_sleep_time:5
[novelonlinefull.com]
use_basic_cache:true
@ -2445,6 +2528,11 @@ use_basic_cache:true
#username:youremail@yourdomain.dom
#password:yourpassword
## In order to see some protected stories, login is required,
## but not indicated anyway we can detect. Requires valid
## username and password when true.
#always_login:true
## dateUpdated/datePublished don't usually have time, but they do on this site.
## http://docs.python.org/library/datetime.html#strftime-strptime-behavior
## Note that ini format requires % to be escaped as %%.
@ -2576,7 +2664,8 @@ extracategories:Buffy the Vampire Slayer
## series04,series04Url etc.
extra_valid_entries:fandoms, freeformtags, freefromtags,
ao3categories, comments, chapterslashtotal, chapterstotal, kudos,
hits, bookmarks, collections, byline, bookmarked, bookmarktags,
hits, bookmarks, collections, collectionsUrl, collectionsHTML,
byline, bookmarked, bookmarktags,
bookmarksummary, bookmarkprivate, bookmarkrec, subscribed,
markedforlater, restricted, series00, series01, series02, series03,
series00Url, series01Url, series02Url, series03Url, series00HTML,
@ -2591,6 +2680,7 @@ chapterstotal_label:Total Chapters
kudos_label:Kudos
hits_label:Hits
collections_label:Collections
collectionsHTML_label:Collections
## Count of bookmarks on story by all users
bookmarks_label:Bookmarks
## Tags & Summary from *your* bookmark on the story. Only collected
@ -2608,21 +2698,25 @@ series01HTML_label:Additional Series
series02HTML_label:Additional Series
series03HTML_label:Additional Series
## have to keep in order for name and URL to line up.
keep_in_order_collections:true
keep_in_order_collectionsUrl:true
## Assume entryUrl, apply to "<a class='%slink' href='%s'>%s</a>" to
## make entryHTML.
make_linkhtml_entries:series00,series01,series02,series03
make_linkhtml_entries:series00,series01,series02,series03,collections
## OTW doesn't have anything it calls 'genre'. The adapter used to be
## hardcoded to include the site specific metadata freeformtags &
## ao3categories in the standard metadata field genre. By making it
## configurable, users can change it.
include_in_genre: freeformtags, ao3categories
include_in_genre: genre, freeformtags, ao3categories
## OTW uses the word 'category' differently than most sites. The
## adapter used to be hardcoded to include the site specific metadata
## fandom in the standard metadata field category. By making it
## configurable, users can change it.
include_in_category:fandoms
include_in_category:category,fandoms
## freeformtags was previously typo'ed as freefromtags. This way,
## freefromtags will still work for people who've used it.
@ -2706,6 +2800,11 @@ slow_down_sleep_time:1
#username:youremail@yourdomain.dom
#password:yourpassword
## In order to see some protected stories, login is required,
## but not indicated anyway we can detect. Requires valid
## username and password when true.
#always_login:true
## dateUpdated/datePublished don't usually have time, but they do on this site.
## http://docs.python.org/library/datetime.html#strftime-strptime-behavior
## Note that ini format requires % to be escaped as %%.
@ -2755,6 +2854,71 @@ cover_exclusion_regexp:/css/bir.png
## for those not expecting it.
#append_datepublished_to_storyurl:false
[storyroom.com]
use_basic_cache:true
## Some sites require login (or login for some rated stories) The
## program can prompt you, or you can save it in config. In
## commandline version, this should go in your personal.ini, not
## defaults.ini.
## storyroom.com has started requiring login by email rather than
## pen name.
#username:youremail@yourdomain.dom
#password:yourpassword
## In order to see some protected stories, login is required,
## but not indicated anyway we can detect. Requires valid
## username and password when true.
#always_login:true
## dateUpdated/datePublished don't usually have time, but they do on this site.
## http://docs.python.org/library/datetime.html#strftime-strptime-behavior
## Note that ini format requires % to be escaped as %%.
dateUpdated_format:%%Y-%%m-%%d %%H:%%M:%%S
datePublished_format:%%Y-%%m-%%d %%H:%%M:%%S
## Clear FanFiction from defaults, site is original fiction.
extratags:
extra_valid_entries:size,universe,universeUrl,universeHTML,sitetags,notice,codes,score
#extra_titlepage_entries:size,universeHTML,sitetags,notice,score
include_in_codes:sitetags
## adds to include_subject_tags instead of replacing it.
#extra_subject_tags:sitetags
size_label:Size
universe_label:Universe
universeUrl_label:Universe URL
universeHTML_label:Universe
sitetags_label:Site Tags
notice_label:Notice
score_label:Score
## Assume entryUrl, apply to "<a class='%slink' href='%s'>%s</a>" to
## make entryHTML.
make_linkhtml_entries:universe
## storyroom.com stories can be in a series and/or a universe. By
## default, series will be populated with the universe if there is
## universe but not series.
universe_as_series: true
## some sites include images that we don't ever want becoming the
## cover image. This lets you exclude them.
cover_exclusion_regexp:/css/bir.png
## This site uses shortened title chapters in chapter lists. When set
## true, this will inject the site's full length chapter title into
## the chapter text in a smaller h4 tag.
#inject_chapter_title:false
## append_datepublished_to_storyurl literally appends
## datePublished(-%Y-%m-%d) to storyUrl. This is an ugly kludge to
## (hopefully) help address the site's unfortunately habit of
## *reusing* storyId numbers. Off by default to *not* cause weirdness
## for those not expecting it.
#append_datepublished_to_storyurl:false
[superlove.sayitditto.net]
## This is a OTW-archive site. Note that ao3categories is still used,
## but labeled "superlove Categories".
@ -2807,7 +2971,8 @@ cover_exclusion_regexp:/css/bir.png
## series04,series04Url etc.
extra_valid_entries:fandoms, freeformtags, freefromtags,
ao3categories, comments, chapterslashtotal, chapterstotal, kudos,
hits, bookmarks, collections, byline, bookmarked, bookmarktags,
hits, bookmarks, collections, collectionsUrl, collectionsHTML,
byline, bookmarked, bookmarktags,
bookmarksummary, bookmarkprivate, bookmarkrec, subscribed,
markedforlater, restricted, series00, series01, series02, series03,
series00Url, series01Url, series02Url, series03Url, series00HTML,
@ -2822,6 +2987,7 @@ chapterstotal_label:Total Chapters
kudos_label:Kudos
hits_label:Hits
collections_label:Collections
collectionsHTML_label:Collections
## Count of bookmarks on story by all users
bookmarks_label:Bookmarks
## Tags & Summary from *your* bookmark on the story. Only collected
@ -2839,21 +3005,25 @@ series01HTML_label:Additional Series
series02HTML_label:Additional Series
series03HTML_label:Additional Series
## have to keep in order for name and URL to line up.
keep_in_order_collections:true
keep_in_order_collectionsUrl:true
## Assume entryUrl, apply to "<a class='%slink' href='%s'>%s</a>" to
## make entryHTML.
make_linkhtml_entries:series00,series01,series02,series03
make_linkhtml_entries:series00,series01,series02,series03,collections
## OTW doesn't have anything it calls 'genre'. The adapter used to be
## hardcoded to include the site specific metadata freeformtags &
## ao3categories in the standard metadata field genre. By making it
## configurable, users can change it.
include_in_genre: freeformtags, ao3categories
include_in_genre: genre, freeformtags, ao3categories
## OTW uses the word 'category' differently than most sites. The
## adapter used to be hardcoded to include the site specific metadata
## fandom in the standard metadata field category. By making it
## configurable, users can change it.
include_in_category:fandoms
include_in_category:category,fandoms
## freeformtags was previously typo'ed as freefromtags. This way,
## freefromtags will still work for people who've used it.
@ -2949,7 +3119,7 @@ extratags:
## here to build up composite metadata entries.
extra_valid_entries: fullgenre, biggenre, smallgenre,
imprint, warningtags, freeformtags, comments, reviews,
imprint, freeformtags, comments, reviews,
bookmarks, ratingpoints, overallpoints, bookmarked,
bookmarkcategory, bookmarkmemo, bookmarkprivate, subscribed
## Genres are only present on general stories.
@ -2965,14 +3135,6 @@ smallgenre_label:小ジャンル
## ムーンライトノベルズ - female demographic
## ミッドナイトノベルズ - other
imprint_label:掲載サイト
## Warnings are required flags but are displayed as tags
## R15 - some adult stories have this anyway
## ボーイズラブ - boy's love
## ガールズラブ - girl's love
## 残酷な描写あり - graphic depictions (of violence, bullying, etc)
## 異世界転生 - reincarnation in another world
## 異世界転移 - transmigration to another world
warningtags_label:必須キーワード
freeformtags_label:キーワード
comments_label:感想
reviews_label:レビュー
@ -2990,16 +3152,14 @@ bookmarkmemo_label:ブックマークメモ
bookmarkprivate_label:非公開ブックマーク
subscribed_label:更新通知
include_in_warnings: warningtags
include_in_genre: fullgenre
#include_in_genre: biggenre, smallgenre
include_in_genre: genre, fullgenre
#include_in_genre: genre, biggenre, smallgenre
## adds to titlepage_entries instead of replacing it.
#extra_titlepage_entries: fullgenre,biggenre,smallgenre,imprint,warningtags,freeformtags,comments,reviews,bookmarks,ratingpoints,overallpoints,bookmarked,bookmarkcategory,bookmarkmemo,bookmarkprivate,subscribed
#extra_titlepage_entries: fullgenre,biggenre,smallgenre,imprint,freeformtags,comments,reviews,bookmarks,ratingpoints,overallpoints,bookmarked,bookmarkcategory,bookmarkmemo,bookmarkprivate,subscribed
## adds to include_subject_tags instead of replacing it.
#extra_subject_tags: warningtags,freeformtags
#extra_subject_tags: freeformtags
## syosetu.com allows authors to group chapters (episodes) by titled sections.
## true prepends the section title to every episode title.
@ -3190,7 +3350,8 @@ extracategories:Star Trek
## series04,series04Url etc.
extra_valid_entries:fandoms, freeformtags, freefromtags,
ao3categories, comments, chapterslashtotal, chapterstotal, kudos,
hits, bookmarks, collections, byline, bookmarked, bookmarktags,
hits, bookmarks, collections, collectionsUrl, collectionsHTML,
byline, bookmarked, bookmarktags,
bookmarksummary, bookmarkprivate, bookmarkrec, subscribed,
markedforlater, restricted, series00, series01, series02, series03,
series00Url, series01Url, series02Url, series03Url, series00HTML,
@ -3205,6 +3366,7 @@ chapterstotal_label:Total Chapters
kudos_label:Kudos
hits_label:Hits
collections_label:Collections
collectionsHTML_label:Collections
## Count of bookmarks on story by all users
bookmarks_label:Bookmarks
## Tags & Summary from *your* bookmark on the story. Only collected
@ -3222,21 +3384,25 @@ series01HTML_label:Additional Series
series02HTML_label:Additional Series
series03HTML_label:Additional Series
## have to keep in order for name and URL to line up.
keep_in_order_collections:true
keep_in_order_collectionsUrl:true
## Assume entryUrl, apply to "<a class='%slink' href='%s'>%s</a>" to
## make entryHTML.
make_linkhtml_entries:series00,series01,series02,series03
make_linkhtml_entries:series00,series01,series02,series03,collections
## OTW doesn't have anything it calls 'genre'. The adapter used to be
## hardcoded to include the site specific metadata freeformtags &
## ao3categories in the standard metadata field genre. By making it
## configurable, users can change it.
include_in_genre: freeformtags, ao3categories
include_in_genre: genre, freeformtags, ao3categories
## OTW uses the word 'category' differently than most sites. The
## adapter used to be hardcoded to include the site specific metadata
## fandom in the standard metadata field category. By making it
## configurable, users can change it.
include_in_category:fandoms
include_in_category:category,fandoms
## freeformtags was previously typo'ed as freefromtags. This way,
## freefromtags will still work for people who've used it.
@ -3301,6 +3467,7 @@ slow_down_sleep_time:2
#datechapter_format:%%Y-%%m-%%d
[www.alternatehistory.com]
## Note this is NOT the same as althistory.com
## see [base_xenforoforum]
## Some sites require login (or login for some rated stories) The
@ -3310,6 +3477,19 @@ slow_down_sleep_time:2
#username:YourName
#password:yourpassword
## alternatehistory.com allows a smaller maximum threadmarks_per_page
## than other XF2 sites.
threadmarks_per_page:50
## Using cloudscraper can satisfy the first couple levels of
## Cloudflare bot-proofing, but not all levels. Older versions of
## OpenSSL will also raise problems, so versions of Calibre older than
## v5 will probably fail. Only a few sites are configured with
## use_cloudscraper:true by default, but it can be applied in other
## sites' ini sections. user_agent setting is ignored when
## use_cloudscraper:true
use_cloudscraper:true
[www.aneroticstory.com]
use_basic_cache:true
## Some sites do not require a login, but do require the user to
@ -3353,15 +3533,23 @@ upvotes_label:Upvotes
subscribers_label:Subscribers
views_label:Views
include_in_category:tags
include_in_category:category,tags
#extra_titlepage_entries:upvotes,subscribers,views
## This site uses shortened title chapters in chapter lists. When set
## true, this will inject the site's full length chapter title into
## the chapter text in a smaller h4 tag.
## the chapter text in a smaller h3 tag.
#inject_chapter_title:false
## This site allows a specific image per chapter. Many (most?)
## stories have the same image for every chapter, so it was originally
## not included. By inject_chapter_image:true you can include the
## chapter images. If both inject_chapter_title and
## inject_chapter_image are true, the title will appear above the
## image.
#inject_chapter_image:false
## This website removes certain HTML tags and portions of the story
## from subscriber-only stories. It is strongly recommended to turn
## this option on. This will automatically subscribe you to such
@ -3435,7 +3623,8 @@ keep_style_attr: false
## series04,series04Url etc.
extra_valid_entries:fandoms, freeformtags, freefromtags,
ao3categories, comments, chapterslashtotal, chapterstotal, kudos,
hits, bookmarks, collections, byline, bookmarked, bookmarktags,
hits, bookmarks, collections, collectionsUrl, collectionsHTML,
byline, bookmarked, bookmarktags,
bookmarksummary, bookmarkprivate, bookmarkrec, subscribed,
markedforlater, restricted, series00, series01, series02, series03,
series00Url, series01Url, series02Url, series03Url, series00HTML,
@ -3450,6 +3639,7 @@ chapterstotal_label:Total Chapters
kudos_label:Kudos
hits_label:Hits
collections_label:Collections
collectionsHTML_label:Collections
## Count of bookmarks on story by all users
bookmarks_label:Bookmarks
## Tags & Summary from *your* bookmark on the story. Only collected
@ -3467,21 +3657,25 @@ series01HTML_label:Additional Series
series02HTML_label:Additional Series
series03HTML_label:Additional Series
## have to keep in order for name and URL to line up.
keep_in_order_collections:true
keep_in_order_collectionsUrl:true
## Assume entryUrl, apply to "<a class='%slink' href='%s'>%s</a>" to
## make entryHTML.
make_linkhtml_entries:series00,series01,series02,series03
make_linkhtml_entries:series00,series01,series02,series03,collections
## OTW doesn't have anything it calls 'genre'. The adapter used to be
## hardcoded to include the site specific metadata freeformtags &
## ao3categories in the standard metadata field genre. By making it
## configurable, users can change it.
include_in_genre: freeformtags, ao3categories
include_in_genre: genre, freeformtags, ao3categories
## OTW uses the word 'category' differently than most sites. The
## adapter used to be hardcoded to include the site specific metadata
## fandom in the standard metadata field category. By making it
## configurable, users can change it.
include_in_category:fandoms
include_in_category:category,fandoms
## freeformtags was previously typo'ed as freefromtags. This way,
## freefromtags will still work for people who've used it.
@ -3793,6 +3987,9 @@ dateUpdated_format:%%Y-%%m-%%d %%H:%%M:%%S
[www.fimfiction.net]
use_basic_cache:true
slow_down_sleep_time: 2
## Some sites require login (or login for some rated stories) The
## program can prompt you, or you can save it in config. In
## commandline version, this should go in your personal.ini, not
@ -3826,6 +4023,14 @@ fix_fimf_blockquotes:true
## chapter text.
#include_author_notes:false
## This option aims to allow easy fetching of the stories on the
## bookshelf and exclude stories that are not in there.
## There are 3 options:
## false - fetch the only one page of stories provided by the user
## true - fetch all the stories from the point provided by the user
## legacy - fetch the stories the old way.
#scrape_bookshelf: false
## some sites include images that we don't ever want becoming the
## cover image. This lets you exclude them.
cover_exclusion_regexp:/images/emoticons/
@ -4102,6 +4307,12 @@ extracategories:Psych
use_basic_cache:true
extra_valid_entries:stars
## royalroad is a little unusual--it doesn't require user/pass, but the site
## keeps track of which chapters you've read. This way, on download,
## it thinks you're up to date.
#username:YourName
#password:yourpassword
#add_to_extra_titlepage_entries:,stars
## some sites include images that we don't ever want becoming the
@ -4149,6 +4360,9 @@ add_to_titlepage_entries:,views, averageWords, fandoms
## parameter to your personal.ini and list the ones you don't want.
#exclude_notes:authornotes,newsboxes,spoilers,footnotes
## for site is rate limiting
slow_down_sleep_time:5
[www.siye.co.uk]
use_basic_cache:true
## Site dedicated to these categories/characters/ships
@ -4159,6 +4373,13 @@ extraships:Harry Potter/Ginny Weasley
website_encodings:Windows-1252,utf8
[www.spiritfanfiction.com]
## Some sites require login (or login for some rated stories) The
## program can prompt you, or you can save it in config. In
## commandline version, this should go in your personal.ini, not
## defaults.ini.
#username:YourName
#password:yourpassword
use_basic_cache:true
extra_valid_entries: freeformtags,bookmarks,comments,hits,kudos
@ -4207,15 +4428,16 @@ extracategories:Buffy: The Vampire Slayer
extracharacters:Buffy, Spike
extraships:Spike/Buffy
[www.swi.org.ru]
use_basic_cache:true
[www.the-sietch.com]
## see [base_xenforoforum]
## the-sietch.com shows more posts per reader page than other XF sites.
reader_posts_per_page:15
## the-sietch.com allows a smaller maximum threadmarks_per_page than
## other XF2 sites.
threadmarks_per_page:50
[www.thedelphicexpanse.com]
## Site dedicated to these categories/characters/ships
extracategories:Star Trek: Enterprise
@ -4336,11 +4558,3 @@ extracharacters:Wolverine,Rogue
website_encodings:Windows-1252,utf8
[www.wuxiaworld.xyz]
use_basic_cache:true
## Was wuxiaworld.co
## Note that wuxiaworld.co != wuxiaworld.com
## When dedup_order_chapter_list:true, use a heuristic algorithm
## specific to wuxiaworld.xyz order and dedup chapters.
dedup_order_chapter_list:false

View file

@ -126,6 +126,7 @@ default_prefs['suppressauthorsort'] = False
default_prefs['suppresstitlesort'] = False
default_prefs['authorcase'] = False
default_prefs['titlecase'] = False
default_prefs['seriescase'] = False
default_prefs['setanthologyseries'] = False
default_prefs['mark'] = False
default_prefs['mark_success'] = True
@ -197,6 +198,11 @@ default_prefs['auto_reject_from_email'] = False
default_prefs['update_existing_only_from_email'] = False
default_prefs['download_from_email_immediately'] = False
#default_prefs['single_proc_jobs'] = True # setting and code removed
default_prefs['site_split_jobs'] = True
default_prefs['reconsolidate_jobs'] = True
def set_library_config(library_config,db,setting=PREFS_KEY_SETTINGS):
db.prefs.set_namespaced(PREFS_NAMESPACE,
setting,

File diff suppressed because it is too large Load diff

File diff suppressed because it is too large Load diff

File diff suppressed because it is too large Load diff

File diff suppressed because it is too large Load diff

File diff suppressed because it is too large Load diff

File diff suppressed because it is too large Load diff

File diff suppressed because it is too large Load diff

File diff suppressed because it is too large Load diff

File diff suppressed because it is too large Load diff

File diff suppressed because it is too large Load diff

File diff suppressed because it is too large Load diff

File diff suppressed because it is too large Load diff

File diff suppressed because it is too large Load diff

File diff suppressed because it is too large Load diff

File diff suppressed because it is too large Load diff

File diff suppressed because it is too large Load diff

File diff suppressed because it is too large Load diff

File diff suppressed because it is too large Load diff

File diff suppressed because it is too large Load diff

File diff suppressed because it is too large Load diff

View file

@ -33,6 +33,9 @@ from .. import configurable as configurable
from . import base_adapter
from . import base_efiction_adapter
from . import adapter_test1
from . import adapter_test2
from . import adapter_test3
from . import adapter_test4
from . import adapter_fanfictionnet
from . import adapter_fictionalleyarchiveorg
from . import adapter_fictionpresscom
@ -65,7 +68,7 @@ from . import adapter_fanfiktionde
from . import adapter_themasquenet
from . import adapter_pretendercentrecom
from . import adapter_darksolaceorg
from . import adapter_finestoriescom
from . import adapter_storyroomcom
from . import adapter_dracoandginnycom
from . import adapter_wolverineandroguecom
from . import adapter_thehookupzonenet
@ -103,7 +106,6 @@ from . import adapter_fireflyfansnet
from . import adapter_trekfanfictionnet
from . import adapter_wwwutopiastoriescom
from . import adapter_sinfuldreamscomunicornfic
from . import adapter_sinfuldreamscomwhisperedmuse
from . import adapter_sinfuldreamscomwickedtemptation
from . import adapter_asianfanficscom
from . import adapter_mttjustoncenet
@ -116,10 +118,8 @@ from . import adapter_alternatehistorycom
from . import adapter_wattpadcom
from . import adapter_novelonlinefullcom
from . import adapter_wwwnovelallcom
from . import adapter_wuxiaworldxyz
from . import adapter_hentaifoundrycom
from . import adapter_mugglenetfanfictioncom
from . import adapter_swiorgru
from . import adapter_fanficsme
from . import adapter_fanfictalkcom
from . import adapter_scifistoriescom
@ -140,6 +140,7 @@ from . import adapter_touchfluffytail
from . import adapter_spiritfanfictioncom
from . import adapter_superlove
from . import adapter_cfaa
from . import adapter_althistorycom
## This bit of complexity allows adapters to be added by just adding
## importing. It eliminates the long if/else clauses we used to need

View file

@ -68,9 +68,7 @@ class AdultFanFictionOrgAdapter(BaseSiteAdapter):
# The date format will vary from site to site.
# http://docs.python.org/library/datetime.html#strftime-strptime-behavior
self.dateformat = "%Y-%m-%d"
self.dateformat = "%B %d, %Y"
## Added because adult-fanfiction.org does send you to
## www.adult-fanfiction.org when you go to it and it also moves
@ -139,91 +137,45 @@ class AdultFanFictionOrgAdapter(BaseSiteAdapter):
def getSiteURLPattern(self):
return r'https?://(anime|anime2|bleach|books|buffy|cartoon|celeb|comics|ff|games|hp|inu|lotr|manga|movies|naruto|ne|original|tv|xmen|ygo|yuyu)\.adult-fanfiction\.org/story\.php\?no=\d+$'
##This is not working right now, so I'm commenting it out, but leaving it for future testing
## Login seems to be reasonably standard across eFiction sites.
#def needToLoginCheck(self, data):
##This adapter will always require a login
# return True
# <form name="login" method="post" action="">
# <div class="top">E-mail: <span id="sprytextfield1">
# <input name="email" type="text" id="email" size="20" maxlength="255" />
# <span class="textfieldRequiredMsg">Email is required.</span><span class="textfieldInvalidFormatMsg">Invalid E-mail.</span></span></div>
# <div class="top">Password: <span id="sprytextfield2">
# <input name="pass1" type="password" id="pass1" size="20" maxlength="32" />
# <span class="textfieldRequiredMsg">password is required.</span><span class="textfieldMinCharsMsg">Minimum 8 characters8.</span><span class="textfieldMaxCharsMsg">Exceeded 32 characters.</span></span></div>
# <div class="top"><br /> <input name="loginsubmittop" type="hidden" id="loginsubmit" value="TRUE" />
# <input type="submit" value="Login" />
# </div>
# </form>
##This is not working right now, so I'm commenting it out, but leaving it for future testing
#def performLogin(self, url, soup):
# params = {}
# if self.password:
# params['email'] = self.username
# params['pass1'] = self.password
# else:
# params['email'] = self.getConfig("username")
# params['pass1'] = self.getConfig("password")
# params['submit'] = 'Login'
# # copy all hidden input tags to pick up appropriate tokens.
# for tag in soup.findAll('input',{'type':'hidden'}):
# params[tag['name']] = tag['value']
# logger.debug("Will now login to URL {0} as {1} with password: {2}".format(url, params['email'],params['pass1']))
# d = self.post_request(url, params, usecache=False)
# d = self.post_request(url, params, usecache=False)
# soup = self.make_soup(d)
#if not (soup.find('form', {'name' : 'login'}) == None):
# logger.info("Failed to login to URL %s as %s" % (url, params['email']))
# raise exceptions.FailedToLogin(url,params['email'])
# return False
#else:
# return True
## Getting the chapter list and the meta data, plus 'is adult' checking.
def doExtractChapterUrlsAndMetadata(self, get_cover=True):
## You need to have your is_adult set to true to get this story
if not (self.is_adult or self.getConfig("is_adult")):
raise exceptions.AdultCheckRequired(self.url)
else:
d = self.post_request('https://www.adult-fanfiction.org/globals/ajax/age-verify.php', {"verify":"1"})
if "Age verified successfully" not in d:
raise exceptions.FailedToDownload("Failed to Verify Age: {0}".format(d))
url = self.url
logger.debug("URL: "+url)
data = self.get_request(url)
# logger.debug(data)
if "The dragons running the back end of the site can not seem to find the story you are looking for." in data:
raise exceptions.StoryDoesNotExist("{0}.{1} says: The dragons running the back end of the site can not seem to find the story you are looking for.".format(self.zone, self.getBaseDomain()))
soup = self.make_soup(data)
##This is not working right now, so I'm commenting it out, but leaving it for future testing
#self.performLogin(url, soup)
## Title
## Some of the titles have a backslash on the story page, but not on the Author's page
## So I am removing it from the title, so it can be found on the Author's page further in the code.
## Also, some titles may have extra spaces ' ', and the search on the Author's page removes them,
## so I have to here as well. I used multiple replaces to make sure, since I did the same below.
a = soup.find('a', href=re.compile(r'story.php\?no='+self.story.getMetadata('storyId')+"$"))
self.story.setMetadata('title',stripHTML(a).replace('\\','').replace(' ',' ').replace(' ',' ').replace(' ',' ').strip())
h1 = soup.find('h1')
# logger.debug("Title:%s"%h1)
self.story.setMetadata('title',stripHTML(h1).replace('\\','').replace(' ',' ').replace(' ',' ').replace(' ',' ').strip())
# Find the chapters:
chapters = soup.find('ul',{'class':'dropdown-content'})
for i, chapter in enumerate(chapters.findAll('a')):
self.add_chapter(chapter,self.url+'&chapter='+unicode(i+1))
# Find the chapters from first list only
chapters = soup.select_one('select.chapter-select').select('option')
for chapter in chapters:
self.add_chapter(chapter,self.url+'&chapter='+chapter['value'])
# Find authorid and URL from... author url.
a = soup.find('a', href=re.compile(r"profile.php\?no=\d+"))
a = soup.find('a', href=re.compile(r"profile.php\?id=\d+"))
if a == None:
# I know that the original author of fanficfare wants to always have metadata,
# but I posit that if the story is there, even if we can't get the metadata from the
@ -232,140 +184,56 @@ class AdultFanFictionOrgAdapter(BaseSiteAdapter):
self.story.setMetadata('authorUrl','https://www.adult-fanfiction.org')
self.story.setMetadata('author','Unknown')
logger.warning('There was no author found for the story... Metadata will not be retreived.')
self.setDescription(url,'>>>>>>>>>> No Summary Given <<<<<<<<<<')
self.setDescription(url,'>>>>>>>>>> No Summary Given, Unknown Author <<<<<<<<<<')
else:
self.story.setMetadata('authorId',a['href'].split('=')[1])
self.story.setMetadata('authorUrl',a['href'])
self.story.setMetadata('author',stripHTML(a))
##The story page does not give much Metadata, so we go to the Author's page
## The story page does not give much Metadata, so we go to
## the Author's page. Except it's actually a sub-req for
## list of author's stories for that subdomain
author_Url = 'https://members.{0}/load-user-stories.php?subdomain={1}&uid={2}'.format(
self.getBaseDomain(),
self.zone,
self.story.getMetadata('authorId'))
##Get the first Author page to see if there are multiple pages.
##AFF doesn't care if the page number is larger than the actual pages,
##it will continue to show the last page even if the variable is larger than the actual page
author_Url = '{0}&view=story&zone={1}&page=1'.format(self.story.getMetadata('authorUrl'), self.zone)
#author_Url = self.story.getMetadata('authorUrl')+'&view=story&zone='+self.zone+'&page=1'
##I'm resetting the author page to the zone for this story
self.story.setMetadata('authorUrl',author_Url)
logger.debug('Getting the author page: {0}'.format(author_Url))
logger.debug('Getting the load-user-stories page: {0}'.format(author_Url))
adata = self.get_request(author_Url)
if "The member you are looking for does not exist." in adata:
raise exceptions.StoryDoesNotExist("{0}.{1} says: The member you are looking for does not exist.".format(self.zone, self.getBaseDomain()))
#raise exceptions.StoryDoesNotExist(self.zone+'.'+self.getBaseDomain() +" says: The member you are looking for does not exist.")
none_found = "No stories found in this category."
if none_found in adata:
raise exceptions.StoryDoesNotExist("{0}.{1} says: {2}".format(self.zone, self.getBaseDomain(), none_found))
asoup = self.make_soup(adata)
# logger.debug(asoup)
##Getting the number of author pages
pages = 0
pagination=asoup.find('ul',{'class' : 'pagination'})
if pagination:
pages = pagination.findAll('li')[-1].find('a')
if not pages == None:
pages = pages['href'].split('=')[-1]
else:
pages = 0
story_card = asoup.select_one('div.story-card:has(a[href="{0}"])'.format(url))
# logger.debug(story_card)
storya = None
##If there is only 1 page of stories, check it to get the Metadata,
if pages == 0:
a = asoup.findAll('li')
for lc2 in a:
if lc2.find('a', href=re.compile(r'story.php\?no='+self.story.getMetadata('storyId')+"$")):
storya = lc2
break
## otherwise go through the pages
else:
page=1
i=0
while i == 0:
##We already have the first page, so if this is the first time through, skip getting the page
if page != 1:
author_Url = '{0}&view=story&zone={1}&page={2}'.format(self.story.getMetadata('authorUrl'), self.zone, unicode(page))
logger.debug('Getting the author page: {0}'.format(author_Url))
adata = self.get_request(author_Url)
##This will probably never be needed, since AFF doesn't seem to care what number you put as
## the page number, it will default to the last page, even if you use 1000, for an author
## that only hase 5 pages of stories, but I'm keeping it in to appease Saint Justin Case (just in case).
if "The member you are looking for does not exist." in adata:
raise exceptions.StoryDoesNotExist("{0}.{1} says: The member you are looking for does not exist.".format(self.zone, self.getBaseDomain()))
# we look for the li element that has the story here
asoup = self.make_soup(adata)
## Category
## I've only seen one category per story so far, but just in case:
for cat in story_card.select('div.story-card-category'):
# remove Category:, old code suggests Located: is also
# possible, so removing by <strong>
cat.find("strong").decompose()
self.story.addToList('category',stripHTML(cat))
a = asoup.findAll('li')
for lc2 in a:
if lc2.find('a', href=re.compile(r'story.php\?no='+self.story.getMetadata('storyId')+"$")):
i=1
storya = lc2
break
page = page + 1
if page > int(pages):
break
self.setDescription(url,story_card.select_one('div.story-card-description'))
##Split the Metadata up into a list
##We have to change the soup type to a string, then remove the newlines, and double spaces,
##then changes the <br/> to '-:-', which seperates the different elemeents.
##Then we strip the HTML elements from the string.
##There is also a double <br/>, so we have to fix that, then remove the leading and trailing '-:-'.
##They are always in the same order.
## EDIT 09/26/2016: Had some trouble with unicode errors... so I had to put in the decode/encode parts to fix it
liMetadata = unicode(storya).replace('\n','').replace('\r','').replace('\t',' ').replace(' ',' ').replace(' ',' ').replace(' ',' ')
liMetadata = stripHTML(liMetadata.replace(r'<br/>','-:-').replace('<!-- <br /-->','-:-'))
liMetadata = liMetadata.strip('-:-').strip('-:-').encode('utf-8')
for i, value in enumerate(liMetadata.decode('utf-8').split('-:-')):
if i == 0:
# The value for the title has been manipulated, so may not be the same as gotten at the start.
# I'm going to use the href from the storya retrieved from the author's page to determine if it is correct.
if storya.find('a', href=re.compile(r'story.php\?no='+self.story.getMetadata('storyId')+"$"))['href'] != url:
raise exceptions.StoryDoesNotExist('Did not find story in author story list: {0}'.format(author_Url))
elif i == 1:
##Get the description
self.setDescription(url,stripHTML(value.strip()))
else:
# the rest of the values can be missing, so instead of hardcoding the numbers, we search for them.
if 'Located :' in value:
self.story.setMetadata('category',value.replace(r'&gt;',r'>').replace(r'Located :',r'').strip())
elif 'Category :' in value:
# Get the Category
self.story.setMetadata('category',value.replace(r'&gt;',r'>').replace(r'Located :',r'').strip())
elif 'Content Tags :' in value:
# Get the Erotic Tags
value = stripHTML(value.replace(r'Content Tags :',r'')).strip()
for code in re.split(r'\s',value):
self.story.addToList('eroticatags',code)
elif 'Posted :' in value:
# Get the Posted Date
value = value.replace(r'Posted :',r'').strip()
if value.startswith('008'):
# It is unknown how the 200 became 008, but I'm going to change it back here
value = value.replace('008','200')
elif value.startswith('0000'):
# Since the date is showing as 0000,
# I'm going to put the memberdate here
value = asoup.find('div',{'id':'contentdata'}).find('p').get_text(strip=True).replace('Member Since','').strip()
self.story.setMetadata('datePublished', makeDate(stripHTML(value), self.dateformat))
elif 'Edited :' in value:
# Get the 'Updated' Edited date
# AFF has the time for the Updated date, and we only want the date,
# so we take the first 10 characters only
value = value.replace(r'Edited :',r'').strip()[0:10]
if value.startswith('008'):
# It is unknown how the 200 became 008, but I'm going to change it back here
value = value.replace('008','200')
self.story.setMetadata('dateUpdated', makeDate(stripHTML(value), self.dateformat))
elif value.startswith('0000') or '-00-' in value:
# Since the date is showing as 0000,
# or there is -00- in the date,
# I'm going to put the Published date here
self.story.setMetadata('dateUpdated', self.story.getMetadata('datPublished'))
else:
self.story.setMetadata('dateUpdated', makeDate(stripHTML(value), self.dateformat))
else:
# This catches the blank elements, and the Review and Dragon Prints.
# I am not interested in these, so do nothing
zzzzzzz=0
for tag in story_card.select('span.story-tag'):
self.story.addToList('eroticatags',stripHTML(tag))
## created/updates share formatting
for meta in story_card.select('div.story-card-meta-item span:last-child'):
meta = stripHTML(meta)
if 'Created: ' in meta:
meta = meta.replace('Created: ','')
self.story.setMetadata('datePublished', makeDate(meta, self.dateformat))
if 'Updated: ' in meta:
meta = meta.replace('Updated: ','')
self.story.setMetadata('dateUpdated', makeDate(meta, self.dateformat))
# grab the text for an individual chapter.
def getChapterText(self, url):
@ -373,10 +241,11 @@ class AdultFanFictionOrgAdapter(BaseSiteAdapter):
logger.debug('Getting chapter text from: %s' % url)
soup = self.make_soup(self.get_request(url))
chaptertag = soup.find('ul',{'class':'pagination'}).parent.parent.parent.findNextSibling('li')
chaptertag = soup.select_one('div.chapter-body')
if None == chaptertag:
raise exceptions.FailedToDownload("Error downloading Chapter: {0}! Missing required element!".format(url))
# Change td to a div.
chaptertag.name='div'
## chapter text includes a copy of story title, author,
## chapter title, & eroticatags specific to the chapter. Did
## before, too.
return self.utf8FromSoup(url,chaptertag)

View file

@ -0,0 +1,40 @@
# -*- coding: utf-8 -*-
# Copyright 2026 FanFicFare team
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
from __future__ import absolute_import
import re
from .base_xenforo2forum_adapter import BaseXenForo2ForumAdapter
def getClass():
return AltHistoryComAdapter
## NOTE: This is a different site than www.alternatehistory.com.
class AltHistoryComAdapter(BaseXenForo2ForumAdapter):
def __init__(self, config, url):
BaseXenForo2ForumAdapter.__init__(self, config, url)
# Each adapter needs to have a unique site abbreviation.
self.story.setMetadata('siteabbrev','ahc')
@staticmethod # must be @staticmethod, don't remove it.
def getSiteDomain():
# The site domain. Does have www here, if it uses it.
return 'althistory.com'

View file

@ -49,8 +49,21 @@ class ArchiveOfOurOwnOrgAdapter(BaseOTWAdapter):
return ['archiveofourown.org',
'archiveofourown.com',
'archiveofourown.net',
'archiveofourown.gay',
'download.archiveofourown.org',
'download.archiveofourown.com',
'download.archiveofourown.net',
'ao3.org',
]
def mod_url_request(self, url):
return url
def mod_url_request(self, url):
## add / to *not* replace media.archiveofourown.org
if self.getConfig("use_archive_transformativeworks_org",False):
return url.replace("/archiveofourown.org","/archive.transformativeworks.org")
elif self.getConfig("use_archiveofourown_gay",False):
return url.replace("/archiveofourown.org","/archiveofourown.gay")
else:
return url

View file

@ -92,7 +92,7 @@ class ASexStoriesComAdapter(BaseSiteAdapter):
self.story.setMetadata('title', title.string)
# Author
author = soup1.find('div',{'class':'story-info'}).findAll('div',{'class':'story-info-bl'})[1].find('a')
author = soup1.find('div',{'class':'story-info'}).find_all('div',{'class':'story-info-bl'})[1].find('a')
authorurl = author['href']
self.story.setMetadata('author', author.string)
self.story.setMetadata('authorUrl', authorurl)
@ -112,7 +112,7 @@ class ASexStoriesComAdapter(BaseSiteAdapter):
### add it before the rest of the pages, if any
self.add_chapter('1', self.url)
chapterTable = soup1.find('div',{'class':'pages'}).findAll('a')
chapterTable = soup1.find('div',{'class':'pages'}).find_all('a')
if chapterTable is not None:
# Multi-chapter story
@ -124,7 +124,7 @@ class ASexStoriesComAdapter(BaseSiteAdapter):
self.add_chapter(chapterTitle, chapterUrl)
rated = soup1.find('div',{'class':'story-info'}).findAll('div',{'class':'story-info-bl5'})[0].find('img')['title'].replace('- Rate','').strip()
rated = soup1.find('div',{'class':'story-info'}).find_all('div',{'class':'story-info-bl5'})[0].find('img')['title'].replace('- Rate','').strip()
self.story.setMetadata('rating',rated)
self.story.setMetadata('dateUpdated', makeDate('01/01/2001', '%m/%d/%Y'))

View file

@ -48,7 +48,7 @@ class AshwinderSycophantHexComAdapter(BaseSiteAdapter):
# normalized story URL.
self._setURL('http://' + self.getSiteDomain() + '/viewstory.php?sid='+self.story.getMetadata('storyId'))
self._setURL('https://' + self.getSiteDomain() + '/viewstory.php?sid='+self.story.getMetadata('storyId'))
# Each adapter needs to have a unique site abbreviation.
self.story.setMetadata('siteabbrev','asph')
@ -64,10 +64,10 @@ class AshwinderSycophantHexComAdapter(BaseSiteAdapter):
@classmethod
def getSiteExampleURLs(cls):
return "http://"+cls.getSiteDomain()+"/viewstory.php?sid=1234"
return "https://"+cls.getSiteDomain()+"/viewstory.php?sid=1234"
def getSiteURLPattern(self):
return re.escape("http://"+self.getSiteDomain()+"/viewstory.php?sid=")+r"\d+$"
return r"https?://"+re.escape(self.getSiteDomain()+"/viewstory.php?sid=")+r"\d+$"
## Login seems to be reasonably standard across eFiction sites.
def needToLoginCheck(self, data):
@ -92,7 +92,7 @@ class AshwinderSycophantHexComAdapter(BaseSiteAdapter):
params['intent'] = ''
params['submit'] = 'Submit'
loginUrl = 'http://' + self.getSiteDomain() + '/user.php'
loginUrl = 'https://' + self.getSiteDomain() + '/user.php'
logger.debug("Will now login to URL (%s) as (%s)" % (loginUrl,
params['penname']))
@ -130,20 +130,20 @@ class AshwinderSycophantHexComAdapter(BaseSiteAdapter):
# Find authorid and URL from... author url.
a = soup.find('a', href=re.compile(r"viewuser.php\?uid=\d+"))
self.story.setMetadata('authorId',a['href'].split('=')[1])
self.story.setMetadata('authorUrl','http://'+self.host+'/'+a['href'])
self.story.setMetadata('authorUrl','https://'+self.host+'/'+a['href'])
self.story.setMetadata('author',a.string)
asoup = self.make_soup(self.get_request(self.story.getMetadata('authorUrl')))
try:
# in case link points somewhere other than the first chapter
a = soup.findAll('option')[1]['value']
a = soup.find_all('option')[1]['value']
self.story.setMetadata('storyId',a.split('=',)[1])
url = 'http://'+self.host+'/'+a
url = 'https://'+self.host+'/'+a
soup = self.make_soup(self.get_request(url))
except:
pass
for info in asoup.findAll('table', {'width' : '100%', 'bordercolor' : re.compile(r'#')}):
for info in asoup.find_all('table', {'width' : '100%', 'bordercolor' : re.compile(r'#')}):
a = info.find('a', href=re.compile(r'viewstory.php\?sid='+self.story.getMetadata('storyId')+"$"))
if a != None:
self.story.setMetadata('title',stripHTML(a))
@ -151,13 +151,13 @@ class AshwinderSycophantHexComAdapter(BaseSiteAdapter):
# Find the chapters:
chapters=soup.findAll('a', href=re.compile(r'viewstory.php\?sid=\d+&i=1$'))
chapters=soup.find_all('a', href=re.compile(r'viewstory.php\?sid=\d+&i=1$'))
if len(chapters) == 0:
self.add_chapter(self.story.getMetadata('title'),url)
else:
for chapter in chapters:
# just in case there's tags, like <i> in chapter titles.
self.add_chapter(chapter,'http://'+self.host+'/'+chapter['href'])
self.add_chapter(chapter,'https://'+self.host+'/'+chapter['href'])
# eFiction sites don't help us out a lot with their meta data
@ -170,7 +170,7 @@ class AshwinderSycophantHexComAdapter(BaseSiteAdapter):
except:
return ""
cats = info.findAll('a',href=re.compile('categories.php'))
cats = info.find_all('a',href=re.compile('categories.php'))
for cat in cats:
self.story.addToList('category',cat.string)
@ -188,7 +188,7 @@ class AshwinderSycophantHexComAdapter(BaseSiteAdapter):
## <td><span class="sb"><b>Published:</b> 04/08/2007</td>
## one story had <b>Updated...</b> in the description. Restrict to sub-table
labels = info.find('table').findAll('b')
labels = info.find('table').find_all('b')
for labelspan in labels:
value = labelspan.nextSibling
label = stripHTML(labelspan)

View file

@ -147,7 +147,7 @@ class AsianFanFicsComAdapter(BaseSiteAdapter):
# Find authorid and URL from... author url.
mainmeta = soup.find('footer', {'class': 'main-meta'})
alist = mainmeta.find('span', string='Author(s)')
alist = alist.parent.findAll('a', href=re.compile(r"/profile/u/[^/]+"))
alist = alist.parent.find_all('a', href=re.compile(r"/profile/u/[^/]+"))
for a in alist:
self.story.addToList('authorId',a['href'].split('/')[-1])
self.story.addToList('authorUrl','https://'+self.host+a['href'])
@ -159,10 +159,10 @@ class AsianFanFicsComAdapter(BaseSiteAdapter):
chapters=soup.find('select',{'name':'chapter-nav'})
hrefattr=None
if chapters:
chapters=chapters.findAll('option')
chapters=chapters.find_all('option')
hrefattr='value'
else: # didn't find <select name='chapter-nav', look for alternative
chapters=soup.find('div',{'class':'widget--chapters'}).findAll('a')
chapters=soup.find('div',{'class':'widget--chapters'}).find_all('a')
hrefattr='href'
for index, chapter in enumerate(chapters):
if chapter.text != 'Foreword' and 'Collapse chapters' not in chapter.text:
@ -202,7 +202,7 @@ class AsianFanFicsComAdapter(BaseSiteAdapter):
# story tags
a = mainmeta.find('span',string='Tags')
if a:
tags = a.parent.findAll('a')
tags = a.parent.find_all('a')
for tag in tags:
self.story.addToList('tags', tag.text)
@ -230,7 +230,7 @@ class AsianFanFicsComAdapter(BaseSiteAdapter):
# upvote, subs, and views
a = soup.find('div',{'class':'title-meta'})
spans = a.findAll('span', recursive=False)
spans = a.find_all('span', recursive=False)
self.story.setMetadata('upvotes', re.search(r'\(([^)]+)', spans[0].find('span').text).group(1))
self.story.setMetadata('subscribers', re.search(r'\(([^)]+)', spans[1].find('span').text).group(1))
if len(spans) > 2: # views can be private
@ -270,11 +270,21 @@ class AsianFanFicsComAdapter(BaseSiteAdapter):
content = soup.find('div', {'id': 'user-submitted-body'})
if self.getConfig('inject_chapter_image'):
logger.debug("Injecting chapter image")
imgdiv = soup.select_one('div#bodyText div.bot-spacer')
if imgdiv:
content.insert(0, "\n")
content.insert(0, imgdiv)
content.insert(0, "\n")
if self.getConfig('inject_chapter_title'):
logger.debug("Injecting full-length chapter title")
title = soup.find('h1', {'id' : 'chapter-title'}).text
newTitle = soup.new_tag('h3')
newTitle.string = title
content.insert(0, "\n")
content.insert(0, newTitle)
content.insert(0, "\n")
return self.utf8FromSoup(url,content)

View file

@ -126,7 +126,7 @@ class BDSMLibraryComSiteAdapter(BaseSiteAdapter):
# Find the chapters:
# The update date is with the chapter links... so we will update it here as well
for chapter in soup.findAll('a', href=re.compile(r'/stories/chapter.php\?storyid='+self.story.getMetadata('storyId')+r"&chapterid=\d+$")):
for chapter in soup.find_all('a', href=re.compile(r'/stories/chapter.php\?storyid='+self.story.getMetadata('storyId')+r"&chapterid=\d+$")):
value = chapter.findNext('td').findNext('td').string.replace('(added on','').replace(')','').strip()
self.story.setMetadata('dateUpdated', makeDate(value, self.dateformat))
self.add_chapter(chapter,'https://'+self.getSiteDomain()+chapter['href'])
@ -134,11 +134,11 @@ class BDSMLibraryComSiteAdapter(BaseSiteAdapter):
# Get the MetaData
# Erotia Tags
tags = soup.findAll('a',href=re.compile(r'/stories/search.php\?selectedcode'))
tags = soup.find_all('a',href=re.compile(r'/stories/search.php\?selectedcode'))
for tag in tags:
self.story.addToList('eroticatags',tag.text)
for td in soup.findAll('td'):
for td in soup.find_all('td'):
if len(td.text)>0:
if 'Added on:' in td.text and '<table' not in unicode(td):
value = td.text.replace('Added on:','').strip()
@ -169,20 +169,20 @@ class BDSMLibraryComSiteAdapter(BaseSiteAdapter):
raise exceptions.FailedToDownload("Error downloading Chapter: {0}! Missing required element!".format(url))
#strip comments from soup
[comment.extract() for comment in chaptertag.findAll(string=lambda text:isinstance(text, Comment))]
[comment.extract() for comment in chaptertag.find_all(string=lambda text:isinstance(text, Comment))]
# BDSM Library basically wraps it's own html around the document,
# so we will be removing the script, title and meta content from the
# storyblock
for tag in chaptertag.findAll('head') + chaptertag.findAll('style') + chaptertag.findAll('title') + chaptertag.findAll('meta') + chaptertag.findAll('o:p') + chaptertag.findAll('link'):
for tag in chaptertag.find_all('head') + chaptertag.find_all('style') + chaptertag.find_all('title') + chaptertag.find_all('meta') + chaptertag.find_all('o:p') + chaptertag.find_all('link'):
tag.extract()
for tag in chaptertag.findAll('o:smarttagtype'):
for tag in chaptertag.find_all('o:smarttagtype'):
tag.name = 'span'
## I'm going to take the attributes off all of the tags
## because they usually refer to the style that we removed above.
for tag in chaptertag.findAll(True):
for tag in chaptertag.find_all(True):
tag.attrs = None
return self.utf8FromSoup(url,chaptertag)

View file

@ -157,9 +157,6 @@ class BloodshedverseComAdapter(BaseSiteAdapter):
self.story.addToList('warnings', warning)
elif key == 'Chapters':
self.story.setMetadata('numChapters', int(value))
elif key == 'Words':
# Apparently only numChapters need to be an integer for
# some strange reason. Remove possible ',' characters as to
@ -174,7 +171,7 @@ class BloodshedverseComAdapter(BaseSiteAdapter):
# ugly %p(am/pm) hack moved into makeDate so other sites can use it.
self.story.setMetadata('dateUpdated', date)
if self.story.getMetadata('rating') == 'NC-17' and not (self.is_adult or self.getConfig('is_adult')):
if self.story.getMetadataRaw('rating') == 'NC-17' and not (self.is_adult or self.getConfig('is_adult')):
raise exceptions.AdultCheckRequired(self.url)
def getChapterText(self, url):

View file

@ -116,7 +116,7 @@ class ChaosSycophantHexComAdapter(BaseSiteAdapter):
self.story.setMetadata('rating', rating)
# Find the chapters:
for chapter in soup.findAll('a', href=re.compile(r'viewstory.php\?sid='+self.story.getMetadata('storyId')+r"&chapter=\d+$")):
for chapter in soup.find_all('a', href=re.compile(r'viewstory.php\?sid='+self.story.getMetadata('storyId')+r"&chapter=\d+$")):
# just in case there's tags, like <i> in chapter titles.
self.add_chapter(chapter,'http://'+self.host+'/'+chapter['href']+addurl)
@ -134,7 +134,7 @@ class ChaosSycophantHexComAdapter(BaseSiteAdapter):
# <span class="label">Rated:</span> NC-17<br /> etc
labels = soup.findAll('span',{'class':'label'})
labels = soup.find_all('span',{'class':'label'})
value = labels[0].previousSibling
svalue = ""
@ -154,22 +154,22 @@ class ChaosSycophantHexComAdapter(BaseSiteAdapter):
self.story.setMetadata('numWords', value.split(' -')[0])
if 'Categories' in label:
cats = labelspan.parent.findAll('a',href=re.compile(r'browse.php\?type=categories'))
cats = labelspan.parent.find_all('a',href=re.compile(r'browse.php\?type=categories'))
for cat in cats:
self.story.addToList('category',cat.string)
if 'Characters' in label:
chars = labelspan.parent.findAll('a',href=re.compile(r'browse.php\?type=characters'))
chars = labelspan.parent.find_all('a',href=re.compile(r'browse.php\?type=characters'))
for char in chars:
self.story.addToList('characters',char.string)
if 'Genre' in label:
genres = labelspan.parent.findAll('a',href=re.compile(r'browse.php\?type=class&type_id=1'))
genres = labelspan.parent.find_all('a',href=re.compile(r'browse.php\?type=class&type_id=1'))
for genre in genres:
self.story.addToList('genre',genre.string)
if 'Warnings' in label:
warnings = labelspan.parent.findAll('a',href=re.compile(r'browse.php\?type=class&type_id=2'))
warnings = labelspan.parent.find_all('a',href=re.compile(r'browse.php\?type=class&type_id=2'))
for warning in warnings:
self.story.addToList('warnings',warning.string)
@ -194,7 +194,7 @@ class ChaosSycophantHexComAdapter(BaseSiteAdapter):
series_url = 'http://'+self.host+'/'+a['href']
seriessoup = self.make_soup(self.get_request(series_url))
storyas = seriessoup.findAll('a', href=re.compile(r'^viewstory.php\?sid=\d+$'))
storyas = seriessoup.find_all('a', href=re.compile(r'^viewstory.php\?sid=\d+$'))
i=1
for a in storyas:
if a['href'] == ('viewstory.php?sid='+self.story.getMetadata('storyId')):

View file

@ -88,8 +88,8 @@ class ChireadsComSiteAdapter(BaseSiteAdapter):
intro = stripHTML(info.select_one('.inform-inform-txt').span)
self.setDescription(self.url, intro)
for content in soup.findAll('div', {'id': 'content'}):
for a in content.findAll('a'):
for content in soup.find_all('div', {'id': 'content'}):
for a in content.find_all('a'):
self.add_chapter(a.get_text(), a['href'])

View file

@ -98,7 +98,7 @@ class ChosenTwoFanFicArchiveAdapter(BaseSiteAdapter):
## Title
## Some stories have a banner that has it's own a tag before the actual text title...
## so I'm checking the pagetitle div for all a tags that match the criteria, then taking the last.
a = soup.find('div',{'id':'pagetitle'}).findAll('a', href=re.compile(r'viewstory.php\?sid='+self.story.getMetadata('storyId')+"$"))[-1]
a = soup.find('div',{'id':'pagetitle'}).find_all('a', href=re.compile(r'viewstory.php\?sid='+self.story.getMetadata('storyId')+"$"))[-1]
self.story.setMetadata('title',stripHTML(a))
# Find authorid and URL from... author url.
@ -110,7 +110,7 @@ class ChosenTwoFanFicArchiveAdapter(BaseSiteAdapter):
self.story.setMetadata('author',a.string)
# Find the chapters:
for chapter in soup.findAll('a', href=re.compile(r'viewstory.php\?sid='+self.story.getMetadata('storyId')+r"&chapter=\d+$")):
for chapter in soup.find_all('a', href=re.compile(r'viewstory.php\?sid='+self.story.getMetadata('storyId')+r"&chapter=\d+$")):
# just in case there's tags, like <i> in chapter titles.
#self.add_chapter(chapter,'http://'+self.host+'/'+chapter['href'])
self.add_chapter(chapter,'https://{0}/{1}{2}'.format(self.host, chapter['href'],addURL))
@ -127,7 +127,7 @@ class ChosenTwoFanFicArchiveAdapter(BaseSiteAdapter):
return ""
# <span class="label">Rated:</span> NC-17<br /> etc
labels = soup.findAll('span',{'class':'label'})
labels = soup.find_all('span',{'class':'label'})
for labelspan in labels:
val = labelspan.nextSibling
value = unicode('')
@ -149,27 +149,27 @@ class ChosenTwoFanFicArchiveAdapter(BaseSiteAdapter):
self.story.setMetadata('numWords', stripHTML(value))
if 'Categories' in label:
cats = labelspan.parent.findAll('a',href=re.compile(r'browse.php\?type=categories'))
cats = labelspan.parent.find_all('a',href=re.compile(r'browse.php\?type=categories'))
for cat in cats:
self.story.addToList('category',cat.string)
if 'Characters' in label:
chars = labelspan.parent.findAll('a',href=re.compile(r'browse.php\?type=characters'))
chars = labelspan.parent.find_all('a',href=re.compile(r'browse.php\?type=characters'))
for char in chars:
self.story.addToList('characters',char.string)
if 'Genre' in label:
genres = labelspan.parent.findAll('a',href=re.compile(r'browse.php\?type=class&type_id=1')) # XXX
genres = labelspan.parent.find_all('a',href=re.compile(r'browse.php\?type=class&type_id=1')) # XXX
for genre in genres:
self.story.addToList('genre',genre.string)
if 'Pairing' in label:
ships = labelspan.parent.findAll('a',href=re.compile(r'browse.php\?type=class&type_id=4'))
ships = labelspan.parent.find_all('a',href=re.compile(r'browse.php\?type=class&type_id=4'))
for ship in ships:
self.story.addToList('ships',ship.string)
if 'Warnings' in label:
warnings = labelspan.parent.findAll('a',href=re.compile(r'browse.php\?type=class&type_id=2')) # XXX
warnings = labelspan.parent.find_all('a',href=re.compile(r'browse.php\?type=class&type_id=2')) # XXX
for warning in warnings:
self.story.addToList('warnings',warning.string)
@ -196,7 +196,7 @@ class ChosenTwoFanFicArchiveAdapter(BaseSiteAdapter):
seriessoup = self.make_soup(self.get_request(series_url))
# can't use ^viewstory...$ in case of higher rated stories with javascript href.
storyas = seriessoup.findAll('a', href=re.compile(r'viewstory.php\?sid=\d+'))
storyas = seriessoup.find_all('a', href=re.compile(r'viewstory.php\?sid=\d+'))
i=1
for a in storyas:
# this site has several links to each story.

View file

@ -95,7 +95,7 @@ class DokugaComAdapter(BaseSiteAdapter):
params['Submit'] = 'Submit'
# copy all hidden input tags to pick up appropriate tokens.
for tag in soup.findAll('input',{'type':'hidden'}):
for tag in soup.find_all('input',{'type':'hidden'}):
params[tag['name']] = tag['value']
loginUrl = 'http://' + self.getSiteDomain() + '/fanfiction'
@ -153,7 +153,7 @@ class DokugaComAdapter(BaseSiteAdapter):
self.story.setMetadata('title',stripHTML(a))
# Find the chapters:
chapters = soup.find('select').findAll('option')
chapters = soup.find('select').find_all('option')
if len(chapters)==1:
self.add_chapter(self.story.getMetadata('title'),'http://'+self.host+'/'+self.section+'/story/'+self.story.getMetadata('storyId')+'/1')
else:
@ -168,7 +168,7 @@ class DokugaComAdapter(BaseSiteAdapter):
asoup=asoup.find('div', {'id' : 'cb_tabid_52'}).find('div')
#grab the rest of the metadata from the author's page
for div in asoup.findAll('div'):
for div in asoup.find_all('div'):
nav=div.find('a', href=re.compile(r'/fanfiction/story/'+self.story.getMetadata('storyId')+"/1$"))
if nav != None:
break
@ -208,7 +208,7 @@ class DokugaComAdapter(BaseSiteAdapter):
else:
asoup=asoup.find('div', {'id' : 'maincol'}).find('div', {'class' : 'padding'})
for div in asoup.findAll('div'):
for div in asoup.find_all('div'):
nav=div.find('a', href=re.compile(r'/spark/story/'+self.story.getMetadata('storyId')+"/1$"))
if nav != None:
break

View file

@ -161,7 +161,7 @@ class DracoAndGinnyComAdapter(BaseSiteAdapter):
self.story.setMetadata('author',a.string)
# Find the chapters:
for chapter in soup.findAll('a', href=re.compile(r'viewstory.php\?sid='+self.story.getMetadata('storyId')+r"&chapter=\d+$")):
for chapter in soup.find_all('a', href=re.compile(r'viewstory.php\?sid='+self.story.getMetadata('storyId')+r"&chapter=\d+$")):
# just in case there's tags, like <i> in chapter titles.
self.add_chapter(chapter,'http://'+self.host+'/'+chapter['href']+addurl)
@ -181,13 +181,13 @@ class DracoAndGinnyComAdapter(BaseSiteAdapter):
self.setDescription(url,content.find('blockquote'))
for genre in content.findAll('a',href=re.compile(r'browse.php\?type=class&type_id=1')):
for genre in content.find_all('a',href=re.compile(r'browse.php\?type=class&type_id=1')):
self.story.addToList('genre',genre.string)
for warning in content.findAll('a',href=re.compile(r'browse.php\?type=class&type_id=2')):
for warning in content.find_all('a',href=re.compile(r'browse.php\?type=class&type_id=2')):
self.story.addToList('warnings',warning.string)
labels = content.findAll('b')
labels = content.find_all('b')
for labelspan in labels:
value = labelspan.nextSibling
@ -208,22 +208,22 @@ class DracoAndGinnyComAdapter(BaseSiteAdapter):
self.story.setMetadata('rating', value)
if 'Categories' in label:
cats = labelspan.parent.findAll('a',href=re.compile(r'browse.php\?type=categories'))
cats = labelspan.parent.find_all('a',href=re.compile(r'browse.php\?type=categories'))
for cat in cats:
self.story.addToList('category',cat.string)
if 'Characters' in label:
chars = labelspan.parent.findAll('a',href=re.compile(r'browse.php\?type=characters'))
chars = labelspan.parent.find_all('a',href=re.compile(r'browse.php\?type=characters'))
for char in chars:
self.story.addToList('characters',char.string)
if 'Genre' in label:
genres = labelspan.parent.findAll('a',href=re.compile(r'browse.php\?type=class&type_id=1'))
genres = labelspan.parent.find_all('a',href=re.compile(r'browse.php\?type=class&type_id=1'))
for genre in genres:
self.story.addToList('genre',genre.string)
if 'Warnings' in label:
warnings = labelspan.parent.findAll('a',href=re.compile(r'browse.php\?type=class&type_id=2'))
warnings = labelspan.parent.find_all('a',href=re.compile(r'browse.php\?type=class&type_id=2'))
for warning in warnings:
self.story.addToList('warnings',warning.string)
@ -247,7 +247,7 @@ class DracoAndGinnyComAdapter(BaseSiteAdapter):
seriessoup = self.make_soup(self.get_request(series_url))
# can't use ^viewstory...$ in case of higher rated stories with javascript href.
storyas = seriessoup.findAll('a', href=re.compile(r'viewstory.php\?sid=\d+'))
storyas = seriessoup.find_all('a', href=re.compile(r'viewstory.php\?sid=\d+'))
i=1
for a in storyas:
# skip 'report this' and 'TOC' links

View file

@ -138,7 +138,7 @@ class EFPFanFicNet(BaseSiteAdapter):
# no selector found, so it's a one-chapter story.
self.add_chapter(self.story.getMetadata('title'),url)
else:
allOptions = select.findAll('option', {'value' : re.compile(r'viewstory')})
allOptions = select.find_all('option', {'value' : re.compile(r'viewstory')})
for o in allOptions:
url = u'https://%s/%s' % ( self.getSiteDomain(),
o['value'])
@ -170,14 +170,14 @@ class EFPFanFicNet(BaseSiteAdapter):
if authsoup != None:
# last author link with offset should be the 'next' link.
authurl = u'https://%s/%s' % ( self.getSiteDomain(),
authsoup.findAll('a',href=re.compile(r'viewuser\.php\?uid=\d+&catid=&offset='))[-1]['href'] )
authsoup.find_all('a',href=re.compile(r'viewuser\.php\?uid=\d+&catid=&offset='))[-1]['href'] )
# Need author page for most of the metadata.
logger.debug("fetching author page: (%s)"%authurl)
authsoup = self.make_soup(self.get_request(authurl))
#print("authsoup:%s"%authsoup)
storyas = authsoup.findAll('a', href=re.compile(r'viewstory.php\?sid='+self.story.getMetadata('storyId')+r'&i=1$'))
storyas = authsoup.find_all('a', href=re.compile(r'viewstory.php\?sid='+self.story.getMetadata('storyId')+r'&i=1$'))
for storya in storyas:
#print("======storya:%s"%storya)
storyblock = storya.findParent('div',{'class':'storybloc'})
@ -194,7 +194,7 @@ class EFPFanFicNet(BaseSiteAdapter):
# Tipo di coppia: Het | Personaggi: Akasuna no Sasori , Akatsuki, Nuovo Personaggio | Note: OOC | Avvertimenti: Tematiche delicate<br />
# Categoria: <a href="categories.php?catid=1&amp;parentcatid=1">Anime & Manga</a> > <a href="categories.php?catid=108&amp;parentcatid=108">Naruto</a> | Contesto: Naruto Shippuuden | Leggi le <a href="reviews.php?sid=1331275&amp;a=">3</a> recensioni</div>
cats = noteblock.findAll('a',href=re.compile(r'browse.php\?type=categories'))
cats = noteblock.find_all('a',href=re.compile(r'browse.php\?type=categories'))
for cat in cats:
self.story.addToList('category',cat.string)
@ -262,7 +262,7 @@ class EFPFanFicNet(BaseSiteAdapter):
seriessoup = self.make_soup(self.get_request(series_url))
# can't use ^viewstory...$ in case of higher rated stories with javascript href.
storyas = seriessoup.findAll('a', href=re.compile(r'viewstory.php\?sid=\d+&i=1'))
storyas = seriessoup.find_all('a', href=re.compile(r'viewstory.php\?sid=\d+&i=1'))
i=1
for a in storyas:
if a['href'] == ('viewstory.php?sid='+self.story.getMetadata('storyId'))+'&i=1':
@ -288,11 +288,11 @@ class EFPFanFicNet(BaseSiteAdapter):
raise exceptions.FailedToDownload("Error downloading Chapter: %s! Missing required element!" % url)
# remove any header and 'o:p' tags.
for tag in div.findAll("head") + div.findAll("o:p"):
for tag in div.find_all("head") + div.find_all("o:p"):
tag.extract()
# change any html and body tags to div.
for tag in div.findAll("html") + div.findAll("body"):
for tag in div.find_all("html") + div.find_all("body"):
tag.name='div'
# remove extra bogus doctype.

View file

@ -126,7 +126,7 @@ class ErosnSapphoSycophantHexComAdapter(BaseSiteAdapter):
self.story.setMetadata('rating', rating)
# Find the chapters:
for chapter in soup.findAll('a', href=re.compile(r'viewstory.php\?sid='+self.story.getMetadata('storyId')+r"&chapter=\d+$")):
for chapter in soup.find_all('a', href=re.compile(r'viewstory.php\?sid='+self.story.getMetadata('storyId')+r"&chapter=\d+$")):
# just in case there's tags, like <i> in chapter titles.
self.add_chapter(chapter,'http://'+self.host+'/'+chapter['href']+addurl)
@ -144,7 +144,7 @@ class ErosnSapphoSycophantHexComAdapter(BaseSiteAdapter):
# <span class="label">Rated:</span> NC-17<br /> etc
labels = soup.findAll('span',{'class':'label'})
labels = soup.find_all('span',{'class':'label'})
value = labels[0].previousSibling
svalue = ""
@ -164,22 +164,22 @@ class ErosnSapphoSycophantHexComAdapter(BaseSiteAdapter):
self.story.setMetadata('numWords', value.split(' -')[0])
if 'Categories' in label:
cats = labelspan.parent.findAll('a',href=re.compile(r'browse.php\?type=categories'))
cats = labelspan.parent.find_all('a',href=re.compile(r'browse.php\?type=categories'))
for cat in cats:
self.story.addToList('category',cat.string)
if 'Characters' in label:
chars = labelspan.parent.findAll('a',href=re.compile(r'browse.php\?type=characters'))
chars = labelspan.parent.find_all('a',href=re.compile(r'browse.php\?type=characters'))
for char in chars:
self.story.addToList('characters',char.string)
if 'Genre' in label:
genres = labelspan.parent.findAll('a',href=re.compile(r'browse.php\?type=class&type_id=1'))
genres = labelspan.parent.find_all('a',href=re.compile(r'browse.php\?type=class&type_id=1'))
for genre in genres:
self.story.addToList('genre',genre.string)
if 'Warnings' in label:
warnings = labelspan.parent.findAll('a',href=re.compile(r'browse.php\?type=class&type_id=2'))
warnings = labelspan.parent.find_all('a',href=re.compile(r'browse.php\?type=class&type_id=2'))
for warning in warnings:
self.story.addToList('warnings',warning.string)
@ -204,7 +204,7 @@ class ErosnSapphoSycophantHexComAdapter(BaseSiteAdapter):
series_url = 'http://'+self.host+'/'+a['href']
seriessoup = self.make_soup(self.get_request(series_url))
storyas = seriessoup.findAll('a', href=re.compile(r'viewstory.php\?sid=\d+'))
storyas = seriessoup.find_all('a', href=re.compile(r'viewstory.php\?sid=\d+'))
i=1
for a in storyas:
# skip 'report this' and 'TOC' links

View file

@ -53,6 +53,9 @@ class FanficAuthorsNetAdapter(BaseSiteAdapter):
#Setting the 'Zone' for each "Site"
self.zone = self.parsedUrl.netloc.replace('.fanficauthors.net','')
# site change .nsns to -nsns
self.zone = self.zone.replace('.nsns','-nsns')
# normalized story URL.
self._setURL('https://{0}.{1}/{2}/'.format(
self.zone, self.getBaseDomain(), self.story.getMetadata('storyId')))
@ -79,7 +82,10 @@ class FanficAuthorsNetAdapter(BaseSiteAdapter):
@classmethod
def getAcceptDomains(cls):
# need both .nsns(old) and -nsns(new) because it's a domain
# change, not just URL change.
return ['aaran-st-vines.nsns.fanficauthors.net',
'aaran-st-vines-nsns.fanficauthors.net',
'abraxan.fanficauthors.net',
'bobmin.fanficauthors.net',
'canoncansodoff.fanficauthors.net',
@ -95,9 +101,12 @@ class FanficAuthorsNetAdapter(BaseSiteAdapter):
'jeconais.fanficauthors.net',
'kinsfire.fanficauthors.net',
'kokopelli.nsns.fanficauthors.net',
'kokopelli-nsns.fanficauthors.net',
'ladya.nsns.fanficauthors.net',
'ladya-nsns.fanficauthors.net',
'lorddwar.fanficauthors.net',
'mrintel.nsns.fanficauthors.net',
'mrintel-nsns.fanficauthors.net',
'musings-of-apathy.fanficauthors.net',
'ruskbyte.fanficauthors.net',
'seelvor.fanficauthors.net',
@ -108,7 +117,7 @@ class FanficAuthorsNetAdapter(BaseSiteAdapter):
################################################################################################
@classmethod
def getSiteExampleURLs(self):
return ("https://aaran-st-vines.nsns.fanficauthors.net/A_Story_Name/ "
return ("https://aaran-st-vines-nsns.fanficauthors.net/A_Story_Name/ "
+ "https://abraxan.fanficauthors.net/A_Story_Name/ "
+ "https://bobmin.fanficauthors.net/A_Story_Name/ "
+ "https://canoncansodoff.fanficauthors.net/A_Story_Name/ "
@ -123,10 +132,10 @@ class FanficAuthorsNetAdapter(BaseSiteAdapter):
+ "https://jbern.fanficauthors.net/A_Story_Name/ "
+ "https://jeconais.fanficauthors.net/A_Story_Name/ "
+ "https://kinsfire.fanficauthors.net/A_Story_Name/ "
+ "https://kokopelli.nsns.fanficauthors.net/A_Story_Name/ "
+ "https://ladya.nsns.fanficauthors.net/A_Story_Name/ "
+ "https://kokopelli-nsns.fanficauthors.net/A_Story_Name/ "
+ "https://ladya-nsns.fanficauthors.net/A_Story_Name/ "
+ "https://lorddwar.fanficauthors.net/A_Story_Name/ "
+ "https://mrintel.nsns.fanficauthors.net/A_Story_Name/ "
+ "https://mrintel-nsns.fanficauthors.net/A_Story_Name/ "
+ "https://musings-of-apathy.fanficauthors.net/A_Story_Name/ "
+ "https://ruskbyte.fanficauthors.net/A_Story_Name/ "
+ "https://seelvor.fanficauthors.net/A_Story_Name/ "
@ -136,8 +145,16 @@ class FanficAuthorsNetAdapter(BaseSiteAdapter):
################################################################################################
def getSiteURLPattern(self):
## .nsns kept here to match both . and -
return r'https?://(aaran-st-vines.nsns|abraxan|bobmin|canoncansodoff|chemprof|copperbadge|crys|deluded-musings|draco664|fp|frenchsession|ishtar|jbern|jeconais|kinsfire|kokopelli.nsns|ladya.nsns|lorddwar|mrintel.nsns|musings-of-apathy|ruskbyte|seelvor|tenhawk|viridian|whydoyouneedtoknow)\.fanficauthors\.net/([a-zA-Z0-9_]+)/'
@classmethod
def get_section_url(cls,url):
## only changing .nsns to -nsns and only when part of the
## domain.
url = url.replace('.nsns.fanficauthors.net','-nsns.fanficauthors.net')
return url
################################################################################################
def doExtractChapterUrlsAndMetadata(self, get_cover=True):
@ -163,7 +180,7 @@ class FanficAuthorsNetAdapter(BaseSiteAdapter):
# Find the chapters:
# The published and update dates are with the chapter links...
# so we have to get them from there.
chapters = soup.findAll('a', href=re.compile('/'+self.story.getMetadata(
chapters = soup.find_all('a', href=re.compile('/'+self.story.getMetadata(
'storyId')+'/([a-zA-Z0-9_]+)/'))
# Here we are getting the published date. It is the date the first chapter was "updated"
@ -202,7 +219,7 @@ class FanficAuthorsNetAdapter(BaseSiteAdapter):
## Raising AdultCheckRequired after collecting chapters gives
## a double chapter list. So does genre, but it de-dups
## automatically.
if( self.story.getMetadata('rating') == 'Mature'
if( self.story.getMetadataRaw('rating') in ['Mature','Adult Only']
and not (self.is_adult or self.getConfig("is_adult")) ):
raise exceptions.AdultCheckRequired(self.url)
@ -226,7 +243,7 @@ class FanficAuthorsNetAdapter(BaseSiteAdapter):
# grab the text for an individual chapter.
def getChapterText(self, url):
logger.debug('Getting chapter text from: %s' % url)
if( self.story.getMetadata('rating') == 'Mature' and
if( self.story.getMetadataRaw('rating') in ['Mature','Adult Only'] and
(self.is_adult or self.getConfig("is_adult")) ):
addurl = "?bypass=1"
else:
@ -241,8 +258,8 @@ class FanficAuthorsNetAdapter(BaseSiteAdapter):
"Error downloading Chapter: '{0}'! Missing required element!".format(url))
#Now, there are a lot of extranious tags within the story division.. so we will remove them.
for tag in story.findAll('ul',{'class':'pager'}) + story.findAll(
'div',{'class':'alert'}) + story.findAll('div', {'class':'btn-group'}):
for tag in story.find_all('ul',{'class':'pager'}) + story.find_all(
'div',{'class':'alert'}) + story.find_all('div', {'class':'btn-group'}):
tag.extract()
return self.utf8FromSoup(url,story)

View file

@ -150,7 +150,7 @@ class FanFicsMeAdapter(BaseSiteAdapter):
self.story.setMetadata('rating',stripHTML(get_meta_content(u'Рейтинг')))
## Need to login for any rating higher than General.
if self.story.getMetadata('rating') != 'General' and self.needToLoginCheck(data):
if self.story.getMetadataRaw('rating') != 'General' and self.needToLoginCheck(data):
self.performLogin(url)
# reload after login.
data = self.get_request(url,usecache=False)

View file

@ -110,6 +110,31 @@ class FanFictionNetSiteAdapter(BaseSiteAdapter):
return re.sub(r"https?://(www|m)\.(?P<keep>fanfiction\.net/s/\d+/\d+/).*",
r"https://www.\g<keep>",url)+self.urltitle
def get_request(self,url,usecache=True):
## use super version if not set or isn't a chapter URL with a
## title.
if( not self.getConfig("try_shortened_title_urls") or
not re.match(r"https?://www\.fanfiction\.net/s/\d+/\d+/(?P<title>[^/]+)$", url) ):
return super(getClass(), self).get_request(url,usecache)
## kludgey way to attempt more than one URL variant by
## removing title one letter at a time. Note that network and
## open_pages_in_browser retries still happen first.
titlelen = len(url.split('/')[-1])
maxcut = min([4,titlelen])
j = 0
while j < maxcut: # should actually leave loop either by
# return or exception raise.
try:
useurl = url
if j: # j==0, full URL, then remove letters.
useurl = url[:-j]
return super(getClass(), self).get_request(useurl,usecache)
except exceptions.HTTPErrorFFF as fffe:
if j >= maxcut or 'Page not found or expired' not in unicode(fffe):
raise
j = j+1
def doExtractChapterUrlsAndMetadata(self,get_cover=True):
# fetch the chapter. From that we will get almost all the
@ -142,7 +167,7 @@ class FanFictionNetSiteAdapter(BaseSiteAdapter):
## the first chapter. It generates another server request and
## doesn't seem to be needed lately, so now default it to off.
try:
chapcount = len(soup.find('select', { 'name' : 'chapter' } ).findAll('option'))
chapcount = len(soup.find('select', { 'name' : 'chapter' } ).find_all('option'))
# get chapter part of url.
except:
chapcount = 1
@ -187,7 +212,7 @@ class FanFictionNetSiteAdapter(BaseSiteAdapter):
## For 1, use the second link.
## For 2, fetch the crossover page and pull the two categories from there.
pre_links = soup.find('div',{'id':'pre_story_links'})
categories = pre_links.findAll('a',{'class':'xcontrast_txt'})
categories = pre_links.find_all('a',{'class':'xcontrast_txt'})
#print("xcontrast_txt a:%s"%categories)
if len(categories) > 1:
# Strangely, the ones with *two* links are the
@ -226,7 +251,7 @@ class FanFictionNetSiteAdapter(BaseSiteAdapter):
grayspan = gui_table1i.find('span', {'class':'xgray xcontrast_txt'})
# for b in grayspan.findAll('button'):
# for b in grayspan.find_all('button'):
# b.extract()
metatext = stripHTML(grayspan).replace('Hurt/Comfort','Hurt-Comfort')
#logger.debug("metatext:(%s)"%metatext)
@ -265,7 +290,7 @@ class FanFictionNetSiteAdapter(BaseSiteAdapter):
# Updated: <span data-xutime='1368059198'>5/8</span> - Published: <span data-xutime='1278984264'>7/12/2010</span>
# Published: <span data-xutime='1384358726'>8m ago</span>
dates = soup.findAll('span',{'data-xutime':re.compile(r'^\d+$')})
dates = soup.find_all('span',{'data-xutime':re.compile(r'^\d+$')})
if len(dates) > 1 :
# updated get set to the same as published upstream if not found.
self.story.setMetadata('dateUpdated',datetime.fromtimestamp(float(dates[0]['data-xutime'])))
@ -370,7 +395,7 @@ class FanFictionNetSiteAdapter(BaseSiteAdapter):
# no selector found, so it's a one-chapter story.
self.add_chapter(self.story.getMetadata('title'),url)
else:
allOptions = select.findAll('option')
allOptions = select.find_all('option')
for o in allOptions:
## title URL will be put back on chapter URL during
## normalize_chapterurl() anyway, but also here for

View file

@ -52,11 +52,11 @@ class FanfictionsFrSiteAdapter(BaseSiteAdapter):
self.story.setMetadata('storyId', story_id)
fandom_name = match.group('fandom')
self._setURL('https://www.%s/fanfictions/%s/%s/chapters.html' % (self.getSiteDomain(), fandom_name, story_id))
self._setURL('https://%s/fanfictions/%s/%s/chapters.html' % (self.getSiteDomain(), fandom_name, story_id))
@staticmethod
def getSiteDomain():
return 'fanfictions.fr'
return 'www.fanfictions.fr'
@classmethod
def getSiteExampleURLs(cls):

View file

@ -134,7 +134,7 @@ class FanFiktionDeAdapter(BaseSiteAdapter):
self.story.setMetadata('author',stripHTML(a))
# Find the chapters:
for chapter in soup.find('select').findAll('option'):
for chapter in soup.find('select').find_all('option'):
self.add_chapter(chapter,'https://'+self.host+'/s/'+self.story.getMetadata('storyId')+'/'+chapter['value'])
## title="Wörter" failed with max_zalgo:1
@ -181,13 +181,13 @@ class FanFiktionDeAdapter(BaseSiteAdapter):
# #find metadata on the author's page
# asoup = self.make_soup(self.get_request("https://"+self.getSiteDomain()+"?a=q&a1=v&t=nickdetailsstories&lbi=stories&ar=0&nick="+self.story.getMetadata('authorId')))
# tr=asoup.findAll('tr')
# tr=asoup.find_all('tr')
# for i in range(1,len(tr)):
# a = tr[i].find('a')
# if '/s/'+self.story.getMetadata('storyId')+'/1/' in a['href']:
# break
# td = tr[i].findAll('td')
# td = tr[i].find_all('td')
# self.story.addToList('category',stripHTML(td[2]))
# self.story.setMetadata('rating', stripHTML(td[5]))
# self.story.setMetadata('numWords', stripHTML(td[6]))
@ -204,7 +204,7 @@ class FanFiktionDeAdapter(BaseSiteAdapter):
soup = self.make_soup(self.get_request(url))
div = soup.find('div', {'id' : 'storytext'})
for a in div.findAll('script'):
for a in div.find_all('script'):
a.extract()
if None == div:

View file

@ -16,16 +16,15 @@
#
from __future__ import absolute_import,unicode_literals
import datetime
# import datetime
import logging
import json
logger = logging.getLogger(__name__)
import re
from .. import translit
# from .. import translit
from ..htmlcleanup import stripHTML
from .. import exceptions as exceptions
from .. import exceptions# as exceptions
# py2 vs py3 transition
@ -59,7 +58,7 @@ class FicBookNetAdapter(BaseSiteAdapter):
# The date format will vary from site to site.
# http://docs.python.org/library/datetime.html#strftime-strptime-behavior
self.dateformat = "%d %m %Y %H:%M"
self.dateformat = u"%d %m %Y г., %H:%M"
@staticmethod # must be @staticmethod, don't remove it.
def getSiteDomain():
@ -87,9 +86,8 @@ class FicBookNetAdapter(BaseSiteAdapter):
if 'Войти используя аккаунт на сайте' in d:
raise exceptions.FailedToLogin(url,params['login'])
return False
else:
return True
return True
## Getting the chapter list and the meta data, plus 'is adult' checking.
def extractChapterUrlsAndMetadata(self,get_cover=True):
@ -109,11 +107,7 @@ class FicBookNetAdapter(BaseSiteAdapter):
try:
a = soup.find('section',{'class':'chapter-info'}).find('h1')
except AttributeError:
# Handle 404 in a nicer way when using nsapa proxy
if re.search(r'404 — Страница не найдена', soup.find('title').text):
raise exceptions.StoryDoesNotExist(url)
else:
raise exceptions.FailedToDownload("Error collecting meta: %s! Missing required element!" % url)
raise exceptions.FailedToDownload("Error collecting meta: %s! Missing required element!" % url)
# kill '+' marks if present.
sup = a.find('sup')
if sup:
@ -145,8 +139,8 @@ class FicBookNetAdapter(BaseSiteAdapter):
# Find the chapters:
pubdate = None
chapters = soup.find('ul', {'class' : 'list-of-fanfic-parts'})
if chapters != None:
for chapdiv in chapters.findAll('li', {'class':'part'}):
if chapters is not None:
for chapdiv in chapters.find_all('li', {'class':'part'}):
chapter=chapdiv.find('a',href=re.compile(r'/readfic/'+self.story.getMetadata('storyId')+r"/\d+#part_content$"))
churl='https://'+self.host+chapter['href']
@ -158,12 +152,11 @@ class FicBookNetAdapter(BaseSiteAdapter):
self.add_chapter(chapter,churl,
{'date':chapterdate.strftime(self.getConfig("datechapter_format",self.getConfig("datePublished_format",self.dateformat)))})
if pubdate == None and chapterdate:
if pubdate is None and chapterdate:
pubdate = chapterdate
update = chapterdate
else:
self.add_chapter(self.story.getMetadata('title'),url)
self.story.setMetadata('numChapters',1)
date_str = soup.find('div', {'class' : 'part-date'}).find('span', {'title': True})['title'].replace(u"\u202fг. в", "")
for month_name, month_num in fullmon.items():
date_str = date_str.replace(month_name, month_num)
@ -175,22 +168,18 @@ class FicBookNetAdapter(BaseSiteAdapter):
self.story.setMetadata('datePublished', pubdate)
self.story.setMetadata('language','Russian')
## after site change, I don't see word count anywhere.
# pr=soup.find('a', href=re.compile(r'/printfic/\w+'))
# pr='https://'+self.host+pr['href']
# pr = self.make_soup(self.get_request(pr))
# pr=pr.findAll('div', {'class' : 'part_text'})
# i=0
# for part in pr:
# i=i+len(stripHTML(part).split(' '))
# self.story.setMetadata('numWords', unicode(i))
dlinfo = soup.select_one('header.d-flex.flex-column.gap-12.word-break')
# dlinfo = soup.find('div',{'class':'fanfic-main-info'})
dlinfo = soup.select_one('div.d-flex.flex-column.gap-8')
series_label = dlinfo.select_one('div.description.word-break').find('strong', string='Серия:')
logger.debug('Series: %s'%str(series_label))
if series_label:
series_div = series_label.find_next_sibling("div")
# No accurate series number as for that, additional request needs to be made
self.setSeries(stripHTML(series_div.a), 1)
self.story.setMetadata('seriesUrl','https://' + self.getSiteDomain() + series_div.a.get('href'))
i=0
fandoms = dlinfo.select_one('div:not([class])').findAll('a', href=re.compile(r'/fanfiction/\w+'))
fandoms = dlinfo.select_one('div:not([class])').find_all('a', href=re.compile(r'/fanfiction/\w+'))
for fandom in fandoms:
self.story.addToList('category',fandom.string)
i=i+1
@ -199,13 +188,16 @@ class FicBookNetAdapter(BaseSiteAdapter):
tags = soup.find('div',{'class':'tags'})
if tags:
for genre in tags.findAll('a',href=re.compile(r'/tags/')):
for genre in tags.find_all('a',href=re.compile(r'/tags/')):
self.story.addToList('genre',stripHTML(genre))
logger.debug("category: (%s)"%self.story.getMetadata('category'))
logger.debug("genre: (%s)"%self.story.getMetadata('genre'))
ratingdt = dlinfo.find('div',{'class':re.compile(r'badge-rating-.*')})
self.story.setMetadata('rating', stripHTML(ratingdt.find('span')))
# meta=table.findAll('a', href=re.compile(r'/ratings/'))
# meta=table.find_all('a', href=re.compile(r'/ratings/'))
# i=0
# for m in meta:
# if i == 0:
@ -223,6 +215,11 @@ class FicBookNetAdapter(BaseSiteAdapter):
else:
self.story.setMetadata('status', 'In-Progress')
try:
self.story.setMetadata('universe', stripHTML(dlinfo.find('a', href=re.compile('/fandom_universe/'))))
except AttributeError:
pass
paircharsdt = soup.find('strong',string='Пэйринг и персонажи:')
# site keeps both ships and indiv chars in /pairings/ links.
if paircharsdt:
@ -245,7 +242,7 @@ class FicBookNetAdapter(BaseSiteAdapter):
stats = soup.find('div', {'class':'hat-actions-container'})
targetdata = stats.find_all('span', {'class' : 'main-info'})
for data in targetdata:
svg_class = data.find('svg')['class'][0] if data.find('svg') else None
svg_class = data.find('svg')['class'][1] if data.find('svg') else None
value = int(stripHTML(data)) if stripHTML(data).isdigit() else 0
if svg_class == 'ic_thumbs-up' and value > 0:
@ -258,15 +255,23 @@ class FicBookNetAdapter(BaseSiteAdapter):
self.story.setMetadata('numCollections', value)
logger.debug("numCollections: (%s)"%self.story.getMetadata('numCollections'))
# Grab the amount of pages
# Grab the amount of pages and words
targetpages = soup.find('strong',string='Размер:').find_next('div')
if targetpages:
pages_raw = re.search(r'(.+)\s+(?:страницы|страниц)', targetpages.text, re.UNICODE)
pages = int(re.sub(r'[^\d]', '', pages_raw.group(1)))
targetpages_text = re.sub(r"(?<!\,)\s| ", "", targetpages.text, flags=re.UNICODE | re.MULTILINE)
pages_raw = re.search(r'(\d+)(?:страницы|страниц)', targetpages_text, re.UNICODE)
pages = int(pages_raw.group(1))
if pages > 0:
self.story.setMetadata('pages', pages)
logger.debug("pages: (%s)"%self.story.getMetadata('pages'))
numWords_raw = re.search(r"(\d+)(?:слова|слов)", targetpages_text, re.UNICODE)
numWords = int(numWords_raw.group(1))
if numWords > 0:
self.story.setMetadata('numWords', numWords)
logger.debug("numWords: (%s)"%self.story.getMetadata('numWords'))
# Grab FBN Category
class_tag = soup.select_one('div[class^="badge-with-icon direction"]').find('span', {'class' : 'badge-text'}).text
if class_tag:
@ -275,7 +280,7 @@ class FicBookNetAdapter(BaseSiteAdapter):
# Find dedication.
ded = soup.find('div', {'class' : 'js-public-beta-dedication'})
if ded != None:
if ded:
ded['class'].append('part_text')
self.story.setMetadata('dedication',ded)
@ -285,11 +290,7 @@ class FicBookNetAdapter(BaseSiteAdapter):
comm['class'].append('part_text')
self.story.setMetadata('authorcomment',comm)
# When using nsapa proxy the required elements are not returned.
try:
follows = stats.find('fanfic-follow-button')[':follow-count']
except TypeError:
follows = stripHTML(stats.find('button', {'class': 'btn btn-with-description btn-primary jsVueComponent', 'type': 'button'}).span)
follows = stats.find('fanfic-follow-button')[':follow-count']
if int(follows) > 0:
self.story.setMetadata('follows', int(follows))
logger.debug("follows: (%s)"%self.story.getMetadata('follows'))
@ -302,15 +303,9 @@ class FicBookNetAdapter(BaseSiteAdapter):
numAwards = int(len(award_list))
# Grab the awards, but if multiple awards have the same name, only one will be kept; only an issue with hundreds of them.
self.story.extendList('awards', [str(award['user_text']) for award in award_list])
#logger.debug("awards (%s)"%self.story.getMetadata('awards'))
#logger.debug("awards (%s)"%self.story.getMetadata('awards'))
except (TypeError, KeyError):
awards_section = soup.find('section', {'class':'fanfic-author-actions__column mt-5 jsVueComponent'})
if awards_section is not None:
awards = awards_section.select('div:not([class])')
numAwards = int(len(awards))
naward = awards_section.find('span', {'class':'js-span-link'})
if naward is not None:
numAwards = numAwards + int(re.sub(r'[^\d]', '', naward.text))
logger.debug("Could not grab the awards")
if numAwards > 0:
self.story.setMetadata('numAwards', numAwards)
@ -318,16 +313,19 @@ class FicBookNetAdapter(BaseSiteAdapter):
if get_cover:
cover = soup.find('fanfic-cover', {'class':"jsVueComponent"})
if cover is None:
# When using nsapa proxy the element is replaced by different one.
cover = soup.find('picture', {'class':"fanfic-hat-cover-picture"})
if cover is not None:
cover = re.sub('/fanfic-covers/(?:m_|d_)', '/fanfic-covers/', cover.img['src'])
logger.debug("Cover url (%s)"%cover)
self.setCoverImage(url,cover)
else:
if cover is not None:
self.setCoverImage(url,cover['src-original'])
def replace_formatting(self,tag):
tname = tag.name
## operating on plain text because BS4 is hard to work on
## text with.
## stripHTML() discards whitespace around other tags, like <i>
txt = tag.get_text()
txt = txt.replace("\n","<br/>")
soup = self.make_soup("<"+tname+">"+txt+"</"+tname+">")
return soup.find(tname)
# grab the text for an individual chapter.
def getChapterText(self, url):
@ -336,29 +334,36 @@ class FicBookNetAdapter(BaseSiteAdapter):
soup = self.make_soup(self.get_request(url))
chapter = soup.find('div', {'id' : 'content'})
if chapter == None: ## still needed?
if chapter is None: ## still needed?
chapter = soup.find('div', {'class' : 'public_beta_disabled'})
if None == chapter:
if chapter is None:
raise exceptions.FailedToDownload("Error downloading Chapter: %s! Missing required element!" % url)
# Remove ads that show up when using NSAPA proxy.
if self.getConfig("use_nsapa_proxy",True):
for ads in chapter.find_all('div', {'class' : 'ads-in-text'}):
ads.extract()
## ficbook uses weird CSS white-space: pre-wrap; for
## paragraphing. Doesn't work with txt output
if 'part_text' in chapter['class'] and self.getConfig('replace_text_formatting'):
## copy classes, except part_text
divclasses = chapter['class']
divclasses.remove('part_text')
chapter = self.replace_formatting(chapter)
chapter['class'] = divclasses
exclude_notes=self.getConfigList('exclude_notes')
if 'headnotes' not in exclude_notes:
# Find the headnote
head_note = soup.find('div', {'class': 'part-comment-top'})
head_note = soup.select_one("div.part-comment-top div.js-public-beta-comment-before")
if head_note:
head_notes_content = head_note.find('div', {'class': 'js-public-beta-comment-before'}).get_text(strip=True)
# Create the structure for the headnote
head_notes_div_tag = soup.new_tag('div', attrs={'class': 'fff_chapter_notes fff_head_notes'})
head_b_tag = soup.new_tag('b')
head_b_tag.string = 'Примечания:'
head_blockquote_tag = soup.new_tag('blockquote')
head_blockquote_tag.string = head_notes_content
if 'text-preline' in head_note['class'] and self.getConfig('replace_text_formatting'):
head_blockquote_tag = self.replace_formatting(head_note)
head_blockquote_tag.name = 'blockquote'
else:
head_blockquote_tag = soup.new_tag('blockquote')
head_blockquote_tag.string = stripHTML(head_note)
head_notes_div_tag.append(head_b_tag)
head_notes_div_tag.append(head_blockquote_tag)
# Prepend the headnotes to the chapter, <hr> to mimic the site
@ -367,15 +372,18 @@ class FicBookNetAdapter(BaseSiteAdapter):
if 'footnotes' not in exclude_notes:
# Find the endnote
end_note = soup.find('div', {'class': 'part-comment-bottom'})
end_note = soup.select_one("div.part-comment-bottom div.js-public-beta-comment-after")
if end_note:
end_notes_content = end_note.find('div', {'class': 'js-public-beta-comment-after'}).get_text(strip=True)
# Create the structure for the footnote
end_notes_div_tag = soup.new_tag('div', attrs={'class': 'fff_chapter_notes fff_foot_notes'})
end_b_tag = soup.new_tag('b')
end_b_tag.string = 'Примечания:'
end_blockquote_tag = soup.new_tag('blockquote')
end_blockquote_tag.string = end_notes_content
if 'text-preline' in end_note['class'] and self.getConfig('replace_text_formatting'):
end_blockquote_tag = self.replace_formatting(end_note)
end_blockquote_tag.name = 'blockquote'
else:
end_blockquote_tag = soup.new_tag('blockquote')
end_blockquote_tag.string = stripHTML(end_note)
end_notes_div_tag.append(end_b_tag)
end_notes_div_tag.append(end_blockquote_tag)
# Append the endnotes to the chapter, <hr> to mimic the site

View file

@ -201,10 +201,10 @@ class FictionAlleyArchiveOrgSiteAdapter(BaseSiteAdapter):
# epubutils.py
# Yes, this still applies to fictionalley-archive.
for tag in chaptext.findAll('head') + chaptext.findAll('meta') + chaptext.findAll('script'):
for tag in chaptext.find_all('head') + chaptext.find_all('meta') + chaptext.find_all('script'):
tag.extract()
for tag in chaptext.findAll('body') + chaptext.findAll('html'):
for tag in chaptext.find_all('body') + chaptext.find_all('html'):
tag.name = 'div'
if self.getConfig('include_author_notes'):

View file

@ -55,6 +55,8 @@ class FictionLiveAdapter(BaseSiteAdapter):
self.story_id = self.parsedUrl.path.split('/')[3]
self.story.setMetadata('storyId', self.story_id)
self.chapter_id_to_api = {}
# normalize URL. omits title in the url
self._setURL("https://fiction.live/stories//{s_id}".format(s_id = self.story_id));
@ -171,7 +173,7 @@ class FictionLiveAdapter(BaseSiteAdapter):
tags = data['ta'] if 'ta' in data else []
if (self.story.getMetadata('rating') in {"nsfw", "adult"} or 'smut' in tags) and \
if (self.story.getMetadataRaw('rating') in {"nsfw", "adult"} or 'smut' in tags) and \
not (self.is_adult or self.getConfig("is_adult")):
raise exceptions.AdultCheckRequired(self.url)
@ -239,6 +241,17 @@ class FictionLiveAdapter(BaseSiteAdapter):
a, b = itertools.tee(iterable, 2)
next(b, None)
return list(zip(a, b))
def map_chap_ids_to_api(chapter_ids, route_ids, times):
for index, bounds in enumerate(times):
start, end = bounds
end -= 1
chapter_url = chunkrange_url.format(s_id = data['_id'], start = start, end = end)
self.chapter_id_to_api[chapter_ids[index]] = chapter_url
for route_id in route_ids:
chapter_url = route_chunkrange_url.format(c_id = route_id)
self.chapter_id_to_api[route_id] = chapter_url
## first thing to do is seperate out the appendices
appendices, maintext, routes = [], [], []
@ -260,22 +273,25 @@ class FictionLiveAdapter(BaseSiteAdapter):
## main-text chapter extraction processing. *should* now handle all the edge cases.
## relies on fanficfare ignoring empty chapters!
titles = [c['title'] for c in maintext]
titles = ["Home"] + titles
titles = ["Home"] + [c['title'] for c in maintext]
chapter_ids = ['home'] + [c['id'] for c in maintext]
times = [data['ct']] + [c['ct'] for c in maintext] + [self.most_recent_chunk + 2] # need to be 1 over, and add_url etc does -1
times = pair(times)
times = [c['ct'] for c in maintext]
times = [data['ct']] + times + [self.most_recent_chunk + 2] # need to be 1 over, and add_url etc does -1
if self.getConfig('include_appendices', True): # Add appendices after main text if desired
titles = titles + ["Appendix: " + a['title'][9:] for a in appendices]
chapter_ids = chapter_ids + [a['id'] for a in appendices]
times = times + [(a['ct'], a['ct'] + 2) for a in appendices]
route_ids = [r['id'] for r in routes]
map_chap_ids_to_api(chapter_ids, route_ids, times) # Map chapter ids to API URLs for use when comparing the two
# doesn't actually run without the call to list.
list(map(add_chapter_url, titles, pair(times)))
for a in appendices: # add appendices afterwards
chapter_start = a['ct']
chapter_title = "Appendix: " + a['title'][9:] # 'Appendix: ' rather than '#special' at beginning of name
add_chapter_url(chapter_title, (chapter_start, chapter_start + 2)) # 1 msec range = this one chunk only
list(map(add_chapter_url, titles, times))
for r in routes: # add route at the end, after appendices
route_id = r['id'] # to get route chapter content, the route id is needed, not the timestamp
route_id = r['id'] # to get route chapter content, the route id is needed, not the timestamp
chapter_title = "Route: " + r['title'] # 'Route: ' at beginning of name, since it's a multiroute chapter
add_route_chapter_url(chapter_title, route_id)
@ -418,7 +434,7 @@ class FictionLiveAdapter(BaseSiteAdapter):
# so let's just ignore non-int values here
if not isinstance(v, int):
continue
if 0 <= v <= len(choices):
if 0 <= v < len(choices):
output[v] += 1
return output
@ -502,8 +518,10 @@ class FictionLiveAdapter(BaseSiteAdapter):
# now matches the site and does *not* include dicerolls as posts!
num_votes = str(len(posts)) + " posts" if len(posts) != 0 else "be the first to post."
posts_title = chunk['b'] if 'b' in chunk else "Reader Posts"
output = ""
output += u"<h4><span>Reader Posts — <small> Posting " + closed
output += u"<h4><span>" + posts_title + " — <small> Posting " + closed
output += u"" + num_votes + "</small></span></h4>\n"
## so. a voter can roll with their post. these rolls are in a seperate dict, but have the **same uid**.
@ -529,6 +547,35 @@ class FictionLiveAdapter(BaseSiteAdapter):
return output
def normalize_chapterurl(self, url):
if url.startswith(r'https://fiction.live/api/anonkun/chapters'):
return url
pattern = None
if url.startswith(r'https://fiction.live/api/anonkun/route'):
pattern = r"https?://(?:beta\.)?fiction\.live/[^/]*/[^/]*/[a-zA-Z0-9]+/routes/([a-zA-Z0-9]+)"
elif url.startswith(r'https://fiction.live/'):
pattern = r"https?://(?:beta\.)?fiction\.live/[^/]*/[^/]*/[a-zA-Z0-9]+/[^/]*(/[a-zA-Z0-9]+|home)"
# regex101 rocks
if not pattern:
return url
match = re.match(pattern, url)
if not match:
return url
chapter_id = match.group(1)
if chapter_id.startswith('/'):
chapter_id = chapter_id[1:]
if chapter_id and chapter_id in self.chapter_id_to_api:
return self.chapter_id_to_api[chapter_id]
return url
def format_unknown(self, chunk):
raise NotImplementedError("Unknown chunk type ({}) in fiction.live story.".format(chunk))

View file

@ -40,10 +40,6 @@ class FictionManiaTVAdapter(BaseSiteAdapter):
self._setURL(self.READ_TEXT_STORY_URL_TEMPLATE % story_id)
self.story.setMetadata('siteabbrev', self.SITE_ABBREVIATION)
# Always single chapters, probably should use the Anthology feature to
# merge chapters of a story
self.story.setMetadata('numChapters', 1)
@staticmethod
def getSiteDomain():
return FictionManiaTVAdapter.SITE_DOMAIN
@ -167,14 +163,30 @@ class FictionManiaTVAdapter(BaseSiteAdapter):
# <div style="margin-left:10ex;margin-right:10ex">
## fetching SWI version now instead of text.
htmlurl = url.replace('readtextstory','readhtmlstory')
soup = self.make_soup(self.get_request(htmlurl))
div = soup.find('div',style="margin-left:10ex;margin-right:10ex")
if div:
return self.utf8FromSoup(htmlurl,div)
else:
## Used to find by style, but it's inconsistent now. we've seen:
## margin-left:10ex;margin-right:10ex
## margin-right: 5%; margin-left: 5%
## margin-left:5%; margin-right:5%
## margin-left:5%; margin-right:5%; background: white
## And there's some without a <div> tag (or an unclosed div)
## Only the comments appear to be consistent.
beginmarker='<!--Read or display the file-->'
endmarker='''<hr size=1 noshade>
<!--review add read, top and bottom-->
'''
data = self.get_request(htmlurl)
try:
## if both markers are found, assume whatever is in between
## is the chapter text.
soup = self.make_soup(data[data.index(beginmarker):data.index(endmarker)])
return self.utf8FromSoup(htmlurl,soup)
except Exception as e:
# logger.debug(e)
# logger.debug(soup)
logger.debug("Story With Images(SWI) not found, falling back to HTML.")
## fetching html version now instead of text.
## Note that html and SWI pages are *not* formatted the same.
soup = self.make_soup(self.get_request(url.replace('readtextstory','readxstory')))
# logger.debug(soup)

View file

@ -66,7 +66,8 @@ class FicwadComSiteAdapter(BaseSiteAdapter):
params['username']))
d = self.post_request(loginUrl,params,usecache=False)
if "Login attempt failed..." in d:
if "Login attempt failed..." in d or \
'<div id="error">Please enter your username and password.</div>' in d:
logger.info("Failed to login to URL %s as %s" % (loginUrl,
params['username']))
raise exceptions.FailedToLogin(url,params['username'])
@ -114,7 +115,7 @@ class FicwadComSiteAdapter(BaseSiteAdapter):
titleh4 = soup.find('div',{'class':'storylist'}).find('h4')
self.story.setMetadata('title', stripHTML(titleh4.a))
if 'Deleted story' in self.story.getMetadata('title'):
if 'Deleted story' in self.story.getMetadataRaw('title'):
raise exceptions.StoryDoesNotExist("This story was deleted. %s"%self.url)
# Find authorid and URL from... author url.
@ -129,14 +130,14 @@ class FicwadComSiteAdapter(BaseSiteAdapter):
#self.story.setMetadata('description', storydiv.find("blockquote",{'class':'summary'}).p.string)
# most of the meta data is here:
metap = storydiv.find("p",{"class":"meta"})
metap = storydiv.find("div",{"class":"meta"})
self.story.addToList('category',metap.find("a",href=re.compile(r"^/category/\d+")).string)
# warnings
# <span class="req"><a href="/help/38" title="Medium Spoilers">[!!] </a> <a href="/help/38" title="Rape/Sexual Violence">[R] </a> <a href="/help/38" title="Violence">[V] </a> <a href="/help/38" title="Child/Underage Sex">[Y] </a></span>
spanreq = metap.find("span",{"class":"story-warnings"})
if spanreq: # can be no warnings.
for a in spanreq.findAll("a"):
for a in spanreq.find_all("a"):
self.story.addToList('warnings',a['title'])
## perhaps not the most efficient way to parse this, using
@ -186,7 +187,7 @@ class FicwadComSiteAdapter(BaseSiteAdapter):
# no list found, so it's a one-chapter story.
self.add_chapter(self.story.getMetadata('title'),url)
else:
chapterlistlis = storylistul.findAll('li')
chapterlistlis = storylistul.find_all('li')
for chapterli in chapterlistlis:
if "blocked" in chapterli['class']:
# paranoia check. We should already be logged in by now.

View file

@ -99,6 +99,17 @@ class FimFictionNetSiteAdapter(BaseSiteAdapter):
params['username']))
raise exceptions.FailedToLogin(url,params['username'])
def make_soup(self,data):
soup = super(FimFictionNetSiteAdapter, self).make_soup(data)
for img in soup.select('img.lazy-img, img.user_image'):
## FimF has started a 'camo' mechanism for images that
## gets block by CF. attr data-source is original source.
if img.has_attr('data-source'):
img['src'] = img['data-source']
elif img.has_attr('data-src'):
img['src'] = img['data-src']
return soup
def doExtractChapterUrlsAndMetadata(self,get_cover=True):
if self.is_adult or self.getConfig("is_adult"):
@ -140,7 +151,8 @@ class FimFictionNetSiteAdapter(BaseSiteAdapter):
self.story.setMetadata("authorId", author['href'].split('/')[2])
self.story.setMetadata("authorUrl", "https://%s/user/%s/%s" % (self.getSiteDomain(),
self.story.getMetadata('authorId'),
self.story.getMetadata('author')))
# meta entry author can be changed by the user.
stripHTML(author)))
#Rating text is replaced with full words for historical compatibility after the site changed
#on 2014-10-27
@ -168,12 +180,13 @@ class FimFictionNetSiteAdapter(BaseSiteAdapter):
# Cover image
if get_cover:
storyImage = storyContentBox.find('img', {'class':'lazy-img'})
storyImage = soup.select_one('div.story_container__story_image img')
if storyImage:
coverurl = storyImage['data-fullsize']
# try setting from data-fullsize, if fails, try using data-src
if self.setCoverImage(self.url,coverurl)[0] == "failedtoload":
coverurl = storyImage['data-src']
cover_set = self.setCoverImage(self.url,coverurl)[0]
if not cover_set or cover_set.startswith("failedtoload"):
coverurl = storyImage['src']
self.setCoverImage(self.url,coverurl)
coverSource = storyImage.parent.find('a', {'class':'source'})
@ -395,3 +408,33 @@ class FimFictionNetSiteAdapter(BaseSiteAdapter):
# data = self.get_request(url)
if self.getConfig("is_adult"):
self.set_adult_cookie()
def get_urls_from_page(self,url,normalize):
iterate = self.getConfig('scrape_bookshelf', default=False)
if not re.search(r'fimfiction\.net/bookshelf/(?P<listid>.+?)/',url) or iterate == 'legacy':
return super().get_urls_from_page(url,normalize)
self.before_get_urls_from_page(url,normalize)
final_urls = list()
while True:
data = self.get_request(url,usecache=True)
soup = self.make_soup(data)
paginator = soup.select_one('div.paginator-container > div.page_list > ul').find_all('li')
logger.debug("Paginator: " + str(len(paginator)))
stories_container = soup.select_one('div.content > div.two-columns > div.left').find_all('article', recursive=False)
x = 0
logger.debug("Container "+str(len(stories_container)))
for story_raw in stories_container:
x += 1
story_url = story_raw.select_one('div.story_content_box > header.title > div > a.story_name').get('href')
url_story = ('https://' + self.getSiteDomain() + story_url)
#logger.debug(url_story)
final_urls.append(url_story)
logger.debug("Discovered %s new stories."%str(x))
next_button = paginator[-1].select_one('a')
logger.debug("Next button: " + next_button.get_text())
if next_button.get_text() or not iterate:
return {'urllist': final_urls}
url = ('https://' + self.getSiteDomain() + next_button.get('href'))

View file

@ -93,6 +93,9 @@ class FireFlyFansNetSiteAdapter(BaseSiteAdapter):
a = soup.find('a', href=re.compile(r"profileshow.aspx\?u="))
self.story.setMetadata('authorId', a['href'].split('=')[1])
if not self.story.getMetadata('authorId'):
logger.warning("Site authorUrl missing authorId, using SiteMissingAuthorId")
self.story.setMetadata('authorId', 'SiteMissingAuthorId')
self.story.setMetadata('authorUrl', 'http://' +
self.host + '/' + a['href'])
self.story.setMetadata('author', a.string)
@ -102,7 +105,6 @@ class FireFlyFansNetSiteAdapter(BaseSiteAdapter):
# to download them one at a time yourself. I'm also setting the status to
# complete
self.add_chapter(self.story.getMetadata('title'), self.url)
self.story.setMetadata('numChapters', 1)
self.story.setMetadata('status', 'Completed')
## some stories do not have a summary listed, so I'm setting it here.

View file

@ -161,7 +161,7 @@ class ImagineEFicComAdapter(BaseSiteAdapter):
self.story.setMetadata('author',a.string)
# Find the chapters:
for chapter in soup.findAll('a', href=re.compile(r'viewstory.php\?sid='+self.story.getMetadata('storyId')+r"&chapter=\d+$")):
for chapter in soup.find_all('a', href=re.compile(r'viewstory.php\?sid='+self.story.getMetadata('storyId')+r"&chapter=\d+$")):
# just in case there's tags, like <i> in chapter titles.
self.add_chapter(chapter,'https://'+self.host+'/'+chapter['href']+addurl)
@ -178,7 +178,7 @@ class ImagineEFicComAdapter(BaseSiteAdapter):
# <span class="label">Rated:</span> NC-17<br /> etc
labels = soup.findAll('span',{'class':'label'})
labels = soup.find_all('span',{'class':'label'})
for labelspan in labels:
value = labelspan.nextSibling
label = labelspan.string
@ -199,22 +199,22 @@ class ImagineEFicComAdapter(BaseSiteAdapter):
self.story.setMetadata('numWords', value)
if 'Categories' in label:
cats = labelspan.parent.findAll('a',href=re.compile(r'browse.php\?type=categories'))
cats = labelspan.parent.find_all('a',href=re.compile(r'browse.php\?type=categories'))
for cat in cats:
self.story.addToList('category',cat.string)
if 'Characters' in label:
chars = labelspan.parent.findAll('a',href=re.compile(r'browse.php\?type=characters'))
chars = labelspan.parent.find_all('a',href=re.compile(r'browse.php\?type=characters'))
for char in chars:
self.story.addToList('characters',char.string)
if 'Genre' in label:
genres = labelspan.parent.findAll('a',href=re.compile(r'browse.php\?type=class&type_id=1'))
genres = labelspan.parent.find_all('a',href=re.compile(r'browse.php\?type=class&type_id=1'))
for genre in genres:
self.story.addToList('genre',genre.string)
if 'Warnings' in label:
warnings = labelspan.parent.findAll('a',href=re.compile(r'browse.php\?type=class&type_id=2'))
warnings = labelspan.parent.find_all('a',href=re.compile(r'browse.php\?type=class&type_id=2'))
for warning in warnings:
self.story.addToList('warnings',warning.string)
@ -238,7 +238,7 @@ class ImagineEFicComAdapter(BaseSiteAdapter):
seriessoup = self.make_soup(self.get_request(series_url))
# can't use ^viewstory...$ in case of higher rated stories with javascript href.
storyas = seriessoup.findAll('a', href=re.compile(r'viewstory.php\?sid=\d+'))
storyas = seriessoup.find_all('a', href=re.compile(r'viewstory.php\?sid=\d+'))
i=1
for a in storyas:
# skip 'report this' and 'TOC' links

View file

@ -125,7 +125,7 @@ class InkBunnyNetSiteAdapter(BaseSiteAdapter):
soup = self.make_soup(self.get_request(url,usecache=False))
# removing all of the scripts
for tag in soup.findAll('script'):
for tag in soup.find_all('script'):
tag.extract()

View file

@ -163,7 +163,7 @@ class KakuyomuJpAdapter(BaseSiteAdapter):
titles = []
nestingLevel = 0
newSection = False
for tocNodeRef in info[workKey]['tableOfContents']:
for tocNodeRef in info[workKey]['tableOfContentsV2']:
tocNode = info[tocNodeRef['__ref']]
if tocNode['chapter'] is not None:
@ -197,8 +197,6 @@ class KakuyomuJpAdapter(BaseSiteAdapter):
self.add_chapter(epTitle, epUrl)
newSection = False
self.story.setMetadata('numChapters', numEpisodes)
logger.debug("Story: <%s>", self.story)
return

View file

@ -144,13 +144,13 @@ class KSArchiveComAdapter(BaseSiteAdapter): # XXX
# Find authorid and URL from... author urls.
pagetitle = soup.find('div',id='pagetitle')
for a in pagetitle.findAll('a', href=re.compile(r"viewuser.php\?uid=\d+")):
for a in pagetitle.find_all('a', href=re.compile(r"viewuser.php\?uid=\d+")):
self.story.addToList('authorId',a['href'].split('=')[1])
self.story.addToList('authorUrl','https://'+self.host+'/'+a['href'])
self.story.addToList('author',stripHTML(a))
# Find the chapters:
for chapter in soup.findAll('a', href=re.compile(r'viewstory.php\?sid='+self.story.getMetadata('storyId')+r"&chapter=\d+$")):
for chapter in soup.find_all('a', href=re.compile(r'viewstory.php\?sid='+self.story.getMetadata('storyId')+r"&chapter=\d+$")):
# just in case there's tags, like <i> in chapter titles.
self.add_chapter(chapter,'https://'+self.host+'/'+chapter['href']+addurl)
@ -166,7 +166,7 @@ class KSArchiveComAdapter(BaseSiteAdapter): # XXX
return ""
# <span class="label">Rated:</span> NC-17<br /> etc
labels = soup.findAll('span',{'class':'label'})
labels = soup.find_all('span',{'class':'label'})
for labelspan in labels:
value = labelspan.nextSibling
label = stripHTML(labelspan)
@ -193,7 +193,7 @@ class KSArchiveComAdapter(BaseSiteAdapter): # XXX
self.story.setMetadata('numWords', value)
if 'Categories' in label:
cats = labelspan.parent.findAll('a',href=re.compile(r'browse.php\?type=categories'))
cats = labelspan.parent.find_all('a',href=re.compile(r'browse.php\?type=categories'))
catstext = [stripHTML(cat) for cat in cats]
for cat in catstext:
# ran across one story with an empty <a href="browse.php?type=categories&amp;catid=1"></a>
@ -204,7 +204,7 @@ class KSArchiveComAdapter(BaseSiteAdapter): # XXX
if 'Characters' in label:
self.story.addToList('characters','Kirk')
self.story.addToList('characters','Spock')
chars = labelspan.parent.findAll('a',href=re.compile(r'browse.php\?type=characters'))
chars = labelspan.parent.find_all('a',href=re.compile(r'browse.php\?type=characters'))
charstext = [stripHTML(char) for char in chars]
for char in charstext:
self.story.addToList('characters',stripHTML(char))
@ -213,7 +213,7 @@ class KSArchiveComAdapter(BaseSiteAdapter): # XXX
## leaving it in. Check to make sure the type_id number
## is correct, though--it's site specific.
if 'Genre' in label:
genres = labelspan.parent.findAll('a',href=re.compile(r'browse.php\?type=class&type_id=1')) # XXX
genres = labelspan.parent.find_all('a',href=re.compile(r'browse.php\?type=class&type_id=1')) # XXX
genrestext = [stripHTML(genre) for genre in genres]
self.genre = ', '.join(genrestext)
for genre in genrestext:
@ -223,7 +223,7 @@ class KSArchiveComAdapter(BaseSiteAdapter): # XXX
## has 'Story Type', which is much more what most sites
## call genre.
if 'Story Type' in label:
genres = labelspan.parent.findAll('a',href=re.compile(r'browse.php\?type=class&type_id=5')) # XXX
genres = labelspan.parent.find_all('a',href=re.compile(r'browse.php\?type=class&type_id=5')) # XXX
genrestext = [stripHTML(genre) for genre in genres]
self.genre = ', '.join(genrestext)
for genre in genrestext:
@ -233,21 +233,21 @@ class KSArchiveComAdapter(BaseSiteAdapter): # XXX
## leaving it in. Check to make sure the type_id number
## is correct, though--it's site specific.
if 'Warnings' in label:
warnings = labelspan.parent.findAll('a',href=re.compile(r'browse.php\?type=class&type_id=2')) # XXX
warnings = labelspan.parent.find_all('a',href=re.compile(r'browse.php\?type=class&type_id=2')) # XXX
warningstext = [stripHTML(warning) for warning in warnings]
self.warning = ', '.join(warningstext)
for warning in warningstext:
self.story.addToList('warnings',stripHTML(warning))
if 'Universe' in label:
universes = labelspan.parent.findAll('a',href=re.compile(r'browse.php\?type=class&type_id=3')) # XXX
universes = labelspan.parent.find_all('a',href=re.compile(r'browse.php\?type=class&type_id=3')) # XXX
universestext = [stripHTML(universe) for universe in universes]
self.universe = ', '.join(universestext)
for universe in universestext:
self.story.addToList('universe',stripHTML(universe))
if 'Crossover Fandom' in label:
crossoverfandoms = labelspan.parent.findAll('a',href=re.compile(r'browse.php\?type=class&type_id=4')) # XXX
crossoverfandoms = labelspan.parent.find_all('a',href=re.compile(r'browse.php\?type=class&type_id=4')) # XXX
crossoverfandomstext = [stripHTML(crossoverfandom) for crossoverfandom in crossoverfandoms]
self.crossoverfandom = ', '.join(crossoverfandomstext)
for crossoverfandom in crossoverfandomstext:
@ -274,7 +274,7 @@ class KSArchiveComAdapter(BaseSiteAdapter): # XXX
series_url = 'https://'+self.host+'/'+a['href']
seriessoup = self.make_soup(self.get_request(series_url))
storyas = seriessoup.findAll('a', href=re.compile(r'viewstory.php\?sid=\d+'))
storyas = seriessoup.find_all('a', href=re.compile(r'viewstory.php\?sid=\d+'))
i=1
for a in storyas:
# skip 'report this' and 'TOC' links

View file

@ -19,6 +19,7 @@ from __future__ import absolute_import
import logging
logger = logging.getLogger(__name__)
import re
import json
from bs4.element import Comment
from ..htmlcleanup import stripHTML
@ -37,7 +38,7 @@ class LiteroticaSiteAdapter(BaseSiteAdapter):
def __init__(self, config, url):
BaseSiteAdapter.__init__(self, config, url)
logger.debug("LiteroticaComAdapter:__init__ - url='%s'" % url)
#logger.debug("LiteroticaComAdapter:__init__ - url='%s'" % url)
# Each adapter needs to have a unique site abbreviation.
self.story.setMetadata('siteabbrev','litero')
@ -53,7 +54,7 @@ class LiteroticaSiteAdapter(BaseSiteAdapter):
## have been keeping the language when 'normalizing' to first
## chapter.
url = re.sub(r"^(https?://)"+LANG_RE+r"(\.i)?",
r"\1\2",
r"https://\2",
url)
url = url.replace('/beta/','/') # to allow beta site URLs.
@ -77,7 +78,7 @@ class LiteroticaSiteAdapter(BaseSiteAdapter):
@classmethod
def getSiteExampleURLs(cls):
return "http://www.literotica.com/s/story-title https://www.literotica.com/series/se/9999999 https://www.literotica.com/s/story-title https://www.literotica.com/i/image-or-comic-title https://www.literotica.com/p/poem-title http://portuguese.literotica.com/s/story-title http://german.literotica.com/s/story-title"
return "https://www.literotica.com/s/story-title https://www.literotica.com/series/se/9999999 https://www.literotica.com/s/story-title https://www.literotica.com/i/image-or-comic-title https://www.literotica.com/p/poem-title https://portuguese.literotica.com/s/story-title https://german.literotica.com/s/story-title"
def getSiteURLPattern(self):
# also https://www.literotica.com/series/se/80075773
@ -95,6 +96,49 @@ class LiteroticaSiteAdapter(BaseSiteAdapter):
self.story.setMetadata('storyId',self.parsedUrl.path.split('/',)[-1])
# logger.debug("language:%s"%self.story.getMetadata('language'))
## apply clean_chapter_titles
def add_chapter(self,chapter_title,url,othermeta={}):
if self.getConfig("clean_chapter_titles"):
storytitle = self.story.getMetadataRaw('title').lower()
chapter_name_type = None
# strip trailing ch or pt before doing the chapter clean.
# doesn't remove from story title metadata
storytitle = re.sub(r'^(.*?)( (ch|pt))?$',r'\1',storytitle)
if chapter_title.lower().startswith(storytitle):
chapter = chapter_title[len(storytitle):].strip()
# logger.debug('\tChapter: "%s"' % chapter)
if chapter == '':
chapter_title = 'Chapter %d' % (self.num_chapters() + 1)
# Sometimes the first chapter does not have type of chapter
if self.num_chapters() == 0:
# logger.debug('\tChapter: first chapter without chapter type')
chapter_name_type = None
else:
separater_char = chapter[0]
# logger.debug('\tseparater_char: "%s"' % separater_char)
chapter = chapter[1:].strip() if separater_char in [":", "-"] else chapter
# logger.debug('\tChapter: "%s"' % chapter)
if chapter.lower().startswith('ch.'):
chapter = chapter[len('ch.'):].strip()
try:
chapter_title = 'Chapter %d' % int(chapter)
except:
chapter_title = 'Chapter %s' % chapter
chapter_name_type = 'Chapter' if chapter_name_type is None else chapter_name_type
# logger.debug('\tChapter: chapter_name_type="%s"' % chapter_name_type)
elif chapter.lower().startswith('pt.'):
chapter = chapter[len('pt.'):].strip()
try:
chapter_title = 'Part %d' % int(chapter)
except:
chapter_title = 'Part %s' % chapter
chapter_name_type = 'Part' if chapter_name_type is None else chapter_name_type
# logger.debug('\tChapter: chapter_name_type="%s"' % chapter_name_type)
elif separater_char in [":", "-"]:
chapter_title = chapter
# logger.debug('\tChapter: taking chapter text as whole')
super(LiteroticaSiteAdapter, self).add_chapter(chapter_title,url,othermeta)
def extractChapterUrlsAndMetadata(self):
"""
In April 2024, site introduced significant changes, including
@ -122,13 +166,22 @@ class LiteroticaSiteAdapter(BaseSiteAdapter):
if "This submission is awaiting moderator's approval" in data:
raise exceptions.StoryDoesNotExist("This submission is awaiting moderator's approval. %s"%self.url)
## 2025Feb - domains other than www now use different HTML.
## Need to look for two different versions of basically
## everything.
## not series URL, assumed to be a chapter. Look for Story
## Info block of post-beta page. I don't think it should happen?
if '/series/se' not in self.url:
if not soup.select_one('div.page__aside'):
#logger.debug(data)
## looking for /series/se URL to indicate this is a
## chapter.
if not soup.select_one('div.page__aside') and not soup.select_one('div.sidebar') and not soup.select_one('div[class^="_sidebar_"]'):
raise exceptions.FailedToDownload("Missing Story Info block, Beta turned off?")
storyseriestag = soup.select_one('a.bn_av')
if not storyseriestag:
storyseriestag = soup.select_one('a[class^="_files__link_"]')
# logger.debug("Story Series Tag:%s"%storyseriestag)
if storyseriestag:
@ -142,8 +195,18 @@ class LiteroticaSiteAdapter(BaseSiteAdapter):
isSingleStory = '/series/se' not in self.url
## common between one-shots and multi-chapters
if not isSingleStory:
# Normilize the url?
state = re.findall(r"prefix\=\"/series/\",state='(.+?)'</script>", data)
json_state = json.loads(state[0].replace("\\'","'").replace("\\\\","\\"))
url_series_id = unicode(re.match(self.getSiteURLPattern(),self.url).group('storyseriesid'))
json_series_id = unicode(json_state['series']['data']['id'])
if json_series_id != url_series_id:
res = re.sub(url_series_id, json_series_id, unicode(self.url))
logger.debug("Normalized url: %s"%res)
self._setURL(res)
## common between one-shots and multi-chapters
# title
self.story.setMetadata('title', stripHTML(soup.select_one('h1')))
# logger.debug(self.story.getMetadata('title'))
@ -157,6 +220,8 @@ class LiteroticaSiteAdapter(BaseSiteAdapter):
## Should change to /authors/ if/when it starts appearing.
## Assuming it's in the same place.
authora = soup.find("a", class_="y_eU")
if not authora:
authora = soup.select_one('a[class^="_author__title"]')
authorurl = authora['href']
if authorurl.startswith('//'):
authorurl = self.parsedUrl.scheme+':'+authorurl
@ -171,19 +236,29 @@ class LiteroticaSiteAdapter(BaseSiteAdapter):
else: # if all else fails
self.story.setMetadata('authorId', stripHTML(authora))
self.story.extendList('eroticatags', [ stripHTML(t).title() for t in soup.select('div#tabpanel-tags a.av_as') ])
if soup.select('div#tabpanel-tags'):
# logger.debug("tags1")
self.story.extendList('eroticatags', [ stripHTML(t).title() for t in soup.select('div#tabpanel-tags a.av_as') ])
if soup.select('div[class^="_widget__tags_"]'):
# logger.debug("tags2")
self.story.extendList('eroticatags', [ stripHTML(t).title() for t in soup.select('div[class^="_widget__tags_"] a[class^="_tag_item_"]') ])
# logger.debug(self.story.getList('eroticatags'))
## look first for 'Series Introduction', then Info panel short desc
## series can have either, so put in common code.
introtag = soup.select_one('div.bp_rh p')
descdiv = soup.select_one('div#tabpanel-info div.bn_B')
desc = []
introtag = soup.select_one('div.bp_rh')
descdiv = soup.select_one('div#tabpanel-info div.bn_B') or \
soup.select_one('div[class^="_tab__pane_"] div[class^="_widget__info_"]')
if introtag and stripHTML(introtag):
# make sure there's something in the tag.
self.setDescription(self.url,introtag)
# logger.debug("intro %s"%introtag)
desc.append(unicode(introtag))
elif descdiv and stripHTML(descdiv):
# make sure there's something in the tag.
self.setDescription(self.url,descdiv)
else:
# logger.debug("desc %s"%descdiv)
desc.append(unicode(descdiv))
if not desc or self.getConfig("include_chapter_descriptions_in_summary"):
## Only for backward compatibility with 'stories' that
## don't have an intro or short desc.
descriptions = []
@ -193,7 +268,9 @@ class LiteroticaSiteAdapter(BaseSiteAdapter):
descriptions.append("%d. %s" % (i + 1, stripHTML(chapterdesctag)))
# now put it back--it's used below
chapterdesctag.append(a)
self.setDescription(authorurl,"<p>"+"</p>\n<p>".join(descriptions)+"</p>")
desc.append(unicode("<p>"+"</p>\n<p>".join(descriptions)+"</p>"))
self.setDescription(self.url,u''.join(desc))
if isSingleStory:
## one-shots don't *display* date info, but they have it
@ -203,16 +280,30 @@ class LiteroticaSiteAdapter(BaseSiteAdapter):
## multichap also have "date_approve", but they have
## several and they're more than just the story chapters.
date = re.search(r'"date_approve":"(\d\d/\d\d/\d\d\d\d)"',data)
if not date:
date = re.search(r'date_approve:"(\d\d/\d\d/\d\d\d\d)"',data)
if date:
dateval = makeDate(date.group(1), self.dateformat)
self.story.setMetadata('datePublished', dateval)
self.story.setMetadata('dateUpdated', dateval)
## one-shots don't have same json data to get aver_rating
## from below. This kludge matches the data_approve
rateall = re.search(r'rate_all:([\d\.]+)',data)
if rateall:
self.story.setMetadata('averrating', '%4.2f' % float(rateall.group(1)))
## one-shots assumed completed.
self.story.setMetadata('status','Completed')
# Add the category from the breadcumb.
self.story.addToList('category', soup.find('div', id='BreadCrumbComponent').findAll('a')[1].string)
breadcrumbs = soup.find('div', id='BreadCrumbComponent')
if not breadcrumbs:
breadcrumbs = soup.select_one('ul[class^="_breadcrumbs_list_"]')
if not breadcrumbs:
# _breadcrumbs_18u7l_1
breadcrumbs = soup.select_one('nav[class^="_breadcrumbs_"]')
self.story.addToList('category', breadcrumbs.find_all('a')[1].string)
## one-shot chapter
self.add_chapter(self.story.getMetadata('title'), self.url)
@ -221,7 +312,8 @@ class LiteroticaSiteAdapter(BaseSiteAdapter):
## Multi-chapter stories. AKA multi-part 'Story Series'.
bn_antags = soup.select('div#tabpanel-info p.bn_an')
# logger.debug(bn_antags)
if bn_antags:
if bn_antags and not self.getConfig("dates_from_chapters"):
## Use dates from series metadata unless dates_from_chapters is enabled
dates = []
for datetag in bn_antags[:2]:
datetxt = stripHTML(datetag)
@ -243,52 +335,11 @@ class LiteroticaSiteAdapter(BaseSiteAdapter):
## category from chapter list
self.story.extendList('category',[ stripHTML(t) for t in soup.select('a.br_rl') ])
storytitle = self.story.getMetadata('title').lower()
chapter_name_type = None
for chapteratag in soup.select('a.br_rj'):
chapter_title = stripHTML(chapteratag)
# logger.debug('\tChapter: "%s"' % chapteratag)
if self.getConfig("clean_chapter_titles"):
# strip trailing ch or pt before doing the chapter clean.
# doesn't remove from story title metadata
storytitle = re.sub(r'^(.*?)( (ch|pt))?$',r'\1',storytitle)
if chapter_title.lower().startswith(storytitle):
chapter = chapter_title[len(storytitle):].strip()
# logger.debug('\tChapter: "%s"' % chapter)
if chapter == '':
chapter_title = 'Chapter %d' % (self.num_chapters() + 1)
# Sometimes the first chapter does not have type of chapter
if self.num_chapters() == 0:
# logger.debug('\tChapter: first chapter without chapter type')
chapter_name_type = None
else:
separater_char = chapter[0]
# logger.debug('\tseparater_char: "%s"' % separater_char)
chapter = chapter[1:].strip() if separater_char in [":", "-"] else chapter
# logger.debug('\tChapter: "%s"' % chapter)
if chapter.lower().startswith('ch.'):
chapter = chapter[len('ch.'):].strip()
try:
chapter_title = 'Chapter %d' % int(chapter)
except:
chapter_title = 'Chapter %s' % chapter
chapter_name_type = 'Chapter' if chapter_name_type is None else chapter_name_type
# logger.debug('\tChapter: chapter_name_type="%s"' % chapter_name_type)
elif chapter.lower().startswith('pt.'):
chapter = chapter[len('pt.'):].strip()
try:
chapter_title = 'Part %d' % int(chapter)
except:
chapter_title = 'Part %s' % chapter
chapter_name_type = 'Part' if chapter_name_type is None else chapter_name_type
# logger.debug('\tChapter: chapter_name_type="%s"' % chapter_name_type)
elif separater_char in [":", "-"]:
chapter_title = chapter
# logger.debug('\tChapter: taking chapter text as whole')
# /series/se does include full URLs current.
chapurl = chapteratag['href']
# logger.debug("Chapter URL: " + chapurl)
self.add_chapter(chapter_title, chapurl)
@ -298,6 +349,7 @@ class LiteroticaSiteAdapter(BaseSiteAdapter):
self.setCoverImage(self.url,coverimg['src'])
#### Attempting averrating from JS metadata.
#### also alternate chapters from json
try:
state_start="state='"
state_end="'</script>"
@ -306,20 +358,48 @@ class LiteroticaSiteAdapter(BaseSiteAdapter):
state = data[i+len(state_start):data.index(state_end,i)].replace("\\'","'").replace("\\\\","\\")
if state:
# logger.debug(state)
import json
json_state = json.loads(state)
# logger.debug(json.dumps(json_state, sort_keys=True,indent=2, separators=(',', ':')))
all_rates = []
## one-shot
if 'story' in json_state:
all_rates = [ float(json_state['story']['data']['rate_all']) ]
## series
elif 'series' in json_state:
if 'series' in json_state:
all_rates = [ float(x['rate_all']) for x in json_state['series']['works'] ]
## Extract dates from chapter approval dates if dates_from_chapters is enabled
if self.getConfig("dates_from_chapters"):
date_approvals = []
for work in json_state['series']['works']:
if 'date_approve' in work:
try:
date_approvals.append(makeDate(work['date_approve'], self.dateformat))
except:
pass
if date_approvals:
# Oldest date is published, newest is updated
date_approvals.sort()
self.story.setMetadata('datePublished', date_approvals[0])
self.story.setMetadata('dateUpdated', date_approvals[-1])
if all_rates:
self.story.setMetadata('averrating', '%4.2f' % (sum(all_rates) / float(len(all_rates))))
## alternate chapters from JSON
if self.num_chapters() < 1:
logger.debug("Getting Chapters from series JSON")
seriesid = json_state.get('series',{}).get('data',{}).get('id',None)
if seriesid:
logger.info("Fetching chapter data from JSON")
logger.debug(seriesid)
series_json = json.loads(self.get_request('https://literotica.com/api/3/series/%s/works'%seriesid))
# logger.debug(json.dumps(series_json, sort_keys=True,indent=2, separators=(',', ':')))
for chap in series_json:
self.add_chapter(chap['title'], 'https://www.literotica.com/s/'+chap['url'])
## Collect tags from series/story page if tags_from_chapters is enabled
if self.getConfig("tags_from_chapters"):
self.story.extendList('eroticatags', [ unicode(t['tag']).title() for t in chap['tags'] ])
except Exception as e:
logger.debug("Processing JSON to find averrating failed. (%s)"%e)
logger.warning("Processing JSON failed. (%s)"%e)
## Features removed because not supportable by new site form:
## averrating metadata entry
@ -328,14 +408,13 @@ class LiteroticaSiteAdapter(BaseSiteAdapter):
return
def getPageText(self, raw_page, url):
# logger.debug('Getting page text')
# logger.debug(soup)
logger.debug('Getting page text')
raw_page = raw_page.replace('<div class="b-story-body-x x-r15"><div><p>','<div class="b-story-body-x x-r15"><div>')
# logger.debug("\tChapter text: %s" % raw_page)
# logger.debug("\tChapter text: %s" % raw_page)
page_soup = self.make_soup(raw_page)
[comment.extract() for comment in page_soup.findAll(string=lambda text:isinstance(text, Comment))]
[comment.extract() for comment in page_soup.find_all(string=lambda text:isinstance(text, Comment))]
fullhtml = ""
for aa_ht_div in page_soup.find_all('div', 'aa_ht'):
for aa_ht_div in page_soup.find_all('div', 'aa_ht') + page_soup.select('div[class^="_article__content_"]'):
if aa_ht_div.div:
html = unicode(aa_ht_div.div)
# Strip some starting and ending tags,
@ -353,6 +432,13 @@ class LiteroticaSiteAdapter(BaseSiteAdapter):
raw_page = self.get_request(url)
page_soup = self.make_soup(raw_page)
pages = page_soup.find('div',class_='l_bH')
if not pages:
pages = page_soup.select_one('div._pagination_h0sum_1')
if not pages:
pages = page_soup.select_one('div.clearfix.panel._pagination_1400x_1')
if not pages:
pages = page_soup.select_one('div[class^="panel clearfix _pagination_"]')
# logger.debug(pages)
fullhtml = ""
chapter_description = ''
@ -365,7 +451,10 @@ class LiteroticaSiteAdapter(BaseSiteAdapter):
## look for highest numbered page, they're not all listed
## when there are many.
last_page_link = pages.find_all('a', class_='l_bJ')[-1]
last_page_links = pages.find_all('a', class_='l_bJ')
if not last_page_links:
last_page_links = pages.select('a[class^="_pagination__item_"]')
last_page_link = last_page_links[-1]
last_page_no = int(urlparse.parse_qs(last_page_link['href'].split('?')[1])['page'][0])
# logger.debug(last_page_no)
for page_no in range(2, last_page_no+1):
@ -374,7 +463,7 @@ class LiteroticaSiteAdapter(BaseSiteAdapter):
raw_page = self.get_request(page_url)
fullhtml += self.getPageText(raw_page, url)
# logger.debug(fullhtml)
#logger.debug(fullhtml)
page_soup = self.make_soup(fullhtml)
fullhtml = self.utf8FromSoup(url, self.make_soup(fullhtml))
fullhtml = chapter_description + fullhtml
@ -382,6 +471,123 @@ class LiteroticaSiteAdapter(BaseSiteAdapter):
return fullhtml
def get_urls_from_page(self,url,normalize):
from ..geturls import get_urls_from_html
## hook for logins, etc.
self.before_get_urls_from_page(url,normalize)
# this way it uses User-Agent or other special settings.
data = self.get_request(url,usecache=False)
soup = self.make_soup(data)
page_urls = get_urls_from_html(soup, url, configuration=self.configuration, normalize=normalize)
if not self.getConfig("fetch_stories_from_api",True):
logger.debug('fetch_stories_from_api Not enabled')
return {'urllist': page_urls}
user_story_list = re.search(r'literotica\.com/authors/.+?/lists\?listid=(?P<list_id>\d+)', url)
fav_authors = re.search(r'literotica\.com/authors/.+?/favorites', url)
written = re.search(r'literotica.com/authors/.+?/works/', url)
logger.debug((bool(user_story_list), bool(fav_authors), bool(written)))
# If the url is not supported
if not user_story_list and not fav_authors and not written:
logger.debug('No supported link. %s', url)
return {'urllist':page_urls}
# Grabbing the main list where chapters are contained.
if user_story_list:
js_story_list = re.search(r';\$R\[\d+?\]\(\$R\[\d+?\],\$R\[\d+?\]\);\$R\[\d+?\]\(\$R\[\d+?\],\$R\[\d+?\]=\{success:!\d,current_page:(?P<current_page>\d+?),last_page:(?P<last_page>\d+?),total:\d+?,per_page:\d+,(has_series:!\d)?data:\$R\[\d+?\]=\[\$R\[\d+?\]=(?P<data>.+)\}\]\}\);', data) # }] } } }); \$R\[\d+?\]\(\$R\[\d+?\],\$R\[\d+?\]\);\$R\[\d+?]\(\$R\[\d+?\],\$R\[\d+?\]=\{sliders:
logger.debug('user_story_list ID [%s]'%user_story_list.group('list_id'))
else:
js_story_list = re.search(r'\$R\[\d+?\]\(\$R\[\d+?\],\$R\[\d+?\]={current_page:(?P<current_page>\d+?),last_page:(?P<last_page>\d+?),total:\d+?,per_page:\d+,(has_series:!\d,)?data:\$R\[\d+\]=\[\$R\[\d+\]=\{(?!aim)(?P<data>.+)\}\);_\$HY\.r\[', data)
# In case the regex becomes outdated
if not js_story_list:
logger.debug('Failed to grab data from the js.')
return {'urllist':page_urls}
user = None
script_tags = soup.find_all('script')
for script in script_tags:
if not script.string:
continue
# Getting author from the js.
user = re.search(r'_\$HY\.r\[\"AuthorQuery\[\\\"(?P<author>.+?)\\\"\]\"\]', script.string)
if user != None:
logger.debug("User: [%s]"%user.group('author'))
break
else:
logger.debug('Failed to get a username')
return {'urllist': page_urls}
# Extract the current (should be 1) and last page numbers from the js.
logger.debug("Pages %s/%s"%(js_story_list.group('current_page'), js_story_list.group('last_page')))
urls = []
# Necessary to format a proper link as there were no visible data specifying what kind of link that should be.
cat_to_link = {'adult-comics': 'i', 'erotic-art': 'i', 'illustrated-poetry': 'p', 'erotic-audio-poetry': 'p', 'erotic-poetry': 'p', 'non-erotic-poetry': 'p'}
stories_found = re.findall(r"category_info:\$R\[.*?type:\".+?\",pageUrl:\"(.+?)\"}.+?,type:\"(.+?)\",url:\"(.+?)\",", js_story_list.group('data'))
for story in stories_found:
story_category, story_type, story_url = story
urls.append('https://www.literotica.com/%s/%s'%(cat_to_link.get(story_category, 's'), story_url))
# Removes the duplicates
seen = set()
urls = [x for x in (page_urls + urls) if not (x in seen or seen.add(x))]
logger.debug("Found [%s] stories so far."%len(urls))
# Sometimes the rest of the stories are burried in the js so no fetching in necessery.
if js_story_list.group('last_page') == js_story_list.group('current_page'):
return {'urllist': urls}
user = urlparse.quote(user.group(1))
logger.debug("Escaped user: [%s]"%user)
if written:
category = re.search(r"_\$HY\.r\[\"AuthorSeriesAndWorksQuery\[\\\".+?\\\",\\\"\D+?\\\",\\\"(?P<type>\D+?)\\\"\]\"\]=\$R\[\d+?\]=\$R\[\d+?\]\(\$R\[\d+?\]=\{", data)
elif fav_authors:
category = re.search(r"_\$HY\.r\[\"AuthorFavoriteWorksQuery\[\\\".+?\\\",\\\"(?P<type>\D+?)\\\",\d\]\"\]=\$R\[\d+?\]=\$R\[\d+?\]\(\$R\[\d+?\]={", data)
if not user_story_list and not category:
logger.debug("Type of works not found")
return {'urllist': urls}
last_page = int(js_story_list.group('last_page'))
current_page = int(js_story_list.group('current_page')) + 1
# Fetching the remaining urls from api. Can't trust the number given about the pages left from a website. Sometimes even the api returns outdated number of pages.
while current_page <= last_page:
i = len(urls)
logger.debug("Pages %s/%s"%(current_page, int(last_page)))
if fav_authors:
jsn = self.get_request('https://literotica.com/api/3/users/{}/favorite/works?params=%7B%22page%22%3A{}%2C%22pageSize%22%3A50%2C%22type%22%3A%22{}%22%2C%22withSeriesDetails%22%3Atrue%7D'.format(user, current_page, category.group('type')))
elif user_story_list:
jsn = self.get_request('https://literotica.com/api/3/users/{}/list/{}?params=%7B%22page%22%3A{}%2C%22pageSize%22%3A50%2C%22withSeriesDetails%22%3Atrue%7D'.format(user, user_story_list.group('list_id'), current_page))
else:
jsn = self.get_request('https://literotica.com/api/3/users/{}/series_and_works?params=%7B%22page%22%3A{}%2C%22pageSize%22%3A50%2C%22sort%22%3A%22date%22%2C%22type%22%3A%22{}%22%2C%22listType%22%3A%22expanded%22%7D'.format(user, current_page, category.group('type')))
urls_data = json.loads(jsn)
last_page = urls_data["last_page"]
current_page = int(urls_data["current_page"]) + 1
for story in urls_data['data']:
#logger.debug('parts' in story)
if story['url'] and story.get('work_count') == None:
urls.append('https://www.literotica.com/%s/%s'%(cat_to_link.get(story["category_info"]["pageUrl"], 's'), str(story['url'])))
continue
# Most of the time series has no url specified and contains all of the story links belonging to the series
urls.append('https://www.literotica.com/series/se/%s'%str(story['id']))
for series_story in story['parts']:
urls.append('https://www.literotica.com/%s/%s'%(cat_to_link.get(series_story["category_info"]["pageUrl"], 's'), str(series_story['url'])))
logger.debug("Found [%s] stories."%(len(urls) - i))
# Again removing duplicates.
seen = set()
urls = [x for x in urls if not (x in seen or seen.add(x))]
logger.debug("Found total of [%s] stories"%len(urls))
return {'urllist':urls}
def getClass():
return LiteroticaSiteAdapter

View file

@ -116,7 +116,7 @@ class LumosSycophantHexComAdapter(BaseSiteAdapter):
self.story.setMetadata('rating', rating)
# Find the chapters:
for chapter in soup.findAll('a', href=re.compile(r'viewstory.php\?sid='+self.story.getMetadata('storyId')+r"&chapter=\d+$")):
for chapter in soup.find_all('a', href=re.compile(r'viewstory.php\?sid='+self.story.getMetadata('storyId')+r"&chapter=\d+$")):
# just in case there's tags, like <i> in chapter titles.
self.add_chapter(chapter,'http://'+self.host+'/'+chapter['href']+addurl)
@ -134,7 +134,7 @@ class LumosSycophantHexComAdapter(BaseSiteAdapter):
# <span class="label">Rated:</span> NC-17<br /> etc
labels = soup.findAll('span',{'class':'label'})
labels = soup.find_all('span',{'class':'label'})
value = labels[0].previousSibling
svalue = ""
@ -154,22 +154,22 @@ class LumosSycophantHexComAdapter(BaseSiteAdapter):
self.story.setMetadata('numWords', value.split(' -')[0])
if 'Categories' in label:
cats = labelspan.parent.findAll('a',href=re.compile(r'browse.php\?type=categories'))
cats = labelspan.parent.find_all('a',href=re.compile(r'browse.php\?type=categories'))
for cat in cats:
self.story.addToList('category',cat.string)
if 'Characters' in label:
chars = labelspan.parent.findAll('a',href=re.compile(r'browse.php\?type=characters'))
chars = labelspan.parent.find_all('a',href=re.compile(r'browse.php\?type=characters'))
for char in chars:
self.story.addToList('characters',char.string)
if 'Genre' in label:
genres = labelspan.parent.findAll('a',href=re.compile(r'browse.php\?type=class&type_id=1'))
genres = labelspan.parent.find_all('a',href=re.compile(r'browse.php\?type=class&type_id=1'))
for genre in genres:
self.story.addToList('genre',genre.string)
if 'Warnings' in label:
warnings = labelspan.parent.findAll('a',href=re.compile(r'browse.php\?type=class&type_id=2'))
warnings = labelspan.parent.find_all('a',href=re.compile(r'browse.php\?type=class&type_id=2'))
for warning in warnings:
self.story.addToList('warnings',warning.string)
@ -194,7 +194,7 @@ class LumosSycophantHexComAdapter(BaseSiteAdapter):
series_url = 'http://'+self.host+'/'+a['href']
seriessoup = self.make_soup(self.get_request(series_url))
storyas = seriessoup.findAll('a', href=re.compile(r'^viewstory.php\?sid=\d+$'))
storyas = seriessoup.find_all('a', href=re.compile(r'^viewstory.php\?sid=\d+$'))
i=1
for a in storyas:
if a['href'] == ('viewstory.php?sid='+self.story.getMetadata('storyId')):

View file

@ -162,7 +162,7 @@ class MassEffect2InAdapter(BaseSiteAdapter):
self.story.extendList('authorId', [authorId])
self.story.extendList('authorUrl', [authorUrl])
if not self.story.getMetadata('rating'):
if not self.story.getMetadataRaw('rating'):
ratingTitle = chapter.getRatingTitle()
if ratingTitle:
self.story.setMetadata('rating', ratingTitle)
@ -204,7 +204,6 @@ class MassEffect2InAdapter(BaseSiteAdapter):
self.story.setMetadata('datePublished', datePublished)
self.story.setMetadata('dateUpdated', dateUpdated)
self.story.setMetadata('numWords', unicode(wordCount))
self.story.setMetadata('numChapters', len(chapters))
# Site-specific metadata.
self.story.setMetadata('language', self.SITE_LANGUAGE)
@ -678,7 +677,7 @@ class Chapter(object):
def _excludeEditorSignature(self, root):
"""Exclude editor signature from within `root' element."""
for stringNode in root.findAll(string=True):
for stringNode in root.find_all(string=True):
if re.match(self.SIGNED_PATTERN, textNode.string):
editorLink = textNode.findNext('a')
if editorLink:

View file

@ -64,7 +64,9 @@ class MCStoriesComSiteAdapter(BaseSiteAdapter):
return "https://mcstories.com/StoryTitle/ https://mcstories.com/StoryTitle/index.html https://mcstories.com/StoryTitle/StoryTitle1.html"
def getSiteURLPattern(self):
return r"https?://(www\.)?mcstories\.com/([a-zA-Z0-9_-]+)/"
## Note that this uses a regular expression *negative*
## lookahead--story URLs *can't* have /Titles/ /Authors/ etc.
return r"https?://(www\.)?mcstories\.com(?!/(Titles|Authors|Tags|ReadersPicks)/)/[a-zA-Z0-9_-]+/"
def extractChapterUrlsAndMetadata(self):
"""

View file

@ -148,12 +148,12 @@ class MediaMinerOrgSiteAdapter(BaseSiteAdapter):
# category
# <a href="/fanfic/src.php/a/567">Ranma 1/2</a>
for a in soup.findAll('a',href=re.compile(r"^/fanfic/a/")):
for a in soup.find_all('a',href=re.compile(r"^/fanfic/a/")):
self.story.addToList('category',a.string)
# genre
# <a href="/fanfic/src.php/g/567">Ranma 1/2</a>
for a in soup.findAll('a',href=re.compile(r"^/fanfic/src.php/g/")):
for a in soup.find_all('a',href=re.compile(r"^/fanfic/src.php/g/")):
self.story.addToList('genre',a.string)
metasoup = soup.find("div",{"class":"post-meta"})

View file

@ -154,7 +154,7 @@ class MidnightwhispersAdapter(BaseSiteAdapter): # XXX
self.story.setMetadata('author',a.string)
# Find the chapters:
for chapter in soup.findAll('a', href=re.compile(r'viewstory.php\?sid='+self.story.getMetadata('storyId')+r"&chapter=\d+$")):
for chapter in soup.find_all('a', href=re.compile(r'viewstory.php\?sid='+self.story.getMetadata('storyId')+r"&chapter=\d+$")):
# just in case there's tags, like <i> in chapter titles.
self.add_chapter(chapter,'https://'+self.host+'/'+chapter['href']+addurl)
@ -170,7 +170,7 @@ class MidnightwhispersAdapter(BaseSiteAdapter): # XXX
return ""
# <span class="label">Rated:</span> NC-17<br /> etc
labels = soup.findAll('span',{'class':'label'})
labels = soup.find_all('span',{'class':'label'})
for labelspan in labels:
value = labelspan.nextSibling
label = labelspan.string
@ -191,13 +191,13 @@ class MidnightwhispersAdapter(BaseSiteAdapter): # XXX
self.story.setMetadata('numWords', value)
if 'Categories' in label:
cats = labelspan.parent.findAll('a',href=re.compile(r'browse.php\?type=categories'))
cats = labelspan.parent.find_all('a',href=re.compile(r'browse.php\?type=categories'))
catstext = [cat.string for cat in cats]
for cat in catstext:
self.story.addToList('category',cat.string)
if 'Characters' in label:
chars = labelspan.parent.findAll('a',href=re.compile(r'browse.php\?type=characters'))
chars = labelspan.parent.find_all('a',href=re.compile(r'browse.php\?type=characters'))
charstext = [char.string for char in chars]
for char in charstext:
self.story.addToList('characters',char.string)
@ -206,7 +206,7 @@ class MidnightwhispersAdapter(BaseSiteAdapter): # XXX
## leaving it in. Check to make sure the type_id number
## is correct, though--it's site specific.
if 'Genre' in label:
genres = labelspan.parent.findAll('a',href=re.compile(r'browse.php\?type=class&type_id=2')) # XXX
genres = labelspan.parent.find_all('a',href=re.compile(r'browse.php\?type=class&type_id=2')) # XXX
genrestext = [genre.string for genre in genres]
self.genre = ', '.join(genrestext)
for genre in genrestext:
@ -216,7 +216,7 @@ class MidnightwhispersAdapter(BaseSiteAdapter): # XXX
## leaving it in. Check to make sure the type_id number
## is correct, though--it's site specific.
if 'Warnings' in label:
warnings = labelspan.parent.findAll('a',href=re.compile(r'browse.php\?type=class&type_id=2')) # XXX
warnings = labelspan.parent.find_all('a',href=re.compile(r'browse.php\?type=class&type_id=2')) # XXX
warningstext = [warning.string for warning in warnings]
self.warning = ', '.join(warningstext)
for warning in warningstext:
@ -243,7 +243,7 @@ class MidnightwhispersAdapter(BaseSiteAdapter): # XXX
series_url = 'https://'+self.host+'/'+a['href']
seriessoup = self.make_soup(self.get_request(series_url))
storyas = seriessoup.findAll('a', href=re.compile(r'^viewstory.php\?sid=\d+$'))
storyas = seriessoup.find_all('a', href=re.compile(r'^viewstory.php\?sid=\d+$'))
i=1
for a in storyas:
# skip 'report this' and 'TOC' links

View file

@ -195,7 +195,7 @@ class LightNovelGateSiteAdapter(BaseSiteAdapter):
[a.extract() for a in story.find_all('a')]
# Some tags have non-standard tag name.
for tag in story.findAll(recursive=True):
for tag in story.find_all(recursive=True):
if tag.name not in HTML_TAGS:
tag.name = 'span'

View file

@ -137,14 +137,14 @@ class OcclumencySycophantHexComAdapter(BaseSiteAdapter):
try:
# in case link points somewhere other than the first chapter
a = soup.findAll('option')[1]['value']
a = soup.find_all('option')[1]['value']
self.story.setMetadata('storyId',a.split('=',)[1])
url = 'http://'+self.host+'/'+a
soup = self.make_soup(self.get_request(url))
except:
pass
for info in asoup.findAll('table', {'class' : 'border'}):
for info in asoup.find_all('table', {'class' : 'border'}):
a = info.find('a', href=re.compile(r'viewstory.php\?sid='+self.story.getMetadata('storyId')+"$"))
if a != None:
self.story.setMetadata('title',stripHTML(a))
@ -152,7 +152,7 @@ class OcclumencySycophantHexComAdapter(BaseSiteAdapter):
# Find the chapters:
chapters=soup.findAll('a', href=re.compile(r'viewstory.php\?sid=\d+&i=1$'))
chapters=soup.find_all('a', href=re.compile(r'viewstory.php\?sid=\d+&i=1$'))
if len(chapters) == 0:
self.add_chapter(self.story.getMetadata('title'),url)
else:
@ -171,7 +171,7 @@ class OcclumencySycophantHexComAdapter(BaseSiteAdapter):
except:
return ""
cats = info.findAll('a',href=re.compile('categories.php'))
cats = info.find_all('a',href=re.compile('categories.php'))
for cat in cats:
self.story.addToList('category',cat.string)
@ -188,7 +188,7 @@ class OcclumencySycophantHexComAdapter(BaseSiteAdapter):
self.setDescription(url,svalue)
# <span class="label">Rated:</span> NC-17<br /> etc
labels = info.findAll('b')
labels = info.find_all('b')
for labelspan in labels:
value = labelspan.nextSibling
label = stripHTML(labelspan)

View file

@ -93,26 +93,26 @@ class PhoenixSongNetAdapter(BaseSiteAdapter):
chapters = soup.find('select')
if chapters == None:
self.add_chapter(self.story.getMetadata('title'),url)
for b in soup.findAll('b'):
for b in soup.find_all('b'):
if b.text == "Updated":
date = b.nextSibling.string.split(': ')[1].split(',')
self.story.setMetadata('datePublished', makeDate(date[0]+date[1], self.dateformat))
self.story.setMetadata('dateUpdated', makeDate(date[0]+date[1], self.dateformat))
else:
i = 0
chapters = chapters.findAll('option')
chapters = chapters.find_all('option')
for chapter in chapters:
self.add_chapter(chapter,'https://'+self.host+chapter['value'])
if i == 0:
self.story.setMetadata('storyId',chapter['value'].split('/')[3])
head = self.make_soup(self.get_request('https://'+self.host+chapter['value'])).findAll('b')
head = self.make_soup(self.get_request('https://'+self.host+chapter['value'])).find_all('b')
for b in head:
if b.text == "Updated":
date = b.nextSibling.string.split(': ')[1].split(',')
self.story.setMetadata('datePublished', makeDate(date[0]+date[1], self.dateformat))
if i == (len(chapters)-1):
head = self.make_soup(self.get_request('https://'+self.host+chapter['value'])).findAll('b')
head = self.make_soup(self.get_request('https://'+self.host+chapter['value'])).find_all('b')
for b in head:
if b.text == "Updated":
date = b.nextSibling.string.split(': ')[1].split(',')
@ -160,20 +160,20 @@ class PhoenixSongNetAdapter(BaseSiteAdapter):
soup = self.make_soup(self.get_request(url))
chapter=self.make_soup('<div class="story"></div>')
for p in soup.findAll(['p','blockquote']):
for p in soup.find_all(['p','blockquote']):
if "This is for problems with the formatting or the layout of the chapter." in stripHTML(p):
break
chapter.append(p)
for a in chapter.findAll('div'):
for a in chapter.find_all('div'):
a.extract()
for a in chapter.findAll('table'):
for a in chapter.find_all('table'):
a.extract()
for a in chapter.findAll('script'):
for a in chapter.find_all('script'):
a.extract()
for a in chapter.findAll('form'):
for a in chapter.find_all('form'):
a.extract()
for a in chapter.findAll('textarea'):
for a in chapter.find_all('textarea'):
a.extract()

View file

@ -80,7 +80,7 @@ class PotionsAndSnitchesOrgSiteAdapter(BaseSiteAdapter):
self.story.setMetadata('author',a.string)
# Find the chapters:
for chapter in soup.findAll('a', href=re.compile(r'viewstory.php\?sid='+self.story.getMetadata('storyId')+r"&chapter=\d+$")):
for chapter in soup.find_all('a', href=re.compile(r'viewstory.php\?sid='+self.story.getMetadata('storyId')+r"&chapter=\d+$")):
# just in case there's tags, like <i> in chapter titles.
self.add_chapter(chapter,'http://'+self.host+'/fanfiction/'+chapter['href'])
@ -92,7 +92,7 @@ class PotionsAndSnitchesOrgSiteAdapter(BaseSiteAdapter):
return ""
# <span class="label">Rated:</span> NC-17<br /> etc
labels = soup.findAll('span',{'class':'label'})
labels = soup.find_all('span',{'class':'label'})
for labelspan in labels:
value = labelspan.nextSibling
label = labelspan.string
@ -116,13 +116,13 @@ class PotionsAndSnitchesOrgSiteAdapter(BaseSiteAdapter):
self.story.setMetadata('reads', value)
if 'Categories' in label:
cats = labelspan.parent.findAll('a',href=re.compile(r'browse.php\?type=categories'))
cats = labelspan.parent.find_all('a',href=re.compile(r'browse.php\?type=categories'))
catstext = [cat.string for cat in cats]
for cat in catstext:
self.story.addToList('category',cat.string)
if 'Characters' in label:
chars = labelspan.parent.findAll('a',href=re.compile(r'browse.php\?type=characters'))
chars = labelspan.parent.find_all('a',href=re.compile(r'browse.php\?type=characters'))
charstext = [char.string for char in chars]
for char in charstext:
if "Snape and Harry (required)" in char:
@ -132,27 +132,27 @@ class PotionsAndSnitchesOrgSiteAdapter(BaseSiteAdapter):
self.story.addToList('characters',char.string)
if 'Warning' in label:
warnings = labelspan.parent.findAll('a',href=re.compile(r'browse.php\?type=class'))
warnings = labelspan.parent.find_all('a',href=re.compile(r'browse.php\?type=class'))
for warning in warnings:
self.story.addToList('warnings',stripHTML(warning))
if 'Genre' in label:
genres = labelspan.parent.findAll('a',href=re.compile(r'browse.php\?type=class'))
genres = labelspan.parent.find_all('a',href=re.compile(r'browse.php\?type=class'))
for genre in genres:
self.story.addToList('genre',stripHTML(genre))
if 'Takes Place' in label:
takesplaces = labelspan.parent.findAll('a',href=re.compile(r'browse.php\?type=class'))
takesplaces = labelspan.parent.find_all('a',href=re.compile(r'browse.php\?type=class'))
for takesplace in takesplaces:
self.story.addToList('takesplaces',stripHTML(takesplace))
if 'Snape flavour' in label:
snapeflavours = labelspan.parent.findAll('a',href=re.compile(r'browse.php\?type=class'))
snapeflavours = labelspan.parent.find_all('a',href=re.compile(r'browse.php\?type=class'))
for snapeflavour in snapeflavours:
self.story.addToList('snapeflavours',stripHTML(snapeflavour))
if 'Tags' in label:
sitetags = labelspan.parent.findAll('a',href=re.compile(r'browse.php\?type=class'))
sitetags = labelspan.parent.find_all('a',href=re.compile(r'browse.php\?type=class'))
for sitetag in sitetags:
self.story.addToList('sitetags',stripHTML(sitetag))
@ -176,7 +176,7 @@ class PotionsAndSnitchesOrgSiteAdapter(BaseSiteAdapter):
series_url = 'http://'+self.host+'/fanfiction/'+a['href']
seriessoup = self.make_soup(self.get_request(series_url))
storyas = seriessoup.findAll('a', href=re.compile(r'^viewstory.php\?sid=\d+$'))
storyas = seriessoup.find_all('a', href=re.compile(r'^viewstory.php\?sid=\d+$'))
i=1
for a in storyas:
if a['href'] == ('viewstory.php?sid='+self.story.getMetadata('storyId')):

View file

@ -121,7 +121,7 @@ class PretenderCenterComAdapter(BaseSiteAdapter):
self.story.setMetadata('author',a.string)
# Find the chapters:
for chapter in soup.findAll('a', href=re.compile(r'viewstory.php\?sid='+self.story.getMetadata('storyId')+r"&chapter=\d+$")):
for chapter in soup.find_all('a', href=re.compile(r'viewstory.php\?sid='+self.story.getMetadata('storyId')+r"&chapter=\d+$")):
# just in case there's tags, like <i> in chapter titles.
self.add_chapter(chapter,'https://'+self.host+'/missingpieces/'+chapter['href']+addurl)
@ -138,7 +138,7 @@ class PretenderCenterComAdapter(BaseSiteAdapter):
# <span class="label">Rated:</span> NC-17<br /> etc
labels = soup.findAll('span',{'class':'label'})
labels = soup.find_all('span',{'class':'label'})
for labelspan in labels:
value = labelspan.nextSibling
label = labelspan.string
@ -159,22 +159,22 @@ class PretenderCenterComAdapter(BaseSiteAdapter):
self.story.setMetadata('numWords', value)
if 'Categories' in label:
cats = labelspan.parent.findAll('a',href=re.compile(r'browse.php\?type=categories'))
cats = labelspan.parent.find_all('a',href=re.compile(r'browse.php\?type=categories'))
for cat in cats:
self.story.addToList('category',cat.string)
if 'Characters' in label:
chars = labelspan.parent.findAll('a',href=re.compile(r'browse.php\?type=characters'))
chars = labelspan.parent.find_all('a',href=re.compile(r'browse.php\?type=characters'))
for char in chars:
self.story.addToList('characters',char.string)
if 'Genre' in label:
genres = labelspan.parent.findAll('a',href=re.compile(r'browse.php\?type=class&type_id=1')) # XXX
genres = labelspan.parent.find_all('a',href=re.compile(r'browse.php\?type=class&type_id=1')) # XXX
for genre in genres:
self.story.addToList('genre',genre.string)
if 'Warnings' in label:
warnings = labelspan.parent.findAll('a',href=re.compile(r'browse.php\?type=class&type_id=2')) # XXX
warnings = labelspan.parent.find_all('a',href=re.compile(r'browse.php\?type=class&type_id=2')) # XXX
for warning in warnings:
self.story.addToList('warnings',warning.string)
@ -198,7 +198,7 @@ class PretenderCenterComAdapter(BaseSiteAdapter):
seriessoup = self.make_soup(self.get_request(series_url))
# can't use ^viewstory...$ in case of higher rated stories with javascript href.
storyas = seriessoup.findAll('a', href=re.compile(r'viewstory.php\?sid=\d+'))
storyas = seriessoup.find_all('a', href=re.compile(r'viewstory.php\?sid=\d+'))
i=1
for a in storyas:
# skip 'report this' and 'TOC' links

View file

@ -111,7 +111,7 @@ class PsychFicComAdapter(BaseSiteAdapter):
self.story.setMetadata('author',a.string)
# Find the chapters:
for chapter in soup.findAll('a', href=re.compile(r'viewstory.php\?sid='+self.story.getMetadata('storyId')+r"&chapter=\d+$")):
for chapter in soup.find_all('a', href=re.compile(r'viewstory.php\?sid='+self.story.getMetadata('storyId')+r"&chapter=\d+$")):
# just in case there's tags, like <i> in chapter titles.
self.add_chapter(chapter,'http://'+self.host+'/'+chapter['href']+addurl)
@ -126,7 +126,7 @@ class PsychFicComAdapter(BaseSiteAdapter):
except:
return ""
labels = soup.findAll('span',{'class':'label'})
labels = soup.find_all('span',{'class':'label'})
for labelspan in labels:
value = labelspan.nextSibling
label = labelspan.string
@ -147,22 +147,22 @@ class PsychFicComAdapter(BaseSiteAdapter):
self.story.setMetadata('numWords', value)
if 'Categories' in label:
cats = labelspan.parent.findAll('a',href=re.compile(r'browse.php\?type=categories'))
cats = labelspan.parent.find_all('a',href=re.compile(r'browse.php\?type=categories'))
for cat in cats:
self.story.addToList('category',cat.string)
if 'Characters' in label:
chars = labelspan.parent.findAll('a',href=re.compile(r'browse.php\?type=characters'))
chars = labelspan.parent.find_all('a',href=re.compile(r'browse.php\?type=characters'))
for char in chars:
self.story.addToList('characters',char.string)
if 'Genre' in label:
genres = labelspan.parent.findAll('a',href=re.compile(r'browse.php\?type=class&type_id=1'))
genres = labelspan.parent.find_all('a',href=re.compile(r'browse.php\?type=class&type_id=1'))
for genre in genres:
self.story.addToList('genre',genre.string)
if 'Warnings' in label:
warnings = labelspan.parent.findAll('a',href=re.compile(r'browse.php\?type=class&type_id=2'))
warnings = labelspan.parent.find_all('a',href=re.compile(r'browse.php\?type=class&type_id=2'))
for warning in warnings:
self.story.addToList('warnings',warning.string)
@ -186,7 +186,7 @@ class PsychFicComAdapter(BaseSiteAdapter):
series_url = 'http://'+self.host+'/'+a['href']
seriessoup = self.make_soup(self.get_request(series_url))
storyas = seriessoup.findAll('a', href=re.compile(r'^viewstory.php\?sid=\d+$'))
storyas = seriessoup.find_all('a', href=re.compile(r'^viewstory.php\?sid=\d+$'))
i=1
for a in storyas:
if a['href'] == ('viewstory.php?sid='+self.story.getMetadata('storyId')):

View file

@ -104,6 +104,42 @@ class RoyalRoadAdapter(BaseSiteAdapter):
def getSiteURLPattern(self):
return "https?"+re.escape("://")+r"(www\.|)royalroadl?\.com/fiction/\d+(/.*)?$"
# rr won't send you future updates if you aren't 'caught up'
# on the story. Login isn't required but logging in will
# mark stories you've downloaded as 'read' on rr.
def performLogin(self):
params = {}
if self.password:
params['Email'] = self.username
params['password'] = self.password
else:
params['Email'] = self.getConfig("username")
params['password'] = self.getConfig("password")
if not params['password']:
return
loginUrl = 'https://' + self.getSiteDomain() + '/account/login'
logger.debug("Will now login to URL (%s) as (%s)" % (loginUrl,
params['Email']))
## need to pull empty login page first to get request token
soup = self.make_soup(self.get_request(loginUrl))
## FYI, this will fail if cookiejar is shared, but
## use_basic_cache is false.
params['__RequestVerificationToken']=soup.find('input', {'name':'__RequestVerificationToken'})['value']
d = self.post_request(loginUrl, params)
if "Sign in" in d : #Member Account
logger.info("Failed to login to URL %s as %s (requires Email not name)" % (loginUrl,
params['Email']))
raise exceptions.FailedToLogin(self.url,"Failed to login as %s (RoyalRoad requires Email not name)" % params['Email'])
return False
else:
return True
## RR chapter URL only requires the chapter ID number field to be correct, story ID and title values are ignored
## URL format after the domain /fiction/ is long form, storyID/storyTitle/chapter/chapterID/chapterTitle
## short form has /fiction/chapter/chapterID both forms have optional final /
@ -160,6 +196,9 @@ class RoyalRoadAdapter(BaseSiteAdapter):
url = self.url
logger.debug("URL: "+url)
# Log in so site will mark the chapers as read
self.performLogin()
data = self.get_request(url)
soup = self.make_soup(data)
@ -187,7 +226,7 @@ class RoyalRoadAdapter(BaseSiteAdapter):
chapters = soup.find('table',{'id':'chapters'}).find('tbody')
tds = [tr.findAll('td') for tr in chapters.findAll('tr')]
tds = [tr.find_all('td') for tr in chapters.find_all('tr')]
if not tds:
raise exceptions.FailedToDownload(
@ -227,6 +266,8 @@ class RoyalRoadAdapter(BaseSiteAdapter):
self.story.setMetadata('status', 'Stub')
elif 'DROPPED' == label:
self.story.setMetadata('status', 'Dropped')
elif 'INACTIVE' == label:
self.story.setMetadata('status', 'Inactive')
elif 'Fan Fiction' == label:
self.story.addToList('category', 'FanFiction')
elif 'Original' == label:
@ -248,7 +289,8 @@ class RoyalRoadAdapter(BaseSiteAdapter):
if img:
cover_url = img['src']
# usually URL is for thumbnail. Try expected URL for larger image, if fails fall back to the original URL
if self.setCoverImage(url,cover_url.replace('/covers-full/', '/covers-large/'))[0] == "failedtoload":
cover_set = self.setCoverImage(url,cover_url.replace('/covers-full/', '/covers-large/'))[0]
if not cover_set or cover_set.startswith("failedtoload"):
self.setCoverImage(url,cover_url)
# some content is show as tables, this will preserve them

View file

@ -193,7 +193,7 @@ class SamAndJackNetAdapter(BaseSiteAdapter): # XXX
# Find authorid and URL from... author url.
# (fetch multiple authors)
alist = soup.findAll('a', href=re.compile(r"viewuser.php\?uid=\d+"))
alist = soup.find_all('a', href=re.compile(r"viewuser.php\?uid=\d+"))
for a in alist:
self.story.addToList('authorId',a['href'].split('=')[1])
self.story.addToList('authorUrl','http://'+self.host+'/fanfics/'+a['href'])
@ -201,11 +201,11 @@ class SamAndJackNetAdapter(BaseSiteAdapter): # XXX
# Reviews
reviewdata = soup.find('div', {'id' : 'sort'})
a = reviewdata.findAll('a', href=re.compile(r'reviews.php\?type=ST&(amp;)?item='+self.story.getMetadata('storyId')+"$"))[1] # second one.
a = reviewdata.find_all('a', href=re.compile(r'reviews.php\?type=ST&(amp;)?item='+self.story.getMetadata('storyId')+"$"))[1] # second one.
self.story.setMetadata('reviews',stripHTML(a))
# Find the chapters:
for chapter in soup.findAll('a', href=re.compile(r'viewstory.php\?sid='+self.story.getMetadata('storyId')+r"&chapter=\d+$")):
for chapter in soup.find_all('a', href=re.compile(r'viewstory.php\?sid='+self.story.getMetadata('storyId')+r"&chapter=\d+$")):
# just in case there's tags, like <i> in chapter titles.
self.add_chapter(chapter,'http://'+self.host+'/fanfics/'+chapter['href']+addurl)
@ -222,7 +222,7 @@ class SamAndJackNetAdapter(BaseSiteAdapter): # XXX
# <span class="label">Rated:</span> NC-17<br /> etc
labels = soup.findAll('span',{'class':'label'})
labels = soup.find_all('span',{'class':'label'})
for labelspan in labels:
value = labelspan.nextSibling
label = labelspan.string
@ -237,13 +237,13 @@ class SamAndJackNetAdapter(BaseSiteAdapter): # XXX
self.story.setMetadata('numWords', value)
if 'Categories' in label:
cats = labelspan.parent.findAll('a',href=re.compile(r'browse.php\?type=categories'))
cats = labelspan.parent.find_all('a',href=re.compile(r'browse.php\?type=categories'))
catstext = [cat.string for cat in cats]
for cat in catstext:
self.story.addToList('category',cat.string)
if 'Characters' in label:
chars = labelspan.parent.findAll('a',href=re.compile(r'browse.php\?type=characters'))
chars = labelspan.parent.find_all('a',href=re.compile(r'browse.php\?type=characters'))
charstext = [char.string for char in chars]
for char in charstext:
self.story.addToList('characters',char.string)
@ -252,7 +252,7 @@ class SamAndJackNetAdapter(BaseSiteAdapter): # XXX
## leaving it in. Check to make sure the type_id number
## is correct, though--it's site specific.
if 'Genre' in label:
genres = labelspan.parent.findAll('a',href=re.compile(r'browse.php\?type=class&type_id=1')) # XXX
genres = labelspan.parent.find_all('a',href=re.compile(r'browse.php\?type=class&type_id=1')) # XXX
genrestext = [genre.string for genre in genres]
self.genre = ', '.join(genrestext)
for genre in genrestext:
@ -262,7 +262,7 @@ class SamAndJackNetAdapter(BaseSiteAdapter): # XXX
## leaving it in. Check to make sure the type_id number
## is correct, though--it's site specific.
if 'Warnings' in label:
warnings = labelspan.parent.findAll('a',href=re.compile(r'browse.php\?type=class&type_id=2')) # XXX
warnings = labelspan.parent.find_all('a',href=re.compile(r'browse.php\?type=class&type_id=2')) # XXX
warningstext = [warning.string for warning in warnings]
self.warning = ', '.join(warningstext)
for warning in warningstext:
@ -291,7 +291,7 @@ class SamAndJackNetAdapter(BaseSiteAdapter): # XXX
series_url = 'http://'+self.host+'/fanfics/'+a['href']
seriessoup = self.make_soup(self.get_request(series_url))
storyas = seriessoup.findAll('a', href=re.compile(r'^viewstory.php\?sid=\d+$'))
storyas = seriessoup.find_all('a', href=re.compile(r'^viewstory.php\?sid=\d+$'))
i=1
for a in storyas:
if a['href'] == ('viewstory.php?sid='+self.story.getMetadata('storyId')):

View file

@ -197,33 +197,20 @@ class ScribbleHubComAdapter(BaseSiteAdapter): # XXX
# Get the contents list from scribblehub, iterate through and add to chapters
# Can be fairly certain this will not 404 - we know the story id is valid
contents_payload = {"action": "wi_gettocchp",
"strSID": self.story.getMetadata('storyId'),
"strmypostid": 0,
"strFic": "yes"}
# 14/12/22 - Looks like it should follow this format now (below), but still returns a 400
# but not a 403. tested in browser getting rid of all other cookies to try and get a 400 and nopes.
# contents_payload = {"action": "wi_getreleases_pagination",
# "pagenum": 1,
# "mypostid": 421879}
# contents_payload = "action=wi_getreleases_pagination&pagenum=1&mypostid=421879"
contents_payload = {"action": "wi_getreleases_pagination",
"pagenum": -1,
"mypostid": self.story.getMetadata('storyId')}
contents_data = self.post_request("https://www.scribblehub.com/wp-admin/admin-ajax.php", contents_payload)
# logger.debug(contents_data)
contents_soup = self.make_soup(contents_data)
for i in range(1, int(contents_soup.find('ol',{'id':'ol_toc'}).get('count')) + 1):
chapter_url = contents_soup.find('li',{'cnt':str(i)}).find('a').get('href')
chapter_name = contents_soup.find('li',{'cnt':str(i)}).find('a').get('title')
# logger.debug("Found Chapter " + str(i) + ", name: " + chapter_name + ", url: " + chapter_url)
for toca in contents_soup.select('a.toc_a'):
chapter_url = toca['href']
chapter_name = stripHTML(toca)
# logger.debug("Found Chapter: " + chapter_name + ", url: " + chapter_url)
self.add_chapter(chapter_name, chapter_url)
# eFiction sites don't help us out a lot with their meta data
# formating, so it's a little ugly.
# utility method
def defaultGetattr(d,k):
try:
@ -240,13 +227,13 @@ class ScribbleHubComAdapter(BaseSiteAdapter): # XXX
# Categories
if soup.find('span',{'class': 'wi_fic_showtags_inner'}):
categories = soup.find('span',{'class': 'wi_fic_showtags_inner'}).findAll('a')
categories = soup.find('span',{'class': 'wi_fic_showtags_inner'}).find_all('a')
for category in categories:
self.story.addToList('category', stripHTML(category))
# Genres
if soup.find('a',{'class': 'fic_genre'}):
genres = soup.findAll('a',{'class': 'fic_genre'})
genres = soup.find_all('a',{'class': 'fic_genre'})
for genre in genres:
self.story.addToList('genre', stripHTML(genre))
@ -258,7 +245,7 @@ class ScribbleHubComAdapter(BaseSiteAdapter): # XXX
# Content Warnings
if soup.find('ul',{'class': 'ul_rate_expand'}):
warnings = soup.find('ul',{'class': 'ul_rate_expand'}).findAll('a')
warnings = soup.find('ul',{'class': 'ul_rate_expand'}).find_all('a')
for warn in warnings:
self.story.addToList('warnings', stripHTML(warn))
@ -312,7 +299,7 @@ class ScribbleHubComAdapter(BaseSiteAdapter): # XXX
self.story.setMetadata(metadata, stripHTML(row.find('td')))
if soup.find('table',{'class': 'table_pro_overview'}):
stats_table = soup.find('table',{'class': 'table_pro_overview'}).findAll('tr')
stats_table = soup.find('table',{'class': 'table_pro_overview'}).find_all('tr')
for row in stats_table:
find_stats_data("Total Views (All)", row, "views")
find_stats_data("Word Count", row, "numWords")

View file

@ -171,7 +171,7 @@ class SheppardWeirComAdapter(BaseSiteAdapter): # XXX
# Find authorid and URL from... author url.
# (fetch multiple authors)
alist = soup.findAll('a', href=re.compile(r"viewuser.php\?uid=\d+"))
alist = soup.find_all('a', href=re.compile(r"viewuser.php\?uid=\d+"))
for a in alist:
self.story.addToList('authorId',a['href'].split('=')[1])
self.story.addToList('authorUrl','https://'+self.host+'/fanfics/'+a['href'])
@ -180,12 +180,12 @@ class SheppardWeirComAdapter(BaseSiteAdapter): # XXX
# Reviews
reviewdata = soup.find('div', {'id' : 'sort'})
a = reviewdata.findAll('a', href=re.compile(r'reviews.php\?type=ST&(amp;)?item='+self.story.getMetadata('storyId')+"$"))[1] # second one.
a = reviewdata.find_all('a', href=re.compile(r'reviews.php\?type=ST&(amp;)?item='+self.story.getMetadata('storyId')+"$"))[1] # second one.
self.story.setMetadata('reviews',stripHTML(a))
# Find the chapters:
for chapter in soup.findAll('a', href=re.compile(r'viewstory.php\?sid='+self.story.getMetadata('storyId')+r"&chapter=\d+$")):
for chapter in soup.find_all('a', href=re.compile(r'viewstory.php\?sid='+self.story.getMetadata('storyId')+r"&chapter=\d+$")):
# just in case there's tags, like <i> in chapter titles.
self.add_chapter(chapter,'https://'+self.host+'/fanfics/'+chapter['href']+addurl)
@ -208,7 +208,7 @@ class SheppardWeirComAdapter(BaseSiteAdapter): # XXX
self.setDescription(url,self.make_soup(summarydata))
# <span class="label">Rated:</span> NC-17<br /> etc
labels = soup.findAll('span',{'class':'label'})
labels = soup.find_all('span',{'class':'label'})
for labelspan in labels:
value = labelspan.nextSibling
label = labelspan.string
@ -220,13 +220,13 @@ class SheppardWeirComAdapter(BaseSiteAdapter): # XXX
self.story.setMetadata('numWords', value)
if 'Categories' in label:
cats = labelspan.parent.findAll('a',href=re.compile(r'browse.php\?type=categories'))
cats = labelspan.parent.find_all('a',href=re.compile(r'browse.php\?type=categories'))
catstext = [cat.string for cat in cats]
for cat in catstext:
self.story.addToList('category',cat.string)
if 'Characters' in label:
chars = labelspan.parent.findAll('a',href=re.compile(r'browse.php\?type=characters'))
chars = labelspan.parent.find_all('a',href=re.compile(r'browse.php\?type=characters'))
charstext = [char.string for char in chars]
for char in charstext:
self.story.addToList('characters',char.string)
@ -235,7 +235,7 @@ class SheppardWeirComAdapter(BaseSiteAdapter): # XXX
## leaving it in. Check to make sure the type_id number
## is correct, though--it's site specific.
if 'Genre' in label:
genres = labelspan.parent.findAll('a',href=re.compile(r'browse.php\?type=class&type_id=1')) # XXX
genres = labelspan.parent.find_all('a',href=re.compile(r'browse.php\?type=class&type_id=1')) # XXX
genrestext = [genre.string for genre in genres]
self.genre = ', '.join(genrestext)
for genre in genrestext:
@ -245,7 +245,7 @@ class SheppardWeirComAdapter(BaseSiteAdapter): # XXX
## leaving it in. Check to make sure the type_id number
## is correct, though--it's site specific.
if 'Warnings' in label:
warnings = labelspan.parent.findAll('a',href=re.compile(r'browse.php\?type=class&type_id=2')) # XXX
warnings = labelspan.parent.find_all('a',href=re.compile(r'browse.php\?type=class&type_id=2')) # XXX
warningstext = [warning.string for warning in warnings]
self.warning = ', '.join(warningstext)
for warning in warningstext:
@ -273,7 +273,7 @@ class SheppardWeirComAdapter(BaseSiteAdapter): # XXX
series_url = 'https://'+self.host+'/fanfics/'+a['href']
seriessoup = self.make_soup(self.get_request(series_url))
storyas = seriessoup.findAll('a', href=re.compile(r'^viewstory.php\?sid=\d+$'))
storyas = seriessoup.find_all('a', href=re.compile(r'^viewstory.php\?sid=\d+$'))
i=1
for a in storyas:
if a['href'] == ('viewstory.php?sid='+self.story.getMetadata('storyId')):

View file

@ -1,47 +0,0 @@
# -*- coding: utf-8 -*-
# Copyright 2011 Fanficdownloader team, 2018 FanFicFare team
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# Software: eFiction
from __future__ import absolute_import
from .base_efiction_adapter import BaseEfictionAdapter
class SinfulDreamsComWhisperedMuse(BaseEfictionAdapter):
@staticmethod
def getSiteDomain():
return 'sinful-dreams.com'
@classmethod
def getPathToArchive(self):
return '/whispered/muse'
@classmethod
def getConfigSection(cls):
"Overriden because [domain/path] section for multiple-adapter domain."
return cls.getSiteDomain()+cls.getPathToArchive()
@classmethod
def getSiteAbbrev(self):
return 'snfldrms-wm'
@classmethod
def getDateFormat(self):
return "%m/%d/%Y"
def getClass():
return SinfulDreamsComWhisperedMuse

View file

@ -109,7 +109,7 @@ class SiyeCoUkAdapter(BaseSiteAdapter): # XXX
self.story.setMetadata('title',stripHTML(titlea))
# Find the chapters (from soup, not authsoup):
for chapter in soup.findAll('a', href=re.compile(r'viewstory.php\?sid='+self.story.getMetadata('storyId')+r"&chapter=\d+$")):
for chapter in soup.find_all('a', href=re.compile(r'viewstory.php\?sid='+self.story.getMetadata('storyId')+r"&chapter=\d+$")):
# just in case there's tags, like <i> in chapter titles.
self.add_chapter(chapter,'https://'+self.host+'/siye/'+chapter['href'])
@ -121,7 +121,7 @@ class SiyeCoUkAdapter(BaseSiteAdapter): # XXX
metatable = soup.find('table',{'width':'95%'})
# Categories
cat_as = metatable.findAll('a', href=re.compile(r'categories.php'))
cat_as = metatable.find_all('a', href=re.compile(r'categories.php'))
for cat_a in cat_as:
self.story.addToList('category',stripHTML(cat_a))
@ -209,7 +209,7 @@ class SiyeCoUkAdapter(BaseSiteAdapter): # XXX
series_url = 'https://'+self.host+'/'+a['href']
seriessoup = self.make_soup(self.get_request(series_url))
storyas = seriessoup.findAll('a', href=re.compile(r'^viewstory.php\?sid=\d+$'))
storyas = seriessoup.find_all('a', href=re.compile(r'^viewstory.php\?sid=\d+$'))
i=1
for a in storyas:
if a['href'] == ('viewstory.php?sid='+self.story.getMetadata('storyId')):

View file

@ -76,6 +76,43 @@ class SpiritFanfictionComAdapter(BaseSiteAdapter):
return 'spirit'
# Login
def needToLoginCheck(self, data):
if 'nao-logado' in data or 'Acessar sua Conta' in data:
return True
return False
def performLogin(self, url, data):
params = {}
params['Usuario'] = self.getConfig("username")
params['Senha'] = self.getConfig("password")
params['Login'] = 'Fazer Login'
login_url = 'https://' + self.getSiteDomain() + '/login'
logger.info("Will now login to URL (%s) as (%s)" % (login_url,
params['Usuario']))
login_page_html = self.get_request(login_url, usecache=False)
login_page_soup = self.make_soup(login_page_html)
session_input = login_page_soup.find('input', {'name': "SessionHash"})
params['SessionHash'] = session_input['value'] if session_input else ""
return_url_input = login_page_soup.find('input', {'name': 'ReturnUrl'})
params['ReturnUrl'] = return_url_input['value'] if return_url_input else ""
response_html = self.post_request(login_url, params)
if 'nao-logado' in response_html or "Acessar sua Conta" in response_html:
logger.info("Failed to login to URL %s as %s" % (login_url,
params['Usuario']))
raise exceptions.FailedToLogin(login_url,params['Usuario'])
else:
return True
def getStoryId(self, url):
# get storyId from url--url validation guarantees query correct
@ -89,9 +126,11 @@ class SpiritFanfictionComAdapter(BaseSiteAdapter):
def extractChapterUrlsAndMetadata(self):
data = self.get_request(self.url)
# use BeautifulSoup HTML parser to make everything easier to find.
if self.needToLoginCheck(data):
self.performLogin(self.url, data)
data = self.get_request(self.url,usecache=False)
soup = self.make_soup(data)
# Now go hunting for all the meta data and the chapter list.
# Title
title = soup.find('h1', {'class':'tituloPrincipal'})
@ -99,7 +138,7 @@ class SpiritFanfictionComAdapter(BaseSiteAdapter):
# Authors
# Find authorid and URL
authors = soup.findAll('span', {'class':'usuario'})
authors = (title.find_next('div', {'class':'left'})).find_all('span', {'class':'usuario'})
for author in authors:
self.story.addToList('authorId', author.find('a')['href'].split('/')[-1])
@ -114,10 +153,10 @@ class SpiritFanfictionComAdapter(BaseSiteAdapter):
newestChapter = None
self.newestChapterNum = None # save for comparing during update.
# Find the chapters:
chapters = soup.findAll('table', {'class':'listagemCapitulos espacamentoTop'})
chapters = soup.find_all('table', {'class':'listagemCapitulos espacamentoTop'})
for chapter in chapters:
for row in chapter.findAll('tr', {'class': 'listagem-textoBg1'}): # Find each row with chapter info
for row in chapter.find_all('tr', {'class': 'listagem-textoBg1'}): # Find each row with chapter info
a = row.find('a') # Chapter link
# Datetime
@ -344,3 +383,11 @@ class SpiritFanfictionComAdapter(BaseSiteAdapter):
element.string = decoded_email
return unicode(html_text)
def before_get_urls_from_page(self,url,normalize):
if self.getConfig("username"):
data = self.get_request(url)
if self.needToLoginCheck(data):
self.performLogin(url, data)

View file

@ -93,7 +93,7 @@ class StoriesOfArdaComAdapter(BaseSiteAdapter):
self.story.setMetadata('title',stripHTML(a))
# Find the chapters: chapterview.asp?sid=7000&cid=30919
chapters=soup.findAll('a', href=re.compile(r'chapterview.asp\?sid='+self.story.getMetadata('storyId')+r"&cid=\d+$"))
chapters=soup.find_all('a', href=re.compile(r'chapterview.asp\?sid='+self.story.getMetadata('storyId')+r"&cid=\d+$"))
if len(chapters)==1:
self.add_chapter(self.story.getMetadata('title'),'http://'+self.host+'/'+chapters[0]['href'])
else:
@ -109,14 +109,14 @@ class StoriesOfArdaComAdapter(BaseSiteAdapter):
# no convenient way to get word count
for td in asoup.findAll('td', {'colspan' : '3'}):
for td in asoup.find_all('td', {'colspan' : '3'}):
if td.find('a', href=re.compile(r'chapterlistview.asp\?SID='+self.story.getMetadata('storyId'))) != None:
break
td=td.nextSibling.nextSibling
self.story.setMetadata('dateUpdated', makeDate(stripHTML(td).split(': ')[1], self.dateformat))
try:
tr=td.parent.nextSibling.nextSibling.nextSibling.nextSibling
td=tr.findAll('td')
td=tr.find_all('td')
self.story.setMetadata('rating', td[0].string.split(': ')[1])
self.story.setMetadata('status', td[2].string.split(': ')[1])
self.story.setMetadata('datePublished', makeDate(stripHTML(td[4]).split(': ')[1], self.dateformat))

View file

@ -147,6 +147,21 @@ class StoriesOnlineNetAdapter(BaseSiteAdapter):
postAction,
'','',''))
data = self.post_request(postUrl,params,usecache=False)
# logger.debug(data)
while '<h2>Enter TOTP Code:</h2>' in data:
if self.totp:
logger.debug("Trying to TOTP with %s code."%self.totp)
params = {}
params['cmd'] = 'finishTotpVerification'
# google auth app at least shows "123 123", but site expects
# "123123". Remove space if user enters it.
params['totp_code'] = self.totp.replace(' ','')
params['action'] = "continue"
data = self.post_request(postUrl,params,usecache=False)
# logger.debug(data)
self.totp = None
else:
raise exceptions.NeedTimedOneTimePassword(url)
if self.needToLoginCheck(data):
logger.info("Failed to login to URL %s as %s" % (loginUrl,
@ -158,6 +173,10 @@ class StoriesOnlineNetAdapter(BaseSiteAdapter):
url = self.url
logger.debug("URL: "+url)
## Some stories give 404 if not logged in now. See #1185
if self.getConfig("always_login"):
self.performLogin(self.url)
## Hit story URL to check for changed title part -- if the
## title has changed or (more likely?) the ID number has
## been reassigned to a different title, this will 404
@ -169,7 +188,7 @@ class StoriesOnlineNetAdapter(BaseSiteAdapter):
if e.status_code in (401, 403, 410):
data = 'Log In' # to trip needToLoginCheck
elif e.status_code == 404:
raise exceptions.FailedToDownload("Page Not Found - Story ID Reused? (%s)" % url)
raise exceptions.FailedToDownload("Page Not Found - always_login needed? (%s)" % url)
else:
raise e
if self.needToLoginCheck(data):
@ -177,13 +196,24 @@ class StoriesOnlineNetAdapter(BaseSiteAdapter):
self.performLogin(url)
data = self.get_request(url,usecache=False)
## SOL adds intermediate page to remind users to renew at 3-30 days before expiration - this breaks the soup 'a' search below
if "Your premier membership is going to expire" in data:
soup = self.make_soup(data)
expire = soup.find(string=re.compile("Your premier membership is going to expire"))
remindurl=(soup.find(href=re.compile("later.php"))).get('href')
raise exceptions.FailedToDownload(self.getSiteDomain() +" says: "+expire+"\n"+"Renew or reduce expiration warning time in account setting\n"+remindurl)
## Premium account might redirect to a chapter, while regular
## account doesn't redirect to the URL with embedded /story-title
## So pull url from <a href="/s/000/story-title" rel="bookmark">
## regardless.
soup = self.make_soup(data)
a = soup.find('a',rel="bookmark")
url = 'https://'+self.host+a['href']
if a:
url = 'https://'+self.host+a['href']
else:
# Contest entries do not have bookmark HREF
logger.info("No Bookmark HREF, using URL="+url)
## Premium has "?ind=1" to force index.
## May not be needed w/o premium
@ -202,6 +232,12 @@ class StoriesOnlineNetAdapter(BaseSiteAdapter):
raise exceptions.FailedToDownload(self.getSiteDomain() +" says: Error! The story you're trying to access is being filtered by your choice of contents filtering.")
elif "Error! Daily Limit Reached" in data or "Sorry! You have reached your daily limit of" in data:
raise exceptions.FailedToDownload(self.getSiteDomain() +" says: Error! Daily Limit Reached")
elif "by (Hidden)" in data:
#Contest entries have author set to "(Hidden)" which breaks author lookups below
logger.info("Contest entry, setting authorId=(Hidden)")
self.story.addToList('authorId',"(Hidden)")
logger.info("Contest entry, setting author=(Hidden)")
self.story.addToList('author',"(Hidden)")
soup = self.make_soup(data)
# logger.debug(data)
@ -210,22 +246,19 @@ class StoriesOnlineNetAdapter(BaseSiteAdapter):
a = soup.find('h1')
self.story.setMetadata('title',stripHTML(a))
# Find authorid and URL from... author url. Sometimes in top,
# other times in footer.
authfrom = soup.find('div', {'id':'top-header'})
if authfrom is None or 'author' not in str(authfrom):
authfrom = soup.find('footer')
alist = authfrom.findAll('a', {'rel' : 'author'})
for a in alist:
self.story.addToList('authorId',a['href'].split('/')[2])
self.story.addToList('authorUrl','https://'+self.host+a['href'])
## both 's Page and s Page
self.story.addToList('author',re.sub(r".s Page$","",stripHTML(a)))
# The rest of the metadata is within the article tag.
soup = soup.find('article')
authfrom = soup.find('footer')
alist = authfrom.find_all('a', {'rel' : 'author'})
if alist:
for a in alist:
self.story.addToList('authorId',a['href'].split('/')[2])
self.story.addToList('authorUrl','https://'+self.host+a['href'])
## both 's Page and s Page
self.story.addToList('author',re.sub(r".s Page$","",stripHTML(a)))
else:
logger.info("AuthorList empty. Contest entry?")
# Find the chapters:
# If multiple chapters, they are in "index-list" div.
# <a href="/s/00001/This-is-a-test/1">Chapter 1</a>
# <a href="/n/00001/This-is-a-test/1">Chapter 1</a>
chapters = soup.select('div#index-list a[href*="/s/"],div#index-list a[href*="/n/"]')
@ -238,8 +271,15 @@ class StoriesOnlineNetAdapter(BaseSiteAdapter):
else:
self.add_chapter(self.story.getMetadata('title'),self.story.getMetadata('storyUrl'))
# The rest of the metadata is within the article tag.
soup = soup.find('article')
if self.story.getList('authorUrl'):
self.getStoryMetadataFromAuthorPage()
else:
logger.info("No authorurl found, setting to homepage. Could be contest story...")
self.story.setMetadata('authorUrl','https://' + self.getSiteDomain() + '/')
self.getStoryMetadataFromAuthorPage()
# Some books have a cover in the index page.
# Samples are:
@ -283,7 +323,7 @@ class StoriesOnlineNetAdapter(BaseSiteAdapter):
self.has_universes = False
title_cell = story_row.find('td', {'class' : 'lc2'})
for cat in title_cell.findAll('div', {'class' : 'typediv'}):
for cat in title_cell.find_all('div', {'class' : 'typediv'}):
self.story.addToList('genre',cat.text)
# in lieu of word count.
@ -360,6 +400,16 @@ class StoriesOnlineNetAdapter(BaseSiteAdapter):
series_name = stripHTML(series_soup.find('h1', {'id' : 'ptitle'}))
series_name = re.sub(r' . a (series by|collection from).*$','',series_name)
# logger.debug("Series name: '%s'" % series_name)
if i == 0:
# find number in series from series page--not
# included in story page anymore.
# ... <a id="t20130r"></a>2 ...
seriesi = series_soup.select_one("a[id='t"+self.story.getMetadata('storyId')+"r']").parent
# logger.debug(seriesi)
try:
i = int(stripHTML(seriesi))
except:
logger.debug("Failed to convert series number(%s)"%seriesi)
self.setSeries(series_name, i)
# Check if series is in a universe
if self.has_universes:
@ -367,7 +417,7 @@ class StoriesOnlineNetAdapter(BaseSiteAdapter):
universes_soup = self.make_soup(self.get_request(universe_url) )
# logger.debug("Universe url='{0}'".format(universe_url))
if universes_soup:
universes = universes_soup.findAll('div', {'class' : 'ser-box'})
universes = universes_soup.find_all('div', {'class' : 'ser-box'})
# logger.debug("Number of Universes: %d" % len(universes))
for universe in universes:
# logger.debug("universe.find('a')={0}".format(universe.find('a')))
@ -462,7 +512,7 @@ class StoriesOnlineNetAdapter(BaseSiteAdapter):
return value
def parseOtherAttributes(self, other_attribute_element):
for b in other_attribute_element.findAll('b'):
for b in other_attribute_element.find_all('b'):
#logger.debug('Getting metadata: "%s"' % b)
label = b.text
if label in ['Posted:', 'Concluded:', 'Updated:']:
@ -561,7 +611,7 @@ class StoriesOnlineNetAdapter(BaseSiteAdapter):
if pager != None:
urls=pager.findAll('a')
urls=pager.find_all('a')
urls=urls[:len(urls)-1]
# logger.debug("pager urls:%s"%urls)
pager.extract()
@ -588,11 +638,13 @@ class StoriesOnlineNetAdapter(BaseSiteAdapter):
chapter_title = None
if self.getConfig('inject_chapter_title'):
h2tag = pagetag.find('h2')
if h2tag:
# I'm seeing an h1 now, but it's not logged in?
# Something's broken...
chapter_title = h2tag.extract()
if self.num_chapters() > 1:
cttag = pagetag.find('h2')
else:
## single chapter stories formatted a little differently.
cttag = pagetag.find('h1')
if cttag:
chapter_title = cttag.extract()
# Strip te header section
tag = pagetag.find('header')
@ -615,7 +667,7 @@ class StoriesOnlineNetAdapter(BaseSiteAdapter):
# putting a 'conTag' at the *top* now, too. So this
# was nuking every page but the first and last. Now
# only if 'Continues'
for contag in pagetag.findAll('span', {'class' : 'conTag'}):
for contag in pagetag.find_all('span', {'class' : 'conTag'}):
# remove everything after continues...
if 'Continuation' in contag.text:
tag = contag
@ -644,7 +696,7 @@ class StoriesOnlineNetAdapter(BaseSiteAdapter):
# If it is a chapter, there are dates at the start for when it was posted or modified. These plus
# everything before them can be discarded.
postedDates = pagetag.findAll('div', {'class' : 'date'})
postedDates = pagetag.find_all('div', {'class' : 'date'})
# logger.debug(postedDates)
if postedDates:
a = postedDates[0].previousSibling
@ -653,7 +705,7 @@ class StoriesOnlineNetAdapter(BaseSiteAdapter):
b = a.previousSibling
a.extract()
a = b
for a in pagetag.findAll('div', {'class' : 'date'}):
for a in pagetag.find_all('div', {'class' : 'date'}):
a.extract()
# Kill the vote form and everything after it.
@ -674,4 +726,5 @@ class StoriesOnlineNetAdapter(BaseSiteAdapter):
# inject_chapter_title
if chapter_title:
chapter_title.name='h3'
chapter_title['class']='inject_chapter_title'
pagetag.insert(0,chapter_title)

View file

@ -24,17 +24,30 @@ logger = logging.getLogger(__name__)
from .adapter_storiesonlinenet import StoriesOnlineNetAdapter
def getClass():
return FineStoriesComAdapter
return StoryRoomComAdapter
# Class name has to be unique. Our convention is camel case the
# sitename with Adapter at the end. www is skipped.
class FineStoriesComAdapter(StoriesOnlineNetAdapter):
class StoryRoomComAdapter(StoriesOnlineNetAdapter):
@classmethod
def getSiteAbbrev(cls):
return 'fnst'
return 'stryrm'
@staticmethod # must be @staticmethod, don't remove it.
def getSiteDomain():
# The site domain. Does have www here, if it uses it.
return 'finestories.com'
return 'storyroom.com'
@classmethod
def getAcceptDomains(cls):
return ['finestories.com',cls.getSiteDomain()]
@classmethod
def getConfigSections(cls):
"Only needs to be overriden if has additional ini sections."
return ['finestories.com',cls.getSiteDomain()]
@classmethod
def getSiteURLPattern(self):
return r"https?://("+r"|".join([x.replace('.',r'\.') for x in self.getAcceptDomains()])+r")/(?P<path>s|n|library)/(storyInfo.php\?id=)?(?P<id>\d+)(?P<chapter>:\d+)?(?P<title>/.+)?((;\d+)?$|(:i)?$)?"

View file

@ -1,144 +0,0 @@
# -*- coding: utf-8 -*-
from __future__ import absolute_import
import logging
logger = logging.getLogger(__name__)
import re
from ..htmlcleanup import stripHTML
from .. import exceptions as exceptions
# py2 vs py3 transition
from .base_adapter import BaseSiteAdapter, makeDate
def getClass():
return SwiOrgRuAdapter
logger = logging.getLogger(__name__)
class SwiOrgRuAdapter(BaseSiteAdapter):
def __init__(self, config, url):
BaseSiteAdapter.__init__(self, config, url)
self.username = "NoneGiven" # if left empty, site doesn't return any message at all.
self.password = ""
self.is_adult=False
storyId = self.parsedUrl.path.split('/',)[3]
self.story.setMetadata('storyId', storyId)
# normalized story URL.
self._setURL('http://' + self.getSiteDomain() + '/mlp-fim/story/'+self.story.getMetadata('storyId'))
# Each adapter needs to have a unique site abbreviation.
self.story.setMetadata('siteabbrev','swiorgru')
# The date format will vary from site to site.
# http://docs.python.org/library/datetime.html#strftime-strptime-behavior
self.dateformat = "%Y.%m.%d"
@staticmethod # must be @staticmethod, don't remove it.
def getSiteDomain():
return 'www.swi.org.ru'
@classmethod
def getSiteExampleURLs(cls):
return "http://" + cls.getSiteDomain() + "/mlp-fim/story/11341/ http://" + cls.getSiteDomain() + "/mlp-fim/story/11341/chapter1.html"
def getSiteURLPattern(self):
return r"http://" + re.escape(self.getSiteDomain() + "/mlp-fim/story/")+r"\d+"
def extractChapterUrlsAndMetadata(self):
url=self.url
logger.debug("URL: "+url)
data = self.get_request(url)
soup = self.make_soup(data)
title = soup.find('h1')
for tag in title.findAll('sup'):
tag.extract()
self.story.setMetadata('title', stripHTML(title.text))
logger.debug("Title: (%s)"%self.story.getMetadata('title'))
author_title = soup.find('strong', string = re.compile(u"Автор: "))
if author_title == None:
raise exceptions.FailedToDownload("Error downloading page: %s! Missing required author_title element!" % url)
author = author_title.next_sibling
self.story.setMetadata('authorId', author.text) # Author's name is unique
self.story.setMetadata('authorUrl','http://'+self.host + author['href'])
self.story.setMetadata('author', author.text)
logger.debug("Author: (%s)"%self.story.getMetadata('author'))
date_pub = soup.find('em', string = re.compile(r'\d{4}.\d{2}.\d{2}'))
if not date_pub == None:
self.story.setMetadata('datePublished', makeDate(date_pub.text, self.dateformat))
rating_label = soup.find('strong', string = re.compile(u"рейтинг:"))
if not rating_label == None:
rating = rating_label.next_sibling.next_sibling
self.story.setMetadata('rating', stripHTML(rating))
if not self.is_adult or self.getConfig("is_adult"):
if "NC-18" in rating:
raise exceptions.AdultCheckRequired(self.url)
characters = soup.findAll('img', src=re.compile(r"/mlp-fim/img/chars/\d+.png"))
logger.debug("numCharacters: (%s)"%str(len(characters)))
for x in range(0,len(characters)):
character=characters[x]
self.story.addToList('characters', character['title'])
if soup.find('font', color = r"green", string = u"завершен"):
self.story.setMetadata('status', 'Completed')
else:
self.story.setMetadata('status', 'In-Progress')
categories_label = soup.find('strong', string = u"категории:")
if not categories_label == None:
categories_element = categories_label.next_sibling.next_sibling
categories = re.findall(r'"(.+?)"', categories_element.text)
for x in range(0, len(categories)):
category=categories[x]
self.story.addToList('category', category)
chapters_header = soup.find('h2', string = re.compile(u"Главы:"))
if chapters_header==None:
raise exceptions.FailedToDownload("Error downloading page: %s! Missing required chapters_header element!" % url)
chapters_table = chapters_header.next_sibling.next_sibling
self.story.setMetadata('language','Russian')
chapters=chapters_table.findAll('a', href=re.compile(r'/mlp-fim/story/'+self.story.getMetadata('storyId')+r"/chapter\d+"))
self.story.setMetadata('numChapters', len(chapters))
logger.debug("numChapters: (%s)"%str(self.story.getMetadata('numChapters')))
for x in range(0,len(chapters)):
chapter=chapters[x]
churl='http://'+self.host+chapter['href']
self.add_chapter(chapter,churl)
# grab the text for an individual chapter.
def getChapterText(self, url):
logger.debug('Getting chapter text from: %s' % url)
soup = self.make_soup(self.get_request(url))
chapter = soup.find('div', {'id' : 'content'})
chapter_header = chapter.find('h1', id = re.compile("chapter"))
if not chapter_header == None:
chapter_header.decompose()
if chapter == None:
raise exceptions.FailedToDownload("Error downloading Chapter: %s! Missing required element!" % url)
return self.utf8FromSoup(url,chapter)

View file

@ -37,9 +37,9 @@ def getClass():
def getEntry(soup, *args):
for arg in args:
target = soup.find('th', string=arg)
target = soup.find('dt', string=arg)
if target is not None:
return target.findNext('td')
return target.findNext('dd')
return None
class SyosetuComAdapter(BaseSiteAdapter):
@ -226,10 +226,10 @@ class SyosetuComAdapter(BaseSiteAdapter):
seriesUrl = series.find('a')['href']
seriesSoup = self.make_soup(self.get_request(seriesUrl))
alist = seriesSoup.select('.serieslist .title a')
alist = seriesSoup.select('.p-series-novellist .p-series-novellist__title a')
i = 1
for a in alist:
if a['href'] == '/' + self.storyId + '/':
if self.storyId in a['href']:
self.setSeries(seriesName, i)
self.story.setMetadata('seriesUrl', seriesUrl)
break
@ -245,18 +245,16 @@ class SyosetuComAdapter(BaseSiteAdapter):
# Status and Chapter count
noveltype = (infoSoup.find(id='noveltype')
or infoSoup.find(id='noveltype_notend'))
noveltype = infoSoup.find('span', {'class':'p-infotop-type__type'})
if noveltype.text.strip() == '短編':
numChapters = 1
oneshot = True
completed = True
else:
# '全1,292エピソード\n'
numChapters = int(re.sub(r'[^\d]', '', noveltype.next_sibling.strip()))
numChapters = int(re.sub(r'[^\d]', '', infoSoup.find('span', {'class':'p-infotop-type__allep'}).text.strip()))
oneshot = False
completed = True if noveltype == '完結済' else False
self.story.setMetadata('numChapters', numChapters)
self.story.setMetadata('status', 'Completed' if completed else 'In-Progress')
# Keywords
@ -264,12 +262,7 @@ class SyosetuComAdapter(BaseSiteAdapter):
flags = []
# not sure what it looks like if a work has no tags
tagsElement = getEntry(infoSoup, 'キーワード')
if tagsElement.find('span'):
# R15, ボーイズラブ, ガールズラブ, 残酷な描写あり, 異世界転生, 異世界転移
flags = tagsElement.find('span').text.split()
for flag in flags:
self.story.addToList('warningtags', flag)
for tag in tagsElement.contents[-1].split():
for tag in tagsElement.text.split():
self.story.addToList('freeformtags', tag)
# Rating, Genre, and Imprint
@ -332,9 +325,10 @@ class SyosetuComAdapter(BaseSiteAdapter):
if self.getConfig("always_login"):
if infoSoup.find('div', {'data-remodal-id':'setting_bookmark'}) is None:
self.story.setMetadata('bookmarked', False)
self.story.setMetadata('subscribed', False)
else:
self.story.setMetadata('bookmarked', True)
modal = infoSoup.find('div', {'class':'favnovelmain_update'})
modal = infoSoup.find('div', {'data-remodal-id':'setting_bookmark'})
# bookmark category name
bookmarkCategory = modal.find('option', {

View file

@ -131,7 +131,7 @@ class TenhawkPresentsSiteAdapter(BaseSiteAdapter):
self.story.setMetadata('author',a.string)
# Find the chapters:
for chapter in soup.findAll('a', href=re.compile(r'viewstory.php\?sid='+self.story.getMetadata('storyId')+r"&chapter=\d+$")):
for chapter in soup.find_all('a', href=re.compile(r'viewstory.php\?sid='+self.story.getMetadata('storyId')+r"&chapter=\d+$")):
# just in case there's tags, like <i> in chapter titles.
self.add_chapter(chapter,'http://'+self.host+'/'+chapter['href']+addurl)
@ -143,7 +143,7 @@ class TenhawkPresentsSiteAdapter(BaseSiteAdapter):
return ""
# <span class="label">Rated:</span> NC-17<br /> etc
labels = soup.findAll('span',{'class':'label'})
labels = soup.find_all('span',{'class':'label'})
for labelspan in labels:
value = labelspan.nextSibling
label = labelspan.string
@ -164,19 +164,19 @@ class TenhawkPresentsSiteAdapter(BaseSiteAdapter):
self.story.setMetadata('numWords', value)
if 'Categories' in label:
cats = labelspan.parent.findAll('a',href=re.compile(r'browse.php\?type=categories'))
cats = labelspan.parent.find_all('a',href=re.compile(r'browse.php\?type=categories'))
catstext = [cat.string for cat in cats]
for cat in catstext:
self.story.addToList('category',cat.string)
if 'Characters' in label:
chars = labelspan.parent.findAll('a',href=re.compile(r'browse.php\?type=characters'))
chars = labelspan.parent.find_all('a',href=re.compile(r'browse.php\?type=characters'))
charstext = [char.string for char in chars]
for char in charstext:
self.story.addToList('characters',char.string)
if 'Genre' in label:
genres = labelspan.parent.findAll('a',href=re.compile(r'browse.php\?type=class'))
genres = labelspan.parent.find_all('a',href=re.compile(r'browse.php\?type=class'))
genrestext = [genre.string for genre in genres]
self.genre = ', '.join(genrestext)
for genre in genrestext:
@ -203,7 +203,7 @@ class TenhawkPresentsSiteAdapter(BaseSiteAdapter):
series_url = 'http://'+self.host+'/'+a['href']
seriessoup = self.make_soup(self.get_request(series_url))
storyas = seriessoup.findAll('a', href=re.compile(r'^viewstory.php\?sid=\d+$'))
storyas = seriessoup.find_all('a', href=re.compile(r'^viewstory.php\?sid=\d+$'))
i=1
for a in storyas:
if a['href'] == ('viewstory.php?sid='+self.story.getMetadata('storyId')):

View file

@ -29,6 +29,13 @@ from ..six import ensure_text
from .base_adapter import BaseSiteAdapter, makeDate
try: # just a way to switch between CLI and PI
## webbrowser.open doesn't work on some linux flavors.
## piggyback Calibre's version.
from calibre.gui2 import safe_open_url as open_url
except :
from webbrowser import open as open_url
class TestSiteAdapter(BaseSiteAdapter):
def __init__(self, config, url):
@ -121,7 +128,7 @@ class TestSiteAdapter(BaseSiteAdapter):
self.story.setMetadata('author',prefix+'Test Author aa')
self.setDescription(self.url,u'<div>Description '+self.crazystring+u''' Done
<p>
Some more longer description. "I suck at summaries!" "Better than it sounds!" "My first fic"
Some more longer description. "I suck at summaries!" "Better than it sounds!" <span>A span!</span> "My first fic"
</div>''')
self.story.setMetadata('datePublished',makeDate("1975-03-15","%Y-%m-%d"))
if idstr == '669':
@ -129,6 +136,9 @@ Some more longer description. "I suck at summaries!" "Better than it sounds!"
else:
self.story.setMetadata('dateUpdated',makeDate("1975-04-15","%Y-%m-%d"))
if idstr == '675' and self.totp != "123321" :
raise exceptions.NeedTimedOneTimePassword(self.url)
if idstr != '674':
self.story.setMetadata('numWords','123456')
@ -139,20 +149,20 @@ Some more longer description. "I suck at summaries!" "Better than it sounds!"
# greater than 10, no language or series.
if idnum < 10:
## non-English was changing series sort order which
## confuses me more often than I test other langs.
# langs = {
# 0:"English",
# 1:"Russian",
# 2:"French",
# 3:"German",
# }
# self.story.setMetadata('language',langs[idnum%len(langs)])
self.setSeries('The Great Test',idnum)
self.story.setMetadata('seriesUrl','http://'+self.getSiteDomain()+'/seriesid=1')
elif idnum < 20:
self.setSeries('魔法少女まどか★マギカ',idnum)
self.story.setMetadata('seriesUrl','http://'+self.getSiteDomain()+'/seriesid=1')
elif idnum < 30:
langs = {
0:"English",
1:"Russian",
2:"French",
3:"German",
}
self.story.setMetadata('language',langs[idnum%len(langs)])
if idnum == 0:
self.setSeries("A Nook Hyphen Test "+self.story.getMetadata('dateCreated'),idnum)
self.story.setMetadata('seriesUrl','http://'+self.getSiteDomain()+'/seriesid=0')
@ -318,13 +328,18 @@ Some more longer description. "I suck at summaries!" "Better than it sounds!"
rt = random.uniform(t*0.5, t*1.5)
logger.debug("random sleep(%0.2f-%0.2f):%0.2f"%(t*0.5, t*1.5,rt))
time.sleep(rt)
# open_url("https://echo.free.beeceptor.com/%s.%s"%(self.story.getMetadata('siteabbrev'),
# self.story.getMetadata('storyId')))
if "chapter=1" in url :
text=u'''
<div>
<h3>Prologue</h3>
<div class='leadpara'>
<p>This is a fake adapter for testing purposes. Different sid's will give different errors:</p>
<p>sid&gt;=1000 will use custom test story data from your configuration(personal.ini)</p>
</div>
<div class='failids'>
<p>Hard coded ids:</p>
<p>http://test1.com?sid=664 - Crazy string title</p>
<p>http://test1.com?sid=665, 711-720 - raises AdultCheckRequired</p>
@ -341,6 +356,7 @@ Some more longer description. "I suck at summaries!" "Better than it sounds!"
<p>http://test1.com?sid=0 - Succeeds, generates some text specifically for testing hyphenation problems with Nook STR/STRwG</p>
<p>Odd sid's will be In-Progress, evens complete. sid&lt;10 will be assigned one of four languages and included in a series.</p>
</div>
</div>
'''
elif self.story.getMetadata('storyId') == '0':
text=u'''<div>
@ -354,7 +370,7 @@ Some more longer description. "I suck at summaries!" "Better than it sounds!"
<br />
</div>
'''
elif self.story.getMetadata('storyId') == '667' and "chapter=2" in url:
elif self.story.getMetadata('storyId') == '667' and ("chapter=2" in url or "chapter=3" in url or "chapter=4" in url):
raise exceptions.FailedToDownload("Error downloading Chapter: %s!" % url)
elif self.getSiteDomain() not in url:
## for chapter_urls setting.
@ -399,7 +415,13 @@ Some more longer description. "I suck at summaries!" "Better than it sounds!"
else:
if self.story.getMetadata('storyId') == '92':
imgtext='<a href="http://code.google.com/p/fanficdownloader/wiki/FanFictionDownLoaderPluginWithReadingList" title="Tilt-a-Whirl by Jim &amp; Sarah, on Flickr"><img src="http://i.imgur.com/bo8eD.png"></a>'
imgtext='''
<a href="http://code.google.com/p/fanficdownloader/wiki/FanFictionDownLoaderPluginWithReadingList" title="Tilt-a-Whirl"><img src="http://i.imgur.com/bo8eD.png"></a>
<style>
.loremipsum { background-image: url("https://picsum.photos/2000/1500") }
</style>
<p style="background-image: url('https://picsum.photos/20/10')">Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.</p>
'''
else:
imgtext='img goes here when sid=92'
text=u'''
@ -420,7 +442,9 @@ Don't&#8212e;ver&#8212d;o&#8212;that&#8212a;gain, &#27861; &#xE9;
<hr>
horizontal rules
<hr size=1 noshade>
<div class="loremipsum">
<p>"Lorem ipsum dolor sit amet", consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore--et dolore magna aliqua. 'Ut enim ad minim veniam', quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.</p>
</div>
<br>
<br>
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.<br/>
@ -432,7 +456,6 @@ Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor
<br/> <br/>
<br/>
"Lorem ipsum dolor sit amet", consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore--et dolore magna aliqua. 'Ut enim ad minim veniam', quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.<br>
<p>Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.</p>
</div>
'''%imgtext
soup = self.make_soup(text)
@ -468,6 +491,7 @@ Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor
desc = '<div><p>The Great Test Series of '+self.getSiteDomain()+'!</p><p>Now with two lines!</p></div>'
return {'name':'The Great Test',
'desc':desc,
'status':'AStatus',
'urllist':['http://'+self.getSiteDomain()+'?sid=1',
'http://'+self.getSiteDomain()+'?sid=2',
'http://'+self.getSiteDomain()+'?sid=3',

View file

@ -0,0 +1,35 @@
# -*- coding: utf-8 -*-
# Copyright 2011 Fanficdownloader team, 2019 FanFicFare team
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
from __future__ import absolute_import
import logging
logger = logging.getLogger(__name__)
from .adapter_test1 import TestSiteAdapter
class Test2SiteAdapter(TestSiteAdapter):
def __init__(self, config, url):
TestSiteAdapter.__init__(self, config, url)
@staticmethod
def getSiteDomain():
return 'test2.com'
def getClass():
return Test2SiteAdapter

View file

@ -0,0 +1,35 @@
# -*- coding: utf-8 -*-
# Copyright 2011 Fanficdownloader team, 2019 FanFicFare team
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
from __future__ import absolute_import
import logging
logger = logging.getLogger(__name__)
from .adapter_test1 import TestSiteAdapter
class Test3SiteAdapter(TestSiteAdapter):
def __init__(self, config, url):
TestSiteAdapter.__init__(self, config, url)
@staticmethod
def getSiteDomain():
return 'test3.com'
def getClass():
return Test3SiteAdapter

View file

@ -0,0 +1,35 @@
# -*- coding: utf-8 -*-
# Copyright 2011 Fanficdownloader team, 2019 FanFicFare team
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
from __future__ import absolute_import
import logging
logger = logging.getLogger(__name__)
from .adapter_test1 import TestSiteAdapter
class Test4SiteAdapter(TestSiteAdapter):
def __init__(self, config, url):
TestSiteAdapter.__init__(self, config, url)
@staticmethod
def getSiteDomain():
return 'test4.com'
def getClass():
return Test4SiteAdapter

View file

@ -168,7 +168,7 @@ class TheMasqueNetAdapter(BaseSiteAdapter):
self.story.setMetadata('author',a.string)
# Find the chapters:
for chapter in soup.findAll('a', href=re.compile(r'viewstory.php\?sid='+self.story.getMetadata('storyId')+r"&chapter=\d+$")):
for chapter in soup.find_all('a', href=re.compile(r'viewstory.php\?sid='+self.story.getMetadata('storyId')+r"&chapter=\d+$")):
# just in case there's tags, like <i> in chapter titles.
self.add_chapter(chapter,'https://'+self.host + self.section + chapter['href']+addurl)
@ -186,7 +186,7 @@ class TheMasqueNetAdapter(BaseSiteAdapter):
# summary, rated, word count, categories, characters, genre, warnings, completed, published, updated, seires
# <span class="label">Rated:</span> NC-17<br /> etc
labels = soup.findAll('span',{'class':'label'})
labels = soup.find_all('span',{'class':'label'})
for labelspan in labels:
value = labelspan.nextSibling
label = labelspan.text
@ -207,22 +207,22 @@ class TheMasqueNetAdapter(BaseSiteAdapter):
self.story.setMetadata('numWords', value)
if 'Categories' in label:
cats = labelspan.parent.findAll('a',href=re.compile(r'browse.php\?type=categories'))
cats = labelspan.parent.find_all('a',href=re.compile(r'browse.php\?type=categories'))
for cat in cats:
self.story.addToList('category',cat.string)
if 'Characters' in label:
chars = labelspan.parent.findAll('a',href=re.compile(r'browse.php\?type=characters'))
chars = labelspan.parent.find_all('a',href=re.compile(r'browse.php\?type=characters'))
for char in chars:
self.story.addToList('characters',char.string)
if 'Genre' in label:
genres = labelspan.parent.findAll('a',href=re.compile(r'browse.php\?type=class&type_id=1'))
genres = labelspan.parent.find_all('a',href=re.compile(r'browse.php\?type=class&type_id=1'))
for genre in genres:
self.story.addToList('genre',genre.string)
if 'Warnings' in label:
warnings = labelspan.parent.findAll('a',href=re.compile(r'browse.php\?type=class&type_id=2'))
warnings = labelspan.parent.find_all('a',href=re.compile(r'browse.php\?type=class&type_id=2'))
for warning in warnings:
self.story.addToList('warnings',warning.string)

View file

@ -45,6 +45,9 @@ class TheSietchComAdapter(BaseXenForo2ForumAdapter):
# in case it needs more than just site/
return '/index.php?'
def loginFormMarker(self):
return 'href="/index.php?login/"'
def make_reader_url(self,tmcat_num,reader_page_num):
# https://www.the-sietch.com/index.php?threads/shattered-sphere-the-arcadian-free-march.3243/reader/page-2
# discard tmcat_num -- the-sietch.com doesn't have multiple

View file

@ -101,7 +101,6 @@ class TouchFluffyTailAdapter(BaseSiteAdapter):
self.story.setMetadata('status', 'Completed')
self.add_chapter(self.story.getMetadata('title'),url)
self.story.setMetadata('numChapters',1)
avrrate = body.find_all('footer', class_='entry-meta')[1].find('em').span.find_all('strong')
averrating = avrrate[1].text

View file

@ -126,11 +126,6 @@ class TrekFanFictionNetSiteAdapter(BaseSiteAdapter):
## url since we can't get the chapter without this, I'm leaving it in.
self.add_chapter(self.story.getMetadata('title'), url)
## I'm going to comment this out, because thereis always only one chapter for each story,
## so this is really not needed
## And I am uncommenting it because the rest of FFF expects
## there to always be numChapters, even if it's one. --Jimm
# getting the rest of the metadata... there isn't much here, and the summary can only be
# gotten on the author's page... so we'll get it to get the information from
adata = self.get_request(self.story.getMetadata('authorUrl'))

View file

@ -199,14 +199,14 @@ class TwistingTheHellmouthSiteAdapter(BaseSiteAdapter):
infodata = self.get_request(infourl)
infosoup = self.make_soup(infodata)
# for a in infosoup.findAll('a',href=re.compile(r"^/Author-\d+")):
# for a in infosoup.find_all('a',href=re.compile(r"^/Author-\d+")):
# self.story.addToList('authorId',a['href'].split('/')[1].split('-')[1])
# self.story.addToList('authorUrl','https://'+self.host+a['href'].replace("/Author-","/AuthorStories-"))
# self.story.addToList('author',stripHTML(a))
# second verticaltable is the chapter list.
table = infosoup.findAll('table',{'class':'verticaltable'})[1]
for a in table.findAll('a',href=re.compile(r"^/Story-"+self.story.getMetadata('storyId'))):
table = infosoup.find_all('table',{'class':'verticaltable'})[1]
for a in table.find_all('a',href=re.compile(r"^/Story-"+self.story.getMetadata('storyId'))):
autha = a.findNext('a',href=re.compile(r"^/Author-\d+"))
self.story.addToList('authorId',autha['href'].split('/')[1].split('-')[1])
self.story.addToList('authorUrl','https://'+self.host+autha['href'].replace("/Author-","/AuthorStories-"))
@ -224,7 +224,7 @@ class TwistingTheHellmouthSiteAdapter(BaseSiteAdapter):
# no selector found, so it's a one-chapter story.
self.add_chapter(self.story.getMetadata('title'),url)
else:
allOptions = select.findAll('option')
allOptions = select.find_all('option')
for o in allOptions:
url = "https://"+self.host+o['value']
# just in case there's tags, like <i> in chapter titles.
@ -237,7 +237,7 @@ class TwistingTheHellmouthSiteAdapter(BaseSiteAdapter):
BtVSNonX = False
char=None
romance=False
for cat in verticaltable.findAll('a', href=re.compile(r"^/Category-")):
for cat in verticaltable.find_all('a', href=re.compile(r"^/Category-")):
# assumes only one -Centered and one Pairing: cat can ever
# be applied to one story.
# Seen at least once: incorrect (empty) cat link, thus "and cat.string"
@ -265,7 +265,7 @@ class TwistingTheHellmouthSiteAdapter(BaseSiteAdapter):
if 'BtVS/AtS Non-Crossover' == cat.string:
BtVSNonX = True
verticaltabletds = verticaltable.findAll('td')
verticaltabletds = verticaltable.find_all('td')
self.story.setMetadata('rating', verticaltabletds[2].string)
self.story.setMetadata('numWords', verticaltabletds[4].string)
@ -279,7 +279,7 @@ class TwistingTheHellmouthSiteAdapter(BaseSiteAdapter):
self.story.setMetadata('datePublished',makeDate(stripHTML(verticaltabletds[8].string), self.dateformat))
self.story.setMetadata('dateUpdated',makeDate(stripHTML(verticaltabletds[9].string), self.dateformat))
for icon in storydiv.find('span',{'class':'storyicons'}).findAll('img'):
for icon in storydiv.find('span',{'class':'storyicons'}).find_all('img'):
if( icon['title'] not in ['Non-Crossover'] ) :
self.story.addToList('genre',icon['title'])
else:

View file

@ -127,7 +127,7 @@ class TwilightedNetSiteAdapter(BaseSiteAdapter):
self.story.setMetadata('author',a.string)
# Find the chapters:
for chapter in soup.findAll('a', href=re.compile(r'viewstory.php\?sid='+self.story.getMetadata('storyId')+r"&chapter=\d+$")):
for chapter in soup.find_all('a', href=re.compile(r'viewstory.php\?sid='+self.story.getMetadata('storyId')+r"&chapter=\d+$")):
# just in case there's tags, like <i> in chapter titles.
self.add_chapter(chapter,'https://'+self.host+'/'+chapter['href'])
@ -139,7 +139,7 @@ class TwilightedNetSiteAdapter(BaseSiteAdapter):
return ""
# <span class="label">Rated:</span> NC-17<br /> etc
labels = soup.findAll('span',{'class':'label'})
labels = soup.find_all('span',{'class':'label'})
for labelspan in labels:
value = labelspan.nextSibling
label = labelspan.string
@ -159,20 +159,20 @@ class TwilightedNetSiteAdapter(BaseSiteAdapter):
self.story.setMetadata('numWords', value)
if 'Categories' in label:
cats = labelspan.parent.findAll('a',href=re.compile(r'browse.php\?type=categories'))
cats = labelspan.parent.find_all('a',href=re.compile(r'browse.php\?type=categories'))
catstext = [cat.string for cat in cats]
for cat in catstext:
self.story.addToList('category',cat.string)
if 'Characters' in label:
chars = labelspan.parent.findAll('a',href=re.compile(r'browse.php\?type=characters'))
chars = labelspan.parent.find_all('a',href=re.compile(r'browse.php\?type=characters'))
charstext = [char.string for char in chars]
for char in charstext:
self.story.addToList('characters',char.string)
## twilighted.net doesn't use genre.
# if 'Genre' in label:
# genres = labelspan.parent.findAll('a',href=re.compile(r'browse.php\?type=class'))
# genres = labelspan.parent.find_all('a',href=re.compile(r'browse.php\?type=class'))
# genrestext = [genre.string for genre in genres]
# self.genre = ', '.join(genrestext)
# for genre in genrestext:
@ -199,7 +199,7 @@ class TwilightedNetSiteAdapter(BaseSiteAdapter):
series_url = 'https://'+self.host+'/'+a['href']
seriessoup = self.make_soup(self.get_request(series_url))
storyas = seriessoup.findAll('a', href=re.compile(r'^viewstory.php\?sid=\d+$'))
storyas = seriessoup.find_all('a', href=re.compile(r'^viewstory.php\?sid=\d+$'))
i=1
for a in storyas:
if a['href'] == ('viewstory.php?sid='+self.story.getMetadata('storyId')):

View file

@ -199,9 +199,6 @@ class Voracity2EficComAdapter(BaseSiteAdapter):
self.story.setMetadata('series', a.string)
self.story.setMetadata('seriesUrl', urlparse.urljoin(self.BASE_URL, a['href']))
elif key == 'Chapter':
self.story.setMetadata('numChapters', int(value))
elif key == 'Completed':
self.story.setMetadata('status', 'Completed' if value == 'Yes' else 'In-Progress')

View file

@ -111,7 +111,7 @@ class WalkingThePlankOrgAdapter(BaseSiteAdapter):
self.story.setMetadata('author',a.string)
# Find the chapters:
for chapter in soup.findAll('a', href=re.compile(r'viewstory.php\?sid='+self.story.getMetadata('storyId')+r"&chapter=\d+$")):
for chapter in soup.find_all('a', href=re.compile(r'viewstory.php\?sid='+self.story.getMetadata('storyId')+r"&chapter=\d+$")):
# just in case there's tags, like <i> in chapter titles.
self.add_chapter(chapter,'http://'+self.host+'/archive/'+chapter['href']+addurl)
@ -126,7 +126,7 @@ class WalkingThePlankOrgAdapter(BaseSiteAdapter):
except:
return ""
labels = soup.findAll('span',{'class':'label'})
labels = soup.find_all('span',{'class':'label'})
for labelspan in labels:
value = labelspan.nextSibling
label = labelspan.string
@ -150,24 +150,24 @@ class WalkingThePlankOrgAdapter(BaseSiteAdapter):
self.story.setMetadata('reads', value)
if 'Categories' in label:
cats = labelspan.parent.findAll('a',href=re.compile(r'browse.php\?type=categories'))
cats = labelspan.parent.find_all('a',href=re.compile(r'browse.php\?type=categories'))
catstext = [cat.string for cat in cats]
for cat in catstext:
self.story.addToList('category',cat.string)
if 'Characters' in label:
chars = labelspan.parent.findAll('a',href=re.compile(r'browse.php\?type=characters'))
chars = labelspan.parent.find_all('a',href=re.compile(r'browse.php\?type=characters'))
charstext = [char.string for char in chars]
for char in charstext:
self.story.addToList('characters',char.string)
if 'Genre' in label:
genres = labelspan.parent.findAll('a',href=re.compile(r'browse.php\?type=class&type_id=1'))
genres = labelspan.parent.find_all('a',href=re.compile(r'browse.php\?type=class&type_id=1'))
for genre in genres:
self.story.addToList('genre',genre.string)
if 'Warnings' in label:
warnings = labelspan.parent.findAll('a',href=re.compile(r'browse.php\?type=class&type_id=2')) # XXX
warnings = labelspan.parent.find_all('a',href=re.compile(r'browse.php\?type=class&type_id=2')) # XXX
for warning in warnings:
self.story.addToList('warnings',warning.string)
@ -190,7 +190,7 @@ class WalkingThePlankOrgAdapter(BaseSiteAdapter):
series_url = 'http://'+self.host+'/archive/'+a['href']
seriessoup = self.make_soup(self.get_request(series_url))
storyas = seriessoup.findAll('a', href=re.compile(r'^viewstory.php\?sid=\d+$'))
storyas = seriessoup.find_all('a', href=re.compile(r'^viewstory.php\?sid=\d+$'))
i=1
for a in storyas:
if a['href'] == ('viewstory.php?sid='+self.story.getMetadata('storyId')):

View file

@ -80,7 +80,7 @@ class WhoficComSiteAdapter(BaseSiteAdapter):
# no selector found, so it's a one-chapter story.
self.add_chapter(self.story.getMetadata('title'),url)
else:
allOptions = select.findAll('option')
allOptions = select.find_all('option')
for o in allOptions:
url = self.url + "&chapter=%s" % o['value']
# just in case there's tags, like <i> in chapter titles.
@ -178,7 +178,7 @@ class WhoficComSiteAdapter(BaseSiteAdapter):
series_url = 'https://'+self.host+'/'+a['href']
try:
seriessoup = self.make_soup(self.get_request(series_url))
storyas = seriessoup.findAll('a', href=re.compile(r'^viewstory.php\?sid=\d+$'))
storyas = seriessoup.find_all('a', href=re.compile(r'^viewstory.php\?sid=\d+$'))
i=1
for a in storyas:
if a['href'] == ('viewstory.php?sid='+self.story.getMetadata('storyId')):

View file

@ -100,7 +100,7 @@ class WolverineAndRogueComAdapter(BaseSiteAdapter):
self.story.setMetadata('rating', rating)
# Find the chapters:
for chapter in soup.findAll('a', href=re.compile(r'viewstory.php\?sid='+self.story.getMetadata('storyId')+r"&chapter=\d+$")):
for chapter in soup.find_all('a', href=re.compile(r'viewstory.php\?sid='+self.story.getMetadata('storyId')+r"&chapter=\d+$")):
# just in case there's tags, like <i> in chapter titles.
self.add_chapter(chapter,'https://'+self.host+'/wrfa/'+chapter['href'])
@ -110,7 +110,7 @@ class WolverineAndRogueComAdapter(BaseSiteAdapter):
# <span class="label">Rated:</span> NC-17<br /> etc
content=soup.find('div',{'class' : 'content'})
labels = soup.findAll('span',{'class':'label'})
labels = soup.find_all('span',{'class':'label'})
value = labels[0].previousSibling
svalue = ""
@ -134,22 +134,22 @@ class WolverineAndRogueComAdapter(BaseSiteAdapter):
self.story.setMetadata('numWords', value.split(' -')[0])
if 'Categories' in label:
cats = labelspan.parent.findAll('a',href=re.compile(r'browse.php\?type=categories'))
cats = labelspan.parent.find_all('a',href=re.compile(r'browse.php\?type=categories'))
for cat in cats:
self.story.addToList('category',cat.string)
if 'Characters' in label:
chars = labelspan.parent.findAll('a',href=re.compile(r'browse.php\?type=characters'))
chars = labelspan.parent.find_all('a',href=re.compile(r'browse.php\?type=characters'))
for char in chars:
self.story.addToList('characters',char.string)
if 'Genre' in label:
genres = labelspan.parent.findAll('a',href=re.compile(r'browse.php\?type=class&type_id=1'))
genres = labelspan.parent.find_all('a',href=re.compile(r'browse.php\?type=class&type_id=1'))
for genre in genres:
self.story.addToList('genre',genre.string)
if 'Warnings' in label:
warnings = labelspan.parent.findAll('a',href=re.compile(r'browse.php\?type=class&type_id=2'))
warnings = labelspan.parent.find_all('a',href=re.compile(r'browse.php\?type=class&type_id=2'))
for warning in warnings:
self.story.addToList('warnings',warning.string)
@ -173,7 +173,7 @@ class WolverineAndRogueComAdapter(BaseSiteAdapter):
seriessoup = self.make_soup(self.get_request(series_url))
# can't use ^viewstory...$ in case of higher rated stories with javascript href.
storyas = seriessoup.findAll('a', href=re.compile(r'viewstory.php\?sid=\d+'))
storyas = seriessoup.find_all('a', href=re.compile(r'viewstory.php\?sid=\d+'))
i=1
for a in storyas:
# skip 'report this' and 'TOC' links

Some files were not shown because too many files have changed in this diff Show more