mirror of https://github.com/kemayo/leech synced 2026-02-04 13:51:27 +01:00

No description

Find a file

David Lynch a92575687c Specify the license (MIT)		2017-10-11 20:20:55 -05:00
sites	Remove TODO from arbitrary	2017-10-06 14:08:18 -05:00
.editorconfig	Set up for Travis	2017-02-08 13:20:14 -06:00
.flake8	Set up for Travis	2017-02-08 13:20:14 -06:00
.gitignore	Changed mind for arbitrary: JSON definitions	2017-04-24 22:02:16 -05:00
.travis.yml	Force latest node in Travis	2017-04-24 01:42:50 -05:00
cover.py	Cover text: outlined	2017-02-23 14:40:55 -06:00
epub.py	Set up for Travis	2017-02-08 13:20:14 -06:00
leech.py	Fix travis eclint call	2017-04-24 01:24:13 -05:00
LICENSE.txt	Specify the license (MIT)	2017-10-11 20:20:55 -05:00
README.markdown	Fix readme typo	2017-10-06 14:13:39 -05:00
requirements.txt	Upgrade requirements	2017-09-08 11:49:30 -05:00

README.markdown

Leech

Let's say you want to read some sort of fiction. You're a fan of it, perhaps. But mobile websites are kind of non-ideal, so you'd like a proper ebook made from whatever you're reading.

Setup

You need Python 3.

My recommended setup process is:

$ pyvenv venv
$ source venv/bin/activate
$ pip install -r requirements.txt

...adjust as needed. Just make sure the dependencies from requirements.txt get installed somehow.

Usage

$ python3 leech.py [[URL]]

A new file will appear named Title of the Story.epub.

If you want to put it on a Kindle you'll have to convert it. I'd recommend Calibre, though you could also try using kindlegen directly.

Supports

Fanfiction.net
FictionPress
ArchiveOfOurOwn
- Yes, it has its own built-in EPUB export, but the formatting is horrible
Various XenForo-based sites: SpaceBattles and SufficientVelocity, most notably
DeviantArt galleries/collections
Sta.sh
Completely arbitrary sites, with a bit more work (see below)

Configuration

A very small amount of configuration is possible by creating a file called leech.json in the project directory. Currently you can define login information for sites that support it.

Example:

{
    "logins": {
        "QuestionableQuesting": ["username", "password"]
    }
}

Arbitrary Sites

If you want to just download a one-off story from a site, you can create a definition file to describe it. This requires investigation and understanding of things like CSS selectors, which may take some trial and error.

Example practical.json:

{
    "url": "https://practicalguidetoevil.wordpress.com/table-of-contents/",
    "title": "A Practical Guide To Evil: Book 1",
    "author": "erraticerrata",
    "chapter_selector": "#main .entry-content > ul > li > a",
    "content_selector": "#main .entry-content",
    "filter_selector": ".sharedaddy, .wpcnt, style"
}

Run as:

$ ./leech.py practical.json

This tells leech to load url, follow the links described by chapter_selector, extract the content from those pages as described by content_selector, and remove any content from that which matches filter_selector.

If chapter_selector isn't given, it'll create a single-chapter book by applying content_selector to url.

This is a fairly viable way to extract a story from, say, a random Wordpress installation. It's relatively likely to get you at least most of the way to the ebook you want, with maybe some manual editing needed.

If you need more advanced behavior, consider looking at...

Adding new site handers

To add support for a new site, create a file in the sites directory that implements the Site interface. Take a look at ao3.py for a minimal example of what you have to do.

Contributing

If you submit a pull request to add support for another reasonably-general-purpose site, I will nigh-certainly accept it.

Run EpubCheck on epubs you generate to make sure they're not breaking.