New sites courtesy of GComyn: writing.whimsicalwanderings.net www.giantessworld.net www.looselugs.com www.tomparisdorm.com www.valentchamber.com www.gluttonyfiction.com www.lotrgfic.com

This commit is contained in:
Jim Miller 2016-11-22 13:27:24 -06:00
parent 4981af2447
commit 3d63135217
10 changed files with 1206 additions and 0 deletions

View file

@ -1351,6 +1351,19 @@ cover_exclusion_regexp:/css/bir.png
[forums.sufficientvelocity.com]
## see [base_xenforoforum]
[gluttonyfiction.com]
## Some sites do not require a login, but do require the user to
## confirm they are adult for adult content. In commandline version,
## this should go in your personal.ini, not defaults.ini.
#is_adult:true
## Some sites require login (or login for some rated stories) The
## program can prompt you, or you can save it in config. In
## commandline version, this should go in your personal.ini, not
## defaults.ini.
#username:YourName
#password:yourpassword
[harem.lucifael.com]
## Some sites do not require a login, but do require the user to
## confirm they are adult for adult content. In commandline version,
@ -1811,6 +1824,22 @@ extra_valid_entries:reviews,readings
reviews_label:Reviews
readings_label:Readings
[writing.whimsicalwanderings.net]
## Some sites do not require a login, but do require the user to
## confirm they are adult for adult content. In commandline version,
## this should go in your personal.ini, not defaults.ini.
#is_adult:true
## Some sites require login (or login for some rated stories) The
## program can prompt you, or you can save it in config. In
## commandline version, this should go in your personal.ini, not
## defaults.ini.
#username:YourName
#password:yourpassword
extra_valid_entries:storynotes
storynotes_label: Story Notes
add_to_titlepage_entries:,storynotes
[www.adastrafanfic.com]
## Some sites do not require a login, but do require the user to
## confirm they are adult for adult content. In commandline version,
@ -2072,6 +2101,25 @@ keep_in_order_groupsUrl:true
## make entryHTML.
make_linkhtml_entries:prequel,sequels,groups,coverSource
[www.giantessworld.net]
extra_valid_entries:growth, shrink, sizeroles
growth_label: Growth
shrink_label:Shrink
sizeroles_label:Size Roles
add_to_titlepage_entries:,growth, shrink, sizeroles
## Some sites do not require a login, but do require the user to
## confirm they are adult for adult content. In commandline version,
## this should go in your personal.ini, not defaults.ini.
#is_adult:true
## Some sites require login (or login for some rated stories) The
## program can prompt you, or you can save it in config. In
## commandline version, this should go in your personal.ini, not
## defaults.ini.
#username:YourName
#password:yourpassword
[www.harrypotterfanfiction.com]
## Some sites do not require a login, but do require the user to
## confirm they are adult for adult content. In commandline version,
@ -2124,6 +2172,36 @@ extraships:InuYasha/Kagome
## Site dedicated to these categories/characters/ships
extracategories:Lord of the Rings
[www.lotrgfic.com]
extra_valid_entries:places, times
places_label: Places
times_label:Times
add_to_titlepage_entries:,places, times
## Some sites do not require a login, but do require the user to
## confirm they are adult for adult content. In commandline version,
## this should go in your personal.ini, not defaults.ini.
#is_adult:true
## Some sites require login (or login for some rated stories) The
## program can prompt you, or you can save it in config. In
## commandline version, this should go in your personal.ini, not
## defaults.ini.
#username:YourName
#password:yourpassword
[www.looselugs.com]
## Some sites do not require a login, but do require the user to
## confirm they are adult for adult content. In commandline version,
## this should go in your personal.ini, not defaults.ini.
#is_adult:true
## Some sites require login (or login for some rated stories) The
## program can prompt you, or you can save it in config. In
## commandline version, this should go in your personal.ini, not
## defaults.ini.
#username:YourName
#password:yourpassword
[www.masseffect2.in]
## Site dedicated to this fandom.
extracategories: Mass Effect
@ -2306,6 +2384,20 @@ extracategories:Lord of the Rings
#username:YourName
#password:yourpassword
[www.tomparisdorm.com]
extracategories:Star Trek: Voyager
## Some sites do not require a login, but do require the user to
## confirm they are adult for adult content. In commandline version,
## this should go in your personal.ini, not defaults.ini.
#is_adult:true
## Some sites require login (or login for some rated stories) The
## program can prompt you, or you can save it in config. In
## commandline version, this should go in your personal.ini, not
## defaults.ini.
#username:YourName
#password:yourpassword
[www.tthfanfic.org]
user_agent:
slow_down_sleep_time:2
@ -2377,6 +2469,19 @@ extracategories:Twilight
## twilighted.net (ab)uses series as personal reading lists.
collect_series: false
[www.valentchamber.com]
## Some sites do not require a login, but do require the user to
## confirm they are adult for adult content. In commandline version,
## this should go in your personal.ini, not defaults.ini.
#is_adult:true
## Some sites require login (or login for some rated stories) The
## program can prompt you, or you can save it in config. In
## commandline version, this should go in your personal.ini, not
## defaults.ini.
#username:YourName
#password:yourpassword
[www.walkingtheplank.org]
extra_valid_entries:reads
reads_label:Read Count

View file

@ -148,6 +148,13 @@ import adapter_chosentwofanficcom
import adapter_bdsmlibrarycom
import adapter_ficsitecom
import adapter_asexstoriescom
import adapter_gluttonyfictioncom
import adapter_valentchambercom
import adapter_looselugscom
import adapter_wwwgiantessworldnet
import adapter_lotrgficcom
import adapter_tomparisdormcom
import adapter_writingwhimsicalwanderingsnet
## This bit of complexity allows adapters to be added by just adding
## importing. It eliminates the long if/else clauses we used to need

View file

@ -0,0 +1,49 @@
# -*- coding: utf-8 -*-
# Copyright 2015 FanFicFare team
# Copyright 2016 FanFicFare team
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# Software: eFiction
##################################################################################
### Rewritten by: GComyn on November, 06, 2016
### Original was adapter_fannation.py
##################################################################################
from base_efiction_adapter import BaseEfictionAdapter
class GluttonyFictionComAdapter(BaseEfictionAdapter):
@staticmethod
def getSiteDomain():
return 'gluttonyfiction.com'
@classmethod
def getSiteAbbrev(self):
return 'gfcom'
@classmethod
def getDateFormat(self):
# The date format will vary from site to site.
# http://docs.python.org/library/datetime.html#strftime-strptime-behavior
return "%d/%m/%y"
##################################################################################
### The Efiction Base Adapter uses the Bulk story to retrieve the metadata, but
### on this site, the Rating is not present in the Bulk page...
### so it is not retrieved.
##################################################################################
def getClass():
return GluttonyFictionComAdapter

View file

@ -0,0 +1,49 @@
# -*- coding: utf-8 -*-
# Copyright 2015 FanFicFare team
# Copyright 2016 FanFicFare team
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# Software: eFiction
##################################################################################
### Rewritten by: GComyn on November, 06, 2016
### Original was adapter_fannation.py
##################################################################################
from base_efiction_adapter import BaseEfictionAdapter
class LooseLugsComAdapter(BaseEfictionAdapter):
@staticmethod
def getSiteDomain():
return 'www.looselugs.com'
@classmethod
def getSiteAbbrev(self):
return 'looselugs'
@classmethod
def getDateFormat(self):
# The date format will vary from site to site.
# http://docs.python.org/library/datetime.html#strftime-strptime-behavior
return "%B %d, %Y"
##################################################################################
### The Efiction Base Adapter uses the Bulk story to retrieve the metadata, but
### on this site, the Rating is not present in the Bulk page...
### so it is not retrieved.
##################################################################################
def getClass():
return LooseLugsComAdapter

View file

@ -0,0 +1,371 @@
# -*- coding: utf-8 -*-
# Copyright 2011 Fanficdownloader team, 2015 FanFicFare team
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
##############################################################################
### Adapted by GComyn
### Completed on November, 22, 2016
##############################################################################
import time
import logging
logger = logging.getLogger(__name__)
import re
import urllib
import urllib2
from ..htmlcleanup import stripHTML
from .. import exceptions as exceptions
from base_adapter import BaseSiteAdapter, makeDate
class LOTRgficComAdapter(BaseSiteAdapter):
def __init__(self, config, url):
BaseSiteAdapter.__init__(self, config, url)
self.story.setMetadata('siteabbrev','lotrgfic')
self.decode = ["utf8",
"Windows-1252",
"iso-8859-1"] # 1252 is a superset of iso-8859-1.
# Most sites that claim to be
# iso-8859-1 (and some that claim to be
# utf8) are really windows-1252.
self.username = "NoneGiven" # if left empty, site doesn't return any message at all.
self.password = ""
self.is_adult=False
# get storyId from url--url validation guarantees query is only sid=1234
self.story.setMetadata('storyId',self.parsedUrl.query.split('=',)[1])
# normalized story URL.
self._setURL('http://' + self.getSiteDomain() + '/viewstory.php?sid='+self.story.getMetadata('storyId'))
@staticmethod
def getSiteDomain():
return 'www.lotrgfic.com'
@classmethod
def getSiteExampleURLs(cls):
return "http://"+cls.getSiteDomain()+"/viewstory.php?sid=1234"
def getSiteURLPattern(self):
return re.escape("http://"+self.getSiteDomain()+"/viewstory.php?sid=")+r"\d+$"
def use_pagecache(self):
'''
adapters that will work with the page cache need to implement
this and change it to True.
'''
return True
def extractChapterUrlsAndMetadata(self):
if self.is_adult or self.getConfig("is_adult"):
addurl = "&warning=3"
else:
addurl=""
url = self.url+'&index=1'+addurl
logger.debug("URL: "+url)
try:
data = self._fetchUrl(url)
except urllib2.HTTPError, e:
if e.code == 404:
raise exceptions.StoryDoesNotExist(self.url)
else:
raise e
if "Content is only suitable for mature adults. May contain explicit language and adult themes. Equivalent of NC-17." in data:
raise exceptions.AdultCheckRequired(self.url)
elif "Access denied. This story has not been validated by the adminstrators of this site." in data:
raise exceptions.AccessDenied(self.getSiteDomain() +" says: Access denied. This story has not been validated by the adminstrators of this site.")
# use BeautifulSoup HTML parser to make everything easier to find.
soup = self.make_soup(data)
### Main Content for the Table Of Contents page.
div = soup.find('div',{'id':'maincontent'})
divfooter = div.find('div',{'id':'footer'})
if divfooter != None:
divfooter.extract()
## Title
a = div.find('a', href=re.compile(r'viewstory.php\?sid='+self.story.getMetadata('storyId')+"$"))
self.story.setMetadata('title',stripHTML(a))
# Find authorid and URL from... author url.
a = div.find('a', href=re.compile(r"viewuser.php"))
self.story.setMetadata('authorId',a['href'].split('=')[1])
self.story.setMetadata('authorUrl','http://'+self.host+'/'+a['href'])
self.story.setMetadata('author',a.string)
# Find the chapters:
for chapter in div.findAll('a', href=re.compile(r'viewstory.php\?sid='+self.story.getMetadata('storyId')+"&chapter=\d+$")):
# just in case there's tags, like <i> in chapter titles.
self.chapterUrls.append((stripHTML(chapter),'http://'+self.host+'/'+chapter['href']+addurl))
self.story.setMetadata('numChapters',len(self.chapterUrls))
### Metadata is contained
def defaultGetattr(d,k):
try:
return d[k]
except:
return ""
# <span class="label">Rated:</span> NC-17<br /> etc
### This site has the metadata formatted all over the place,
### so we have to do some very cludgy programming to get it.
### If someone can do it better, please do so, and let us know.
## I'm going to leave this section in, so we can get those
## elements that are "formatted correctly".
labels = soup.findAll('span',{'class':'label'})
for labelspan in labels:
value = labelspan.nextSibling
label = labelspan.string
if 'Summary' in label:
## the summary is not encased in a span label... so we can't do anything here.
## I'm going to leave it here just in case.
## Everything until the next span class='label'
svalue = ''
while value and 'label' not in defaultGetattr(value,'class'):
svalue += unicode(value)
value = value.nextSibling
# sometimes poorly formated desc (<p> w/o </p>) leads
# to all labels being included.
svalue=svalue[:svalue.find('<span class="label">')]
self.setDescription(url,svalue)
if 'Rated' in label:
self.story.setMetadata('rating', value)
if 'Word count' in label:
self.story.setMetadata('numWords', value)
if 'Categories' in label:
cats = labelspan.parent.findAll('a',href=re.compile(r'browse.php\?type=categories'))
catstext = [cat.string for cat in cats]
for cat in catstext:
self.story.addToList('category',cat.string)
if 'Characters' in label:
chars = labelspan.parent.findAll('a',href=re.compile(r'browse.php\?type=characters'))
charstext = [char.string for char in chars]
for char in charstext:
self.story.addToList('characters',char.string)
if 'Genre' in label:
genres = labelspan.parent.findAll('a',href=re.compile(r'browse.php\?type=class&type_id=1'))
genrestext = [genre.string for genre in genres]
self.genre = ', '.join(genrestext)
for genre in genrestext:
self.story.addToList('genre',genre.string)
if 'Warnings' in label:
warnings = labelspan.parent.findAll('a',href=re.compile(r'browse.php\?type=class&type_id=4'))
warningstext = [warning.string for warning in warnings]
self.warning = ', '.join(warningstext)
for warning in warningstext:
self.story.addToList('warnings',warning.string)
if 'Places' in label:
places = labelspan.parent.findAll('a',href=re.compile(r'browse.php\?type=class&type_id=2'))
placestext = [place.string for place in places]
self.warning = ', '.join(placestext)
for place in placestext:
self.story.addToList('places',place.string)
if 'Times' in label:
times = labelspan.parent.findAll('a',href=re.compile(r'browse.php\?type=class&type_id=3'))
timestext = [time.string for time in times]
self.warning = ', '.join(timestext)
for time in timestext:
self.story.addToList('times',time.string)
if 'Completed' in label:
if 'Yes' in value:
self.story.setMetadata('status', 'Completed')
else:
self.story.setMetadata('status', 'In-Progress')
if 'Published' in label:
self.story.setMetadata('datePublished', makeDate(value.strip(), "%d %b %Y"))
if 'Updated' in label:
# there's a stray [ at the end.
#value = value[0:-1]
self.story.setMetadata('dateUpdated', makeDate(value.strip(), "%d %b %Y"))
try:
# Find Series name from series URL.
a = soup.find('a', href=re.compile(r"viewseries.php\?seriesid=\d+"))
series_name = a.string
series_url = 'http://'+self.host+'/'+a['href']
# use BeautifulSoup HTML parser to make everything easier to find.
seriessoup = self_make_soup(self._fetchUrl(series_url))
storyas = seriessoup.findAll('a', href=re.compile(r'^viewstory.php\?sid=\d+$'))
i=1
for a in storyas:
if a['href'] == ('viewstory.php?sid='+self.story.getMetadata('storyId')):
self.setSeries(series_name, i)
self.story.setMetadata('seriesUrl',series_url)
break
i+=1
except:
# I find it hard to care if the series parsing fails
pass
## Now we are going to cludge together the rest of the metadata
metad = soup.findAll('p',{'class':'smaller'})
## Categories don't have a proper label, but do use links, so...
cats = soup.findAll('a',href=re.compile(r'browse.php\?type=categories'))
catstext = [cat.string for cat in cats]
for cat in catstext:
if cat != None:
self.story.addToList('category',cat.string)
## Characters don't have a proper label, but do use links, so...
chars = soup.findAll('a',href=re.compile(r'browse.php\?type=characters'))
charstext = [char.string for char in chars]
for char in charstext:
if char != None:
self.story.addToList('characters',char.string)
### Rating is not enclosed in a label, only in a p tag classed 'smaller' so...
ratng = metad[0].find('strong').get_text().replace('Rated','').strip()
self.story.setMetadata('rating', ratng)
## No we try to get the summary... it's not within it's own
## dedicated tag, so we have to split some hairs..
## This may not work every time... but I tested it with 6 stories...
mdata = metad[0]
while '<hr/>' not in str(mdata.nextSibling):
mdata = mdata.nextSibling
self.setDescription(url,mdata.previousSibling.previousSibling.get_text())
### the rest of the metadata are not in tags at all... so we have to be really cludgy.
## we don't need the rest of them, so we get rid of all but the last one
metad = metad[-1]
## we also don't need any of the links in here, so we'll get rid of them as well.
links = metad.findAll('a')
for link in links:
link.extract()
## and we've already done the labels, so let's remove them
labels = metad.findAll('span',{'class':'label'})
for label in labels:
label.extract()
## now we should only have text and <br>'s... somthing like this:
#<p class="smaller">Categories:
#<br/>
#Characters: , , ,
#<br/>
# , <br/> <br/> <br/> None<br/>
#Challenges: None
#<br/>
#Series: None
#<br/>
#Chapters: 1    |    Word count: 200    |    Read Count: 767
#<br/>
#Completed: Yes    |    Updated: 04/27/13    |    Published: 04/27/13
#<br/>
#</p>
## we'll have to remove the non-breaking spaces to get this to work.
metad = str(metad).replace(b"\xc2\xa0",'').replace('\n','')
for txt in metad.split('<br/>'):
if 'Challenges:' in txt:
txt = txt.replace('Challenges:','').strip()
self.story.setMetadata('challenges', txt)
elif 'Series:' in txt:
txt = txt.replace('Series:','').strip()
self.story.setMetadata('challenges', txt)
elif 'Chapters:' in txt:
for txt2 in txt.split('|'):
txt2 = txt2.replace('\n','').strip()
if 'Word count:' in txt2:
txt2 = txt2.replace('Word count:','').strip()
self.story.setMetadata('numWords', value)
elif 'Read Count:' in txt2:
txt2= txt2.replace('Read Count:','').strip()
self.story.setMetadata('readings', value)
elif 'Completed:' in txt:
for txt2 in txt.split('|'):
txt2 = txt2.strip()
if 'Completed:' in txt2:
if 'Yes' in txt2:
self.story.setMetadata('status', 'Completed')
else:
self.story.setMetadata('status', 'In-Progress')
elif 'Updated:' in txt2:
txt2= txt2.replace('Updated:','').strip()
self.story.setMetadata('dateUpdated', makeDate(txt2.strip(), "%b/%d/%y"))
elif 'Published:' in txt2:
txt2= txt2.replace('Published:','').strip()
self.story.setMetadata('datePublished', makeDate(txt2.strip(), "%b/%d/%y"))
def getChapterText(self, url):
logger.debug('Getting chapter text from: %s' % url)
data = self._fetchUrl(url)
# problems with some stories, but only in calibre. I suspect
# issues with different SGML parsers in python. This is a
# nasty hack, but it works.
data = data[data.index("<body"):]
soup = self.make_soup(data)
span = soup.find('div', {'id' : 'maincontent'})
# Everything is encased in the maincontent section, so we have
# to remove as much as we can systematically
tables = span.findAll('table')
for table in tables:
table.extract()
headings = span.findAll('h3')
for heading in headings:
heading.extract()
links = span.findAll('a')
for link in links:
link.extract()
forms = span.findAll('form')
for form in forms:
form.extract()
divs = span.findAll('div')
for div in divs:
div.extract()
if None == span:
raise exceptions.FailedToDownload("Error downloading Chapter: %s! Missing required element!" % url)
return self.utf8FromSoup(url,span)
def getClass():
return LOTRgficComAdapter

View file

@ -0,0 +1,206 @@
# -*- coding: utf-8 -*-
# Copyright 2012 Fanficdownloader team, 2015 FanFicFare team
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# Software: eFiction
import time
import logging
logger = logging.getLogger(__name__)
import re
import urllib2
import sys
from bs4.element import Comment
from ..htmlcleanup import stripHTML
from .. import exceptions as exceptions
from base_adapter import BaseSiteAdapter, makeDate
def getClass():
return TomParisDormComAdapter
# Class name has to be unique. Our convention is camel case the
# sitename with Adapter at the end. www is skipped.
class TomParisDormComAdapter(BaseSiteAdapter):
def __init__(self, config, url):
BaseSiteAdapter.__init__(self, config, url)
self.decode = ["utf8",
"Windows-1252",
"iso-8859-1"] # 1252 is a superset of iso-8859-1.
# Most sites that claim to be
# iso-8859-1 (and some that claim to be
# utf8) are really windows-1252.
self.username = "NoneGiven" # if left empty, site doesn't return any message at all.
self.password = ""
self.is_adult=False
# get storyId from url--url validation guarantees query is only sid=1234
self.story.setMetadata('storyId',self.parsedUrl.query.split('=',)[1])
# normalized story URL.
self._setURL('http://' + self.getSiteDomain() + '/viewstory.php?sid='+self.story.getMetadata('storyId'))
# Each adapter needs to have a unique site abbreviation.
self.story.setMetadata('siteabbrev','tpdorm')
# The date format will vary from site to site.
# http://docs.python.org/library/datetime.html#strftime-strptime-behavior
self.dateformat = "%d/%m/%y"
@staticmethod # must be @staticmethod, don't remove it.
def getSiteDomain():
# The site domain. Does have www here, if it uses it.
return 'www.tomparisdorm.com'
@classmethod
def getSiteExampleURLs(cls):
return "http://"+cls.getSiteDomain()+"/viewstory.php?sid=1234"
def getSiteURLPattern(self):
return re.escape("http://"+self.getSiteDomain()+"/viewstory.php?sid=")+r"\d+$"
## Getting the chapter list and the meta data, plus 'is adult' checking.
def extractChapterUrlsAndMetadata(self):
# index=1 makes sure we see the story chapter index. Some
# sites skip that for one-chapter stories.
url = self.url
logger.debug("URL: "+url)
try:
data = self._fetchUrl(url)
except urllib2.HTTPError, e:
if e.code == 404:
raise exceptions.StoryDoesNotExist(url)
else:
raise e
if "Access denied. This story has not been validated by the adminstrators of this site." in data:
raise exceptions.AccessDenied(self.getSiteDomain() +" says: Access denied. This story has not been validated by the adminstrators of this site.")
# use BeautifulSoup HTML parser to make everything easier to find.
soup = self.make_soup(data)
# print data
# Now go hunting for all the meta data and the chapter list.
## Title
a = soup.find('a', href=re.compile(r'viewstory.php\?sid='+self.story.getMetadata('storyId')+"$"))
self.story.setMetadata('title',stripHTML(a))
# Find authorid and URL from... author url.
a = soup.find('a', href=re.compile(r"viewuser.php\?uid=\d+"))
self.story.setMetadata('authorId',a['href'].split('=')[1])
self.story.setMetadata('authorUrl','http://'+self.host+a['href'])
self.story.setMetadata('author',a.string)
# Find the chapters:
for chapter in soup.findAll('a', href=re.compile(r'viewstory.php\?sid='+self.story.getMetadata('storyId')+"&chapter=\d+$")):
# just in case there's tags, like <i> in chapter titles.
self.chapterUrls.append((stripHTML(chapter),'http://'+self.host+'/'+chapter['href']))
self.story.setMetadata('numChapters',len(self.chapterUrls))
# eFiction sites don't help us out a lot with their meta data
# formating, so it's a little ugly.
## Getting the Summary
value = stripHTML(soup.find('div',{'class' : 'summary'})).strip('Summary')
self.setDescription(url,value)
# Get the rest of the Metadata
mdsoup = soup.find('div',{'id' : 'output'})
mdstr = str(mdsoup).replace('\n','').replace('\r','').replace('\t',' ').replace(' ',' ').replace(' ',' ').replace(' ',' ')
mdstr = stripHTML(mdstr.replace(r'<br/>',r'-:-').replace('|','-:-'))
mdstr = mdstr.replace(r'[Rev',r'-:-[Rev').replace(' -:- ','-:-').strip('-:-').strip('-:-')
##I am using this method, because the summary does not have a 'label', and the Metadata is always in this order (as far as I've tested)
for i, value in enumerate(mdstr.split('-:-')):
if 'Published:' in value:
val=value.split(':')[1].strip()
self.story.setMetadata('datePublished', makeDate(val, self.dateformat))
elif 'Last Updated:' in value:
val=value.split(':')[1].strip()
self.story.setMetadata('dateUpdated', makeDate(val, self.dateformat))
elif 'Rating:' in value:
val=value.split(':')[1].strip()
self.story.setMetadata('rating', stripHTML(val))
elif 'Category:' in value:
cats = mdsoup.findAll('a',href=re.compile(r'browse.php\?type=categories'))
for cat in cats:
self.story.addToList('category',cat.string)
elif 'Characters:' in value:
chars = mdsoup.findAll('a',href=re.compile(r'browse.php\?type=characters'))
for char in chars:
self.story.addToList('characters',char.string)
elif 'Genres:' in value:
genres = mdsoup.findAll('a',href=re.compile(r'browse.php\?type=class&type_id=1')) # XXX
for genre in genres:
self.story.addToList('genre',genre.string)
elif 'Warnings:' in value:
warnings = mdsoup.findAll('a',href=re.compile(r'browse.php\?type=class&type_id=2')) # XXX
for warning in warnings:
self.story.addToList('warnings',warning.string)
elif 'Challenge:' in value:
val=value.split(':')[1].strip()
self.story.setMetadata('challenge', stripHTML(val))
elif 'Series:' in value:
val=value.split(':')[1].strip()
self.story.setMetadata('series', stripHTML(val))
elif 'Chapters:' in value:
val=value.split(':')[1].strip()
self.story.setMetadata('numChapters', int(val))
elif 'Completed:' in value:
if 'Yes' in value:
self.story.setMetadata('status', 'Completed')
else:
self.story.setMetadata('status', 'In-Progress')
elif 'Word Count:' in value:
val=value.split(':')[1].strip()
self.story.setMetadata('numWords', int(stripHTML(val)))
elif 'Read:' in value:
val=value.split(':')[1].strip()
self.story.setMetadata('readings', stripHTML(val))
# grab the text for an individual chapter.
def getChapterText(self, url):
logger.debug('Getting chapter text from: %s' % url)
soup = self.make_soup(self._fetchUrl(url))
div = soup.find('div', {'id' : 'story'})
if None == div:
raise exceptions.FailedToDownload("Error downloading Chapter: %s! Missing required element!" % url)
return self.utf8FromSoup(url,div)

View file

@ -0,0 +1,49 @@
# -*- coding: utf-8 -*-
# Copyright 2015 FanFicFare team
# Copyright 2016 FanFicFare team
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# Software: eFiction
##################################################################################
### Rewritten by: GComyn on November, 06, 2016
### Original was adapter_fannation.py
##################################################################################
from base_efiction_adapter import BaseEfictionAdapter
class ValentChamberComAdapter(BaseEfictionAdapter):
@staticmethod
def getSiteDomain():
return 'www.valentchamber.com'
@classmethod
def getSiteAbbrev(self):
return 'vccom'
@classmethod
def getDateFormat(self):
# The date format will vary from site to site.
# http://docs.python.org/library/datetime.html#strftime-strptime-behavior
return "%B %d %Y"
##################################################################################
### The Efiction Base Adapter uses the Bulk story to retrieve the metadata, but
### on this site, the Rating is not present in the Bulk page...
### so it is not retrieved.
##################################################################################
def getClass():
return ValentChamberComAdapter

View file

@ -0,0 +1,227 @@
# -*- coding: utf-8 -*-
# Copyright 2012 Fanficdownloader team, 2015 FanFicFare team
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# Software: eFiction
import time
import logging
logger = logging.getLogger(__name__)
import re
import urllib2
import sys
from bs4.element import Comment
from ..htmlcleanup import stripHTML
from .. import exceptions as exceptions
from base_adapter import BaseSiteAdapter, makeDate
def getClass():
return WritingWhimsicalwanderingsNetAdapter
# Class name has to be unique. Our convention is camel case the
# sitename with Adapter at the end. www is skipped.
class WritingWhimsicalwanderingsNetAdapter(BaseSiteAdapter):
def __init__(self, config, url):
BaseSiteAdapter.__init__(self, config, url)
self.decode = ["utf8",
"Windows-1252",
"iso-8859-1"] # 1252 is a superset of iso-8859-1.
# Most sites that claim to be
# iso-8859-1 (and some that claim to be
# utf8) are really windows-1252.
self.username = "NoneGiven" # if left empty, site doesn't return any message at all.
self.password = ""
self.is_adult=False
# get storyId from url--url validation guarantees query is only sid=1234
self.story.setMetadata('storyId',self.parsedUrl.query.split('=',)[1])
# normalized story URL.
self._setURL('http://' + self.getSiteDomain() + '/viewstory.php?sid='+self.story.getMetadata('storyId'))
# Each adapter needs to have a unique site abbreviation.
self.story.setMetadata('siteabbrev','wwnet')
# The date format will vary from site to site.
# http://docs.python.org/library/datetime.html#strftime-strptime-behavior
self.dateformat = "%m/%d/%y"
@staticmethod # must be @staticmethod, don't remove it.
def getSiteDomain():
# The site domain. Does have www here, if it uses it.
return 'writing.whimsicalwanderings.net'
@classmethod
def getSiteExampleURLs(cls):
return "http://"+cls.getSiteDomain()+"/viewstory.php?sid=1234"
def getSiteURLPattern(self):
return re.escape("http://"+self.getSiteDomain()+"/viewstory.php?sid=")+r"\d+$"
## Getting the chapter list and the meta data, plus 'is adult' checking.
def extractChapterUrlsAndMetadata(self):
# index=1 makes sure we see the story chapter index. Some
# sites skip that for one-chapter stories.
url = self.url+'&index=1'
logger.debug("URL: "+url)
if self.is_adult or self.getConfig("is_adult"):
addurl = '&ageconsent=ok&warning=4'
else:
addurl= ''
try:
data = self._fetchUrl(url+addurl)
except urllib2.HTTPError, e:
if e.code == 404:
raise exceptions.StoryDoesNotExist(url)
else:
raise e
if "Access denied. This story has not been validated by the adminstrators of this site." in data:
raise exceptions.AccessDenied(self.getSiteDomain() +" says: Access denied. This story has not been validated by the adminstrators of this site.")
# use BeautifulSoup HTML parser to make everything easier to find.
soup = self.make_soup(data)
# Now go hunting for all the meta data and the chapter list.
## Title
a = soup.find('a', href=re.compile(r'viewstory.php\?sid='+self.story.getMetadata('storyId')+"$"))
self.story.setMetadata('title',stripHTML(a))
# Find authorid and URL from... author url.
a = soup.find('a', href=re.compile(r"viewuser.php\?uid=\d+"))
self.story.setMetadata('authorId',a['href'].split('=')[1])
self.story.setMetadata('authorUrl','http://'+self.host+a['href'])
self.story.setMetadata('author',a.string)
# Find the chapters:
for chapter in soup.findAll('a', href=re.compile(r'viewstory.php\?sid='+self.story.getMetadata('storyId')+"&chapter=\d+$")):
# just in case there's tags, like <i> in chapter titles.
self.chapterUrls.append((stripHTML(chapter),'http://'+self.host+'/'+chapter['href']))
self.story.setMetadata('numChapters',len(self.chapterUrls))
## This site's metadata is not very well formatted... so we have to cludge a bit..
## The only ones I see that are, are Relationships and Warnings...
## However, the categories, characters, and warnings are all links, so we can get them easier
## Categories don't have a proper label, but do use links, so...
cats = soup.findAll('a',href=re.compile(r'browse.php\?type=categories'))
catstext = [cat.string for cat in cats]
for cat in catstext:
if cat != None:
self.story.addToList('category',cat.string)
## Characters don't have a proper label, but do use links, so...
chars = soup.findAll('a',href=re.compile(r'browse.php\?type=characters'))
charstext = [char.string for char in chars]
for char in charstext:
if char != None:
self.story.addToList('characters',char.string)
## Warnings do have a proper label, but we will use links anyway
warnings = soup.findAll('a',href=re.compile(r'browse.php\?type=class&type_id=2'))
warningstext = [warning.string for warning in warnings]
for warning in warningstext:
if warning != None:
self.story.addToList('warnings',warning.string)
## Relationships do have a proper label, but we will use links anyway
## this is actually tag information ... m/f, gen, m/m and such.
## so I'm putting them in the extratags section.
relationships = soup.findAll('a',href=re.compile(r'browse.php\?type=class&type_id=1'))
relationshipstext = [relationship.string for relationship in relationships]
for relationship in relationshipstext:
if relationship != None:
self.story.addToList('ships',relationship.string)
## I know I'm replacing alot of <br>'s here, but I want to make sure that they are all
## the same, so we can split the string correctly.
metad = soup.find('div',{'class':'listbox'})
metad = str(metad.renderContents()).replace('\n',' ').replace('<br>','|||||||').replace('<br/>','|||||||').replace('<br />','|||||||').strip()
while '||||||||' in metad:
metad = metad.replace('||||||||','|||||||')
metad = stripHTML(metad)
for mdata in metad.split('|||||||'):
mdata = mdata.strip()
if mdata.startswith('Summary:'):
self.setDescription(url,mdata[8:].strip())
elif mdata.startswith('Rating'):
temp = mdata[:mdata.find('[')].replace('Rating:','')
self.story.setMetadata('rating', temp)
elif mdata.startswith('Series'):
pass
elif mdata.startswith('Chapters'):
temp = mdata.split('Completed:')[1]
if 'Yes' in stripHTML(temp):
self.story.setMetadata('status', 'Completed')
else:
self.story.setMetadata('status', 'In-Progress')
elif mdata.startswith('Word Count'):
self.story.setMetadata('numWords',mdata.replace('Word Count:','').strip())
elif mdata.startswith('Published'):
temp = mdata.split('Updated:')
self.story.setMetadata('datePublished', makeDate(temp[0].replace('Published:','').strip(), self.dateformat))
self.story.setMetadata('dateUpdated', makeDate(temp[1].strip(), self.dateformat))
# Find Series name from series URL.
a = soup.find('a', href=re.compile(r"viewseries.php\?seriesid=\d+"))
if a != None:
series_name = a.string
try:
series_url = 'http://'+self.host+'/'+a['href']
# use BeautifulSoup HTML parser to make everything easier to find.
seriessoup = self.make_soup(self._fetchUrl(series_url))
# can't use ^viewstory...$ in case of higher rated stories with javascript href.
storyas = seriessoup.findAll('a', href=re.compile(r'viewstory.php\?sid=\d+'))
i=1
for a in storyas:
# skip 'report this' and 'TOC' links
if 'contact.php' not in a['href'] and 'index' not in a['href']:
if a['href'] == ('viewstory.php?sid='+self.story.getMetadata('storyId')):
self.setSeries(series_name, i)
self.story.setMetadata('seriesUrl',series_url)
break
i+=1
except:
self.setSeries(series_name,0)
pass
storynotes = soup.find('blockquote')
if storynotes != None:
storynotes = stripHTML(storynotes).replace('Story Notes:','')
self.story.setMetadata('storynotes',storynotes)
# grab the text for an individual chapter.
def getChapterText(self, url):
logger.debug('Getting chapter text from: %s' % url)
soup = self.make_soup(self._fetchUrl(url))
div = soup.find('div', {'id' : 'story'})
if None == div:
raise exceptions.FailedToDownload("Error downloading Chapter: %s! Missing required element!" % url)
return self.utf8FromSoup(url,div)

View file

@ -0,0 +1,38 @@
# -*- coding: utf-8 -*-
# Copyright 2012 Fanficdownloader team, 2016 FanFicFare team
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
###########################################################################
### Adapted by GComyn - November 18, 2016
###########################################################################
# Software: eFiction
from base_efiction_adapter import BaseEfictionAdapter
class WWWGiantessworldNetAdapter(BaseEfictionAdapter):
@staticmethod
def getSiteDomain():
return 'www.giantessworld.net'
@classmethod
def getSiteAbbrev(self):
return 'gwnet'
@classmethod
def getDateFormat(self):
return "%B %d %Y"
def getClass():
return WWWGiantessworldNetAdapter

View file

@ -1370,6 +1370,19 @@ cover_exclusion_regexp:/css/bir.png
[forums.sufficientvelocity.com]
## see [base_xenforoforum]
[gluttonyfiction.com]
## Some sites do not require a login, but do require the user to
## confirm they are adult for adult content. In commandline version,
## this should go in your personal.ini, not defaults.ini.
#is_adult:true
## Some sites require login (or login for some rated stories) The
## program can prompt you, or you can save it in config. In
## commandline version, this should go in your personal.ini, not
## defaults.ini.
#username:YourName
#password:yourpassword
[harem.lucifael.com]
## Some sites do not require a login, but do require the user to
## confirm they are adult for adult content. In commandline version,
@ -1830,6 +1843,22 @@ extra_valid_entries:reviews,readings
reviews_label:Reviews
readings_label:Readings
[writing.whimsicalwanderings.net]
## Some sites do not require a login, but do require the user to
## confirm they are adult for adult content. In commandline version,
## this should go in your personal.ini, not defaults.ini.
#is_adult:true
## Some sites require login (or login for some rated stories) The
## program can prompt you, or you can save it in config. In
## commandline version, this should go in your personal.ini, not
## defaults.ini.
#username:YourName
#password:yourpassword
extra_valid_entries:storynotes
storynotes_label: Story Notes
add_to_titlepage_entries:,storynotes
[www.adastrafanfic.com]
## Some sites do not require a login, but do require the user to
## confirm they are adult for adult content. In commandline version,
@ -2085,6 +2114,25 @@ keep_in_order_groupsUrl:true
## make entryHTML.
make_linkhtml_entries:prequel,sequels,groups,coverSource
[www.giantessworld.net]
extra_valid_entries:growth, shrink, sizeroles
growth_label: Growth
shrink_label:Shrink
sizeroles_label:Size Roles
add_to_titlepage_entries:,growth, shrink, sizeroles
## Some sites do not require a login, but do require the user to
## confirm they are adult for adult content. In commandline version,
## this should go in your personal.ini, not defaults.ini.
#is_adult:true
## Some sites require login (or login for some rated stories) The
## program can prompt you, or you can save it in config. In
## commandline version, this should go in your personal.ini, not
## defaults.ini.
#username:YourName
#password:yourpassword
[www.harrypotterfanfiction.com]
## Some sites do not require a login, but do require the user to
## confirm they are adult for adult content. In commandline version,
@ -2137,6 +2185,36 @@ extraships:InuYasha/Kagome
## Site dedicated to these categories/characters/ships
extracategories:Lord of the Rings
[www.lotrgfic.com]
extra_valid_entries:places, times
places_label: Places
times_label:Times
add_to_titlepage_entries:,places, times
## Some sites do not require a login, but do require the user to
## confirm they are adult for adult content. In commandline version,
## this should go in your personal.ini, not defaults.ini.
#is_adult:true
## Some sites require login (or login for some rated stories) The
## program can prompt you, or you can save it in config. In
## commandline version, this should go in your personal.ini, not
## defaults.ini.
#username:YourName
#password:yourpassword
[www.looselugs.com]
## Some sites do not require a login, but do require the user to
## confirm they are adult for adult content. In commandline version,
## this should go in your personal.ini, not defaults.ini.
#is_adult:true
## Some sites require login (or login for some rated stories) The
## program can prompt you, or you can save it in config. In
## commandline version, this should go in your personal.ini, not
## defaults.ini.
#username:YourName
#password:yourpassword
[www.masseffect2.in]
## Site dedicated to this fandom.
extracategories: Mass Effect
@ -2319,6 +2397,20 @@ extracategories:Lord of the Rings
#username:YourName
#password:yourpassword
[www.tomparisdorm.com]
extracategories:Star Trek: Voyager
## Some sites do not require a login, but do require the user to
## confirm they are adult for adult content. In commandline version,
## this should go in your personal.ini, not defaults.ini.
#is_adult:true
## Some sites require login (or login for some rated stories) The
## program can prompt you, or you can save it in config. In
## commandline version, this should go in your personal.ini, not
## defaults.ini.
#username:YourName
#password:yourpassword
[www.tthfanfic.org]
user_agent:
slow_down_sleep_time:2
@ -2390,6 +2482,19 @@ extracategories:Twilight
## twilighted.net (ab)uses series as personal reading lists.
collect_series: false
[www.valentchamber.com]
## Some sites do not require a login, but do require the user to
## confirm they are adult for adult content. In commandline version,
## this should go in your personal.ini, not defaults.ini.
#is_adult:true
## Some sites require login (or login for some rated stories) The
## program can prompt you, or you can save it in config. In
## commandline version, this should go in your personal.ini, not
## defaults.ini.
#username:YourName
#password:yourpassword
[www.walkingtheplank.org]
extra_valid_entries:reads
reads_label:Read Count