mirror of
https://github.com/JimmXinu/FanFicFare.git
synced 2026-04-30 02:41:15 +02:00
New sites courtesy of GComyn: writing.whimsicalwanderings.net www.giantessworld.net www.looselugs.com www.tomparisdorm.com www.valentchamber.com www.gluttonyfiction.com www.lotrgfic.com
This commit is contained in:
parent
4981af2447
commit
3d63135217
10 changed files with 1206 additions and 0 deletions
|
|
@ -1351,6 +1351,19 @@ cover_exclusion_regexp:/css/bir.png
|
|||
[forums.sufficientvelocity.com]
|
||||
## see [base_xenforoforum]
|
||||
|
||||
[gluttonyfiction.com]
|
||||
## Some sites do not require a login, but do require the user to
|
||||
## confirm they are adult for adult content. In commandline version,
|
||||
## this should go in your personal.ini, not defaults.ini.
|
||||
#is_adult:true
|
||||
|
||||
## Some sites require login (or login for some rated stories) The
|
||||
## program can prompt you, or you can save it in config. In
|
||||
## commandline version, this should go in your personal.ini, not
|
||||
## defaults.ini.
|
||||
#username:YourName
|
||||
#password:yourpassword
|
||||
|
||||
[harem.lucifael.com]
|
||||
## Some sites do not require a login, but do require the user to
|
||||
## confirm they are adult for adult content. In commandline version,
|
||||
|
|
@ -1811,6 +1824,22 @@ extra_valid_entries:reviews,readings
|
|||
reviews_label:Reviews
|
||||
readings_label:Readings
|
||||
|
||||
[writing.whimsicalwanderings.net]
|
||||
## Some sites do not require a login, but do require the user to
|
||||
## confirm they are adult for adult content. In commandline version,
|
||||
## this should go in your personal.ini, not defaults.ini.
|
||||
#is_adult:true
|
||||
|
||||
## Some sites require login (or login for some rated stories) The
|
||||
## program can prompt you, or you can save it in config. In
|
||||
## commandline version, this should go in your personal.ini, not
|
||||
## defaults.ini.
|
||||
#username:YourName
|
||||
#password:yourpassword
|
||||
extra_valid_entries:storynotes
|
||||
storynotes_label: Story Notes
|
||||
add_to_titlepage_entries:,storynotes
|
||||
|
||||
[www.adastrafanfic.com]
|
||||
## Some sites do not require a login, but do require the user to
|
||||
## confirm they are adult for adult content. In commandline version,
|
||||
|
|
@ -2072,6 +2101,25 @@ keep_in_order_groupsUrl:true
|
|||
## make entryHTML.
|
||||
make_linkhtml_entries:prequel,sequels,groups,coverSource
|
||||
|
||||
[www.giantessworld.net]
|
||||
extra_valid_entries:growth, shrink, sizeroles
|
||||
growth_label: Growth
|
||||
shrink_label:Shrink
|
||||
sizeroles_label:Size Roles
|
||||
add_to_titlepage_entries:,growth, shrink, sizeroles
|
||||
|
||||
## Some sites do not require a login, but do require the user to
|
||||
## confirm they are adult for adult content. In commandline version,
|
||||
## this should go in your personal.ini, not defaults.ini.
|
||||
#is_adult:true
|
||||
|
||||
## Some sites require login (or login for some rated stories) The
|
||||
## program can prompt you, or you can save it in config. In
|
||||
## commandline version, this should go in your personal.ini, not
|
||||
## defaults.ini.
|
||||
#username:YourName
|
||||
#password:yourpassword
|
||||
|
||||
[www.harrypotterfanfiction.com]
|
||||
## Some sites do not require a login, but do require the user to
|
||||
## confirm they are adult for adult content. In commandline version,
|
||||
|
|
@ -2124,6 +2172,36 @@ extraships:InuYasha/Kagome
|
|||
## Site dedicated to these categories/characters/ships
|
||||
extracategories:Lord of the Rings
|
||||
|
||||
[www.lotrgfic.com]
|
||||
extra_valid_entries:places, times
|
||||
places_label: Places
|
||||
times_label:Times
|
||||
add_to_titlepage_entries:,places, times
|
||||
## Some sites do not require a login, but do require the user to
|
||||
## confirm they are adult for adult content. In commandline version,
|
||||
## this should go in your personal.ini, not defaults.ini.
|
||||
#is_adult:true
|
||||
|
||||
## Some sites require login (or login for some rated stories) The
|
||||
## program can prompt you, or you can save it in config. In
|
||||
## commandline version, this should go in your personal.ini, not
|
||||
## defaults.ini.
|
||||
#username:YourName
|
||||
#password:yourpassword
|
||||
|
||||
[www.looselugs.com]
|
||||
## Some sites do not require a login, but do require the user to
|
||||
## confirm they are adult for adult content. In commandline version,
|
||||
## this should go in your personal.ini, not defaults.ini.
|
||||
#is_adult:true
|
||||
|
||||
## Some sites require login (or login for some rated stories) The
|
||||
## program can prompt you, or you can save it in config. In
|
||||
## commandline version, this should go in your personal.ini, not
|
||||
## defaults.ini.
|
||||
#username:YourName
|
||||
#password:yourpassword
|
||||
|
||||
[www.masseffect2.in]
|
||||
## Site dedicated to this fandom.
|
||||
extracategories: Mass Effect
|
||||
|
|
@ -2306,6 +2384,20 @@ extracategories:Lord of the Rings
|
|||
#username:YourName
|
||||
#password:yourpassword
|
||||
|
||||
[www.tomparisdorm.com]
|
||||
extracategories:Star Trek: Voyager
|
||||
## Some sites do not require a login, but do require the user to
|
||||
## confirm they are adult for adult content. In commandline version,
|
||||
## this should go in your personal.ini, not defaults.ini.
|
||||
#is_adult:true
|
||||
|
||||
## Some sites require login (or login for some rated stories) The
|
||||
## program can prompt you, or you can save it in config. In
|
||||
## commandline version, this should go in your personal.ini, not
|
||||
## defaults.ini.
|
||||
#username:YourName
|
||||
#password:yourpassword
|
||||
|
||||
[www.tthfanfic.org]
|
||||
user_agent:
|
||||
slow_down_sleep_time:2
|
||||
|
|
@ -2377,6 +2469,19 @@ extracategories:Twilight
|
|||
## twilighted.net (ab)uses series as personal reading lists.
|
||||
collect_series: false
|
||||
|
||||
[www.valentchamber.com]
|
||||
## Some sites do not require a login, but do require the user to
|
||||
## confirm they are adult for adult content. In commandline version,
|
||||
## this should go in your personal.ini, not defaults.ini.
|
||||
#is_adult:true
|
||||
|
||||
## Some sites require login (or login for some rated stories) The
|
||||
## program can prompt you, or you can save it in config. In
|
||||
## commandline version, this should go in your personal.ini, not
|
||||
## defaults.ini.
|
||||
#username:YourName
|
||||
#password:yourpassword
|
||||
|
||||
[www.walkingtheplank.org]
|
||||
extra_valid_entries:reads
|
||||
reads_label:Read Count
|
||||
|
|
|
|||
|
|
@ -148,6 +148,13 @@ import adapter_chosentwofanficcom
|
|||
import adapter_bdsmlibrarycom
|
||||
import adapter_ficsitecom
|
||||
import adapter_asexstoriescom
|
||||
import adapter_gluttonyfictioncom
|
||||
import adapter_valentchambercom
|
||||
import adapter_looselugscom
|
||||
import adapter_wwwgiantessworldnet
|
||||
import adapter_lotrgficcom
|
||||
import adapter_tomparisdormcom
|
||||
import adapter_writingwhimsicalwanderingsnet
|
||||
|
||||
## This bit of complexity allows adapters to be added by just adding
|
||||
## importing. It eliminates the long if/else clauses we used to need
|
||||
|
|
|
|||
49
fanficfare/adapters/adapter_gluttonyfictioncom.py
Normal file
49
fanficfare/adapters/adapter_gluttonyfictioncom.py
Normal file
|
|
@ -0,0 +1,49 @@
|
|||
# -*- coding: utf-8 -*-
|
||||
|
||||
# Copyright 2015 FanFicFare team
|
||||
# Copyright 2016 FanFicFare team
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
#
|
||||
|
||||
# Software: eFiction
|
||||
##################################################################################
|
||||
### Rewritten by: GComyn on November, 06, 2016
|
||||
### Original was adapter_fannation.py
|
||||
##################################################################################
|
||||
from base_efiction_adapter import BaseEfictionAdapter
|
||||
|
||||
class GluttonyFictionComAdapter(BaseEfictionAdapter):
|
||||
|
||||
@staticmethod
|
||||
def getSiteDomain():
|
||||
return 'gluttonyfiction.com'
|
||||
|
||||
@classmethod
|
||||
def getSiteAbbrev(self):
|
||||
return 'gfcom'
|
||||
|
||||
@classmethod
|
||||
def getDateFormat(self):
|
||||
# The date format will vary from site to site.
|
||||
# http://docs.python.org/library/datetime.html#strftime-strptime-behavior
|
||||
return "%d/%m/%y"
|
||||
|
||||
##################################################################################
|
||||
### The Efiction Base Adapter uses the Bulk story to retrieve the metadata, but
|
||||
### on this site, the Rating is not present in the Bulk page...
|
||||
### so it is not retrieved.
|
||||
##################################################################################
|
||||
|
||||
def getClass():
|
||||
return GluttonyFictionComAdapter
|
||||
49
fanficfare/adapters/adapter_looselugscom.py
Normal file
49
fanficfare/adapters/adapter_looselugscom.py
Normal file
|
|
@ -0,0 +1,49 @@
|
|||
# -*- coding: utf-8 -*-
|
||||
|
||||
# Copyright 2015 FanFicFare team
|
||||
# Copyright 2016 FanFicFare team
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
#
|
||||
|
||||
# Software: eFiction
|
||||
##################################################################################
|
||||
### Rewritten by: GComyn on November, 06, 2016
|
||||
### Original was adapter_fannation.py
|
||||
##################################################################################
|
||||
from base_efiction_adapter import BaseEfictionAdapter
|
||||
|
||||
class LooseLugsComAdapter(BaseEfictionAdapter):
|
||||
|
||||
@staticmethod
|
||||
def getSiteDomain():
|
||||
return 'www.looselugs.com'
|
||||
|
||||
@classmethod
|
||||
def getSiteAbbrev(self):
|
||||
return 'looselugs'
|
||||
|
||||
@classmethod
|
||||
def getDateFormat(self):
|
||||
# The date format will vary from site to site.
|
||||
# http://docs.python.org/library/datetime.html#strftime-strptime-behavior
|
||||
return "%B %d, %Y"
|
||||
|
||||
##################################################################################
|
||||
### The Efiction Base Adapter uses the Bulk story to retrieve the metadata, but
|
||||
### on this site, the Rating is not present in the Bulk page...
|
||||
### so it is not retrieved.
|
||||
##################################################################################
|
||||
|
||||
def getClass():
|
||||
return LooseLugsComAdapter
|
||||
371
fanficfare/adapters/adapter_lotrgficcom.py
Normal file
371
fanficfare/adapters/adapter_lotrgficcom.py
Normal file
|
|
@ -0,0 +1,371 @@
|
|||
# -*- coding: utf-8 -*-
|
||||
|
||||
# Copyright 2011 Fanficdownloader team, 2015 FanFicFare team
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
#
|
||||
##############################################################################
|
||||
### Adapted by GComyn
|
||||
### Completed on November, 22, 2016
|
||||
##############################################################################
|
||||
import time
|
||||
import logging
|
||||
logger = logging.getLogger(__name__)
|
||||
import re
|
||||
import urllib
|
||||
import urllib2
|
||||
|
||||
from ..htmlcleanup import stripHTML
|
||||
from .. import exceptions as exceptions
|
||||
|
||||
from base_adapter import BaseSiteAdapter, makeDate
|
||||
|
||||
class LOTRgficComAdapter(BaseSiteAdapter):
|
||||
|
||||
def __init__(self, config, url):
|
||||
BaseSiteAdapter.__init__(self, config, url)
|
||||
|
||||
self.story.setMetadata('siteabbrev','lotrgfic')
|
||||
|
||||
self.decode = ["utf8",
|
||||
"Windows-1252",
|
||||
"iso-8859-1"] # 1252 is a superset of iso-8859-1.
|
||||
# Most sites that claim to be
|
||||
# iso-8859-1 (and some that claim to be
|
||||
# utf8) are really windows-1252.
|
||||
|
||||
self.username = "NoneGiven" # if left empty, site doesn't return any message at all.
|
||||
self.password = ""
|
||||
self.is_adult=False
|
||||
|
||||
# get storyId from url--url validation guarantees query is only sid=1234
|
||||
self.story.setMetadata('storyId',self.parsedUrl.query.split('=',)[1])
|
||||
|
||||
|
||||
# normalized story URL.
|
||||
self._setURL('http://' + self.getSiteDomain() + '/viewstory.php?sid='+self.story.getMetadata('storyId'))
|
||||
|
||||
|
||||
@staticmethod
|
||||
def getSiteDomain():
|
||||
return 'www.lotrgfic.com'
|
||||
|
||||
@classmethod
|
||||
def getSiteExampleURLs(cls):
|
||||
return "http://"+cls.getSiteDomain()+"/viewstory.php?sid=1234"
|
||||
|
||||
def getSiteURLPattern(self):
|
||||
return re.escape("http://"+self.getSiteDomain()+"/viewstory.php?sid=")+r"\d+$"
|
||||
|
||||
def use_pagecache(self):
|
||||
'''
|
||||
adapters that will work with the page cache need to implement
|
||||
this and change it to True.
|
||||
'''
|
||||
return True
|
||||
|
||||
def extractChapterUrlsAndMetadata(self):
|
||||
|
||||
if self.is_adult or self.getConfig("is_adult"):
|
||||
addurl = "&warning=3"
|
||||
else:
|
||||
addurl=""
|
||||
|
||||
url = self.url+'&index=1'+addurl
|
||||
logger.debug("URL: "+url)
|
||||
|
||||
try:
|
||||
data = self._fetchUrl(url)
|
||||
except urllib2.HTTPError, e:
|
||||
if e.code == 404:
|
||||
raise exceptions.StoryDoesNotExist(self.url)
|
||||
else:
|
||||
raise e
|
||||
|
||||
if "Content is only suitable for mature adults. May contain explicit language and adult themes. Equivalent of NC-17." in data:
|
||||
raise exceptions.AdultCheckRequired(self.url)
|
||||
elif "Access denied. This story has not been validated by the adminstrators of this site." in data:
|
||||
raise exceptions.AccessDenied(self.getSiteDomain() +" says: Access denied. This story has not been validated by the adminstrators of this site.")
|
||||
|
||||
# use BeautifulSoup HTML parser to make everything easier to find.
|
||||
soup = self.make_soup(data)
|
||||
|
||||
### Main Content for the Table Of Contents page.
|
||||
div = soup.find('div',{'id':'maincontent'})
|
||||
|
||||
divfooter = div.find('div',{'id':'footer'})
|
||||
if divfooter != None:
|
||||
divfooter.extract()
|
||||
|
||||
## Title
|
||||
a = div.find('a', href=re.compile(r'viewstory.php\?sid='+self.story.getMetadata('storyId')+"$"))
|
||||
self.story.setMetadata('title',stripHTML(a))
|
||||
|
||||
# Find authorid and URL from... author url.
|
||||
a = div.find('a', href=re.compile(r"viewuser.php"))
|
||||
self.story.setMetadata('authorId',a['href'].split('=')[1])
|
||||
self.story.setMetadata('authorUrl','http://'+self.host+'/'+a['href'])
|
||||
self.story.setMetadata('author',a.string)
|
||||
|
||||
# Find the chapters:
|
||||
for chapter in div.findAll('a', href=re.compile(r'viewstory.php\?sid='+self.story.getMetadata('storyId')+"&chapter=\d+$")):
|
||||
# just in case there's tags, like <i> in chapter titles.
|
||||
self.chapterUrls.append((stripHTML(chapter),'http://'+self.host+'/'+chapter['href']+addurl))
|
||||
|
||||
self.story.setMetadata('numChapters',len(self.chapterUrls))
|
||||
|
||||
### Metadata is contained
|
||||
|
||||
def defaultGetattr(d,k):
|
||||
try:
|
||||
return d[k]
|
||||
except:
|
||||
return ""
|
||||
|
||||
# <span class="label">Rated:</span> NC-17<br /> etc
|
||||
### This site has the metadata formatted all over the place,
|
||||
### so we have to do some very cludgy programming to get it.
|
||||
### If someone can do it better, please do so, and let us know.
|
||||
## I'm going to leave this section in, so we can get those
|
||||
## elements that are "formatted correctly".
|
||||
labels = soup.findAll('span',{'class':'label'})
|
||||
for labelspan in labels:
|
||||
value = labelspan.nextSibling
|
||||
label = labelspan.string
|
||||
|
||||
if 'Summary' in label:
|
||||
## the summary is not encased in a span label... so we can't do anything here.
|
||||
## I'm going to leave it here just in case.
|
||||
## Everything until the next span class='label'
|
||||
svalue = ''
|
||||
while value and 'label' not in defaultGetattr(value,'class'):
|
||||
svalue += unicode(value)
|
||||
value = value.nextSibling
|
||||
# sometimes poorly formated desc (<p> w/o </p>) leads
|
||||
# to all labels being included.
|
||||
svalue=svalue[:svalue.find('<span class="label">')]
|
||||
self.setDescription(url,svalue)
|
||||
|
||||
if 'Rated' in label:
|
||||
self.story.setMetadata('rating', value)
|
||||
|
||||
if 'Word count' in label:
|
||||
self.story.setMetadata('numWords', value)
|
||||
|
||||
if 'Categories' in label:
|
||||
cats = labelspan.parent.findAll('a',href=re.compile(r'browse.php\?type=categories'))
|
||||
catstext = [cat.string for cat in cats]
|
||||
for cat in catstext:
|
||||
self.story.addToList('category',cat.string)
|
||||
|
||||
if 'Characters' in label:
|
||||
chars = labelspan.parent.findAll('a',href=re.compile(r'browse.php\?type=characters'))
|
||||
charstext = [char.string for char in chars]
|
||||
for char in charstext:
|
||||
self.story.addToList('characters',char.string)
|
||||
|
||||
if 'Genre' in label:
|
||||
genres = labelspan.parent.findAll('a',href=re.compile(r'browse.php\?type=class&type_id=1'))
|
||||
genrestext = [genre.string for genre in genres]
|
||||
self.genre = ', '.join(genrestext)
|
||||
for genre in genrestext:
|
||||
self.story.addToList('genre',genre.string)
|
||||
|
||||
if 'Warnings' in label:
|
||||
warnings = labelspan.parent.findAll('a',href=re.compile(r'browse.php\?type=class&type_id=4'))
|
||||
warningstext = [warning.string for warning in warnings]
|
||||
self.warning = ', '.join(warningstext)
|
||||
for warning in warningstext:
|
||||
self.story.addToList('warnings',warning.string)
|
||||
|
||||
if 'Places' in label:
|
||||
places = labelspan.parent.findAll('a',href=re.compile(r'browse.php\?type=class&type_id=2'))
|
||||
placestext = [place.string for place in places]
|
||||
self.warning = ', '.join(placestext)
|
||||
for place in placestext:
|
||||
self.story.addToList('places',place.string)
|
||||
|
||||
if 'Times' in label:
|
||||
times = labelspan.parent.findAll('a',href=re.compile(r'browse.php\?type=class&type_id=3'))
|
||||
timestext = [time.string for time in times]
|
||||
self.warning = ', '.join(timestext)
|
||||
for time in timestext:
|
||||
self.story.addToList('times',time.string)
|
||||
|
||||
if 'Completed' in label:
|
||||
if 'Yes' in value:
|
||||
self.story.setMetadata('status', 'Completed')
|
||||
else:
|
||||
self.story.setMetadata('status', 'In-Progress')
|
||||
|
||||
if 'Published' in label:
|
||||
self.story.setMetadata('datePublished', makeDate(value.strip(), "%d %b %Y"))
|
||||
|
||||
if 'Updated' in label:
|
||||
# there's a stray [ at the end.
|
||||
#value = value[0:-1]
|
||||
self.story.setMetadata('dateUpdated', makeDate(value.strip(), "%d %b %Y"))
|
||||
|
||||
try:
|
||||
# Find Series name from series URL.
|
||||
a = soup.find('a', href=re.compile(r"viewseries.php\?seriesid=\d+"))
|
||||
series_name = a.string
|
||||
series_url = 'http://'+self.host+'/'+a['href']
|
||||
|
||||
# use BeautifulSoup HTML parser to make everything easier to find.
|
||||
seriessoup = self_make_soup(self._fetchUrl(series_url))
|
||||
storyas = seriessoup.findAll('a', href=re.compile(r'^viewstory.php\?sid=\d+$'))
|
||||
i=1
|
||||
for a in storyas:
|
||||
if a['href'] == ('viewstory.php?sid='+self.story.getMetadata('storyId')):
|
||||
self.setSeries(series_name, i)
|
||||
self.story.setMetadata('seriesUrl',series_url)
|
||||
break
|
||||
i+=1
|
||||
|
||||
except:
|
||||
# I find it hard to care if the series parsing fails
|
||||
pass
|
||||
|
||||
## Now we are going to cludge together the rest of the metadata
|
||||
metad = soup.findAll('p',{'class':'smaller'})
|
||||
## Categories don't have a proper label, but do use links, so...
|
||||
cats = soup.findAll('a',href=re.compile(r'browse.php\?type=categories'))
|
||||
catstext = [cat.string for cat in cats]
|
||||
for cat in catstext:
|
||||
if cat != None:
|
||||
self.story.addToList('category',cat.string)
|
||||
|
||||
## Characters don't have a proper label, but do use links, so...
|
||||
chars = soup.findAll('a',href=re.compile(r'browse.php\?type=characters'))
|
||||
charstext = [char.string for char in chars]
|
||||
for char in charstext:
|
||||
if char != None:
|
||||
self.story.addToList('characters',char.string)
|
||||
|
||||
### Rating is not enclosed in a label, only in a p tag classed 'smaller' so...
|
||||
ratng = metad[0].find('strong').get_text().replace('Rated','').strip()
|
||||
self.story.setMetadata('rating', ratng)
|
||||
|
||||
## No we try to get the summary... it's not within it's own
|
||||
## dedicated tag, so we have to split some hairs..
|
||||
## This may not work every time... but I tested it with 6 stories...
|
||||
mdata = metad[0]
|
||||
while '<hr/>' not in str(mdata.nextSibling):
|
||||
mdata = mdata.nextSibling
|
||||
self.setDescription(url,mdata.previousSibling.previousSibling.get_text())
|
||||
|
||||
### the rest of the metadata are not in tags at all... so we have to be really cludgy.
|
||||
## we don't need the rest of them, so we get rid of all but the last one
|
||||
metad = metad[-1]
|
||||
## we also don't need any of the links in here, so we'll get rid of them as well.
|
||||
links = metad.findAll('a')
|
||||
for link in links:
|
||||
link.extract()
|
||||
## and we've already done the labels, so let's remove them
|
||||
labels = metad.findAll('span',{'class':'label'})
|
||||
for label in labels:
|
||||
label.extract()
|
||||
## now we should only have text and <br>'s... somthing like this:
|
||||
#<p class="smaller">Categories:
|
||||
#<br/>
|
||||
#Characters: , , ,
|
||||
#<br/>
|
||||
# , <br/> <br/> <br/> None<br/>
|
||||
#Challenges: None
|
||||
#<br/>
|
||||
#Series: None
|
||||
#<br/>
|
||||
#Chapters: 1    |    Word count: 200    |    Read Count: 767
|
||||
#<br/>
|
||||
#Completed: Yes    |    Updated: 04/27/13    |    Published: 04/27/13
|
||||
#<br/>
|
||||
#</p>
|
||||
## we'll have to remove the non-breaking spaces to get this to work.
|
||||
metad = str(metad).replace(b"\xc2\xa0",'').replace('\n','')
|
||||
for txt in metad.split('<br/>'):
|
||||
if 'Challenges:' in txt:
|
||||
txt = txt.replace('Challenges:','').strip()
|
||||
self.story.setMetadata('challenges', txt)
|
||||
elif 'Series:' in txt:
|
||||
txt = txt.replace('Series:','').strip()
|
||||
self.story.setMetadata('challenges', txt)
|
||||
elif 'Chapters:' in txt:
|
||||
for txt2 in txt.split('|'):
|
||||
txt2 = txt2.replace('\n','').strip()
|
||||
if 'Word count:' in txt2:
|
||||
txt2 = txt2.replace('Word count:','').strip()
|
||||
self.story.setMetadata('numWords', value)
|
||||
elif 'Read Count:' in txt2:
|
||||
txt2= txt2.replace('Read Count:','').strip()
|
||||
self.story.setMetadata('readings', value)
|
||||
elif 'Completed:' in txt:
|
||||
for txt2 in txt.split('|'):
|
||||
txt2 = txt2.strip()
|
||||
if 'Completed:' in txt2:
|
||||
if 'Yes' in txt2:
|
||||
self.story.setMetadata('status', 'Completed')
|
||||
else:
|
||||
self.story.setMetadata('status', 'In-Progress')
|
||||
elif 'Updated:' in txt2:
|
||||
txt2= txt2.replace('Updated:','').strip()
|
||||
self.story.setMetadata('dateUpdated', makeDate(txt2.strip(), "%b/%d/%y"))
|
||||
elif 'Published:' in txt2:
|
||||
txt2= txt2.replace('Published:','').strip()
|
||||
self.story.setMetadata('datePublished', makeDate(txt2.strip(), "%b/%d/%y"))
|
||||
|
||||
|
||||
def getChapterText(self, url):
|
||||
|
||||
logger.debug('Getting chapter text from: %s' % url)
|
||||
|
||||
data = self._fetchUrl(url)
|
||||
# problems with some stories, but only in calibre. I suspect
|
||||
# issues with different SGML parsers in python. This is a
|
||||
# nasty hack, but it works.
|
||||
data = data[data.index("<body"):]
|
||||
|
||||
soup = self.make_soup(data)
|
||||
|
||||
span = soup.find('div', {'id' : 'maincontent'})
|
||||
|
||||
# Everything is encased in the maincontent section, so we have
|
||||
# to remove as much as we can systematically
|
||||
tables = span.findAll('table')
|
||||
for table in tables:
|
||||
table.extract()
|
||||
|
||||
headings = span.findAll('h3')
|
||||
for heading in headings:
|
||||
heading.extract()
|
||||
|
||||
links = span.findAll('a')
|
||||
for link in links:
|
||||
link.extract()
|
||||
|
||||
forms = span.findAll('form')
|
||||
for form in forms:
|
||||
form.extract()
|
||||
|
||||
divs = span.findAll('div')
|
||||
for div in divs:
|
||||
div.extract()
|
||||
|
||||
if None == span:
|
||||
raise exceptions.FailedToDownload("Error downloading Chapter: %s! Missing required element!" % url)
|
||||
|
||||
return self.utf8FromSoup(url,span)
|
||||
|
||||
def getClass():
|
||||
return LOTRgficComAdapter
|
||||
|
||||
206
fanficfare/adapters/adapter_tomparisdormcom.py
Normal file
206
fanficfare/adapters/adapter_tomparisdormcom.py
Normal file
|
|
@ -0,0 +1,206 @@
|
|||
# -*- coding: utf-8 -*-
|
||||
|
||||
# Copyright 2012 Fanficdownloader team, 2015 FanFicFare team
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
#
|
||||
|
||||
# Software: eFiction
|
||||
import time
|
||||
import logging
|
||||
logger = logging.getLogger(__name__)
|
||||
import re
|
||||
import urllib2
|
||||
import sys
|
||||
|
||||
from bs4.element import Comment
|
||||
from ..htmlcleanup import stripHTML
|
||||
from .. import exceptions as exceptions
|
||||
|
||||
from base_adapter import BaseSiteAdapter, makeDate
|
||||
|
||||
def getClass():
|
||||
return TomParisDormComAdapter
|
||||
|
||||
# Class name has to be unique. Our convention is camel case the
|
||||
# sitename with Adapter at the end. www is skipped.
|
||||
class TomParisDormComAdapter(BaseSiteAdapter):
|
||||
|
||||
def __init__(self, config, url):
|
||||
BaseSiteAdapter.__init__(self, config, url)
|
||||
|
||||
self.decode = ["utf8",
|
||||
"Windows-1252",
|
||||
"iso-8859-1"] # 1252 is a superset of iso-8859-1.
|
||||
# Most sites that claim to be
|
||||
# iso-8859-1 (and some that claim to be
|
||||
# utf8) are really windows-1252.
|
||||
self.username = "NoneGiven" # if left empty, site doesn't return any message at all.
|
||||
self.password = ""
|
||||
self.is_adult=False
|
||||
|
||||
# get storyId from url--url validation guarantees query is only sid=1234
|
||||
self.story.setMetadata('storyId',self.parsedUrl.query.split('=',)[1])
|
||||
|
||||
|
||||
# normalized story URL.
|
||||
self._setURL('http://' + self.getSiteDomain() + '/viewstory.php?sid='+self.story.getMetadata('storyId'))
|
||||
|
||||
# Each adapter needs to have a unique site abbreviation.
|
||||
self.story.setMetadata('siteabbrev','tpdorm')
|
||||
|
||||
# The date format will vary from site to site.
|
||||
# http://docs.python.org/library/datetime.html#strftime-strptime-behavior
|
||||
self.dateformat = "%d/%m/%y"
|
||||
|
||||
@staticmethod # must be @staticmethod, don't remove it.
|
||||
def getSiteDomain():
|
||||
# The site domain. Does have www here, if it uses it.
|
||||
return 'www.tomparisdorm.com'
|
||||
|
||||
@classmethod
|
||||
def getSiteExampleURLs(cls):
|
||||
return "http://"+cls.getSiteDomain()+"/viewstory.php?sid=1234"
|
||||
|
||||
def getSiteURLPattern(self):
|
||||
return re.escape("http://"+self.getSiteDomain()+"/viewstory.php?sid=")+r"\d+$"
|
||||
|
||||
## Getting the chapter list and the meta data, plus 'is adult' checking.
|
||||
def extractChapterUrlsAndMetadata(self):
|
||||
|
||||
# index=1 makes sure we see the story chapter index. Some
|
||||
# sites skip that for one-chapter stories.
|
||||
url = self.url
|
||||
logger.debug("URL: "+url)
|
||||
|
||||
try:
|
||||
data = self._fetchUrl(url)
|
||||
except urllib2.HTTPError, e:
|
||||
if e.code == 404:
|
||||
raise exceptions.StoryDoesNotExist(url)
|
||||
else:
|
||||
raise e
|
||||
|
||||
if "Access denied. This story has not been validated by the adminstrators of this site." in data:
|
||||
raise exceptions.AccessDenied(self.getSiteDomain() +" says: Access denied. This story has not been validated by the adminstrators of this site.")
|
||||
|
||||
# use BeautifulSoup HTML parser to make everything easier to find.
|
||||
soup = self.make_soup(data)
|
||||
# print data
|
||||
|
||||
# Now go hunting for all the meta data and the chapter list.
|
||||
|
||||
## Title
|
||||
a = soup.find('a', href=re.compile(r'viewstory.php\?sid='+self.story.getMetadata('storyId')+"$"))
|
||||
self.story.setMetadata('title',stripHTML(a))
|
||||
|
||||
# Find authorid and URL from... author url.
|
||||
a = soup.find('a', href=re.compile(r"viewuser.php\?uid=\d+"))
|
||||
self.story.setMetadata('authorId',a['href'].split('=')[1])
|
||||
self.story.setMetadata('authorUrl','http://'+self.host+a['href'])
|
||||
self.story.setMetadata('author',a.string)
|
||||
|
||||
# Find the chapters:
|
||||
for chapter in soup.findAll('a', href=re.compile(r'viewstory.php\?sid='+self.story.getMetadata('storyId')+"&chapter=\d+$")):
|
||||
# just in case there's tags, like <i> in chapter titles.
|
||||
self.chapterUrls.append((stripHTML(chapter),'http://'+self.host+'/'+chapter['href']))
|
||||
|
||||
self.story.setMetadata('numChapters',len(self.chapterUrls))
|
||||
|
||||
# eFiction sites don't help us out a lot with their meta data
|
||||
# formating, so it's a little ugly.
|
||||
|
||||
## Getting the Summary
|
||||
value = stripHTML(soup.find('div',{'class' : 'summary'})).strip('Summary')
|
||||
self.setDescription(url,value)
|
||||
|
||||
# Get the rest of the Metadata
|
||||
mdsoup = soup.find('div',{'id' : 'output'})
|
||||
|
||||
mdstr = str(mdsoup).replace('\n','').replace('\r','').replace('\t',' ').replace(' ',' ').replace(' ',' ').replace(' ',' ')
|
||||
mdstr = stripHTML(mdstr.replace(r'<br/>',r'-:-').replace('|','-:-'))
|
||||
mdstr = mdstr.replace(r'[Rev',r'-:-[Rev').replace(' -:- ','-:-').strip('-:-').strip('-:-')
|
||||
|
||||
##I am using this method, because the summary does not have a 'label', and the Metadata is always in this order (as far as I've tested)
|
||||
for i, value in enumerate(mdstr.split('-:-')):
|
||||
if 'Published:' in value:
|
||||
val=value.split(':')[1].strip()
|
||||
self.story.setMetadata('datePublished', makeDate(val, self.dateformat))
|
||||
|
||||
elif 'Last Updated:' in value:
|
||||
val=value.split(':')[1].strip()
|
||||
self.story.setMetadata('dateUpdated', makeDate(val, self.dateformat))
|
||||
|
||||
elif 'Rating:' in value:
|
||||
val=value.split(':')[1].strip()
|
||||
self.story.setMetadata('rating', stripHTML(val))
|
||||
|
||||
elif 'Category:' in value:
|
||||
cats = mdsoup.findAll('a',href=re.compile(r'browse.php\?type=categories'))
|
||||
for cat in cats:
|
||||
self.story.addToList('category',cat.string)
|
||||
|
||||
elif 'Characters:' in value:
|
||||
chars = mdsoup.findAll('a',href=re.compile(r'browse.php\?type=characters'))
|
||||
for char in chars:
|
||||
self.story.addToList('characters',char.string)
|
||||
|
||||
elif 'Genres:' in value:
|
||||
genres = mdsoup.findAll('a',href=re.compile(r'browse.php\?type=class&type_id=1')) # XXX
|
||||
for genre in genres:
|
||||
self.story.addToList('genre',genre.string)
|
||||
|
||||
elif 'Warnings:' in value:
|
||||
warnings = mdsoup.findAll('a',href=re.compile(r'browse.php\?type=class&type_id=2')) # XXX
|
||||
for warning in warnings:
|
||||
self.story.addToList('warnings',warning.string)
|
||||
|
||||
elif 'Challenge:' in value:
|
||||
val=value.split(':')[1].strip()
|
||||
self.story.setMetadata('challenge', stripHTML(val))
|
||||
|
||||
elif 'Series:' in value:
|
||||
val=value.split(':')[1].strip()
|
||||
self.story.setMetadata('series', stripHTML(val))
|
||||
|
||||
elif 'Chapters:' in value:
|
||||
val=value.split(':')[1].strip()
|
||||
self.story.setMetadata('numChapters', int(val))
|
||||
|
||||
elif 'Completed:' in value:
|
||||
if 'Yes' in value:
|
||||
self.story.setMetadata('status', 'Completed')
|
||||
else:
|
||||
self.story.setMetadata('status', 'In-Progress')
|
||||
|
||||
elif 'Word Count:' in value:
|
||||
val=value.split(':')[1].strip()
|
||||
self.story.setMetadata('numWords', int(stripHTML(val)))
|
||||
|
||||
elif 'Read:' in value:
|
||||
val=value.split(':')[1].strip()
|
||||
self.story.setMetadata('readings', stripHTML(val))
|
||||
|
||||
# grab the text for an individual chapter.
|
||||
def getChapterText(self, url):
|
||||
|
||||
logger.debug('Getting chapter text from: %s' % url)
|
||||
|
||||
soup = self.make_soup(self._fetchUrl(url))
|
||||
|
||||
div = soup.find('div', {'id' : 'story'})
|
||||
|
||||
if None == div:
|
||||
raise exceptions.FailedToDownload("Error downloading Chapter: %s! Missing required element!" % url)
|
||||
|
||||
return self.utf8FromSoup(url,div)
|
||||
49
fanficfare/adapters/adapter_valentchambercom.py
Normal file
49
fanficfare/adapters/adapter_valentchambercom.py
Normal file
|
|
@ -0,0 +1,49 @@
|
|||
# -*- coding: utf-8 -*-
|
||||
|
||||
# Copyright 2015 FanFicFare team
|
||||
# Copyright 2016 FanFicFare team
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
#
|
||||
|
||||
# Software: eFiction
|
||||
##################################################################################
|
||||
### Rewritten by: GComyn on November, 06, 2016
|
||||
### Original was adapter_fannation.py
|
||||
##################################################################################
|
||||
from base_efiction_adapter import BaseEfictionAdapter
|
||||
|
||||
class ValentChamberComAdapter(BaseEfictionAdapter):
|
||||
|
||||
@staticmethod
|
||||
def getSiteDomain():
|
||||
return 'www.valentchamber.com'
|
||||
|
||||
@classmethod
|
||||
def getSiteAbbrev(self):
|
||||
return 'vccom'
|
||||
|
||||
@classmethod
|
||||
def getDateFormat(self):
|
||||
# The date format will vary from site to site.
|
||||
# http://docs.python.org/library/datetime.html#strftime-strptime-behavior
|
||||
return "%B %d %Y"
|
||||
|
||||
##################################################################################
|
||||
### The Efiction Base Adapter uses the Bulk story to retrieve the metadata, but
|
||||
### on this site, the Rating is not present in the Bulk page...
|
||||
### so it is not retrieved.
|
||||
##################################################################################
|
||||
|
||||
def getClass():
|
||||
return ValentChamberComAdapter
|
||||
227
fanficfare/adapters/adapter_writingwhimsicalwanderingsnet.py
Normal file
227
fanficfare/adapters/adapter_writingwhimsicalwanderingsnet.py
Normal file
|
|
@ -0,0 +1,227 @@
|
|||
# -*- coding: utf-8 -*-
|
||||
|
||||
# Copyright 2012 Fanficdownloader team, 2015 FanFicFare team
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
#
|
||||
|
||||
# Software: eFiction
|
||||
import time
|
||||
import logging
|
||||
logger = logging.getLogger(__name__)
|
||||
import re
|
||||
import urllib2
|
||||
import sys
|
||||
|
||||
from bs4.element import Comment
|
||||
from ..htmlcleanup import stripHTML
|
||||
from .. import exceptions as exceptions
|
||||
|
||||
from base_adapter import BaseSiteAdapter, makeDate
|
||||
|
||||
def getClass():
|
||||
return WritingWhimsicalwanderingsNetAdapter
|
||||
|
||||
# Class name has to be unique. Our convention is camel case the
|
||||
# sitename with Adapter at the end. www is skipped.
|
||||
class WritingWhimsicalwanderingsNetAdapter(BaseSiteAdapter):
|
||||
|
||||
def __init__(self, config, url):
|
||||
BaseSiteAdapter.__init__(self, config, url)
|
||||
|
||||
self.decode = ["utf8",
|
||||
"Windows-1252",
|
||||
"iso-8859-1"] # 1252 is a superset of iso-8859-1.
|
||||
# Most sites that claim to be
|
||||
# iso-8859-1 (and some that claim to be
|
||||
# utf8) are really windows-1252.
|
||||
self.username = "NoneGiven" # if left empty, site doesn't return any message at all.
|
||||
self.password = ""
|
||||
self.is_adult=False
|
||||
|
||||
# get storyId from url--url validation guarantees query is only sid=1234
|
||||
self.story.setMetadata('storyId',self.parsedUrl.query.split('=',)[1])
|
||||
|
||||
|
||||
# normalized story URL.
|
||||
self._setURL('http://' + self.getSiteDomain() + '/viewstory.php?sid='+self.story.getMetadata('storyId'))
|
||||
|
||||
# Each adapter needs to have a unique site abbreviation.
|
||||
self.story.setMetadata('siteabbrev','wwnet')
|
||||
|
||||
# The date format will vary from site to site.
|
||||
# http://docs.python.org/library/datetime.html#strftime-strptime-behavior
|
||||
self.dateformat = "%m/%d/%y"
|
||||
|
||||
|
||||
@staticmethod # must be @staticmethod, don't remove it.
|
||||
def getSiteDomain():
|
||||
# The site domain. Does have www here, if it uses it.
|
||||
return 'writing.whimsicalwanderings.net'
|
||||
|
||||
@classmethod
|
||||
def getSiteExampleURLs(cls):
|
||||
return "http://"+cls.getSiteDomain()+"/viewstory.php?sid=1234"
|
||||
|
||||
def getSiteURLPattern(self):
|
||||
return re.escape("http://"+self.getSiteDomain()+"/viewstory.php?sid=")+r"\d+$"
|
||||
|
||||
## Getting the chapter list and the meta data, plus 'is adult' checking.
|
||||
def extractChapterUrlsAndMetadata(self):
|
||||
|
||||
# index=1 makes sure we see the story chapter index. Some
|
||||
# sites skip that for one-chapter stories.
|
||||
url = self.url+'&index=1'
|
||||
logger.debug("URL: "+url)
|
||||
if self.is_adult or self.getConfig("is_adult"):
|
||||
addurl = '&ageconsent=ok&warning=4'
|
||||
else:
|
||||
addurl= ''
|
||||
|
||||
try:
|
||||
data = self._fetchUrl(url+addurl)
|
||||
except urllib2.HTTPError, e:
|
||||
if e.code == 404:
|
||||
raise exceptions.StoryDoesNotExist(url)
|
||||
else:
|
||||
raise e
|
||||
|
||||
if "Access denied. This story has not been validated by the adminstrators of this site." in data:
|
||||
raise exceptions.AccessDenied(self.getSiteDomain() +" says: Access denied. This story has not been validated by the adminstrators of this site.")
|
||||
|
||||
# use BeautifulSoup HTML parser to make everything easier to find.
|
||||
soup = self.make_soup(data)
|
||||
|
||||
# Now go hunting for all the meta data and the chapter list.
|
||||
|
||||
## Title
|
||||
a = soup.find('a', href=re.compile(r'viewstory.php\?sid='+self.story.getMetadata('storyId')+"$"))
|
||||
self.story.setMetadata('title',stripHTML(a))
|
||||
|
||||
# Find authorid and URL from... author url.
|
||||
a = soup.find('a', href=re.compile(r"viewuser.php\?uid=\d+"))
|
||||
self.story.setMetadata('authorId',a['href'].split('=')[1])
|
||||
self.story.setMetadata('authorUrl','http://'+self.host+a['href'])
|
||||
self.story.setMetadata('author',a.string)
|
||||
|
||||
# Find the chapters:
|
||||
for chapter in soup.findAll('a', href=re.compile(r'viewstory.php\?sid='+self.story.getMetadata('storyId')+"&chapter=\d+$")):
|
||||
# just in case there's tags, like <i> in chapter titles.
|
||||
self.chapterUrls.append((stripHTML(chapter),'http://'+self.host+'/'+chapter['href']))
|
||||
|
||||
self.story.setMetadata('numChapters',len(self.chapterUrls))
|
||||
|
||||
## This site's metadata is not very well formatted... so we have to cludge a bit..
|
||||
## The only ones I see that are, are Relationships and Warnings...
|
||||
## However, the categories, characters, and warnings are all links, so we can get them easier
|
||||
|
||||
## Categories don't have a proper label, but do use links, so...
|
||||
cats = soup.findAll('a',href=re.compile(r'browse.php\?type=categories'))
|
||||
catstext = [cat.string for cat in cats]
|
||||
for cat in catstext:
|
||||
if cat != None:
|
||||
self.story.addToList('category',cat.string)
|
||||
## Characters don't have a proper label, but do use links, so...
|
||||
chars = soup.findAll('a',href=re.compile(r'browse.php\?type=characters'))
|
||||
charstext = [char.string for char in chars]
|
||||
for char in charstext:
|
||||
if char != None:
|
||||
self.story.addToList('characters',char.string)
|
||||
## Warnings do have a proper label, but we will use links anyway
|
||||
warnings = soup.findAll('a',href=re.compile(r'browse.php\?type=class&type_id=2'))
|
||||
warningstext = [warning.string for warning in warnings]
|
||||
for warning in warningstext:
|
||||
if warning != None:
|
||||
self.story.addToList('warnings',warning.string)
|
||||
|
||||
## Relationships do have a proper label, but we will use links anyway
|
||||
## this is actually tag information ... m/f, gen, m/m and such.
|
||||
## so I'm putting them in the extratags section.
|
||||
relationships = soup.findAll('a',href=re.compile(r'browse.php\?type=class&type_id=1'))
|
||||
relationshipstext = [relationship.string for relationship in relationships]
|
||||
for relationship in relationshipstext:
|
||||
if relationship != None:
|
||||
self.story.addToList('ships',relationship.string)
|
||||
|
||||
## I know I'm replacing alot of <br>'s here, but I want to make sure that they are all
|
||||
## the same, so we can split the string correctly.
|
||||
metad = soup.find('div',{'class':'listbox'})
|
||||
metad = str(metad.renderContents()).replace('\n',' ').replace('<br>','|||||||').replace('<br/>','|||||||').replace('<br />','|||||||').strip()
|
||||
while '||||||||' in metad:
|
||||
metad = metad.replace('||||||||','|||||||')
|
||||
metad = stripHTML(metad)
|
||||
|
||||
for mdata in metad.split('|||||||'):
|
||||
mdata = mdata.strip()
|
||||
if mdata.startswith('Summary:'):
|
||||
self.setDescription(url,mdata[8:].strip())
|
||||
elif mdata.startswith('Rating'):
|
||||
temp = mdata[:mdata.find('[')].replace('Rating:','')
|
||||
self.story.setMetadata('rating', temp)
|
||||
elif mdata.startswith('Series'):
|
||||
pass
|
||||
elif mdata.startswith('Chapters'):
|
||||
temp = mdata.split('Completed:')[1]
|
||||
if 'Yes' in stripHTML(temp):
|
||||
self.story.setMetadata('status', 'Completed')
|
||||
else:
|
||||
self.story.setMetadata('status', 'In-Progress')
|
||||
elif mdata.startswith('Word Count'):
|
||||
self.story.setMetadata('numWords',mdata.replace('Word Count:','').strip())
|
||||
elif mdata.startswith('Published'):
|
||||
temp = mdata.split('Updated:')
|
||||
self.story.setMetadata('datePublished', makeDate(temp[0].replace('Published:','').strip(), self.dateformat))
|
||||
self.story.setMetadata('dateUpdated', makeDate(temp[1].strip(), self.dateformat))
|
||||
# Find Series name from series URL.
|
||||
a = soup.find('a', href=re.compile(r"viewseries.php\?seriesid=\d+"))
|
||||
if a != None:
|
||||
series_name = a.string
|
||||
try:
|
||||
series_url = 'http://'+self.host+'/'+a['href']
|
||||
|
||||
# use BeautifulSoup HTML parser to make everything easier to find.
|
||||
seriessoup = self.make_soup(self._fetchUrl(series_url))
|
||||
# can't use ^viewstory...$ in case of higher rated stories with javascript href.
|
||||
storyas = seriessoup.findAll('a', href=re.compile(r'viewstory.php\?sid=\d+'))
|
||||
i=1
|
||||
for a in storyas:
|
||||
# skip 'report this' and 'TOC' links
|
||||
if 'contact.php' not in a['href'] and 'index' not in a['href']:
|
||||
if a['href'] == ('viewstory.php?sid='+self.story.getMetadata('storyId')):
|
||||
self.setSeries(series_name, i)
|
||||
self.story.setMetadata('seriesUrl',series_url)
|
||||
break
|
||||
i+=1
|
||||
|
||||
except:
|
||||
self.setSeries(series_name,0)
|
||||
pass
|
||||
|
||||
storynotes = soup.find('blockquote')
|
||||
if storynotes != None:
|
||||
storynotes = stripHTML(storynotes).replace('Story Notes:','')
|
||||
self.story.setMetadata('storynotes',storynotes)
|
||||
|
||||
# grab the text for an individual chapter.
|
||||
def getChapterText(self, url):
|
||||
|
||||
logger.debug('Getting chapter text from: %s' % url)
|
||||
|
||||
soup = self.make_soup(self._fetchUrl(url))
|
||||
|
||||
div = soup.find('div', {'id' : 'story'})
|
||||
|
||||
if None == div:
|
||||
raise exceptions.FailedToDownload("Error downloading Chapter: %s! Missing required element!" % url)
|
||||
|
||||
return self.utf8FromSoup(url,div)
|
||||
38
fanficfare/adapters/adapter_wwwgiantessworldnet.py
Normal file
38
fanficfare/adapters/adapter_wwwgiantessworldnet.py
Normal file
|
|
@ -0,0 +1,38 @@
|
|||
# -*- coding: utf-8 -*-
|
||||
|
||||
# Copyright 2012 Fanficdownloader team, 2016 FanFicFare team
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
#
|
||||
###########################################################################
|
||||
### Adapted by GComyn - November 18, 2016
|
||||
###########################################################################
|
||||
# Software: eFiction
|
||||
from base_efiction_adapter import BaseEfictionAdapter
|
||||
|
||||
class WWWGiantessworldNetAdapter(BaseEfictionAdapter):
|
||||
|
||||
@staticmethod
|
||||
def getSiteDomain():
|
||||
return 'www.giantessworld.net'
|
||||
|
||||
@classmethod
|
||||
def getSiteAbbrev(self):
|
||||
return 'gwnet'
|
||||
|
||||
@classmethod
|
||||
def getDateFormat(self):
|
||||
return "%B %d %Y"
|
||||
|
||||
def getClass():
|
||||
return WWWGiantessworldNetAdapter
|
||||
|
|
@ -1370,6 +1370,19 @@ cover_exclusion_regexp:/css/bir.png
|
|||
[forums.sufficientvelocity.com]
|
||||
## see [base_xenforoforum]
|
||||
|
||||
[gluttonyfiction.com]
|
||||
## Some sites do not require a login, but do require the user to
|
||||
## confirm they are adult for adult content. In commandline version,
|
||||
## this should go in your personal.ini, not defaults.ini.
|
||||
#is_adult:true
|
||||
|
||||
## Some sites require login (or login for some rated stories) The
|
||||
## program can prompt you, or you can save it in config. In
|
||||
## commandline version, this should go in your personal.ini, not
|
||||
## defaults.ini.
|
||||
#username:YourName
|
||||
#password:yourpassword
|
||||
|
||||
[harem.lucifael.com]
|
||||
## Some sites do not require a login, but do require the user to
|
||||
## confirm they are adult for adult content. In commandline version,
|
||||
|
|
@ -1830,6 +1843,22 @@ extra_valid_entries:reviews,readings
|
|||
reviews_label:Reviews
|
||||
readings_label:Readings
|
||||
|
||||
[writing.whimsicalwanderings.net]
|
||||
## Some sites do not require a login, but do require the user to
|
||||
## confirm they are adult for adult content. In commandline version,
|
||||
## this should go in your personal.ini, not defaults.ini.
|
||||
#is_adult:true
|
||||
|
||||
## Some sites require login (or login for some rated stories) The
|
||||
## program can prompt you, or you can save it in config. In
|
||||
## commandline version, this should go in your personal.ini, not
|
||||
## defaults.ini.
|
||||
#username:YourName
|
||||
#password:yourpassword
|
||||
extra_valid_entries:storynotes
|
||||
storynotes_label: Story Notes
|
||||
add_to_titlepage_entries:,storynotes
|
||||
|
||||
[www.adastrafanfic.com]
|
||||
## Some sites do not require a login, but do require the user to
|
||||
## confirm they are adult for adult content. In commandline version,
|
||||
|
|
@ -2085,6 +2114,25 @@ keep_in_order_groupsUrl:true
|
|||
## make entryHTML.
|
||||
make_linkhtml_entries:prequel,sequels,groups,coverSource
|
||||
|
||||
[www.giantessworld.net]
|
||||
extra_valid_entries:growth, shrink, sizeroles
|
||||
growth_label: Growth
|
||||
shrink_label:Shrink
|
||||
sizeroles_label:Size Roles
|
||||
add_to_titlepage_entries:,growth, shrink, sizeroles
|
||||
|
||||
## Some sites do not require a login, but do require the user to
|
||||
## confirm they are adult for adult content. In commandline version,
|
||||
## this should go in your personal.ini, not defaults.ini.
|
||||
#is_adult:true
|
||||
|
||||
## Some sites require login (or login for some rated stories) The
|
||||
## program can prompt you, or you can save it in config. In
|
||||
## commandline version, this should go in your personal.ini, not
|
||||
## defaults.ini.
|
||||
#username:YourName
|
||||
#password:yourpassword
|
||||
|
||||
[www.harrypotterfanfiction.com]
|
||||
## Some sites do not require a login, but do require the user to
|
||||
## confirm they are adult for adult content. In commandline version,
|
||||
|
|
@ -2137,6 +2185,36 @@ extraships:InuYasha/Kagome
|
|||
## Site dedicated to these categories/characters/ships
|
||||
extracategories:Lord of the Rings
|
||||
|
||||
[www.lotrgfic.com]
|
||||
extra_valid_entries:places, times
|
||||
places_label: Places
|
||||
times_label:Times
|
||||
add_to_titlepage_entries:,places, times
|
||||
## Some sites do not require a login, but do require the user to
|
||||
## confirm they are adult for adult content. In commandline version,
|
||||
## this should go in your personal.ini, not defaults.ini.
|
||||
#is_adult:true
|
||||
|
||||
## Some sites require login (or login for some rated stories) The
|
||||
## program can prompt you, or you can save it in config. In
|
||||
## commandline version, this should go in your personal.ini, not
|
||||
## defaults.ini.
|
||||
#username:YourName
|
||||
#password:yourpassword
|
||||
|
||||
[www.looselugs.com]
|
||||
## Some sites do not require a login, but do require the user to
|
||||
## confirm they are adult for adult content. In commandline version,
|
||||
## this should go in your personal.ini, not defaults.ini.
|
||||
#is_adult:true
|
||||
|
||||
## Some sites require login (or login for some rated stories) The
|
||||
## program can prompt you, or you can save it in config. In
|
||||
## commandline version, this should go in your personal.ini, not
|
||||
## defaults.ini.
|
||||
#username:YourName
|
||||
#password:yourpassword
|
||||
|
||||
[www.masseffect2.in]
|
||||
## Site dedicated to this fandom.
|
||||
extracategories: Mass Effect
|
||||
|
|
@ -2319,6 +2397,20 @@ extracategories:Lord of the Rings
|
|||
#username:YourName
|
||||
#password:yourpassword
|
||||
|
||||
[www.tomparisdorm.com]
|
||||
extracategories:Star Trek: Voyager
|
||||
## Some sites do not require a login, but do require the user to
|
||||
## confirm they are adult for adult content. In commandline version,
|
||||
## this should go in your personal.ini, not defaults.ini.
|
||||
#is_adult:true
|
||||
|
||||
## Some sites require login (or login for some rated stories) The
|
||||
## program can prompt you, or you can save it in config. In
|
||||
## commandline version, this should go in your personal.ini, not
|
||||
## defaults.ini.
|
||||
#username:YourName
|
||||
#password:yourpassword
|
||||
|
||||
[www.tthfanfic.org]
|
||||
user_agent:
|
||||
slow_down_sleep_time:2
|
||||
|
|
@ -2390,6 +2482,19 @@ extracategories:Twilight
|
|||
## twilighted.net (ab)uses series as personal reading lists.
|
||||
collect_series: false
|
||||
|
||||
[www.valentchamber.com]
|
||||
## Some sites do not require a login, but do require the user to
|
||||
## confirm they are adult for adult content. In commandline version,
|
||||
## this should go in your personal.ini, not defaults.ini.
|
||||
#is_adult:true
|
||||
|
||||
## Some sites require login (or login for some rated stories) The
|
||||
## program can prompt you, or you can save it in config. In
|
||||
## commandline version, this should go in your personal.ini, not
|
||||
## defaults.ini.
|
||||
#username:YourName
|
||||
#password:yourpassword
|
||||
|
||||
[www.walkingtheplank.org]
|
||||
extra_valid_entries:reads
|
||||
reads_label:Read Count
|
||||
|
|
|
|||
Loading…
Reference in a new issue