Commit graph

21 commits

Author SHA1 Message Date
SmallCoccinelle
e513b6ffa5
Cache and reuse the scraper HTTP client (#1855)
* Add Cookies directly to the request

Rather than maintaining a cookie jar on a one-shot HTTP client, maintain
the jar ourselves: make a new jar, then use it to select the right
cookies.

The cookies are set on the request rather than on the client. This will
retain the current behavior as we are always throwing the client away
after each use.

This patch enables the lifting of the http client as well over time.

* Introduce a cached scraper HTTP client

The scraper cache is augmented with an *http.Client. These are safe for
concurrent use, so the pointer can safely be passed around. Push this
into scraper configurations where applicable, next to the txnManagers.

When we issue a loadUrl request, do so on the cached *http.Client,
which will reuse existing idle connections in the client if any are
present.

* Set MaxIdleConnsPerHost. Closes #1850

We allow for up to 8 idle connections to a single host. This should
make concurrent operation toward the same host reuse connections, even
for sizeable concurrency.

The number isn't bumped excessively high. We should probably limit
concurrency toward a single site anyway, since we'll be able to overrun
a site with queries quite easily if we have many concurrent goroutines
issuing requests at the same time.

* Reinstate driverOptions / useCDP check

Use DeMorgan's laws to invert the logic and exit early. Fixes tests
breaking.

* Documentation fixup.

* Use the scraper http.Client when fetching images

Fold image fetchers onto the cached scraper http.Client as well. This
makes the scraper have a single http.Client cache for all its
operations.

Thread the client upwards to the relevant attachment points: either the
cache, or a stash_box instance, which is extended to include a pointer
to the client.

Style roughly follows that of txnManagers.

* Use the same http Client as the GraphQL client use

Rather than using http.DefaultClient, use the same client as the
GraphQL client use in the stash_box subsystem. This localizes the
client used in the subsystem into the constructing New.. call.

* Hoist HTTP client construction

Create a function for initializaing the HTTP Client we use. While here
hoist magic numbers into constants. Introduce a proper static redirect
error and use it in the client code as well.

* Reinstate printCookies

This is a debugging function, and it might still come in handy in the
future at some point.

* Nitpick comment.

* Minor tidy

Co-authored-by: WithoutPants <53250216+WithoutPants@users.noreply.github.com>
2021-10-20 16:12:24 +11:00
kermieisinthehouse
5ec70ac3e0
Fix List filter styles, fix freeones spam (#1853)
* Fix List filter styles, fix freeones spam
2021-10-15 14:02:49 +11:00
WithoutPants
e9d48683f8
Autotag scraper (#1817)
* Refactor scraper structures
* Move matching code into new package
* Add autotag scraper
* Always check first letter of auto-tag names
* Account for nulls

Co-authored-by: Kermie <kermie@isinthe.house>
2021-10-11 23:06:06 +11:00
julien0221
d673c4ce03
added details, deathdate, hair color, weight to performers and added details to studios (#1274)
* added details to performers and studios
* added deathdate, hair_color and weight to performers
* Simplify performer/studio create mutations
* Add changelog and recategorised

Co-authored-by: WithoutPants <53250216+WithoutPants@users.noreply.github.com>
2021-04-16 16:06:35 +10:00
bnkai
4299f113e0
Fix Freeones search (#1230) 2021-03-25 10:01:56 +11:00
Belley
86bfb64a0d
Fix freeones scraper (#1091) 2021-02-01 08:15:50 +11:00
Belley
94392c7c4d
Fixing image for Freeones Scrapers (#930) 2020-11-07 09:36:26 +11:00
WithoutPants
70f73ecf4a
Update freeones scraper (#881) 2020-10-24 13:12:21 +11:00
WithoutPants
2b9215702e
Refactor xpath scraper code. Add fixed and map (#616)
* Refactor xpath scraper code
* Make post-process a list
* Add map post-process action
* Add fixed xpath values
* Refactor scrapers into cache
* Refactor into mapped config
* Trim test html
2020-07-21 14:06:25 +10:00
bnkai
a7ac02fb50
freeones fixes (#615) 2020-06-17 11:02:06 +10:00
bnkai
b89956de25
freeones scraper fixes/tweaking (#584) 2020-06-02 09:45:37 +10:00
WithoutPants
215c4e3bde
Change builtin freeones scraper to community yml (#542) 2020-05-15 20:10:20 +10:00
bnkai
0b50e83dbf
freeones scraper tweaks (#509) 2020-05-04 14:11:49 +10:00
WithoutPants
82201e23e0
Make ethnicity freetext and fix freeones ethnicity panic (#431)
* Make ethnicity free text

* Fix panic in freeones scraper for other ethnicity
2020-04-02 08:25:39 +11:00
WithoutPants
92837fe1f7 Add scene metadata scraping functionality (#236)
* Add scene scraping functionality

* Adapt to changed scraper config
2019-12-15 20:35:34 -05:00
WithoutPants
50784025f2 Change scraper config to yaml (#256) 2019-12-12 14:27:44 -05:00
WithoutPants
17247060b6 Generic performer scrapers (#203)
* Generalise scraper API

* Add script performer scraper

* Fixes from testing

* Add context to scrapers and generalise

* Add scraping performer from URL

* Add error handling

* Move log to debug

* Add supported scrape types
2019-11-18 21:49:05 -05:00
bill
9f6888a3d6 fix freeones scraper bugs 2019-10-16 02:05:49 +03:00
Friendly C
7c94262020 Freeones Scrape: Fix scraping by alias 2019-10-10 23:56:06 +02:00
bnkai
bcc70af7e5 Fix minor freeones scraper bug (#41)
Fix minor freeones scraper bug
2019-04-11 11:54:38 -07:00
Stash Dev
b488c1ed7d Reorg 2019-02-14 15:42:52 -08:00
Renamed from scraper/freeones.go (Browse further)