Commit graph

25 commits

Author SHA1 Message Date
SmallCoccinelle
c1f89611e2
Refactor scraper top half (#1893)
* Simplify scraper listing

Introduce an enum, scraper.Kind, which explains what we are looking
for. Make it possible to match this from a scraper struct.

Use the enum to rewrite all the listing code to use the same code path.

* Use a map, nitpick ScrapePerformerList

Let the cache store a map from ID of a scraper to the scraper. This
improves lookups when there are many scrapers, making it practically
O(1) rather than O(n). If many scrapers are stored, this is faster.

Since range expressions work unchanged, we don't have to change much,
and things will still work.

make Kind a Stringer

Rename ScraperPerformerList -> ScraperPerformerQuery since that name
is used in the other scrapers, and we value consistency.

Tune ScraperPerformerQuery:

* Return static errors
* Use the new functionality

* When loading scrapers, do so directly

Rather than first walking the directory structure to obtain file paths,
fold the load directly in the the filepath walk. This makes the code
for more direct.

* Use static ErrNotFound

If a scraper isn't found, return one static error. This paves the way
for eventually doing our own error-presenter in gqlgen.

* Store the cache in the Resolver state

Putting the scraperCache directly in the resolver avoids the need to
call manager.GetInstance() all over the place to get access to the
scraper cache. The cache is stored by pointer, so it should be safe,
since the cache will just update its internal state rather than being
overwritten.

We can now utilize the resolver state to grab the cache where needed.

While here, pass context.Context from the resolver down into a function,
which removes a context.TODO()

* Introduce ScrapedContent

Create a union in the GraphQL schema for all scraped content. This
simplifies the internal implementation because we get variance on
the output content type.

Introduce a new type ScrapedContentType which signifies the scraped
content you want as a caller.

Use these to generalize the List interface and the URL scraping
interface.

* Simplify the scraper API

Introduce a new interface for scraping. This interface is then
used in the upper half of the scraper code, to make the code use one
code flow rather than multiple code flows. Variance is currently at
the old scraper structure.

Add extending interfaces for the different ways of invoking scrapes.
Use interface conversions to convert a scraper from the cache to a
scraper supporting the extra methods.

The return path returns models.ScrapedContent.

Write a general postProcess function in the scraper, handling all
ScrapedContent via type switching. This consolidates all postprocessing
code flows.

Introduce marhsallers in the resolver code for converting ScrapedContent
into the underlying concrete types. Use this to plug the existing
fields in the Query resolver, so everything still works.

* ScrapedContent: add more marshalling functions

Handle all marshalling of ScrapedContent through marhsalling functions.
Removes some hand-rolled early variants of it, and replaces it with
a canonical code flow.

* Support loadByName via scraper_s

In order to temporarily plug a hole in the current implementation, we
use the older implementation as a hook to get the newer implementation
to run.

Later on, this can serve as a guide for how to implement the lower level
bits inside the scrapers themselves. For now, it just enables support.

* Plug the remaining scraper functions for now

Since we would like to have a scraper which works in between refactors,
plug the lower level parts of the scraper for now. It avoids us having
to tackle this part just yet.

* Move postprocessing to its own file

There's enough postprocessing to clutter the main scrapers.go file.

Move all of this into a new file, postprocessing to make the API
simpler. It now lives in scrapers.go.

* Scraper: Invoke API consistency

scraper.Cache.ScrapeByName -> ScrapeName

* Fix scraping scenes by URL

Simple typo. While here, also make a single marshaller nil-aware.

* Introduce scraper groups, consolidate loadByURL

Rename `scraper_s` into `group`. A group is a group of scrapers with
the same identity. This corresponds to a single YAML file for a scraper
configuration. It defines a group which supports different types of
scraping contexts.

Move config into the group, and lift txnManager and globalConfig to
the group.

Because we now return models.ScrapedContent we can use interfaces to
get variance from the different underlying scrapers. Use a type
switch for the URL matcher candidates. And then again for the scrapers.

This consolidates all URL scraping paths into one.

While here, remove the urlMatcher interface which isn't needed. Also
clean up the remaining interfaces for url scraping and delete code
which has no purpose anymore.

* Consolidate fragment scraping in one code path

While here, abide the linters checks.

* Refactor loadByFragment

Give it the same treatment as loadByURL:

Step 1: find a scraperActionImpl which works for the data.
Step 2: use that to scrape

Most of this is simple analysis on the data at hand. It can be pushed
down further in a later commit, but for now we leave it here.

* Remove configScraper, autotag is a scraper

Remove the remains of the configScraper struct. It now lives on in the
group struct. Kill the remaining interfaces from the old implementation
while here.

Remove group.specification since it can now be handled by a simple
func call to spec().

Work through the autotag scraper. It now implements the scraper
interface, so it can be used as a scraper. This also simplifies the
autotag scraper quite a bit since it doens't have to implement a number
of unsupported func calls.

* Simplify the fragment scraper flow

* Pass the context

Eliminate a round of context.TODO() in the scraper code by passing
the calling context down into the subsystem. This will gracefully
allow for termination of remote calls if the client goes away for some
reason in GraphQL requests.

* Improve listScrapers in the schema

Support lists of types we accept.

* Be graceful on nil values in conversion

Supporting nil-values make the API more robust in the
case of partial results in a multi-scrape situation.

* Improve listScrapers: output at-most-once

Use the ID of a scraper to reduce the output set. If a scraper has
been included, don't include it again.

* Consolidate all API level errors into resolver.go
* Reorder files and functions:

scrapers.go -> cache.go:
    It almost contains nothing but the cache code.
    Move errors into scraper.go from here because
    It is a better place to have them living right now
group.go:
    All of the group structure. This can now go from
    scraper.go, making it more lean. Move group create
    from config_scraper to here.
config.go:
    Move the `(c config) spec()` call to here.
config_scraper.go:
    Empty file by now

* Name-update the scraper interfaces

Use 'via' rather than 'loadBy'.

The scrape happens via a given scrape method, so I think this is a nice
name for it.

* Rename scrapers for consistency.

While here, improve the error formatting, so different errors come
back differently.

* Nuke the freeones field from the GraphQL schema

* Fix autotag interfacing, refactor

The autotag scraper uses a pointer receiver, but the rest of the code
we use for scraping doesn't expect a pointer-receiver. Hence, to fix
the autotag scraper, we change it to be a value receiver, like the
rest of the code.

Fix: viaScene, and viaGallery.

While here, remove a couple of pointer-receiver methods which can be
trivially rewritten into plain functions.

* Protect against pointer interfaces

The underlying code can be a bit inconsistent in what it returns.
Introduce pointer-types in the postprocessing layer and handle them
accordingly for now. Once a better understanding of the lower levels
are understood, we can lift this.

* Move ErrConversion into the models package.

The conversion error pertains to the logic of converting models.
Because of this, it should move there, so it is centralized.

* Be consistent in scraper resolver error handling

If we have a static error

    Err = errors.New(..)

Then use it wrapped at the start:

    fmt.Errorf("%w: ...context...", Err)

This reads better.

While here, avoid using the underlying Atoi errors: they are verbose,
and like 99% of the time, the user know what is wrong from the input
string, so just give that back.

Also, remove the scraper id from the error contexts: it is implicit,
and the error wouldn't change if we used a different scraper, which
the error message would imply.

* Mark the list*Scrapers() API as deprecated

The same functionality is now present in listScrapers.

* Improve error formatting

Think about how each error is going to be used and tweak them to be
nicer.

* Return a sorted list of scrapers

This helps testing, it's closer to what we had, caches like stable data,
and it is easier for humans. It also makes the output stable, because
map iteration is randomized.

* Fix listScrapers calls to return in ID-order

Since we need the ordering to be by ID in all situations, it is easier
to just generalize the cache listScrapers call to support multiple
scraper types.

This avoids a de-dupe map up the chain, since every scraper is only
considered once. Sorting now happens in the cache listScrapers call.

Use this generalized function in all resolvers, which are now simple
passthroughs.

* Remove UpdateConfig from the scraper cache.

This isn't needed, so get rid of it.

* Pull a context into identify

Scraping scenes in the identify tasks now use a context from up the
call chain.

* Do not store the scraper cache in the resolver.

Scraper caches are updated through
manager.singleton•RefreshScraperCache, so we can't keep a pointer to
it in the resolver. Instead, solve this by adding a fetcher method to
the resolver type. This keeps it local to the resolver, while handling
the problem of updating caches in the configuration.
2021-11-19 10:55:34 +11:00
SmallCoccinelle
214a15bc40
Fixups + enable the commentFormatting linter (#1866)
* Add a space after // comments

For consistency, the commentFormatting lint checker suggests a space
after each // comment block. This commit handles all the spots in
the code where that is needed.

* Rewrite documentation on functions

Use the Go idiom of commenting:

* First sentence declares the purpose.
* First word is the name being declared

The reason this style is preferred is such that grep is able to find
names the user might be interested in. Consider e.g.,

    go doc -all pkg/ffmpeg | grep -i transcode

in which case a match will tell you the name of the function you are
interested in.

* Remove old code comment-blocks

There are some commented out old code blocks in the code base. These are
either 3 years old, or 2 years old. By now, I don't think their use is
going to come back any time soon, and Git will track old pieces of
deleted code anyway.

Opt for deletion.

* Reorder imports

Split stdlib imports from non-stdlib imports in files we are touching.

* Use a range over an iteration variable

Probably more go-idiomatic, and the code needed comment-fixing anyway.

* Use time.After rather than rolling our own

The idiom here is common enough that the stdlib contains a function for
it. Use the stdlib function over our own variant.

* Enable the commentFormatting linter
2021-10-20 16:10:46 +11:00
SmallCoccinelle
e14bb8432c
Enable gocritic (#1848)
* Don't capitalize local variables

ValidCodecs -> validCodecs

* Capitalize deprecation markers

A deprecated marker should be capitalized.

* Use re.MustCompile for static regexes

If the regex fails to compile, it's a programmer error, and should be
treated as such. The regex is entirely static.

* Simplify else-if constructions

Rewrite

   else { if cond {}}

to

   else if cond {}

* Use a switch statement to analyze formats

Break an if-else chain. While here, simplify code flow.

Also introduce a proper static error for unsupported image formats,
paving the way for being able to check against the error.

* Rewrite ifElse chains into switch statements

The "Effective Go" https://golang.org/doc/effective_go#switch document
mentions it is more idiomatic to write if-else chains as switches when
it is possible.

Find all the plain rewrite occurrences in the code base and rewrite.
In some cases, the if-else chains are replaced by a switch scrutinizer.
That is, the code sequence

  if x == 1 {
      ..
  } else if x == 2 {
      ..
  } else if x == 3 {
      ...
  }

can be rewritten into

  switch x {
  case 1:
    ..
  case 2:
    ..
  case 3:
    ..
  }

which is clearer for the compiler: it can decide if the switch is
better served by a jump-table then a branch-chain.

* Rewrite switches, introduce static errors

Introduce two new static errors:

* `ErrNotImplmented`
* `ErrNotSupported`

And use these rather than forming new generative errors whenever the
code is called. Code can now test on the errors (since they are static
and the pointers to them wont change).

Also rewrite ifElse chains into switches in this part of the code base.

* Introduce a StashBoxError in configuration

Since all stashbox errors are the same, treat them as such in the code
base. While here, rewrite an ifElse chain.

In the future, it might be beneifical to refactor configuration errors
into one error which can handle missing fields, which context the error
occurs in and so on. But for now, try to get an overview of the error
categories by hoisting them into static errors.

* Get rid of an else-block in transaction handling

If we succesfully `recover()`, we then always `panic()`. This means the
rest of the code is not reachable, so we can avoid having an else-block
here.

It also solves an ifElse-chain style check in the code base.

* Use strings.ReplaceAll

Rewrite

    strings.Replace(s, o, n, -1)

into

    strings.ReplaceAll(s, o, n)

To make it consistent and clear that we are doing an all-replace in the
string rather than replacing parts of it. It's more of a nitpick since
there are no implementation differences: the stdlib implementation is
just to supply -1.

* Rewrite via gocritic's assignOp

Statements of the form

    x = x + e

is rewritten into

    x += e

where applicable.

* Formatting

* Review comments handled

Stash-box is a proper noun.

Rewrite a switch into an if-chain which returns on the first error
encountered.

* Use context.TODO() over context.Background()

Patch in the same vein as everything else: use the TODO() marker so we
can search for it later and link it into the context tree/tentacle once
it reaches down to this level in the code base.

* Tell the linter to ignore a section in manager_tasks.go

The section is less readable, so mark it with a nolint for now. Because
the rewrite enables a ifElseChain, also mark that as nolint for now.

* Use strings.ReplaceAll over strings.Replace

* Apply an ifElse rewrite

else { if .. { .. } } rewrite into else if { .. }

* Use switch-statements over ifElseChains

Rewrite chains of if-else into switch statements. Where applicable,
add an early nil-guard to simplify case analysis. Also, in
ScanTask's Start(..), invert the logic to outdent the whole block, and
help the reader: if it's not a scene, the function flow is now far more
local to the top of the function, and it's clear that the rest of the
function has to do with scene management.

* Enable gocritic on the code base.

Disable appendAssign for now since we aren't passing that check yet.

* Document the nolint additions

* Document StashBoxBatchPerformerTagInput
2021-10-18 14:12:40 +11:00
SmallCoccinelle
655d3ae969
Toward better context handling (#1835)
* Use the request context

The code uses context.Background() in a flow where there is a
http.Request. Use the requests context instead.

* Use a true context in the plugin example

Let AddTag/RemoveTag take a context and use that context throughout
the example.

* Avoid the use of context.Background

Prefer context.TODO over context.Background deep in the call chain.

This marks the site as something which we need to context-handle
later, and also makes it clear to the reader that the context is
sort-of temporary in the code base.

While here, be consistent in handling the `act` variable in each
branch of the if .. { .. } .. check.

* Prefer context.TODO over context.Background

For the different scraping operations here, there is a context
higher up the call chain, which we ought to use. Mark the call-sites
as TODO for now, so we can come back later on a sweep of which parts
can be context-lifted.

* Thread context upwards

Initialization requires context for transactions. Thread the context
upward the call chain.

At the intialization call, add a context.TODO since we can't break this
yet. The singleton assumption prevents us from pulling it up into main for
now.

* make tasks context-aware

Change the task interface to understand contexts.

Pass the context down in some of the branches where it is needed.

* Make QueryStashBoxScene context-aware

This call naturally sits inside the request-context. Use it.

* Introduce a context in the JS plugin code

This allows us to use a context for HTTP calls inside the system.

Mark the context with a TODO at top level for now.

* Nitpick error formatting

Use %v rather than %s for error interfaces.
Do not begin an error strong with a capital letter.

* Avoid the use of http.Get in FFMPEG download chain

Since http.Get has no context, it isn't possible to break out or have
policy induced. The call will block until the GET completes. Rewrite
to use a http Request and provide a context.

Thread the context through the call chain for now. provide
context.TODO() at the top level of the initialization chain.

* Make getRemoteCDPWSAddress aware of contexts

Eliminate a call to http.Get and replace it with a context-aware
variant.

Push the context upwards in the call chain, but plug it before the
scraper interface so we don't have to rewrite said interface yet.

Plugged with context.TODO()

* Scraper: make the getImage function context-aware

Use a context, and pass it upwards. Plug it with context.TODO()
up the chain before the rewrite gets too much out of hand for now.

Minor tweaks along the way, remove a call to context.Background()
deep in the call chain.

* Make NOTIFY request context-aware

The call sits inside a Request-handler. So it's natural to use the
requests context as the context for the outgoing HTTP request.

* Use a context in the url scraper code

We are sitting in code which has a context, so utilize it for the
request as well.

* Use a context when checking versions

When we check the version of stash on Github, use a context. Thread
the context up to the initialization routine of the HTTP/GraphQL
server and plug it with a context.TODO() for now.

This paves the way for providing a context to the HTTP server code in a
future patch.

* Make utils func ReadImage context-aware

In almost all of the cases, there is a context in the call chain which
is a natural use. This is true for all the GraphQL mutations.

The exception is in task_stash_box_tag, so plug that task with
context.TODO() for now.

* Make stash-box get context-aware

Thread a context through the call chain until we hit the Client API.
Plug it with context.TODO() there for now.

* Enable the noctx linter

The code is now free of any uncontexted HTTP request. This means we
pass the noctx linter, and we can enable it in the code base.
2021-10-14 15:32:41 +11:00
WithoutPants
4625e1f955
Unify scrape refactor (#1630)
* Unify scraped types
* Make name fields optional
* Unify single scrape queries
* Change UI to use new interfaces
* Add multi scrape interfaces
* Use images instead of image
2021-09-07 11:54:22 +10:00
FleetingOrchard
50cb6a9c79
Add duration statistics to stats page (#1626) 2021-08-26 13:37:08 +10:00
WithoutPants
46bbede9a0
Plugin hooks (#1452)
* Refactor session and plugin code
* Add context to job tasks
* Show hooks in plugins page
* Refactor session management
2021-06-11 17:24:58 +10:00
EnameEtavir
5c4351f338
Cleanup fixes (#1422)
* cleanup: remove dead code

removing some code that does nothing

* cleanup: fixing usage of deprecated gqlgen/graphql api in api/changeset_translator

* cleanup: changing to recommended comparison methods

Changing byte and case-insensitive string comparison to the recommended methods.

* cleanup: making staticcheck happy
2021-05-25 11:03:09 +10:00
InfiniteTF
4fd022a93b
Decouple galleries from scenes (#1057) 2021-02-02 07:56:54 +11:00
WithoutPants
1e04deb3d4
Data layer restructuring (#997)
* Move query builders to sqlite package
* Add transaction system
* Wrap model resolvers in transaction
* Add error return value for StringSliceToIntSlice
* Update/refactor mutation resolvers
* Convert query builders
* Remove unused join types
* Add stash id unit tests
* Use WAL journal mode
2021-01-18 12:23:20 +11:00
InfiniteTF
e84c92355e
Fix integer overflow for scene size on 32bit systems (#994)
* Fix integer overflow for scene size on 32bit systems
* Cast to double in sqlite to prevent potential overflow
* Add migration to reset scene sizes and scan logic to repopulate if empty
2020-12-22 10:29:53 +11:00
WithoutPants
8eda72ad89
Image improvements (#847)
* Fix image performer filtering
* Add performer images tab
* Add studio images tab
* Rename interface
* Add tag images tab
* Add path filtering for images
* Show image stats on stats page
* Fix incorrect scan counts after timeout
* Add gallery filters
* Relax scene gallery selector
2020-10-20 10:11:15 +11:00
WithoutPants
aca2c7c5f4
Images section (#813)
* Add new configuration options
* Refactor scan/clean
* Schema changes
* Add details to galleries
* Remove redundant code
* Refine thumbnail generation
* Gallery overhaul
* Don't allow modifying zip gallery images
* Show gallery card overlays
* Hide zoom slider when not in grid mode
2020-10-13 10:12:46 +11:00
WithoutPants
1fd3fcc6a8
Show and allow creation of unknown performers/tags/studios/movies from scraper dialog (#741)
* Fix toast container z-index
* Make scrape operations network only
* Add CollapseButton component
2020-08-22 18:12:39 +10:00
bnkai
e58c311ddd
Add library size to main stats page (#427) 2020-04-03 13:44:17 +11:00
caustico
5fb8bbf768
Movies Section (#338)
Co-authored-by: WithoutPants <53250216+WithoutPants@users.noreply.github.com>
2020-03-10 14:28:15 +11:00
bnkai
87f0b667b5 Add check version functionality (#296)
* Add check version functionality
 * add backend support
 * add ui support
 * minor fixes
 * cosmetic fixes
 * workaround query refetch not working after network error
 * revert changes to Makefile after testing is complete
 * switch to Releases Github API endpoint, add latest Release URL to UI
 * latest version is only shown in UI when version is available and data is ready
 * resolve conflict , squash rebase
2020-01-07 17:21:23 -05:00
WithoutPants
17247060b6 Generic performer scrapers (#203)
* Generalise scraper API

* Add script performer scraper

* Fixes from testing

* Add context to scrapers and generalise

* Add scraping performer from URL

* Add error handling

* Move log to debug

* Add supported scrape types
2019-11-18 21:49:05 -05:00
WithoutPants
5963844191 Add develop branch releases and display version tag (#216)
* Add releases for develop branch. Show version tag

* Pass version tag to cross-compile
2019-11-17 16:41:08 -05:00
WithoutPants
0655223c38 Bulk update scenes (#92)
* Add bulk update functionality

* Restore multiselect fixes from previous branch

* Prevent unsetting of studios/tags

* Detect when slice fields are omitted and ignore
2019-10-27 09:05:31 -04:00
WithoutPants
3cf4b26f1d Show version info in about page 2019-08-21 14:47:48 +10:00
Stash Dev
4b037e1040 Dependency updates 2019-05-27 12:34:26 -07:00
Stash Dev
d6eb2c2d8e Scenes with a marker missing a primary tag fails to load
Fixes #42
2019-04-20 10:32:01 -07:00
Stash Dev
763424bc40 Update GQLGen and break up the schema.graphql file 2019-03-27 12:47:43 -07:00
Stash Dev
b488c1ed7d Reorg 2019-02-14 15:42:52 -08:00
Renamed from api/resolver.go (Browse further)