stash/pkg/scraper/stashbox/stash_box.go
SmallCoccinelle 655d3ae969
Toward better context handling (#1835)
* Use the request context

The code uses context.Background() in a flow where there is a
http.Request. Use the requests context instead.

* Use a true context in the plugin example

Let AddTag/RemoveTag take a context and use that context throughout
the example.

* Avoid the use of context.Background

Prefer context.TODO over context.Background deep in the call chain.

This marks the site as something which we need to context-handle
later, and also makes it clear to the reader that the context is
sort-of temporary in the code base.

While here, be consistent in handling the `act` variable in each
branch of the if .. { .. } .. check.

* Prefer context.TODO over context.Background

For the different scraping operations here, there is a context
higher up the call chain, which we ought to use. Mark the call-sites
as TODO for now, so we can come back later on a sweep of which parts
can be context-lifted.

* Thread context upwards

Initialization requires context for transactions. Thread the context
upward the call chain.

At the intialization call, add a context.TODO since we can't break this
yet. The singleton assumption prevents us from pulling it up into main for
now.

* make tasks context-aware

Change the task interface to understand contexts.

Pass the context down in some of the branches where it is needed.

* Make QueryStashBoxScene context-aware

This call naturally sits inside the request-context. Use it.

* Introduce a context in the JS plugin code

This allows us to use a context for HTTP calls inside the system.

Mark the context with a TODO at top level for now.

* Nitpick error formatting

Use %v rather than %s for error interfaces.
Do not begin an error strong with a capital letter.

* Avoid the use of http.Get in FFMPEG download chain

Since http.Get has no context, it isn't possible to break out or have
policy induced. The call will block until the GET completes. Rewrite
to use a http Request and provide a context.

Thread the context through the call chain for now. provide
context.TODO() at the top level of the initialization chain.

* Make getRemoteCDPWSAddress aware of contexts

Eliminate a call to http.Get and replace it with a context-aware
variant.

Push the context upwards in the call chain, but plug it before the
scraper interface so we don't have to rewrite said interface yet.

Plugged with context.TODO()

* Scraper: make the getImage function context-aware

Use a context, and pass it upwards. Plug it with context.TODO()
up the chain before the rewrite gets too much out of hand for now.

Minor tweaks along the way, remove a call to context.Background()
deep in the call chain.

* Make NOTIFY request context-aware

The call sits inside a Request-handler. So it's natural to use the
requests context as the context for the outgoing HTTP request.

* Use a context in the url scraper code

We are sitting in code which has a context, so utilize it for the
request as well.

* Use a context when checking versions

When we check the version of stash on Github, use a context. Thread
the context up to the initialization routine of the HTTP/GraphQL
server and plug it with a context.TODO() for now.

This paves the way for providing a context to the HTTP server code in a
future patch.

* Make utils func ReadImage context-aware

In almost all of the cases, there is a context in the call chain which
is a natural use. This is true for all the GraphQL mutations.

The exception is in task_stash_box_tag, so plug that task with
context.TODO() for now.

* Make stash-box get context-aware

Thread a context through the call chain until we hit the Client API.
Plug it with context.TODO() there for now.

* Enable the noctx linter

The code is now free of any uncontexted HTTP request. This means we
pass the noctx linter, and we can enable it in the code base.
2021-10-14 15:32:41 +11:00

713 lines
17 KiB
Go

package stashbox
import (
"context"
"fmt"
"io"
"net/http"
"strconv"
"strings"
"time"
"github.com/Yamashou/gqlgenc/client"
"github.com/stashapp/stash/pkg/logger"
"github.com/stashapp/stash/pkg/match"
"github.com/stashapp/stash/pkg/models"
"github.com/stashapp/stash/pkg/scraper/stashbox/graphql"
"github.com/stashapp/stash/pkg/utils"
)
// Timeout to get the image. Includes transfer time. May want to make this
// configurable at some point.
const imageGetTimeout = time.Second * 30
// Client represents the client interface to a stash-box server instance.
type Client struct {
client *graphql.Client
txnManager models.TransactionManager
}
// NewClient returns a new instance of a stash-box client.
func NewClient(box models.StashBox, txnManager models.TransactionManager) *Client {
authHeader := func(req *http.Request) {
req.Header.Set("ApiKey", box.APIKey)
}
client := &graphql.Client{
Client: client.NewClient(http.DefaultClient, box.Endpoint, authHeader),
}
return &Client{
client: client,
txnManager: txnManager,
}
}
// QueryStashBoxScene queries stash-box for scenes using a query string.
func (c Client) QueryStashBoxScene(ctx context.Context, queryStr string) ([]*models.ScrapedScene, error) {
scenes, err := c.client.SearchScene(ctx, queryStr)
if err != nil {
return nil, err
}
sceneFragments := scenes.SearchScene
var ret []*models.ScrapedScene
for _, s := range sceneFragments {
ss, err := sceneFragmentToScrapedScene(context.TODO(), c.txnManager, s)
if err != nil {
return nil, err
}
ret = append(ret, ss)
}
return ret, nil
}
// FindStashBoxScenesByFingerprints queries stash-box for scenes using every
// scene's MD5/OSHASH checksum, or PHash, and returns results in the same order
// as the input slice.
func (c Client) FindStashBoxScenesByFingerprints(sceneIDs []string) ([][]*models.ScrapedScene, error) {
ctx := context.TODO()
ids, err := utils.StringSliceToIntSlice(sceneIDs)
if err != nil {
return nil, err
}
var fingerprints []string
// map fingerprints to their scene index
fpToScene := make(map[string][]int)
if err := c.txnManager.WithReadTxn(ctx, func(r models.ReaderRepository) error {
qb := r.Scene()
for index, sceneID := range ids {
scene, err := qb.Find(sceneID)
if err != nil {
return err
}
if scene == nil {
return fmt.Errorf("scene with id %d not found", sceneID)
}
if scene.Checksum.Valid {
fingerprints = append(fingerprints, scene.Checksum.String)
fpToScene[scene.Checksum.String] = append(fpToScene[scene.Checksum.String], index)
}
if scene.OSHash.Valid {
fingerprints = append(fingerprints, scene.OSHash.String)
fpToScene[scene.OSHash.String] = append(fpToScene[scene.OSHash.String], index)
}
if scene.Phash.Valid {
phashStr := utils.PhashToString(scene.Phash.Int64)
fingerprints = append(fingerprints, phashStr)
fpToScene[phashStr] = append(fpToScene[phashStr], index)
}
}
return nil
}); err != nil {
return nil, err
}
allScenes, err := c.findStashBoxScenesByFingerprints(ctx, fingerprints)
if err != nil {
return nil, err
}
// set the matched scenes back in their original order
ret := make([][]*models.ScrapedScene, len(sceneIDs))
for _, s := range allScenes {
var addedTo []int
for _, fp := range s.Fingerprints {
sceneIndexes := fpToScene[fp.Hash]
for _, index := range sceneIndexes {
if !utils.IntInclude(addedTo, index) {
addedTo = append(addedTo, index)
ret[index] = append(ret[index], s)
}
}
}
}
return ret, nil
}
// FindStashBoxScenesByFingerprintsFlat queries stash-box for scenes using every
// scene's MD5/OSHASH checksum, or PHash, and returns results a flat slice.
func (c Client) FindStashBoxScenesByFingerprintsFlat(sceneIDs []string) ([]*models.ScrapedScene, error) {
ctx := context.TODO()
ids, err := utils.StringSliceToIntSlice(sceneIDs)
if err != nil {
return nil, err
}
var fingerprints []string
if err := c.txnManager.WithReadTxn(ctx, func(r models.ReaderRepository) error {
qb := r.Scene()
for _, sceneID := range ids {
scene, err := qb.Find(sceneID)
if err != nil {
return err
}
if scene == nil {
return fmt.Errorf("scene with id %d not found", sceneID)
}
if scene.Checksum.Valid {
fingerprints = append(fingerprints, scene.Checksum.String)
}
if scene.OSHash.Valid {
fingerprints = append(fingerprints, scene.OSHash.String)
}
if scene.Phash.Valid {
phashStr := utils.PhashToString(scene.Phash.Int64)
fingerprints = append(fingerprints, phashStr)
}
}
return nil
}); err != nil {
return nil, err
}
return c.findStashBoxScenesByFingerprints(ctx, fingerprints)
}
func (c Client) findStashBoxScenesByFingerprints(ctx context.Context, fingerprints []string) ([]*models.ScrapedScene, error) {
var ret []*models.ScrapedScene
for i := 0; i < len(fingerprints); i += 100 {
end := i + 100
if end > len(fingerprints) {
end = len(fingerprints)
}
scenes, err := c.client.FindScenesByFingerprints(ctx, fingerprints[i:end])
if err != nil {
return nil, err
}
sceneFragments := scenes.FindScenesByFingerprints
for _, s := range sceneFragments {
ss, err := sceneFragmentToScrapedScene(ctx, c.txnManager, s)
if err != nil {
return nil, err
}
ret = append(ret, ss)
}
}
return ret, nil
}
func (c Client) SubmitStashBoxFingerprints(sceneIDs []string, endpoint string) (bool, error) {
ids, err := utils.StringSliceToIntSlice(sceneIDs)
if err != nil {
return false, err
}
var fingerprints []graphql.FingerprintSubmission
if err := c.txnManager.WithReadTxn(context.TODO(), func(r models.ReaderRepository) error {
qb := r.Scene()
for _, sceneID := range ids {
scene, err := qb.Find(sceneID)
if err != nil {
return err
}
if scene == nil {
continue
}
stashIDs, err := qb.GetStashIDs(sceneID)
if err != nil {
return err
}
sceneStashID := ""
for _, stashID := range stashIDs {
if stashID.Endpoint == endpoint {
sceneStashID = stashID.StashID
}
}
if sceneStashID != "" {
if scene.Checksum.Valid && scene.Duration.Valid {
fingerprint := graphql.FingerprintInput{
Hash: scene.Checksum.String,
Algorithm: graphql.FingerprintAlgorithmMd5,
Duration: int(scene.Duration.Float64),
}
fingerprints = append(fingerprints, graphql.FingerprintSubmission{
SceneID: sceneStashID,
Fingerprint: &fingerprint,
})
}
if scene.OSHash.Valid && scene.Duration.Valid {
fingerprint := graphql.FingerprintInput{
Hash: scene.OSHash.String,
Algorithm: graphql.FingerprintAlgorithmOshash,
Duration: int(scene.Duration.Float64),
}
fingerprints = append(fingerprints, graphql.FingerprintSubmission{
SceneID: sceneStashID,
Fingerprint: &fingerprint,
})
}
if scene.Phash.Valid && scene.Duration.Valid {
fingerprint := graphql.FingerprintInput{
Hash: utils.PhashToString(scene.Phash.Int64),
Algorithm: graphql.FingerprintAlgorithmPhash,
Duration: int(scene.Duration.Float64),
}
fingerprints = append(fingerprints, graphql.FingerprintSubmission{
SceneID: sceneStashID,
Fingerprint: &fingerprint,
})
}
}
}
return nil
}); err != nil {
return false, err
}
return c.submitStashBoxFingerprints(fingerprints)
}
func (c Client) submitStashBoxFingerprints(fingerprints []graphql.FingerprintSubmission) (bool, error) {
for _, fingerprint := range fingerprints {
_, err := c.client.SubmitFingerprint(context.TODO(), fingerprint)
if err != nil {
return false, err
}
}
return true, nil
}
// QueryStashBoxPerformer queries stash-box for performers using a query string.
func (c Client) QueryStashBoxPerformer(queryStr string) ([]*models.StashBoxPerformerQueryResult, error) {
performers, err := c.queryStashBoxPerformer(queryStr)
res := []*models.StashBoxPerformerQueryResult{
{
Query: queryStr,
Results: performers,
},
}
// set the deprecated image field
for _, p := range res[0].Results {
if len(p.Images) > 0 {
p.Image = &p.Images[0]
}
}
return res, err
}
func (c Client) queryStashBoxPerformer(queryStr string) ([]*models.ScrapedPerformer, error) {
performers, err := c.client.SearchPerformer(context.TODO(), queryStr)
if err != nil {
return nil, err
}
performerFragments := performers.SearchPerformer
var ret []*models.ScrapedPerformer
for _, fragment := range performerFragments {
performer := performerFragmentToScrapedScenePerformer(*fragment)
ret = append(ret, performer)
}
return ret, nil
}
// FindStashBoxPerformersByNames queries stash-box for performers by name
func (c Client) FindStashBoxPerformersByNames(performerIDs []string) ([]*models.StashBoxPerformerQueryResult, error) {
ids, err := utils.StringSliceToIntSlice(performerIDs)
if err != nil {
return nil, err
}
var performers []*models.Performer
if err := c.txnManager.WithReadTxn(context.TODO(), func(r models.ReaderRepository) error {
qb := r.Performer()
for _, performerID := range ids {
performer, err := qb.Find(performerID)
if err != nil {
return err
}
if performer == nil {
return fmt.Errorf("performer with id %d not found", performerID)
}
if performer.Name.Valid {
performers = append(performers, performer)
}
}
return nil
}); err != nil {
return nil, err
}
return c.findStashBoxPerformersByNames(performers)
}
func (c Client) FindStashBoxPerformersByPerformerNames(performerIDs []string) ([][]*models.ScrapedPerformer, error) {
ids, err := utils.StringSliceToIntSlice(performerIDs)
if err != nil {
return nil, err
}
var performers []*models.Performer
if err := c.txnManager.WithReadTxn(context.TODO(), func(r models.ReaderRepository) error {
qb := r.Performer()
for _, performerID := range ids {
performer, err := qb.Find(performerID)
if err != nil {
return err
}
if performer == nil {
return fmt.Errorf("performer with id %d not found", performerID)
}
if performer.Name.Valid {
performers = append(performers, performer)
}
}
return nil
}); err != nil {
return nil, err
}
results, err := c.findStashBoxPerformersByNames(performers)
if err != nil {
return nil, err
}
var ret [][]*models.ScrapedPerformer
for _, r := range results {
ret = append(ret, r.Results)
}
return ret, nil
}
func (c Client) findStashBoxPerformersByNames(performers []*models.Performer) ([]*models.StashBoxPerformerQueryResult, error) {
var ret []*models.StashBoxPerformerQueryResult
for _, performer := range performers {
if performer.Name.Valid {
performerResults, err := c.queryStashBoxPerformer(performer.Name.String)
if err != nil {
return nil, err
}
result := models.StashBoxPerformerQueryResult{
Query: strconv.Itoa(performer.ID),
Results: performerResults,
}
ret = append(ret, &result)
}
}
return ret, nil
}
func findURL(urls []*graphql.URLFragment, urlType string) *string {
for _, u := range urls {
if u.Type == urlType {
ret := u.URL
return &ret
}
}
return nil
}
func enumToStringPtr(e fmt.Stringer, titleCase bool) *string {
if e != nil {
ret := e.String()
if titleCase {
ret = strings.Title(strings.ToLower(ret))
}
return &ret
}
return nil
}
func formatMeasurements(m graphql.MeasurementsFragment) *string {
if m.BandSize != nil && m.CupSize != nil && m.Hip != nil && m.Waist != nil {
ret := fmt.Sprintf("%d%s-%d-%d", *m.BandSize, *m.CupSize, *m.Waist, *m.Hip)
return &ret
}
return nil
}
func formatCareerLength(start, end *int) *string {
if start == nil && end == nil {
return nil
}
var ret string
if end == nil {
ret = fmt.Sprintf("%d -", *start)
} else if start == nil {
ret = fmt.Sprintf("- %d", *end)
} else {
ret = fmt.Sprintf("%d - %d", *start, *end)
}
return &ret
}
func formatBodyModifications(m []*graphql.BodyModificationFragment) *string {
if len(m) == 0 {
return nil
}
var retSlice []string
for _, f := range m {
if f.Description == nil {
retSlice = append(retSlice, f.Location)
} else {
retSlice = append(retSlice, fmt.Sprintf("%s, %s", f.Location, *f.Description))
}
}
ret := strings.Join(retSlice, "; ")
return &ret
}
func fetchImage(ctx context.Context, url string) (*string, error) {
client := &http.Client{
Timeout: imageGetTimeout,
}
req, err := http.NewRequestWithContext(ctx, http.MethodGet, url, nil)
if err != nil {
return nil, err
}
resp, err := client.Do(req)
if err != nil {
return nil, err
}
defer resp.Body.Close()
body, err := io.ReadAll(resp.Body)
if err != nil {
return nil, err
}
// determine the image type and set the base64 type
contentType := resp.Header.Get("Content-Type")
if contentType == "" {
contentType = http.DetectContentType(body)
}
img := "data:" + contentType + ";base64," + utils.GetBase64StringFromData(body)
return &img, nil
}
func performerFragmentToScrapedScenePerformer(p graphql.PerformerFragment) *models.ScrapedPerformer {
id := p.ID
images := []string{}
for _, image := range p.Images {
images = append(images, image.URL)
}
sp := &models.ScrapedPerformer{
Name: &p.Name,
Country: p.Country,
Measurements: formatMeasurements(p.Measurements),
CareerLength: formatCareerLength(p.CareerStartYear, p.CareerEndYear),
Tattoos: formatBodyModifications(p.Tattoos),
Piercings: formatBodyModifications(p.Piercings),
Twitter: findURL(p.Urls, "TWITTER"),
RemoteSiteID: &id,
Images: images,
// TODO - tags not currently supported
// graphql schema change to accommodate this. Leave off for now.
}
if len(sp.Images) > 0 {
sp.Image = &sp.Images[0]
}
if p.Height != nil && *p.Height > 0 {
hs := strconv.Itoa(*p.Height)
sp.Height = &hs
}
if p.Birthdate != nil {
b := p.Birthdate.Date
sp.Birthdate = &b
}
if p.Gender != nil {
sp.Gender = enumToStringPtr(p.Gender, false)
}
if p.Ethnicity != nil {
sp.Ethnicity = enumToStringPtr(p.Ethnicity, true)
}
if p.EyeColor != nil {
sp.EyeColor = enumToStringPtr(p.EyeColor, true)
}
if p.BreastType != nil {
sp.FakeTits = enumToStringPtr(p.BreastType, true)
}
return sp
}
func getFirstImage(ctx context.Context, images []*graphql.ImageFragment) *string {
ret, err := fetchImage(ctx, images[0].URL)
if err != nil {
logger.Warnf("Error fetching image %s: %s", images[0].URL, err.Error())
}
return ret
}
func getFingerprints(scene *graphql.SceneFragment) []*models.StashBoxFingerprint {
fingerprints := []*models.StashBoxFingerprint{}
for _, fp := range scene.Fingerprints {
fingerprint := models.StashBoxFingerprint{
Algorithm: fp.Algorithm.String(),
Hash: fp.Hash,
Duration: fp.Duration,
}
fingerprints = append(fingerprints, &fingerprint)
}
return fingerprints
}
func sceneFragmentToScrapedScene(ctx context.Context, txnManager models.TransactionManager, s *graphql.SceneFragment) (*models.ScrapedScene, error) {
stashID := s.ID
ss := &models.ScrapedScene{
Title: s.Title,
Date: s.Date,
Details: s.Details,
URL: findURL(s.Urls, "STUDIO"),
Duration: s.Duration,
RemoteSiteID: &stashID,
Fingerprints: getFingerprints(s),
// Image
// stash_id
}
if len(s.Images) > 0 {
// TODO - #454 code sorts images by aspect ratio according to a wanted
// orientation. I'm just grabbing the first for now
ss.Image = getFirstImage(ctx, s.Images)
}
if err := txnManager.WithReadTxn(ctx, func(r models.ReaderRepository) error {
pqb := r.Performer()
tqb := r.Tag()
if s.Studio != nil {
studioID := s.Studio.ID
ss.Studio = &models.ScrapedStudio{
Name: s.Studio.Name,
URL: findURL(s.Studio.Urls, "HOME"),
RemoteSiteID: &studioID,
}
err := match.ScrapedStudio(r.Studio(), ss.Studio)
if err != nil {
return err
}
}
for _, p := range s.Performers {
sp := performerFragmentToScrapedScenePerformer(p.Performer)
err := match.ScrapedPerformer(pqb, sp)
if err != nil {
return err
}
ss.Performers = append(ss.Performers, sp)
}
for _, t := range s.Tags {
st := &models.ScrapedTag{
Name: t.Name,
}
err := match.ScrapedTag(tqb, st)
if err != nil {
return err
}
ss.Tags = append(ss.Tags, st)
}
return nil
}); err != nil {
return nil, err
}
return ss, nil
}
func (c Client) FindStashBoxPerformerByID(id string) (*models.ScrapedPerformer, error) {
performer, err := c.client.FindPerformerByID(context.TODO(), id)
if err != nil {
return nil, err
}
ret := performerFragmentToScrapedScenePerformer(*performer.FindPerformer)
return ret, nil
}
func (c Client) FindStashBoxPerformerByName(name string) (*models.ScrapedPerformer, error) {
performers, err := c.client.SearchPerformer(context.TODO(), name)
if err != nil {
return nil, err
}
var ret *models.ScrapedPerformer
for _, performer := range performers.SearchPerformer {
if strings.EqualFold(performer.Name, name) {
ret = performerFragmentToScrapedScenePerformer(*performer)
}
}
return ret, nil
}