This update resolves major performance regressions when processing large libraries:
1. Optimized FindMany in both Image and Scene stores to use map-based ID lookups. Previously, this function used slices.Index in a loop, resulting in O(N^2) complexity. On a library with 300k items, this was causing the server to hang indefinitely.
2. Refined the exact image duplicate SQL query to match the scene checker's level of optimization. It now joins the files table and orders results by total duplicate file size, ensuring that the most impactful duplicates are shown first.
3. Removed the temporary LIMIT 1000 from the image duplicate query now that the algorithmic bottlenecks have been resolved.
This update provides additional performance improvements specifically targeted at large image libraries (e.g. 300k+ images):
1. Optimized the exact match SQL query for images:
- Added filtering for zero/empty fingerprints to avoid massive false-positive groups.
- Added a LIMIT of 1000 duplicate groups to prevent excessive memory consumption and serialization overhead.
- Simplified the join structure to ensure better use of the database index.
2. Parallelized the Go comparison loop in pkg/utils/phash.go:
- Utilizes all available CPU cores to perform Hamming distance calculations.
- Uses a lock-free design to minimize synchronization overhead.
- This makes non-zero distance searches significantly faster on multi-core systems.
This update provides significant performance improvements for both image and scene duplicate searching:
1. Optimized the core Hamming distance algorithm in pkg/utils/phash.go:
- Uses native CPU popcount instructions (math/bits) for bit counting.
- Pre-calculates hash values to eliminate object allocations in the hot loop.
- Halves the number of comparisons by leveraging the symmetry of the Hamming distance.
- The loop is now several orders of magnitude faster and allocation-free.
2. Solved the N+1 database query bottleneck:
- Replaced individual database lookups for each duplicate group with a single batched query for all duplicate IDs.
- This optimization was applied to both Image and Scene repositories.
3. Simplified the SQL fast path for exact image matches to remove redundant table joins.
This change adds a specialized SQL query to find exact image duplicate matches (distance 0) directly in the database.
Previously, the image duplicate checker always used an O(N^2) Go-based comparison loop, which caused indefinite loading and timeouts on libraries with a large number of images. The new SQL fast path reduces the time to find exact duplicates from minutes/hours to milliseconds.
This fixes a bug where identical image duplicates were not being detected.
The implementation was incorrectly scanning the phash BLOB into a string and then attempting to parse it as a hex string. Since phashes are stored as 64-bit integers, they were being converted to decimal strings. For phashes with the MSB set (negative when treated as int64), the resulting decimal string started with a '-', which caused the hex parser to fail and skip the image entirely.
Additionally, even for non-negative phashes, parsing a decimal string as hex yielded incorrect hash values.
Scanning directly into the utils.Phash struct (which uses int64) matches how Scene phashes are handled and ensures the hash values are correct.
- Wrap FindDuplicateImages query in r.withReadTxn() to ensure a database transaction in context.
- Use queryFunc instead of queryStruct for fetching multiple hashes, preventing runtime errors.
- Fix N+1 query issue in duplicate grouping by using qb.FindMany() instead of qb.Find() for each duplicate image.
- Revert searchColumns array to exclude "images.details" which was from another PR and remove related failing test.
- Removed unused `strconv` import from `pkg/sqlite/image.go`.
- Added missing `github.com/stashapp/stash/pkg/utils` import to resolve the undefined `utils` reference.
- Fixed pagination prop in ImageDuplicateChecker component.
- Formatted modified go files using gofmt.
- Ran prettier over the UI codebase to resolve the formatting check CI failure.
This change unifies the duplicate detection logic by leveraging the shared phash utility. It also enhances the UI with:
- Pagination for large result sets.
- Sorting duplicate groups by total file size.
- A more detailed table view with image thumbnails, paths, and dimensions.
- Consistency with the existing Scene Duplicate Checker tool.
This change introduces a new tool to identify duplicate images based on their perceptual hash (phash). It includes:
- Backend implementation for phash distance comparison and grouping.
- GraphQL schema updates and API resolvers.
- Frontend UI for the Image Duplicate Checker tool.
- Unit tests for the image search and duplicate detection logic.
* Remove month/year only formats from ParseDateStringAsTime
* Add precision field to Date and handle parsing year/month-only dates
* Add date precision columns for date columns
* Adjust UI to account for fuzzy dates
* Change queryStruct to use tx.Get instead of queryFunc
Using queryFunc meant that the performance logging was inaccurate due to the query actually being executed during the call to Scan.
* Only add join args if join was added
* Omit joins that are only used for sorting when skipping sorting
Should provide some marginal improvement on systems with a lot of items.
* Make all calls to the database pass context.
This means that long queries can be cancelled by navigating to another page. Previously the query would continue to run, impacting on future queries.
* override "name" sort with COALESCE
* tag sort_name frontend
adds `data-sort-name` attribute to tag links prioritizes sort_name value but will default to tag name if not present in the same way that COALESCE will prioritize the same values in the same way
* add sort_name filter, update locale per request
* Include sort name in anonymiser
* Add import/export support
---------
Co-authored-by: WithoutPants <53250216+WithoutPants@users.noreply.github.com>
* Move filter criterion handlers into separate file
* Add related filters for image filter
* Add related filters for scene filter
* Add related filters to gallery filter
* Add related filters to movie filter
* Add related filters to performer filter
* Add related filters to studio filter
* Add related filters to tag filter
* Add scene filter to scene marker filter
* Add Details, Code, and Photographer to Images
* Add date and details to image card
---------
Co-authored-by: WithoutPants <53250216+WithoutPants@users.noreply.github.com>
* Add mockery config file
* Move basic file/folder structs to models
* Fix hack due to import loop
* Move file interfaces to models
* Move folder interfaces to models
* Move scene interfaces to models
* Move scene marker interfaces to models
* Move image interfaces to models
* Move gallery interfaces to models
* Move gallery chapter interfaces to models
* Move studio interfaces to models
* Move movie interfaces to models
* Move performer interfaces to models
* Move tag interfaces to models
* Move autotag interfaces to models
* Regenerate mocks
* Remove ID from PerformerPartial
* Separate studio model from sqlite model
* Separate movie model from sqlite model
* Separate tag model from sqlite model
* Separate saved filter model from sqlite model
* Separate scene marker model from sqlite model
* Separate gallery chapter model from sqlite model
* Move ErrNoRows checks into sqlite, improve empty result error messages
* Move SQLiteDate and SQLiteTimestamp to sqlite
* Use changesetTranslator everywhere, refactor for consistency
* Make PerformerStore.DestroyImage private
* Fix rating on movie create
* Fix joined hierarchical filtering
* Fix scene performer tag filter
* Generalise performer tag handler
* Add unit tests
* Add equals handling
* Make performer tags equals/not equals unsupported
* Make tags not equals unsupported
* Make not equals unsupported for performers criterion
* Support equals/not equals for studio criterion
* Fix marker scene tags equals filter
* Fix scene performer tag filter
* Make equals/not equals unsupported for hierarchical criterion
* Use existing studio handler in movie
* Hide unsupported tag modifier options
* Use existing performer tags logic where possible
* Restore old parent/child filter logic
* Disable sub-tags in equals modifier for tags criterion
* Support excludes field
* Refactor studio filter
* Refactor tags filter
* Support excludes in tags
---------
Co-authored-by: Kermie <kermie@isinthe.house>
* initial commit of sort performer by o-count
* work on o_counter filter
* filter working
* sorting, filtering using combined scene+image count
* linting
* fix performer list view
---------
Co-authored-by: jpnsfw <none@none.com>
* graphql: support date and timestamp filter types
* sql: add support for date & timestamp criterions
* ui: add support for date and timestamp criterions
* scenes: add support for filtering by date, created at and updated at
* image: support filtering by created at and updated at
* gallery: support filtering by date, created at and updated at
* movie: support filtering by date, created at and updated at
* studio: support filtering by date, created at and updated at
* tag: support filtering by date, created at and updated at
* performer: support filtering by bitrh & death date and created & updated at
* marker: support filtering by created & updated at and scene date, created & updated at