This fixes an issue where Chrome would become unresponsive and prompt the user to kill the page when a large number of duplicates (e.g. 30,000+ groups) were found.
1. Changed the fetchPolicy on FindDuplicateImages to 'no-cache'. Loading 30k+ complex objects into the Apollo normalized cache blocked the main thread for an extended period. Bypassing the cache for this massive one-off query resolves the blocking.
2. Optimized the sorting algorithm in both Image and Scene duplicate checkers. Previously, the group size was recalculated by iterating over all nested files inside the sort's comparison function, resulting in millions of unnecessary iterations (O(N log N) with a heavy inner loop). Now, group sizes are precalculated into a map (O(N)) before sorting.
This fixes a severe performance bottleneck where the image duplicate checker would hang indefinitely or crash the server when finding many duplicates.
Previously, the GraphQL query requested the full 'ImageData' fragment for every duplicate found, forcing the backend to resolve and serialize all related entities (galleries, studios, tags, performers) for thousands of images at once.
By switching to the 'SlimImageData' fragment (mirroring how the Scene duplicate checker operates), the payload size and resolution time are drastically reduced, allowing the tool to scale correctly.
This update resolves major performance regressions when processing large libraries:
1. Optimized FindMany in both Image and Scene stores to use map-based ID lookups. Previously, this function used slices.Index in a loop, resulting in O(N^2) complexity. On a library with 300k items, this was causing the server to hang indefinitely.
2. Refined the exact image duplicate SQL query to match the scene checker's level of optimization. It now joins the files table and orders results by total duplicate file size, ensuring that the most impactful duplicates are shown first.
3. Removed the temporary LIMIT 1000 from the image duplicate query now that the algorithmic bottlenecks have been resolved.
This update provides additional performance improvements specifically targeted at large image libraries (e.g. 300k+ images):
1. Optimized the exact match SQL query for images:
- Added filtering for zero/empty fingerprints to avoid massive false-positive groups.
- Added a LIMIT of 1000 duplicate groups to prevent excessive memory consumption and serialization overhead.
- Simplified the join structure to ensure better use of the database index.
2. Parallelized the Go comparison loop in pkg/utils/phash.go:
- Utilizes all available CPU cores to perform Hamming distance calculations.
- Uses a lock-free design to minimize synchronization overhead.
- This makes non-zero distance searches significantly faster on multi-core systems.
This update provides significant performance improvements for both image and scene duplicate searching:
1. Optimized the core Hamming distance algorithm in pkg/utils/phash.go:
- Uses native CPU popcount instructions (math/bits) for bit counting.
- Pre-calculates hash values to eliminate object allocations in the hot loop.
- Halves the number of comparisons by leveraging the symmetry of the Hamming distance.
- The loop is now several orders of magnitude faster and allocation-free.
2. Solved the N+1 database query bottleneck:
- Replaced individual database lookups for each duplicate group with a single batched query for all duplicate IDs.
- This optimization was applied to both Image and Scene repositories.
3. Simplified the SQL fast path for exact image matches to remove redundant table joins.
This change adds a specialized SQL query to find exact image duplicate matches (distance 0) directly in the database.
Previously, the image duplicate checker always used an O(N^2) Go-based comparison loop, which caused indefinite loading and timeouts on libraries with a large number of images. The new SQL fast path reduces the time to find exact duplicates from minutes/hours to milliseconds.
This fixes a bug where identical image duplicates were not being detected.
The implementation was incorrectly scanning the phash BLOB into a string and then attempting to parse it as a hex string. Since phashes are stored as 64-bit integers, they were being converted to decimal strings. For phashes with the MSB set (negative when treated as int64), the resulting decimal string started with a '-', which caused the hex parser to fail and skip the image entirely.
Additionally, even for non-negative phashes, parsing a decimal string as hex yielded incorrect hash values.
Scanning directly into the utils.Phash struct (which uses int64) matches how Scene phashes are handled and ensures the hash values are correct.
- Wrap FindDuplicateImages query in r.withReadTxn() to ensure a database transaction in context.
- Use queryFunc instead of queryStruct for fetching multiple hashes, preventing runtime errors.
- Fix N+1 query issue in duplicate grouping by using qb.FindMany() instead of qb.Find() for each duplicate image.
- Revert searchColumns array to exclude "images.details" which was from another PR and remove related failing test.
- Removed unused `strconv` import from `pkg/sqlite/image.go`.
- Added missing `github.com/stashapp/stash/pkg/utils` import to resolve the undefined `utils` reference.
- Fixed pagination prop in ImageDuplicateChecker component.
- Formatted modified go files using gofmt.
- Ran prettier over the UI codebase to resolve the formatting check CI failure.
This adds checkboxes to select duplicate images and integrates the existing EditImagesDialog and DeleteImagesDialog, allowing users to resolve duplicates directly from the tool.
This change unifies the duplicate detection logic by leveraging the shared phash utility. It also enhances the UI with:
- Pagination for large result sets.
- Sorting duplicate groups by total file size.
- A more detailed table view with image thumbnails, paths, and dimensions.
- Consistency with the existing Scene Duplicate Checker tool.
This change introduces a new tool to identify duplicate images based on their perceptual hash (phash). It includes:
- Backend implementation for phash distance comparison and grouping.
- GraphQL schema updates and API resolvers.
- Frontend UI for the Image Duplicate Checker tool.
- Unit tests for the image search and duplicate detection logic.
* fix: support string-based fingerprints in hashes filter
* Fix tests and add phash test
File fingerprints weren't using correct types. Filter test wasn't using correct types. Add phash to general files.
---------
Co-authored-by: hyper440 <hyper440@users.noreply.github.com>
Co-authored-by: WithoutPants <53250216+WithoutPants@users.noreply.github.com>
* Show scene resolution and duration in tagger
A scene's duration and resolution is often useful to ensure you have
found the right scene. This PR adds the same resolution/duration
overlay from the grid view to the tagger view.
* Show the stash box for each stash ID in the scene merge dialog
Currently, this dialog only shows the ID but not the stash box it
corresponds to. This is not very useful because the ID does not mean
anything to a user.
This renders the ID as "Stashdb | 1234...", mimicing the StashIDPill.
* Use StashIDPill instead
Currently, this dialog just shows a text "Stash-Box Source".
This change instead re-uses the StashIDPill, with the main advantage
that you can immediately tell which stash box is being used.
* Add useDebouncedState hook
* Add basename to folder sort whitelist
* Add parent_folder criterion to gallery
* Add selection on enter if single result
* Fix stale thumbnails after file content changes
When a file's content changed (e.g. after renaming files in a gallery),
the scan handler updated fingerprints but did not bump the entity's
updated_at timestamp. Since thumbnail URLs use updated_at as a cache
buster and are served with immutable/1-year cache headers, browsers
would indefinitely serve the old cached thumbnail.
Update image, scene, and gallery scan handlers to call UpdatePartial
(which sets updated_at to now) whenever file content changes, not only
when a new file association is created.
Add missing .input-group-append .btn border-radius rule to zero out
the left-side radius, matching the existing .input-group-prepend rule.
Fixes#6518
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* Skip scanning zip contents when fingerprint is unchanged
When a zip-based gallery's modification time changes but its content
hash (oshash/md5) remains the same, skip walking and rescanning every
file inside the zip. This avoids expensive per-file fingerprint
recalculation when zip metadata changes without actual content changes.
Closes#6512
* Log a debug message when skipping a zip scan due to unchanged
fingerprint
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: WithoutPants <53250216+WithoutPants@users.noreply.github.com>
* Fix edit modal not opening inside gallery view
The modal element was only rendered in the sidebar layout branch, but
gallery images use the non-sidebar path which returned content without
the modal. Also stabilize onEdit/onDelete with useCallback and add
missing dependency array to the Mousetrap useEffect.
Closes#6624
* Render modal once above sidebar conditional
Move {modal} above the withSidebar ternary so it is rendered
exactly once, avoiding the duplication that caused the original bug.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Replace panic with warning if creating a folder hierarchy where parent is equal to current
* Clean stash paths so that comparison works correctly when creating folder hierarchies
* Add basename field to folder
* Add parent_folders field to folder
* Add basename column to folder table
* Add basename filter field
* Create missing folder hierarchies during migration
* Treat files/folders in zips where path can't be made relative as not found
Addresses an issue during clean where corrupt folder entries in zip files could not be removed due to an error during the call to Rel.
* Add root paths parameter to GetOrCreateFolderHierarchy
Ensures that folders are only created up to the root library paths.
* Create full folder hierarchy when scanning a new folder
During a recursive scan, folders should be created as they are encountered (folders are handled in a single thread). This change applies only during a selective scan. Creates up to the root library folder.
* Create folder hierarchy on new file scan
This should only apply when scanning a specific file, as parent folders should be been created during a recursive scan.
* Fix existing folders with missing parents during scan
* Use effective filter for keybinds/view random
* Refactor ImageList to use sidebar
* Add performer age filter to gallery sidebar
* Port metadata info changes
* Fix incorrect patch component parameter
* Update plugin doc and types
* Show unsupported filter criteria in filter tags
Shows a warning coloured filter tag, with warning icon and text "<type> (unsupported) ...". Cannot be edited, can only be removed. Won't be saved to saved filters.
* Generalise filtered recommendation rows. Include warning popover for unsupported criteria