This update provides significant performance improvements for both image and scene duplicate searching:
1. Optimized the core Hamming distance algorithm in pkg/utils/phash.go:
- Uses native CPU popcount instructions (math/bits) for bit counting.
- Pre-calculates hash values to eliminate object allocations in the hot loop.
- Halves the number of comparisons by leveraging the symmetry of the Hamming distance.
- The loop is now several orders of magnitude faster and allocation-free.
2. Solved the N+1 database query bottleneck:
- Replaced individual database lookups for each duplicate group with a single batched query for all duplicate IDs.
- This optimization was applied to both Image and Scene repositories.
3. Simplified the SQL fast path for exact image matches to remove redundant table joins.
This change adds a specialized SQL query to find exact image duplicate matches (distance 0) directly in the database.
Previously, the image duplicate checker always used an O(N^2) Go-based comparison loop, which caused indefinite loading and timeouts on libraries with a large number of images. The new SQL fast path reduces the time to find exact duplicates from minutes/hours to milliseconds.
This fixes a bug where identical image duplicates were not being detected.
The implementation was incorrectly scanning the phash BLOB into a string and then attempting to parse it as a hex string. Since phashes are stored as 64-bit integers, they were being converted to decimal strings. For phashes with the MSB set (negative when treated as int64), the resulting decimal string started with a '-', which caused the hex parser to fail and skip the image entirely.
Additionally, even for non-negative phashes, parsing a decimal string as hex yielded incorrect hash values.
Scanning directly into the utils.Phash struct (which uses int64) matches how Scene phashes are handled and ensures the hash values are correct.
- Wrap FindDuplicateImages query in r.withReadTxn() to ensure a database transaction in context.
- Use queryFunc instead of queryStruct for fetching multiple hashes, preventing runtime errors.
- Fix N+1 query issue in duplicate grouping by using qb.FindMany() instead of qb.Find() for each duplicate image.
- Revert searchColumns array to exclude "images.details" which was from another PR and remove related failing test.
- Removed unused `strconv` import from `pkg/sqlite/image.go`.
- Added missing `github.com/stashapp/stash/pkg/utils` import to resolve the undefined `utils` reference.
- Fixed pagination prop in ImageDuplicateChecker component.
- Formatted modified go files using gofmt.
- Ran prettier over the UI codebase to resolve the formatting check CI failure.
This adds checkboxes to select duplicate images and integrates the existing EditImagesDialog and DeleteImagesDialog, allowing users to resolve duplicates directly from the tool.
This change unifies the duplicate detection logic by leveraging the shared phash utility. It also enhances the UI with:
- Pagination for large result sets.
- Sorting duplicate groups by total file size.
- A more detailed table view with image thumbnails, paths, and dimensions.
- Consistency with the existing Scene Duplicate Checker tool.
This change introduces a new tool to identify duplicate images based on their perceptual hash (phash). It includes:
- Backend implementation for phash distance comparison and grouping.
- GraphQL schema updates and API resolvers.
- Frontend UI for the Image Duplicate Checker tool.
- Unit tests for the image search and duplicate detection logic.
* Translated using Weblate (French)
Currently translated at 100.0% (1341 of 1341 strings)
Translation: stash/stash
Translate-URL: https://translate.codeberg.org/projects/stash/stash/fr/
* Translated using Weblate (Turkish)
Currently translated at 75.3% (1010 of 1341 strings)
Translation: stash/stash
Translate-URL: https://translate.codeberg.org/projects/stash/stash/tr/
* Translated using Weblate (Ukrainian)
Currently translated at 100.0% (1341 of 1341 strings)
Translation: stash/stash
Translate-URL: https://translate.codeberg.org/projects/stash/stash/uk/
* Translated using Weblate (French)
Currently translated at 99.9% (1345 of 1346 strings)
Translation: stash/stash
Translate-URL: https://translate.codeberg.org/projects/stash/stash/fr/
* Translated using Weblate (Korean)
Currently translated at 100.0% (1346 of 1346 strings)
Translation: stash/stash
Translate-URL: https://translate.codeberg.org/projects/stash/stash/ko/
* Translated using Weblate (French)
Currently translated at 100.0% (1346 of 1346 strings)
Translation: stash/stash
Translate-URL: https://translate.codeberg.org/projects/stash/stash/fr/
* Translated using Weblate (Ukrainian)
Currently translated at 100.0% (1346 of 1346 strings)
Translation: stash/stash
Translate-URL: https://translate.codeberg.org/projects/stash/stash/uk/
* Translated using Weblate (Chinese (Simplified Han script))
Currently translated at 100.0% (1346 of 1346 strings)
Translation: stash/stash
Translate-URL: https://translate.codeberg.org/projects/stash/stash/zh_Hans/
* Translated using Weblate (Portuguese (Brazil))
Currently translated at 67.3% (906 of 1346 strings)
Translation: stash/stash
Translate-URL: https://translate.codeberg.org/projects/stash/stash/pt_BR/
* Update translation files
Updated by "Cleanup translation files" add-on in Weblate.
Translation: stash/stash
Translate-URL: https://translate.codeberg.org/projects/stash/stash/
* Translated using Weblate (French)
Currently translated at 100.0% (1348 of 1348 strings)
Translation: stash/stash
Translate-URL: https://translate.codeberg.org/projects/stash/stash/fr/
* Translated using Weblate (Czech)
Currently translated at 100.0% (1351 of 1351 strings)
Translation: stash/stash
Translate-URL: https://translate.codeberg.org/projects/stash/stash/cs/
* Translated using Weblate (Spanish)
Currently translated at 100.0% (1351 of 1351 strings)
Translation: stash/stash
Translate-URL: https://translate.codeberg.org/projects/stash/stash/es/
* Translated using Weblate (Korean)
Currently translated at 100.0% (1351 of 1351 strings)
Translation: stash/stash
Translate-URL: https://translate.codeberg.org/projects/stash/stash/ko/
* Translated using Weblate (Ukrainian)
Currently translated at 100.0% (1351 of 1351 strings)
Translation: stash/stash
Translate-URL: https://translate.codeberg.org/projects/stash/stash/uk/
* Translated using Weblate (French)
Currently translated at 100.0% (1351 of 1351 strings)
Translation: stash/stash
Translate-URL: https://translate.codeberg.org/projects/stash/stash/fr/
* Translated using Weblate (Spanish)
Currently translated at 100.0% (1351 of 1351 strings)
Translation: stash/stash
Translate-URL: https://translate.codeberg.org/projects/stash/stash/es/
* Translated using Weblate (Arabic)
Currently translated at 56.9% (769 of 1351 strings)
Translation: stash/stash
Translate-URL: https://translate.codeberg.org/projects/stash/stash/ar/
* Translated using Weblate (Polish)
Currently translated at 80.1% (1083 of 1351 strings)
Translation: stash/stash
Translate-URL: https://translate.codeberg.org/projects/stash/stash/pl/
* Translated using Weblate (Ukrainian)
Currently translated at 100.0% (1351 of 1351 strings)
Translation: stash/stash
Translate-URL: https://translate.codeberg.org/projects/stash/stash/uk/
* Translated using Weblate (Arabic)
Currently translated at 58.0% (784 of 1351 strings)
Translation: stash/stash
Translate-URL: https://translate.codeberg.org/projects/stash/stash/ar/
---------
Co-authored-by: doodoo <doodoo@noreply.codeberg.org>
Co-authored-by: slickdaddy <slickdaddy@noreply.codeberg.org>
Co-authored-by: Saenko <saenko@noreply.codeberg.org>
Co-authored-by: lugged9922 <lugged9922@noreply.codeberg.org>
Co-authored-by: wql219 <wql219@noreply.codeberg.org>
Co-authored-by: tiagodamian <tiagodamian@noreply.codeberg.org>
Co-authored-by: Codeberg Translate <translate@codeberg.org>
Co-authored-by: NymeriaCZ <nymeriacz@noreply.codeberg.org>
Co-authored-by: donlothario <donlothario@noreply.codeberg.org>
Co-authored-by: gallegonovato <gallegonovato@noreply.codeberg.org>
Co-authored-by: interj4 <interj4@noreply.codeberg.org>
Co-authored-by: brnd <brnd@noreply.codeberg.org>
This is more consistent with other places that stash IDs are shown,
simplifies the code a bit, and lets you see at a glance which stash
box is being used.
* Add short cuts when only getting zip/folder ids
* Don't show zip folders when viewing scenes and galleries.
Zip folders have no results for scenes and galleries, but will for images.
* Add StudioLogo component
If no studio image is set, shows the studio icon with the studio name.
* Add option to always show studio text
* Implement studio as text option
* Add studio logo to image
* Clarify existing show studio as text option
* Make modal field/value styling consistent
Fixes URL list in studio list styling
* Add stash id pill to studio and tag modals
* Fix create parent check box
* Allow excluding parent studio
Disabled the create checkbox if parent studio is not excluded and does not exist.
* Don't render modal on every studio
* Show dialog when refreshing tags
* Add GUID search for performers in PerformerSelect component
* Refactor and apply to all objects with stash ids
---------
Co-authored-by: KennyG <kennyg@kennyg.com>
Co-authored-by: WithoutPants <53250216+WithoutPants@users.noreply.github.com>