Commit graph

1232 commits

Author SHA1 Message Date
notsafeforgit
f2f299307a perf: eliminate O(N^2) bottlenecks in image and scene duplicate detection
This update resolves major performance regressions when processing large libraries:

1. Optimized FindMany in both Image and Scene stores to use map-based ID lookups. Previously, this function used slices.Index in a loop, resulting in O(N^2) complexity. On a library with 300k items, this was causing the server to hang indefinitely.

2. Refined the exact image duplicate SQL query to match the scene checker's level of optimization. It now joins the files table and orders results by total duplicate file size, ensuring that the most impactful duplicates are shown first.

3. Removed the temporary LIMIT 1000 from the image duplicate query now that the algorithmic bottlenecks have been resolved.
2026-03-20 15:29:04 -07:00
notsafeforgit
6de7195ed6 perf: further optimize image duplicate detection
This update provides additional performance improvements specifically targeted at large image libraries (e.g. 300k+ images):

1. Optimized the exact match SQL query for images:
   - Added filtering for zero/empty fingerprints to avoid massive false-positive groups.
   - Added a LIMIT of 1000 duplicate groups to prevent excessive memory consumption and serialization overhead.
   - Simplified the join structure to ensure better use of the database index.

2. Parallelized the Go comparison loop in pkg/utils/phash.go:
   - Utilizes all available CPU cores to perform Hamming distance calculations.
   - Uses a lock-free design to minimize synchronization overhead.
   - This makes non-zero distance searches significantly faster on multi-core systems.
2026-03-20 15:26:18 -07:00
notsafeforgit
b3d002ccf9 fix: remove unused goimagehash import in phash utility 2026-03-20 15:22:19 -07:00
notsafeforgit
acc5438af2 perf: massive optimization for image and scene duplicate detection
This update provides significant performance improvements for both image and scene duplicate searching:

1. Optimized the core Hamming distance algorithm in pkg/utils/phash.go:
   - Uses native CPU popcount instructions (math/bits) for bit counting.
   - Pre-calculates hash values to eliminate object allocations in the hot loop.
   - Halves the number of comparisons by leveraging the symmetry of the Hamming distance.
   - The loop is now several orders of magnitude faster and allocation-free.

2. Solved the N+1 database query bottleneck:
   - Replaced individual database lookups for each duplicate group with a single batched query for all duplicate IDs.
   - This optimization was applied to both Image and Scene repositories.

3. Simplified the SQL fast path for exact image matches to remove redundant table joins.
2026-03-20 15:06:05 -07:00
notsafeforgit
3924acdb0f perf(sqlite): implement SQL-based fast path for exact image duplicate detection
This change adds a specialized SQL query to find exact image duplicate matches (distance 0) directly in the database.

Previously, the image duplicate checker always used an O(N^2) Go-based comparison loop, which caused indefinite loading and timeouts on libraries with a large number of images. The new SQL fast path reduces the time to find exact duplicates from minutes/hours to milliseconds.
2026-03-20 14:35:19 -07:00
notsafeforgit
fa5725e709 fix(sqlite): fix image duplicate detection by scanning phash as integer
This fixes a bug where identical image duplicates were not being detected.

The implementation was incorrectly scanning the phash BLOB into a string and then attempting to parse it as a hex string. Since phashes are stored as 64-bit integers, they were being converted to decimal strings. For phashes with the MSB set (negative when treated as int64), the resulting decimal string started with a '-', which caused the hex parser to fail and skip the image entirely.

Additionally, even for non-negative phashes, parsing a decimal string as hex yielded incorrect hash values.

Scanning directly into the utils.Phash struct (which uses int64) matches how Scene phashes are handled and ensures the hash values are correct.
2026-03-20 04:53:39 -07:00
notsafeforgit
27ab865d70 fix: update image duplicate checker UI and API handling
- Fixes 400 error in ImageDuplicateChecker

- Updates UI and frontend types

- Fixes tools casing
2026-03-16 04:15:28 -07:00
notsafeforgit
9295655d19 fix: resolve image duplicate finder issues
- Wrap FindDuplicateImages query in r.withReadTxn() to ensure a database transaction in context.
- Use queryFunc instead of queryStruct for fetching multiple hashes, preventing runtime errors.
- Fix N+1 query issue in duplicate grouping by using qb.FindMany() instead of qb.Find() for each duplicate image.
- Revert searchColumns array to exclude "images.details" which was from another PR and remove related failing test.
2026-03-13 18:39:34 -07:00
notsafeforgit
8358c794ba fix: resolve unused import and undefined reference in sqlite image repository
- Removed unused `strconv` import from `pkg/sqlite/image.go`.
- Added missing `github.com/stashapp/stash/pkg/utils` import to resolve the undefined `utils` reference.
- Fixed pagination prop in ImageDuplicateChecker component.
- Formatted modified go files using gofmt.
- Ran prettier over the UI codebase to resolve the formatting check CI failure.
2026-03-13 18:11:05 -07:00
notsafeforgit
af75c8c1b4 feat: improve Image Duplicate Checker implementation
This change unifies the duplicate detection logic by leveraging the shared phash utility. It also enhances the UI with:
- Pagination for large result sets.
- Sorting duplicate groups by total file size.
- A more detailed table view with image thumbnails, paths, and dimensions.
- Consistency with the existing Scene Duplicate Checker tool.
2026-03-13 15:35:29 -07:00
notsafeforgit
3b1fccb010 feat: Implement Image Duplicate Checker
This change introduces a new tool to identify duplicate images based on their perceptual hash (phash). It includes:
- Backend implementation for phash distance comparison and grouping.
- GraphQL schema updates and API resolvers.
- Frontend UI for the Image Duplicate Checker tool.
- Unit tests for the image search and duplicate detection logic.
2026-03-13 15:23:02 -07:00
hyper440
300e7edb75
fix: support string-based fingerprints in hashes filter (#6654)
* fix: support string-based fingerprints in hashes filter
* Fix tests and add phash test

File fingerprints weren't using correct types. Filter test wasn't using correct types. Add phash to general files.
---------

Co-authored-by: hyper440 <hyper440@users.noreply.github.com>
Co-authored-by: WithoutPants <53250216+WithoutPants@users.noreply.github.com>
2026-03-10 15:07:46 +11:00
WithoutPants
717f968a2c
Add folder criteria to scenes, images and galleries and sidebars (#6636)
* Add useDebouncedState hook
* Add basename to folder sort whitelist
* Add parent_folder criterion to gallery
* Add selection on enter if single result
2026-03-05 08:02:13 +11:00
WithoutPants
69e781b0ee
Use ffmpeg as a general fallback when generating phash (#6641) 2026-03-05 07:58:51 +11:00
Matt Stone
cd0980201c
feat: Add .stashignore support for gitignore-style scan exclusions (#6485)
Co-authored-by: WithoutPants <53250216+WithoutPants@users.noreply.github.com>
2026-03-04 08:17:14 +11:00
puc9
1457ad590d
Add Selective generate (#6621) 2026-03-03 09:11:28 +11:00
puc9
99a0d01371
Fix new panic in IsFsPathCaseSensitive: Use filepath operations to check for file system case sensitivity (#6635)
* Use filepath operations to check for file system case sensitivity
2026-03-03 08:11:55 +11:00
Abdu Dihan
52bd9392fb
Fix stale browser-cached thumbnails after file content changes during scan. (#6622)
* Fix stale thumbnails after file content changes

When a file's content changed (e.g. after renaming files in a gallery),
the scan handler updated fingerprints but did not bump the entity's
updated_at timestamp. Since thumbnail URLs use updated_at as a cache
buster and are served with immutable/1-year cache headers, browsers
would indefinitely serve the old cached thumbnail.

Update image, scene, and gallery scan handlers to call UpdatePartial
(which sets updated_at to now) whenever file content changes, not only
when a new file association is created.
2026-03-02 15:53:02 +11:00
dev-null-life
784795660b
Skip scanning zip contents when fingerprint is unchanged (#6633)
* Skip scanning zip contents when fingerprint is unchanged

When a zip-based gallery's modification time changes but its content
hash (oshash/md5) remains the same, skip walking and rescanning every
file inside the zip. This avoids expensive per-file fingerprint
recalculation when zip metadata changes without actual content changes.

Closes #6512

* Log a debug message when skipping a zip scan due to unchanged
  fingerprint

---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: WithoutPants <53250216+WithoutPants@users.noreply.github.com>
2026-03-02 15:47:23 +11:00
WithoutPants
09e2b2bd4e Wrap CleanCaptions with database. Refactor AssociateCaptions. 2026-03-02 15:45:37 +11:00
WithoutPants
681ccbf380
Fix caption handling during scan and check before correcting path (#6634)
* Handle case where folder entry exists for corrected path in correctSubFolderHierarchy
* Log scan start
* Handle caption files during scan
2026-03-02 14:44:20 +11:00
Gykes
c874bd560e
Fix: Custom Field Filtering (#6614)
* add tests
* Refactor queryBuilder: split args into per-clause fields
2026-02-28 11:05:13 +11:00
WithoutPants
c7e1c3da69
Fix panic when library path has trailing path separator (#6619)
* Replace panic with warning if creating a folder hierarchy where parent is equal to current
* Clean stash paths so that comparison works correctly when creating folder hierarchies
2026-02-28 10:51:02 +11:00
Gykes
3b8f6bd94c
update logs and fix UNIQUE constraint failure (#6617) 2026-02-28 09:11:13 +11:00
WithoutPants
d8448ba37e
Add basename and parent_folders fields to Folder graphql interface (#6494)
* Add basename field to folder
* Add parent_folders field to folder
* Add basename column to folder table
* Add basename filter field
* Create missing folder hierarchies during migration
* Treat files/folders in zips where path can't be made relative as not found

Addresses an issue during clean where corrupt folder entries in zip files could not be removed due to an error during the call to Rel.
2026-02-27 10:58:11 +11:00
WithoutPants
e52ac14d56
Fix missing folder corruption during scanning (#6608)
* Add root paths parameter to GetOrCreateFolderHierarchy

Ensures that folders are only created up to the root library paths.

* Create full folder hierarchy when scanning a new folder

During a recursive scan, folders should be created as they are encountered (folders are handled in a single thread). This change applies only during a selective scan. Creates up to the root library folder.

* Create folder hierarchy on new file scan

This should only apply when scanning a specific file, as parent folders should be been created during a recursive scan.

* Fix existing folders with missing parents during scan
2026-02-27 07:42:53 +11:00
Gykes
b77abd64e2
FR: Add Missing is-missing Filter Options Across all Object Types (#6565)
Co-authored-by: WithoutPants <53250216+WithoutPants@users.noreply.github.com>
2026-02-26 16:36:54 +11:00
WithoutPants
5734ee43ff
Add sidebar to scene markers list (#6603)
* Add tag markers filter
* Add marker count and markers filter to performer filter
* Add sidebar to marker list
2026-02-26 07:54:40 +11:00
Gykes
0103fe4751
FR: Tags Tagger (#6559)
* Refactor Tagger components
* condense localization
* add alias and description to model and schema
2026-02-25 11:39:14 +11:00
WithoutPants
86abe7b24c
Backend support for image custom fields (#6598)
* Initialise maps in bulk get custom fields to fix graphql validation error
2026-02-24 07:41:40 +11:00
1509x
9a1b1fb718
[Feature] Reveal file in system file manager from file info panel (#6587)
* Add reveal in file manager button to file info panel

Adds a folder icon button next to the path field in the Scene, Image,
and Gallery file info panels. Clicking it calls a new GraphQL mutation
that opens the file's enclosing directory in the system file manager
(Finder on macOS, Explorer on Windows, xdg-open on Linux).

Also fixes the existing revealInFileManager implementations which were
constructing exec.Command but never calling Run(), making them no-ops:
- darwin: add Run() to open -R
- windows: add Run() and fix flag from \select to /select,<path>
- linux: implement with xdg-open on the parent directory
- desktop.go: use os.Stat instead of FileExists so folders work too

* Disallow reveal operation if request not from loopback
---------
Co-authored-by: 1509x <1509x@users.noreply.github.com>
Co-authored-by: WithoutPants <53250216+WithoutPants@users.noreply.github.com>
2026-02-23 12:51:35 +11:00
WithoutPants
ca5178f05e
Backend support for Group custom fields (#6596) 2026-02-23 11:53:12 +11:00
WithoutPants
47dcdd439c
Backend support for gallery custom fields (#6592) 2026-02-23 07:39:28 +11:00
WithoutPants
076032ff8b
Custom sprite generation (#6588)
* configurable minimum/maximum number of sprites
* configurable sprite size
---------
Co-authored-by: cacheflush <github.stoneware268@passmail.com>
2026-02-20 15:09:59 +11:00
WithoutPants
843806247d
Add group scene count filter (#6593) 2026-02-20 09:14:25 +11:00
Gykes
3dc86239d2
Feature Request: Add organized flag to studios (#6303) 2026-02-19 09:05:17 +11:00
WithoutPants
b653e91fae
Fix panic in IsFsPathCaseSensitive (#6589)
* Add crashing unit test
* Fix IsFsPathCaseSensitive to use runes
2026-02-19 08:09:06 +11:00
WithoutPants
e289199911
Scene custom field backend support (#6584)
* Add custom fields to scenes
* Generalise set custom fields tests to other object types
2026-02-18 16:50:32 +11:00
Gykes
adaadee368
FR: Change Career Length to Career Start and Career End (#6449)
Co-authored-by: WithoutPants <53250216+WithoutPants@users.noreply.github.com>
2026-02-17 13:44:03 +11:00
WithoutPants
bede849fa6
Add sidebar to group list (#6573)
* Add group filter criteria to tag and studio
* Add sidebar to groups list
* Refactor ListOperations to accept buttons
* Move create new button back to navbar

Having the create new button with a plus icon conflicted with the add sub-group button in the sub-groups view.

* Simplify group-sub-groups view
2026-02-16 17:28:41 +11:00
Gykes
d1479ca4e5
Feature: Scene Duplicate Filter (#6344) 2026-02-11 11:52:44 +11:00
Gykes
7aa7276fa3
Bugfix: AVIF Image PHash Support (#6556)
* AVIF phash support
* add avif check for zips
2026-02-11 11:38:57 +11:00
WithoutPants
5628fbc5d3
Merge tag values dialog (#6552)
* Change tag merge to accept values.

MergeHierarchy is removed as it is no longer needed

* Add tag merge value dialog to choose values when merging
2026-02-11 11:27:57 +11:00
InfiniteStash
5cf41c8c8e
Remove unused stash-box fingerprint queries (#6561)
* Remove unused stash-box fingerprint query
* Remove findSceneByFingerprint
2026-02-11 11:26:05 +11:00
WithoutPants
d64b3b711c
Revamp studio list with sidebar (#6549)
* Add studios_filter to TagFilterType
* Convert studio list to use sidebar
2026-02-06 12:37:38 +11:00
WithoutPants
b278525647
Tag custom fields support for backend (#6546)
* Fix custom field import/export for studio
* Update studio unit tests
* Add tag create and update unit tests
* Add custom fields to tag filter graphql
* Add unit tests for tag filtering
* Add filter unit tests for studio
2026-02-06 12:35:05 +11:00
CJ
f629191b28
Future support for filtering tags list by current filter on Performers page (#6091) 2026-02-05 13:35:58 +11:00
WithoutPants
9eda7c2f60
Studio custom fields backend support (#6156) 2026-02-05 09:01:29 +11:00
WithoutPants
88eb46380c
Refactor scraper package (#6495)
* Remove reflection from mapped value processing
* AI generated unit tests
* Move mappedConfig to separate file
* Rename group to configScraper
* Separate mapped post-processing code into separate file
* Update test after group rename
* Check map entry when returning scraper
* Refactor config into definition
* Support single string for string slice translation
* Rename config.go to definition.go
* Rename configScraper to definedScraper
* Rename config_scraper.go to defined_scraper.go
2026-02-04 11:07:51 +11:00
Hans Evers
ed0fb53ae0
feat: auto-remove duplicate aliases (#6514) 2026-02-04 10:37:15 +11:00