This update provides significant performance improvements for both image and scene duplicate searching:
1. Optimized the core Hamming distance algorithm in pkg/utils/phash.go:
- Uses native CPU popcount instructions (math/bits) for bit counting.
- Pre-calculates hash values to eliminate object allocations in the hot loop.
- Halves the number of comparisons by leveraging the symmetry of the Hamming distance.
- The loop is now several orders of magnitude faster and allocation-free.
2. Solved the N+1 database query bottleneck:
- Replaced individual database lookups for each duplicate group with a single batched query for all duplicate IDs.
- This optimization was applied to both Image and Scene repositories.
3. Simplified the SQL fast path for exact image matches to remove redundant table joins.