Image search

August 2, 2021 — January 5, 2023

computers are awful together
faster pussycat
information provenance
making things
photon choreography
search
standards

Finding images by the qualities or similarities to existing images. A.k.a. “reverse” image search, CBIR (content-based image retrieval), similarity search, fuzzy de-duplication.

I can see there are many interesting theoreticl and technical questions but I don’t have time to explore them; I just need to find images sometimes.

Figure 1

1 In my image library

1.1 fastdup

visual-layer/fastdup is a tool for gaining insights from a large image collection. It can find anomalies, duplicate and near duplicate images, clusters of similarity, learn the normal behavior and temporal interactions between images. It can be used for smart subsampling of a higher quality dataset, outlier removal, novelty detection of new information to be sent for tagging. FastDup scales to millions of images running on CPU only.

Seems to use modern NN methods, and be targeted at image dataset curation for training tasks.

If you are happy to use it from a python CLI, this looks like the most natural tool for many use cases, including many of mine.

Position piece: Large Image Datasets Today Are a Mess.

1.2 Geeqie

Geeqie is a free open software image viewer and organiser program for Linux, FreeBSD and other Unix-like operating systems

They are fairly low-key on the image searching but it would be worth checking them out.

1.3 dupeguru

dupeGuru (macOS, windows, linux):

dupeGuru is a cross-platform (Linux, OS X, Windows) GUI tool to find duplicate files in a system.… It can scan either filenames or contents. The filename scan features a fuzzy matching algorithm that can find duplicate filenames even when they are not exactly the same.

dupeGuru is efficient. Find your duplicate files in minutes, thanks to its quick fuzzy matching algorithm. dupeGuru not only finds filenames that are the same, but it also finds similar filenames.

dupeGuru is good with music. It has a special Music mode that can scan tags and shows music-specific information in the duplicate results window.

dupeGuru is good with pictures. It has a special Picture mode that can scan pictures fuzzily, allowing you to find pictures that are similar, but not exactly the same.

That last point appears to mean that it will search by blurred versions of pictures, which is elegant but not probably sufficient to all needs.

1.4 digikam

KDE photo organiser Digikam has fuzzy image search.

1.5 liresolr

dermotte/liresolr: Putting LIRE into Solr

This is a Solr plugin for the LIRE content based image retrieval library, so basically it’s for indexing images and then finding similar (looking) ones. The original library can be found at Github.

NB the LIRE project is officially dead and liresolr is unofficially dead.

1.6 Visipics

might still be maintained? looks like there was a release in 2015.

VisiPics

Figure 2

1.7 Spammy looking ones

2 On the internet

Aric Toler’s Terrifyingly comprehensive guide to reverse image search has a forensic bent.

3 References

Bingham, and Mannila. 2001. Random Projection in Dimensionality Reduction: Applications to Image and Text Data.” In Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD ’01.
Gordo, Almazan, Revaud, et al. 2016. End-to-End Learning of Deep Visual Representations for Image Retrieval.” arXiv:1610.07940 [Cs].
Lai, Pan, Liu, et al. 2015. Simultaneous Feature Learning and Hash Coding with Deep Neural Networks.” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
Lin, Yang, Hsiao, et al. 2015. Deep Learning of Binary Hash Codes for Fast Image Retrieval.” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops.
Nagathan, Mungara, and Manimozhi. 2014. Content-Based Image Retrieval System Using Feed-Forward Backpropagation Neural Network.” International Journal of Computer Science and Network Security (IJCSNS).
Simoncelli, and Olshausen. 2001. Natural Image Statistics and Neural Representation.” Annual Review of Neuroscience.
Xia, Pan, Lai, et al. 2014. “Supervised Hashing for Image Retrieval via Image Representation Learning.” In AAAI.
Zhang, Lin, Zhang, et al. 2015. “Bit-Scalable Deep Hashing with Regularized Similarity Learning for Image Retrieval and Person Re-Identification.” IEEE Transactions on Image Processing.