Finding images by the qualities or similarities to existing images. A.k.a. “reverse” image search, CBIR (content-based image retrieval), similarity search, fuzzy de-duplication.
I can see there are many interesting theoreticl and technical questions but I don’t have time to explore them; I just need to find images sometimes.
In my image library
visual-layer/fastdup is a tool for gaining insights from a large image collection. It can find anomalies, duplicate and near duplicate images, clusters of similarity, learn the normal behavior and temporal interactions between images. It can be used for smart subsampling of a higher quality dataset, outlier removal, novelty detection of new information to be sent for tagging. FastDup scales to millions of images running on CPU only.
Seems to use modern NN methods, and be targeted at image dataset curation for training tasks.
If you are happy to use it from a python CLI, this looks like the most natural tool for many use cases, including many of mine.
Position piece: Large Image Datasets Today Are a Mess.
Geeqie is a free open software image viewer and organiser program for Linux, FreeBSD and other Unix-like operating systems
They are fairly low-key on the image searching but it would be worth checking them out.
dupeGuru (macOS, windows, linux):
dupeGuru is a cross-platform (Linux, OS X, Windows) GUI tool to find duplicate files in a system.… It can scan either filenames or contents. The filename scan features a fuzzy matching algorithm that can find duplicate filenames even when they are not exactly the same.
dupeGuru is efficient. Find your duplicate files in minutes, thanks to its quick fuzzy matching algorithm. dupeGuru not only finds filenames that are the same, but it also finds similar filenames.
dupeGuru is good with music. It has a special Music mode that can scan tags and shows music-specific information in the duplicate results window.
dupeGuru is good with pictures. It has a special Picture mode that can scan pictures fuzzily, allowing you to find pictures that are similar, but not exactly the same.
That last point appears to mean that it will search by blurred versions of pictures, which is elegant but not probably sufficient to all needs.
This is a Solr plugin for the LIRE content based image retrieval library, so basically it’s for indexing images and then finding similar (looking) ones. The original library can be found at Github.
NB the LIRE project is officially dead and liresolr is unofficially dead.
might still be maintained? looks like there was a release in 2015.
On the internet
Aric Toler’s Terrifyingly comprehensive guide to reverse image search has a forensic bent.