A basic duplicate image detection service using perceptual image hash functions and nearest neighbor search, implemented using faiss, fastapi, and imagehash

Last update: Nov 11, 2022

Overview

Duplicate Image Detection

Getting Started

Install dependencies pip install -r requirements.txt
Run service python main.py

Testing

Test with pytest

How it Works

This system uses a perceptual hashing function, similar to Apple's CSAM Detection. Instead of generating image hashes using NeuralHash, it uses a difference hash (dHash), which is simpler and less computationally intensive as it doesn't require neural networks. Since we don't have the same privacy constraints as Apple, we will be using nearest neighbor searches to identify duplicate images.

Difference Hash

dHash is a perceptual hashing function that produces hash values that are resilient to image scaling, as well as changes in color, brightness, and aspect ratio [1]. There are 4 main steps for creating a difference hash for an image:

Convert to greyscale*
Resize image to (hash_size+1, hash_size)
Calculate horizontal gradient, reducing image size to (hash_size, hash_size)
Assign bits based on horizontal gradient values

*We convert the image to greyscale before resizing for optimal performance

Nearest Neighbors

Image hashes that we want to check for duplicates against will be stored in a binary index for fast and efficient nearest neighbor searches. We will use Hamming distance as a metric to determine the similarity between image hashes, for dHash, distances less than 10 (96.09% similarity) likely indicate similar/duplicate images [1].

References

[1] https://www.hackerfactor.com/blog/?/archives/529-Kind-of-Like-That.html

A basic duplicate image detection service using perceptual image hash functions and nearest neighbor search, implemented using faiss, fastapi, and imagehash

Related tags

Overview

Duplicate Image Detection

Getting Started

Testing

How it Works

Difference Hash

Nearest Neighbors

References

Owner

Matthew Podolak

PyTorch code for ICPR 2020 paper Future Urban Scene Generation Through Vehicle Synthesis

Pytorch code for "Text-Independent Speaker Verification Using 3D Convolutional Neural Networks".

Notes, programming assignments and quizzes from all courses within the Coursera Deep Learning specialization offered by deeplearning.ai

Demonstrates how to divide a DL model into multiple IR model files (division) and introduce a simplest way to implement a custom layer works with OpenVINO IR models.

A fuzzing framework for SMT solvers

Fast, general, and tested differentiable structured prediction in PyTorch

classification task on dataset-CIFAR10,by using Tensorflow/keras

Unsupervised CNN for Single View Depth Estimation: Geometry to the Rescue

[CVPR2021] Domain Consensus Clustering for Universal Domain Adaptation

First-Order Probabilistic Programming Language

Implementation of the method proposed in the paper "Neural Descriptor Fields: SE(3)-Equivariant Object Representations for Manipulation"

Pytorch implementation of NeurIPS 2021 paper: Geometry Processing with Neural Fields.

This repository contains PyTorch models for SpecTr (Spectral Transformer).

ScaleNet: A Shallow Architecture for Scale Estimation

Official code release for "Learned Spatial Representations for Few-shot Talking-Head Synthesis" ICCV 2021

DeepLearning Anomalies Detection with Bluetooth Sensor Data

Deep learning with dynamic computation graphs in TensorFlow

Machine Learning Model deployment for Container (TensorFlow Serving)

SeqAttack: a framework for adversarial attacks on token classification models

[IEEE Transactions on Computational Imaging] Self-Gated Memory Recurrent Network for Efficient Scalable HDR Deghosting