TorchPQ is a python library for Approximate Nearest Neighbor Search (ANNS) and Maximum Inner Product Search (MIPS) on GPU using Product Quantization (PQ) algorithm.

Related tags

Deep LearningTorchPQ
Overview

TorchPQ

TorchPQ is a python library for Approximate Nearest Neighbor Search (ANNS) and Maximum Inner Product Search (MIPS) on GPU using Product Quantization (PQ) algorithm. TorchPQ is implemented mainly with PyTorch, with some extra CUDA kernels to accelerate clustering, indexing and searching.

Install

First install a version of CuPy library that matches your CUDA version

pip install cupy-cuda90
pip install cupy-cuda100
pip install cupy-cuda101
...

Then install TorchPQ

pip install torchpq

for a full list of cupy-cuda versions, please go to Installation Guide

Quick Start

IVFPQ

InVerted File Product Quantization (IVFPQ) is a type of ANN search algorithm that is designed to do fast and efficient vector search in million, or even billion scale vector sets. check the original paper for more details.

Training

from torchpq import IVFPQ

n_data = 1000000 # number of data points
d_vector = 128 # dimentionality / number of features

index = IVFPQ(
  d_vector=d_vector,
  n_subvectors=64,
  n_cq_clusters=1024,
  n_pq_clusters=256,
  blocksize=128,
  distance="euclidean",
)

x = torch.randn(d_vector, n_data, device="cuda:0")
index.train(x)

There are some important parameters that need to be explained:

  • d_vector: dimentionality of input vectors. there are 2 constraints on d_vector: (1) it needs to be divisible by n_subvectors; (2) it needs to be a multiple of 4.*
  • n_subvectors: number of subquantizers, essentially this is the byte size of each quantized vector, 64 byte per vector in the above example.**
  • n_cq_clusters: number of coarse quantizer clusters
  • n_pq_clusters: number of product quantizer clusters, this is assumed to be 256 throughout the entire project, and should NOT be changed.
  • blocksize: initial capacity assigned to each voronoi cell of coarse quantizer. n_cq_clusters * blocksize is the number of vectors that can be stored initially. if any cell has reached its capacity, that cell will be automatically expanded. If you need to add vectors frequently, a larger value for blocksize is recommended.

Remember that the shape of any tensor that contains data points has to be [d_vector, n_data].

* the second constraint could be removed in the future
** actual byte size would be (n_subvectors+9) bytes, 8 bytes for ID and 1 byte for is_empty

Adding new vectors

ids = torch.arange(n_data, device="cuda")
index.add(x, input_ids=ids)

Each ID in ids needs to be a unique int64 (torch.long) value that identifies a vector in x. if input_ids is not provided, it will be set to torch.arange(n_data, device="cuda") + previous_max_id

Removing vectors

index.remove(ids)

index.remove(ids) will virtually remove vectors with specified ids from storage. It ignores ids that doesn't exist.

Topk search

index.n_probe = 32
n_query = 10000
query = torch.randn(d_vector, n_query, device="cuda:0")
topk_values, topk_ids = index.topk(query, k=100)
  • when distance="inner", topk_values are inner product of queries and topk closest data points.
  • when distance="euclidean", topk_values are negative squared L2 distance between queries and topk closest data points.
  • when distance="manhattan", topk_values are negative L1 distance between queries and topk closest data points.
  • when distance="cosine", topk_values are cosine similarity between queries and topk closest data points.

Encode and Decode

you can use IVFPQ as a vector codec for lossy compression of vectors

code = index.encode(query)   # compression
reconstruction = index.decode(code) # reconstruction

Save and Load

Most of the TorchPQ modules are inherited from torch.nn.Module, this means you can save and load them just like a regular pytorch model.

# Save to PATH
torch.save(index.state_dict(), PATH)
# Load from PATH
index.load_state_dict(torch.load(PATH))

Clustering

K-means

from torchpq.kmeans import KMeans
import torch

n_data = 1000000 # number of data points
d_vector = 128 # dimentionality / number of features
x = torch.randn(d_vector, n_data, device="cuda")

kmeans = KMeans(n_clusters=4096, distance="euclidean")
labels = kmeans.fit(x)

Notice that the shape of the tensor that contains data points has to be [d_vector, n_data], this is consistant in TorchPQ.

Multiple concurrent K-means

Sometimes, we have multiple independent datasets that need to be clustered, instead of running multiple KMeans sequentianlly, we can perform multiple kmeans concurrently with MultiKMeans

from torchpq.kmeans import MultiKMeans
import torch

n_data = 1000000
n_kmeans = 16
d_vector = 64
x = torch.randn(n_kmeans, d_vector, n_data, device="cuda")
kmeans = MultiKMeans(n_clusters=256, distance="euclidean")
labels = kmeans.fit(x)

Prediction with K-means

labels = kmeans.predict(x)

Benchmark

All experiments were performed with a Tesla T4 GPU.

SIFT1M

IVFPQ

Faiss is one of the most well known ANN search libraries, and it also has a GPU implementation of IVFPQ, so we did some comparison experiments with faiss.

How to read the plot:

  • the plot format follows the style of ann-benchmarks
  • X axis is [email protected], Y axis is queries/second
  • the closer to the top right corner the better
  • indexes with same parameters from different libraries have similar colors.
  • different libraries have different line styles (TorchPQ is solid line with circle marker, faiss is dashed line with triangle marker)
  • each node on the line represents a different n_probe, starting from 1 at the left most node, and multiplied by 2 at the next node. (n_probe = 1,2,4,8,16,...)

Summary:

  • for all the IVF16384 variants, torchpq outperforms faiss when n_probe > 16.
  • for IVF4096, torchpq has lower [email protected] compared to faiss, this could be caused by not encoding residuals. An option to encode residuals will be added soon.

IVFPQ+R

GIST1M

coming soon...

Comments
  • torchPQ in a deep nets

    torchPQ in a deep nets

    Hello,

    Thank you very much for sharing the project. I am interested in using torchPQ inside a deep nets (implemented in pytorch) where in each forward pass, I will call torchPQ. I was wondering is this possible?

    Also, I saw https://ai.googleblog.com/2020/07/announcing-scann-efficient-vector.html, have you tried some comparison with other methods?

    Thank you!

    opened by Chen-Cai-OSU 9
  • About SM Size

    About SM Size

    Hi, thanks very much for sharing this project. I have been looking for a package supporting batch kmeans for a very long period. Very glad to find that TorchPQ supports that (MultiKMeans). Many thanks again.

    But I have a question regarding the argument sm_size of initializing MultiKMeans. I know it is Shared Memory Size of CUDA. I am not familiar with CUDA programming and cannot figure out what the default value 48 * 256 * 4 means (the comment in the code does not mention this argument), even after I search on the internet. Could you briefly explain this here? Also, I guess increasing this value can speed up the computation? Am I right? Thanks for your time.

    opened by SEC4SR 6
  • Question about importing MultiKMeans

    Question about importing MultiKMeans

    Thanks for the nice work! But when I tried to import MultiKMeans using the command shown in README.md: from torchpq.kmeans import MultiKMeans it goes wrong and said: ModuleNotFoundError: No module named 'torchpq.kmeans' And when I try to use: from torchpq.clustering import MultiKMeans to import, and it goes right. I wonder if it is correct since it is different from what README.md says. Screen Shot 2021-12-30 at 12 26 33

    opened by nmynol 6
  • CUDA error distributed training

    CUDA error distributed training

    Hi,

    TorchPQ runs well on a single gpu, but it fails when I switch to multi-gpus. The error occurs in the synchronize step. Do you have any suggestions for multi-gpu usage?

    Thanks!

    opened by Songweiping 4
  • Inquiry about the centroids of the K-means method

    Inquiry about the centroids of the K-means method

    Hi, firstly thanks for your wonderful work.

    I want to get the centroids of the clusters and visualize them. However, from your introduction, it seems I can only get the labels of all samples. Do you have any suggestions that I can get the results?

    Thanks again for helping me out.

    opened by hellodrx 3
  • Import Error in Minibatch K means

    Import Error in Minibatch K means

    just tried this today

    Traceback (most recent call last):
      File "/datadrive/phd-projects/PiCIE/eval_minimal.py", line 18, in <module>
        from torchpq.clustering import MinibatchKMeans
      File "/anaconda/envs/py38_pytorch/lib/python3.8/site-packages/torchpq/__init__.py", line 11, in <module>
        from . import experimental
    ImportError: cannot import name 'experimental' from partially initialized module 'torchpq' (most likely due to a circular import) (/anaconda/envs/py38_pytorch/lib/python3.8/site-packages/torchpq/__init__.py)
    
    opened by mhamilton723 3
  • Imports on CPU-only machine fail

    Imports on CPU-only machine fail

    Hello,

    I am trying to run your awesome CUDA-powered k-means. For testing purposes, I would like to make it runnable also on CPU, but I am getting errors during importing because of this: https://github.com/DeMoriarty/TorchPQ/blob/b8bbadf7915b8fead9a1b0f2dafa964b4058f26d/torchpq/kernels/default_device.py#L3

    which results in:

    CUDARuntimeError: cudaErrorNoDevice: no CUDA-capable device is detected
    

    Would you mind changing it to something like:

    if torch.cuda.is_available():
      __device = cp.cuda.Device().id
    else:
      __device = None
    

    or hiding the imports of get_default_device and set_default_device (they seem to be imported after checking torch.cuda.is_available() anyway, so it should be possible)?

    And also getting rid / hiding this: https://github.com/DeMoriarty/TorchPQ/blob/b8bbadf7915b8fead9a1b0f2dafa964b4058f26d/torchpq/init.py#L22

    opened by Tomiinek 2
  • KMeans and MultiKMeans: CUDA_ERROR_INVALID_VALUE: invalid argument

    KMeans and MultiKMeans: CUDA_ERROR_INVALID_VALUE: invalid argument

    This issue seems to come up when the tensor length (n_data) is greater than 8388480.

    n_data = 8388481 # Works when n_data = 8388480
    n_kmeans = 5
    d_vector = 3
    A = torch.randn(n_kmeans, d_vector, n_data, device="cuda")
    kmeans = MultiKMeans(n_clusters=10, distance="euclidean")
    labels = kmeans.fit(x)
    

    Error message:

    ---------------------------------------------------------------------------
    CUDADriverError                           Traceback (most recent call last)
    <ipython-input-27-75b27aaadf4d> in <module>
          6 #x = x.float()
          7 kmeans = MultiKMeans(n_clusters=10, distance="euclidean")
    ----> 8 labels = kmeans3fit(x)
    
    ~/.local/lib/python3.8/site-packages/torchpq/clustering/MultiKMeans.py in fit(self, data, centroids)
        432       for j in range(self.max_iter):
        433         # 1 iteration of clustering
    --> 434         maxsims, labels = self.get_labels(data, centroids) #top1 search
        435         new_centroids = self.compute_centroids(data, labels)
        436         error = self.calculate_error(centroids, new_centroids)
    
    ~/.local/lib/python3.8/site-packages/torchpq/clustering/MultiKMeans.py in get_labels(self, data, centroids)
        323         #   dim=2
        324         # )
    --> 325         maxsims, labels = self.max_sim_cuda(
        326           data,
        327           centroids,
    
    ~/.local/lib/python3.8/site-packages/torchpq/kernels/MaxSimCuda.py in __call__(self, A, B, dim, mode)
        317       vals, inds = self._call_tt(A2, B2, dim)
        318     elif mode == "tn":
    --> 319       vals, inds = self._call_tn(A2, B2, dim)
        320     elif mode == "nt":
        321       vals, inds = self._call_nt(A2, B2, dim)
    
    ~/.local/lib/python3.8/site-packages/torchpq/kernels/MaxSimCuda.py in _call_tn(self, A, B, dim)
        213     blocks_per_grid = (l, math.ceil(n/128), math.ceil(m/128))
        214 
    --> 215     self._fn_tn(
        216       grid=blocks_per_grid,
        217       block=threads_per_block,
    
    cupy/_core/raw.pyx in cupy._core.raw.RawKernel.__call__()
    
    cupy/cuda/function.pyx in cupy.cuda.function.Function.__call__()
    
    cupy/cuda/function.pyx in cupy.cuda.function._launch()
    
    cupy_backends/cuda/api/driver.pyx in cupy_backends.cuda.api.driver.launchKernel()
    
    cupy_backends/cuda/api/driver.pyx in cupy_backends.cuda.api.driver.check_status()
    
    CUDADriverError: CUDA_ERROR_INVALID_VALUE: invalid argument
    
    opened by mhudecheck 2
  • How to use MinibatchKMeans on multi GPUs machine?

    How to use MinibatchKMeans on multi GPUs machine?

    I'm a beginner, please how can I use multiple GPUs in MinibatchKMeans?

    from torchpq.clustering import MinibatchKMeans
    import torch
    
    n_data = 10000 # number of data points
    d_vector = 128 # dimentionality / number of features
    x = torch.randn(d_vector, n_data, device="cuda")
    
    minibatch_kmeans = MinibatchKMeans(n_clusters = 128)
    minibatch_kmeans = torch.nn.DataParallel(minibatch_kmeans, device_ids=[0,1,2])
    n_iter = 10
    tol = 0.001
    for i in range(n_iter):
        x = torch.randn(d_vector, n_data, device="cuda")
        minibatch_kmeans.fit_minibatch(x)
        if minibatch_kmeans.error < tol:
            break
    

    And I get the below output

    Traceback (most recent call last):
      File "kmean_torch.py", line 14, in <module>
        minibatch_kmeans.fit_minibatch(x)
      File "/data/home/dl/anaconda3/envs/clip/lib/python3.7/site-packages/torch/nn/modules/module.py", line 779, in __getattr__
        type(self).__name__, name))
    torch.nn.modules.module.ModuleAttributeError: 'DataParallel' object has no attribute 'fit_minibatch'
    
    opened by ZhangIceNight 1
  • readme does not run

    readme does not run

    Hello, I'm trying to run your Readme example and I get __init__() got an unexpected keyword argument 'blocksize' on removing blocksize, then i see __init__() got an unexpected keyword argument 'init_size'

    opened by lucidrains 1
  • Error while importing torchpq.clustering

    Error while importing torchpq.clustering

    I see the following error when I try to import torchpq.clustering.

    ---------------------------------------------------------------------------
    FileNotFoundError                         Traceback (most recent call last)
    /tmp/ipykernel_7302/3376715144.py in <module>
    ----> 1 from torchpq import clustering
    
    ~/.local/lib/python3.8/site-packages/torchpq/__init__.py in <module>
         18 from .CustomModule import CustomModule
         19 
    ---> 20 topk = fn.Topk()
    
    ~/.local/lib/python3.8/site-packages/torchpq/fn/Topk.py in __init__(self)
          4 class Topk:
          5   def __init__(self):
    ----> 6     self._top32_cuda = TopkSelectCuda(
          7       tpb = 32,
          8       queue_capacity = 4,
    
    ~/.local/lib/python3.8/site-packages/torchpq/kernels/TopkSelectCuda.py in __init__(self, tpb, queue_capacity, buffer_size)
         23     self.buffer_size = buffer_size
         24 
    ---> 25     with open(get_absolute_path("kernels", "cuda", "topk_select.cu"),'r') as f: ###
         26       self.kernel = f.read()
         27 
    
    FileNotFoundError: [Errno 2] No such file or directory: '/home/XXXX/.local/lib/python3.8/site-packages/torchpq/kernels/cuda/topk_select.cu'
    

    Installation details:

    • Used pip to install cupy-cuda110,
    • pytorch version: 1.7.1
    • Cuda: 11.0

    However, I am able to run from torchpq.index import IVFPQIndex without any issue. Can you please help me fix this?

    opened by abhinavvs 1
Releases(v0.3.0.1)
3D Multi-Person Pose Estimation by Integrating Top-Down and Bottom-Up Networks

3D Multi-Person Pose Estimation by Integrating Top-Down and Bottom-Up Networks Introduction This repository contains the code and models for the follo

124 Jan 06, 2023
基于Paddle框架的fcanet复现

fcanet-Paddle 基于Paddle框架的fcanet复现 fcanet 本项目基于paddlepaddle框架复现fcanet,并参加百度第三届论文复现赛,将在2021年5月15日比赛完后提供AIStudio链接~敬请期待 参考项目: frazerlin-fcanet 数据准备 本项目已挂

QuanHao Guo 7 Mar 07, 2022
Deep Reinforcement Learning for mobile robot navigation in ROS Gazebo simulator

DRL-robot-navigation Deep Reinforcement Learning for mobile robot navigation in ROS Gazebo simulator. Using Twin Delayed Deep Deterministic Policy Gra

87 Jan 07, 2023
A machine learning benchmark of in-the-wild distribution shifts, with data loaders, evaluators, and default models.

WILDS is a benchmark of in-the-wild distribution shifts spanning diverse data modalities and applications, from tumor identification to wildlife monitoring to poverty mapping.

P-Lambda 437 Dec 30, 2022
Repo for EchoVPR: Echo State Networks for Visual Place Recognition

EchoVPR Repo for EchoVPR: Echo State Networks for Visual Place Recognition Currently under development Dirs: data: pre-collected hidden representation

Anil Ozdemir 4 Oct 04, 2022
Multi Agent Reinforcement Learning for ROS in 2D Simulation Environments

IROS21 information To test the code and reproduce the experiments, follow the installation steps in Installation.md. Afterwards, follow the steps in E

11 Oct 29, 2022
PyTorch Language Model for 1-Billion Word (LM1B / GBW) Dataset

PyTorch Large-Scale Language Model A Large-Scale PyTorch Language Model trained on the 1-Billion Word (LM1B) / (GBW) dataset Latest Results 39.98 Perp

Ryan Spring 114 Nov 04, 2022
AugLiChem - The augmentation library for chemical systems.

AugLiChem Welcome to AugLiChem! The augmentation library for chemical systems. This package supports augmentation for both crystaline and molecular sy

BaratiLab 17 Jan 08, 2023
It's A ML based Web Site build with python and Django to find the breed of the dog

ML-Based-Dog-Breed-Identifier This is a Django Based Web Site To Identify the Breed of which your DOG belogs All You Need To Do is to Follow These Ste

Sanskar Dwivedi 2 Oct 12, 2022
OpenPCDet Toolbox for LiDAR-based 3D Object Detection.

OpenPCDet OpenPCDet is a clear, simple, self-contained open source project for LiDAR-based 3D object detection. It is also the official code release o

OpenMMLab 3.2k Dec 31, 2022
Si Adek Keras is software VR dangerous object detection.

Si Adek Python Keras Sistem Informasi Deteksi Benda Berbahaya Keras Python. Version 1.0 Developed by Ananda Rauf Maududi. Developed date: 24 November

Ananda Rauf 1 Dec 21, 2021
Contrastive Learning for Many-to-many Multilingual Neural Machine Translation(mCOLT/mRASP2), ACL2021

Contrastive Learning for Many-to-many Multilingual Neural Machine Translation(mCOLT/mRASP2), ACL2021 The code for training mCOLT/mRASP2, a multilingua

104 Jan 01, 2023
The repository for our EMNLP 2021 paper "Finnish Dialect Identification: The Effect of Audio and Text"

Finnish Dialect Identification The repository for our EMNLP 2021 paper "Finnish Dialect Identification: The Effect of Audio and Text". We present a te

Rootroo Ltd 2 Dec 25, 2021
[Arxiv preprint] Causality-inspired Single-source Domain Generalization for Medical Image Segmentation (code&data-processing pipeline)

Causality-inspired Single-source Domain Generalization for Medical Image Segmentation Arxiv preprint Repository under construction. Might still be bug

Cheng 31 Dec 27, 2022
Python scripts performing class agnostic object localization using the Object Localization Network model in ONNX.

ONNX Object Localization Network Python scripts performing class agnostic object localization using the Object Localization Network model in ONNX. Ori

Ibai Gorordo 15 Oct 14, 2022
π-GAN: Periodic Implicit Generative Adversarial Networks for 3D-Aware Image Synthesis

π-GAN: Periodic Implicit Generative Adversarial Networks for 3D-Aware Image Synthesis Project Page | Paper | Data Eric Ryan Chan*, Marco Monteiro*, Pe

375 Dec 31, 2022
MWPToolkit is a PyTorch-based toolkit for Math Word Problem (MWP) solving.

MWPToolkit is a PyTorch-based toolkit for Math Word Problem (MWP) solving. It is a comprehensive framework for research purpose that integrates popular MWP benchmark datasets and typical deep learnin

119 Jan 04, 2023
Semantic Segmentation for Aerial Imagery using Convolutional Neural Network

This repo has been deprecated because whole things are re-implemented by using Chainer and I did refactoring for many codes. So please check this newe

Shunta Saito 27 Sep 23, 2022
StarGAN - Official PyTorch Implementation (CVPR 2018)

StarGAN: Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation

Yunjey Choi 5.1k Dec 30, 2022
Narya API allows you track soccer player from camera inputs, and evaluate them with an Expected Discounted Goal (EDG) Agent

Narya The Narya API allows you track soccer player from camera inputs, and evaluate them with an Expected Discounted Goal (EDG) Agent. This repository

Paul Garnier 121 Dec 30, 2022