A library for efficient similarity search and clustering of dense vectors.

Last update: Jan 08, 2023

Related tags

Overview

Faiss

Faiss is a library for efficient similarity search and clustering of dense vectors. It contains algorithms that search in sets of vectors of any size, up to ones that possibly do not fit in RAM. It also contains supporting code for evaluation and parameter tuning. Faiss is written in C++ with complete wrappers for Python/numpy. Some of the most useful algorithms are implemented on the GPU. It is developed by Facebook AI Research.

News

See CHANGELOG.md for detailed information about latest features.

Introduction

Faiss contains several methods for similarity search. It assumes that the instances are represented as vectors and are identified by an integer, and that the vectors can be compared with L2 (Euclidean) distances or dot products. Vectors that are similar to a query vector are those that have the lowest L2 distance or the highest dot product with the query vector. It also supports cosine similarity, since this is a dot product on normalized vectors.

Most of the methods, like those based on binary vectors and compact quantization codes, solely use a compressed representation of the vectors and do not require to keep the original vectors. This generally comes at the cost of a less precise search but these methods can scale to billions of vectors in main memory on a single server.

The GPU implementation can accept input from either CPU or GPU memory. On a server with GPUs, the GPU indexes can be used a drop-in replacement for the CPU indexes (e.g., replace IndexFlatL2 with GpuIndexFlatL2) and copies to/from GPU memory are handled automatically. Results will be faster however if both input and output remain resident on the GPU. Both single and multi-GPU usage is supported.

Building

The library is mostly implemented in C++, with optional GPU support provided via CUDA, and an optional Python interface. The CPU version requires a BLAS library. It compiles with a Makefile and can be packaged in a docker image. See INSTALL.md for details.

How Faiss works

Faiss is built around an index type that stores a set of vectors, and provides a function to search in them with L2 and/or dot product vector comparison. Some index types are simple baselines, such as exact search. Most of the available indexing structures correspond to various trade-offs with respect to

search time
search quality
memory used per index vector
training time
need for external data for unsupervised training

The optional GPU implementation provides what is likely (as of March 2017) the fastest exact and approximate (compressed-domain) nearest neighbor search implementation for high-dimensional vectors, fastest Lloyd's k-means, and fastest small k-selection algorithm known. The implementation is detailed here.

Full documentation of Faiss

The following are entry points for documentation:

the full documentation, including a tutorial, a FAQ and a troubleshooting section can be found on the wiki page
the doxygen documentation gives per-class information
to reproduce results from our research papers, Polysemous codes and Billion-scale similarity search with GPUs, refer to the benchmarks README. For Link and code: Fast indexing with graphs and compact regression codes, see the link_and_code README

Authors

The main authors of Faiss are:

Hervé Jégou initiated the Faiss project and wrote its first implementation
Matthijs Douze implemented most of the CPU Faiss
Jeff Johnson implemented all of the GPU Faiss
Lucas Hosseini implemented the binary indexes

Reference

Reference to cite when you use Faiss in a research paper:

@article{JDH17,
  title={Billion-scale similarity search with GPUs},
  author={Johnson, Jeff and Douze, Matthijs and J{\'e}gou, Herv{\'e}},
  journal={arXiv preprint arXiv:1702.08734},
  year={2017}
}

Join the Faiss community

For public discussion of Faiss or for questions, there is a Facebook group at https://www.facebook.com/groups/faissusers/

We monitor the issues page of the repository. You can report bugs, ask questions, etc.

License

Faiss is MIT-licensed.

Comments

faiss::gpu::runMatrixMult failure

The full log: Faiss assertion err == CUBLAS_STATUS_SUCCESS failed in void faiss::gpu::runMatrixMult(faiss::gpu::Tensor<T, 2, true>&, bool, faiss::gpu::Tensor<T, 2, true>&, bool, faiss::gpu::Tensor<T, 2, true>&, bool, float, float, cublasHandle_t, cudaStream_t) [with T = float; cublasHandle_t = cublasContext*; cudaStream_t = CUstream_st*] at utils/MatrixMult.cu:141Aborted (core dumped)

I have successfully run demo_ivfpq_indexing_gpu, which I think the faiss was installed successfully.
bug cant-repro

opened by hellolovetiger 36
No module named '_swigfaiss' for conda install
Summary

Platform

OS: macOS 10.13.4

Faiss version:

Faiss compilation options:

Running on :

[ ] CPU

Reproduction instructions

I installed with

conda install faiss-cpu -c pytorch

and got No module named '_swigfaiss' error. I went into faiss directory and tried to import again, but got the same error message. It is mentioned in the trouble shooting that this error is caused by faiss not being compiled. Since I use conda install, I suppose it is not the case?
bug install
opened by hsiaoma 29
make py: fatal error: Python.h: No such file or directory
I am also facing same issue, i did following steps

Cloned FAISS

updated makefile.inc with anaconda python path and installed necessary dependencies like libopenblas-dev python-numpy python-dev

make (After this step i am not finding any _swigfaiss.so files anywhere)

make py (Gave following error) $ make py g++ -I. -fPIC -m64 -Wall -g -O3 -msse4 -mpopcnt -fopenmp -Wno-sign-compare -std=c++11 -fopenmp -g -fPIC -fopenmp -I~/anaconda2/envs/faissenv/include/python2.7/ -I~/anaconda2/envs/faissenv/lib/python2.7/site-packages/numpy/core/include -shared
-o python/_swigfaiss.so python/swigfaiss_wrap.cxx libfaiss.a /usr/lib/libopenblas.so.0 python/swigfaiss_wrap.cxx:154:21: fatal error: Python.h: No such file or directory compilation terminated. Makefile:84: recipe for target 'python/_swigfaiss.so' failed make: *** [python/_swigfaiss.so] Error 1 I am able to run cpp implementation, but only this python wrapper is failing, let me know what i am setting wrong. As _swigfaiss.so is not generated, what went wrong while doing make?

Originally posted by @Mahanteshambi in https://github.com/facebookresearch/faiss/issues/336#issuecomment-365565492
question cant-repro install
opened by daisy-belle 24

Faiss import error when run in virtualenv by using own built Faiss-python

Summary

I have built faiss-core and faiss-python by myself. I installed python into my local virtual env and try to import faiss and I got an error, checked egg file, it does have _swigfaiss.so inside. I checked conda swigfaiss.py, it's still using old swig_import_helper, not sure if caused by this you remove it by using swig create python/swigfaiss.py as follows:

https://github.com/facebookresearch/faiss/commit/7f5b22b0fff0882ce4afd93ce54cc2833a224909#diff-8cf6167d58ce775a08acafcfe6f40966

$ ls faiss-1.5.2-py3.6/faiss
__init__.py	__pycache__	_swigfaiss.so	swigfaiss.py

Platform

OS: centos 7

Faiss version: 1.5.2

Faiss compilation options:

 ./configure  --prefix=/usr --without-cuda --with-blas=/usr/lib64/libblas.so.3 --with-lapack=/usr/lib64/liblapack.so.3
make
sudo make install
make py
cd ~ && rm -rf env && python3 -m venv env
source env/bin/activate
cd ~/faiss && sudo make -C python install

Running on:

[X] CPU
[ ] GPU

Interface:

[ ] C++
[X] Python

Reproduction instructions

$ python
Python 3.6.7 | packaged by conda-forge | (default, Feb 28 2019, 09:07:38)  [GCC 7.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import faiss
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/midas/env/lib/python3.6/site-packages/faiss-1.5.2-py3.6.egg/faiss/__init__.py", line 18, in <module>
  File "/home/midas/env/lib/python3.6/site-packages/faiss-1.5.2-py3.6.egg/faiss/swigfaiss.py", line 13, in <module>
ImportError: cannot import name '_swigfaiss'

install

opened by billyean 23

PyTorch tensor / Faiss index interoperability
Summary: This diff allows for native usage of PyTorch tensors for Faiss indexes on both CPU and GPU. It is currently only implemented in this diff for things that inherit from faiss.Index, which covers the non-binary indices, and it patches the same functions on faiss.Index that were also covered by __init__.py for numpy interoperability.

There must be uniformity among the inputs: if any array input is a Torch tensor, then all array inputs must be Torch tensors. Similarly, if any array input is a numpy ndarray, then all array inputs must be numpy ndarrays.

If faiss.contrib.torch_utils is imported, it ensures that import faiss has already been performed to patch all of the functions using the base __init__.py numpy wrappers, and then patches the following functions again:

add add_with_ids assign train search remove_ids reconstruct reconstruct_n range_search update_vectors search_and_reconstruct sa_encode sa_decode

to allow usage of PyTorch CPU tensors, and additionally PyTorch GPU tensors if the index being used is on the GPU.

numpy functionality is still available when faiss.contrib.torch_utils is imported; we pass through to the original patched numpy function when we detect numpy inputs.

In addition, to allow for better (asynchronous) GPU usage without requiring the CPU to be involved, all of these functions which construct tensors/arrays for output now take optional arguments for storage (numpy or torch.Tensor) to be provided that will contain the output data. range_search is the only exception to this, as the size of the output data is indeterminate. The eventual GPU implementation will likely require the user to provide a maximum cap on the output size, and allow that to be passed instead. If the optional pre-allocated output values are presented by the user, they are used; otherwise, new return ndarray / Tensors are constructed as before and used for the return. If this feature were not provided on the GPU, then every execution would be completely serial as we would depend upon the CPU to allocate GPU memory before every operation. Instead, now this can function much like NN graph execution on the GPU, assuming that all of the data requirements are pre-allocated, so the execution will run at the full speed of the GPU and not be stalled sequentially launching kernels.

This diff also exposes the GpuResources shared_ptr object owned by a GPU index. This is required for pytorch GPU so that we can perform proper stream ordering in Faiss with respect to the current pytorch stream. So, Faiss indices now perform more or less as any NN operation in Torch does.

Note, however, that a Faiss index has its own setting on current device, and if the pytorch GPU tensor inputs are resident on a different device than what the Faiss index expects, a cross-device copy will be initiated. I may choose to make this an error in the future and require matching device to device.

This diff also found a bug when passing GPU data directly to train() for GpuIndexIVFFlat and GpuIndexIVFScalarQuantizer, as I guess we never tested passing GPU data directly to these functions before. GpuIndexIVFPQ was doing the right thing however.

The assign function is now also implemented on the GPU as well, and is now marked const to be in line with the search function.

Also added better checking of non-contiguous inputs for both Torch tensors and numpy ndarrays.

Updated the knn_gpu function with a base implementation always present that allows for usage of numpy arrays, which is overridden when torch_utils is imported to allow torch usage. This supports row/column major layout, float32/float16 data and int64/int32 indices for both numpy and torch.

Reviewed By: mdouze

Differential Revision: D24299400
CLA Signed fb-exported
opened by wickedfoo 21
GPU issue when installing from conda
Summary

I install Faiss from conda (GPU version)

And I got ImportError: No module named 'swigfaiss' Could you guys help me out? Did I forget anything?

Platform

OS: Ubuntu

Faiss version:

Faiss compilation options:

Running on :

[ ] CPU

[x] GPU

Reproduction instructions

GPU install
opened by hminle 20
Speedup exhaustive_L2sqr_blas for AVX2, ARM NEON and AVX512

Summary: Add a fused kernel for exhaustive_L2sqr_blas() call that combines a computation of dot product and the search for the nearest centroid. As a result, no temporary dot product values are written and read in RAM.

Significantly speeds up the training of PQx[1] indices for low-dimensional PQ vectors ( 1, 2, 4, 8 ), and the effect is higher for higher values of [1]. AVX512 provides additional overloads for dimensionality of 12 and 16.

The speedup is also beneficial for higher values of pq.cp.max_points_per_centroid (which is 256 by default).

Speeds up IVFPQ training as well.

AVX512 kernel is not enabled, but I've seen it speeding up the training TWICE versus AVX2 version. So, please feel free to use it by enabling AVX512 manually.

Differential Revision: D41166766
CLA Signed fb-exported

opened by alexanderguzhva 18
Does Faiss support searching from Disk?

I checked this issue[#552] and also this demo file. But when I checked the demo file, it was not for searching from disk, The demo file was about how save an trained index and load the index to memory for searching. Does Faiss really support searching from disk? If it does, could you let me know where I can refer to do it.
question

opened by sam3oh5 18
_swigfaiss_avx2.so may not be loaded properly in conda
Summary

When I install faiss via conda, IndexPQFastScan is slower than IndexPQ. It seems that AVX2 is not activated properly because _swigfaiss_avx2.so is not loaded correctly.

Platform

OS: Ubuntu 20.04 on AWS EC2. (ami-0e039c7d64008bd84, c5.large)

Faiss version: faiss-cpu 1.7.0 (pytorch/linux-64::faiss-cpu-1.7.0-py3.8_h2a577fa_0_cpu)

Installed from: conda install -c pytorch faiss-cpu

Faiss compilation options:

Running on:

[x] CPU

[ ] GPU

Interface:

[ ] C++

[x] Python

Reproduction instructions

I found that IndexPQFastScan is slower than IndexPQ for faiss 1.7.0 installed from conda. Here is the benchmark code.

import faiss import numpy as np import time np.random.seed(123) D = 128 N = 1000 X = np.random.random((N, D)).astype(np.float32) M = 64 nbits = 4 pq = faiss.IndexPQ(D, M, nbits) pq.train(X) pq.add(X) pq_fast = faiss.IndexPQFastScan(D, M, nbits) pq_fast.train(X) pq_fast.add(X) t0 = time.time() d1, ids1 = pq.search(x=X[:3], k=5) t1 = time.time() print(f"pq: {(t1 - t0) * 1000} msec") t0 = time.time() d2, ids2 = pq_fast.search(x=X[:3], k=5) t1 = time.time() print(f"pq_fast: {(t1 - t0) * 1000} msec") assert np.allclose(ids1, ids2)

The result is:

pq: 0.4680156707763672 msec pq_fast: 1.6791820526123047 msec

After investigating, the cause seems that _swigfaiss_avx2.so is not loaded correctly. If I rename _swigfaiss_avx2.so to _swigfaiss.so, the above code works as expected:

cd ~/anaconda/lib/python3.8/site-packages/faiss/ mv _swigfaiss.so _swigfaiss.so.bk mv _swigfaiss_avx2.so _swigfaiss.so

Then the benchmark results in:

pq: 0.8258819580078125 msec pq_fast: 0.07104873657226562 msec

Here, IndexPQFastScan becomes much faster.

The root cause seems that swigfaiss.py is somehow exactly the same as swigfaiss_avx2.py.

diff swigfaiss.py swigfaiss_avx2.py # same

If I understand correctly, swigfaiss_avx2.py must load _swigfaiss_avx2.so. But currently swigfaiss_avx2.py is the same as swigfaiss.py and loads _swigfaiss.so.
install
opened by matsui528 16
Indexing 1B vectors by creating smaller indexes on batches and merging them

Need guidance...

We'll have an application where we will stream a set of vectors (on the order of a billion). We cannot wait until we collect all the vectors to train an index (you recommend IMI at this scale). We are thinking of building indexes for smaller batches of vectors... once we have a batch ready, we could train the index from a sample, create an index for the batch and in the end merge all the indexes. I understand only IVF supports merging of indexes, wanted your thoughts on this approach.

Thanks
question GPU

opened by mvss80 16

CUDA 9 issue: results of GPU Index are not right?

1. The result of GPU index is not the same as CPU, even although on the same dateset with the same index

import numpy as np
d = 64                           # dimension
nb = 100000                      # database size
nq = 10000                       # nb of queries
np.random.seed(1234)             # make reproducible
xb = np.random.random((nb, d)).astype('float32')
xb[:, 0] += np.arange(nb) / 1000.
xq = np.random.random((nq, d)).astype('float32')
xq[:, 0] += np.arange(nq) / 1000.
#=================================================================
import faiss                   # make faiss available
index = faiss.IndexFlatL2(d)   # build the index
index.add(xb)                  # add vectors to the index
k = 4                          # we want to see 4 nearest neighbors
D, I = index.search(xq, k)     # actual search
print I[-5:]                # neighbors of the 5 last queries
print D[-5:]

del index, D, I
#=================================================================
print "================="
index = faiss.IndexFlatL2(d)   # build the index
res = faiss.StandardGpuResources()
index = faiss.index_cpu_to_gpu(res, 0, index)
index.add(xb)                  # add vectors to the index
k = 4                          # we want to see 4 nearest neighbors
D, I = index.search(xq, k)     # actual search
print I[-5:]                # neighbors of the 5 last queries
print D[-5:]

del index, D, I

exit(1)

The result is

[[ 9900 10500  9309  9831]
 [11055 10895 10812 11321]
 [11353 11103 10164  9787]
 [10571 10664 10632  9638]
 [ 9628  9554 10036  9582]]
[[ 6.53157043  6.97875977  7.00392151  7.01379395]
 [ 4.33526611  5.23693848  5.31942749  5.70327759]
 [ 6.07269287  6.57675171  6.61395264  6.7322998 ]
 [ 6.63751221  6.64874268  6.85787964  7.00964355]
 [ 6.21836853  6.45251465  6.54876709  6.58129883]]
=================
number of GPUs: 1
[[10500 10500  9831  9831]
 [10895 10895 10812 11321]
 [11103 11103  9787  9787]
 [10632 10632  9638  9638]
 [ 9628  9554  9582  9582]]
[[ 6.53156281  6.97874451  7.00393677  7.01376343]
 [ 4.33531189  5.23696899  5.31942749  5.70326233]
 [ 6.07269287  6.57672119  6.61393738  6.73226929]
 [ 6.63748169  6.64871216  6.85783386  7.00959778]
 [ 6.21837616  6.45251465  6.54875183  6.58128357]]

The result of the GPU index and CPU index are not the same

2. Duplicate items in the GPU result

As the result shown above, there are duplicate ids in the result but with different distances, like [10500 10500 9831 9831].

Could someone tell me what is the problem and how to fix it, THX!

bug GPU

opened by DrLai12club 16

Tests fail to link: undefined symbol: testing::AssertionSuccess()

Summary

ld: error: undefined symbol: testing::AssertionSuccess()
>>> referenced by test_binary_flat.cpp
>>>               tests/CMakeFiles/faiss_test.dir/test_binary_flat.cpp.o:(BinaryFlat_accuracy_Test::TestBody())
>>> referenced by test_dealloc_invlists.cpp
>>>               tests/CMakeFiles/faiss_test.dir/test_dealloc_invlists.cpp.o:((anonymous namespace)::test_dealloc_invlists(char const*))
>>> referenced by test_ivfpq_codec.cpp
>>>               tests/CMakeFiles/faiss_test.dir/test_ivfpq_codec.cpp.o:(IVFPQ_codec_Test::TestBody())
>>> referenced 533 more times

ld: error: undefined symbol: testing::Message::Message()
>>> referenced by test_binary_flat.cpp
>>>               tests/CMakeFiles/faiss_test.dir/test_binary_flat.cpp.o:(BinaryFlat_accuracy_Test::TestBody())
>>> referenced by test_dealloc_invlists.cpp
>>>               tests/CMakeFiles/faiss_test.dir/test_dealloc_invlists.cpp.o:((anonymous namespace)::test_dealloc_invlists(char const*))
>>> referenced by test_ivfpq_codec.cpp
>>>               tests/CMakeFiles/faiss_test.dir/test_ivfpq_codec.cpp.o:(IVFPQ_codec_Test::TestBody())
>>> referenced 746 more times

Platform

OS: FreeBSD 13.1

Faiss version: 1.7.3

Installed from: FreeBSD port

opened by yurivict 0

Have max_codes consider only subset entries in IndexIVF search
Summary

Hey! Nice work with V1.7.3! I have a feature request.

Is there a way to have the max_codes stop criteria in IndexIVF searches only consider those entries that actually belong to a subset if one is specified? Wit the current implementation, to my understanding, when the number of scanned entries reaches max_codes, the search is stopped. However, for subset searches, this might happen before we actually scanned max_codes entries in the subset as even entries not in the subset count towards this limit.

Specifically, just as a proof of concept, all that would be necessary is to have scan_one_list not return list_size but instead return the number returned by scanner->scan_codes(list_size, codes, ids, simi, idxi, k); a few lines above. See here.

Obviously, that's just a quick hack only for the IVF index. 🙃 I assume in order to not break the current behavior, this would need to be controlled via an additional search parameter for all indices that have the same behavior currently.

Faiss version: 1.7.3 (19f7696deedc93615c3ee0ff4de22284b53e0243)

Running on:

[x] CPU

[ ] GPU

Interface:

[x] C++

[ ] Python
opened by wro-ableton 0
Scan exactly max_codes elements

Summary: The max_codes search parameter for IVF indexes limits the number of distance computations that are performed. Previously, the number of distance computations could exceed max_codes because inverted lists were scanned completely. This diff changed this to scan the beginning of the last inverted list to reach max_codes exactly.

Differential Revision: D42367593
CLA Signed fb-exported

opened by mdouze 2
search slow after time.sleep
Platform

OS: Ubuntu 18.04.6 LTS

Faiss version: 1.5.3

Installed from: pip install faiss

Running on:

unknown

Interface:

[x] Python

build index: index = faiss.index_factory(self.d, "IDMap,Flat", faiss.METRIC_INNER_PRODUCT) save index: faiss.write_index(index, index_path) read_index: faiss_index = faiss.read_index(index_path)

loop 100, if "time.sleep(0.2)", some step cost time > 20ms if no "time.sleep(0.2)", all step cost time is steady

#1 for i in range(0, 100): time.sleep(0.2) s_time = time.time() D, I = faiss_index.search(feature, 10) print(time.time() - s_time)

time(s): 0.033809662 0.001636744 0.001227379 0.000584841 0.000673294 0.001588345 0.000566244 0.025577307 0.000347614 0.000542164 0.00073719 0.000379801 0.000360966 0.000362158 0.000305891 0.000477791 0.000341892 0.000299692 0.027928352 0.000314474 0.000792265 0.000283957 0.000373125 0.000294924 0.000402451 0.000293255 0.000303745 0.000368595 0.000586987 0.0218997 0.000355959 0.000353813 0.000363588 0.000471115 0.000345945 0.00036335 0.000501871 0.000407934 0.000304461 0.025905132 0.000546932 0.000391483 0.000262737 0.000678778 0.000277281 0.000338316 0.000325441 0.000415325 0.000396729 0.000430822 0.025371552 0.000266314 0.000350237 0.000250816 0.000309944 0.000453234 0.000368357 0.000521183 0.000347614 0.000543833 0.000417709 0.051602125 0.000535011 0.00065589 0.00056839 0.000513554 0.000328541 0.000306129 0.00067091 0.00054121 0.00051856 0.00036788 0.02731204 0.000954151 0.00055337 0.000694036 0.000400543 0.000449419 0.00043416 0.000398636 0.000354052 0.000365257 0.033364534 0.000450373 0.000359058 0.004323483 0.000331402 0.000561714 0.000916481 0.000369787 0.000481844 0.000393391 0.000357866 0.025733948 0.000584841 0.000360727 0.000318527 0.000590801 0.000495434 0.000266552

#2 for i in range(0, 100): #time.sleep(0.2)
s_time = time.time() D, I = faiss_index.search(feature, 10) print(time.time() - s_time)

time(s): 0.046122789 0.000362396 0.00031805 0.000313759 0.000325203 0.00032568 0.000318527 0.000315666 0.000306606 0.000331163 0.000328302 0.000318289 0.000317335 0.000319004 0.00031662 0.00031805 0.000314713 0.000321388 0.000338554 0.000316143 0.000310659 0.000306129 0.000330448 0.000365973 0.000255823 0.000335455 0.00032115 0.000276089 0.000339508 0.000310898 0.000317812 0.00032568 0.000333309 0.00030756 0.000320435 0.000317812 0.00032258 0.000314236 0.000326872 0.000309706 0.000336885 0.000307322 0.000322104 0.00032711 0.00032711 0.000305414 0.000321388 0.000312805 0.000305891 0.00031805 0.000324965 0.00030899 0.000313282 0.000323772 0.000318527 0.000325918 0.000321627 0.000317097 0.000327587 0.000323296 0.000310898 0.000326872 0.000333548 0.000359297 0.000272274 0.000305414 0.000329018 0.000317335 0.000315666 0.000325441 0.00031662 0.000314474 0.00033021 0.000314951 0.000320911 0.00033021 0.000313282 0.000319958 0.000318289 0.000332832 0.000331879 0.000303507 0.000319242 0.000331879 0.000316381 0.000310659 0.000353813 0.000301838 0.000322819 0.00031662 0.000310183 0.000318766 0.000341415 0.000312328 0.00033021 0.000317335 0.000331402 0.000324726 0.000315905 0.000311375
opened by safehumeng 0

GpuIndexFlatL2 doesn't produce distances for the last 8 queries

Platform

OS: Windows 10 Faiss version: 1.7.3

Installed from: Compiled using Visual Studio 17 2022

Faiss compilation options: Using MKL 2202.2.1

Cuda version: 12.0.0

GPU: GTX 1060

Running on:

[X] CPU
[X] GPU

Interface:

[X] C++
[ ] Python

Reproduction instructions

Using the test file linked below, faiss makes a CPU index and a GPU index. Then performs a query search on the first 1000 vectors from a 100000 vector database. Code copied directly from 1-Flat for the CPU portion, and 4-GPU for the GPU portion.

Consistently, the last 8 vectors from the distance matrix are all 0's. Whether querying 1000 elements, or 10000 elements, it's only the last 8 elements.

6-GPU-CPU.zip

Output of the program is as follows:

Building data
Make index
is_trained = true
ntotal = 100000
I (5 first results)=
    0   723   254   152   403    92   368  1129   673   571
    1   995   136   183   223   555   880   671     5    68
    2   312   253    29   124   148   112   718   713   260
    3   983   467    88   786   327   326   684   367  1053
    4   403   112   643   430   679   142   733   119   382
I (10 last results)=
  990   962  2284   863  1133  1683  1463  2339  1730  2228
  991  1026   995   540  1396   365  1348  1271  1861   975
  992   257   163   135  1489  1315   878  1017   219   777
  993  1331   210  1362   286   444  1329   608  1191   986
  994   155   134   631   469  1044   388  1042   766  1561
  995   511     1   664   991  1800   689    37   634   631
  996   770  1043   827  1264  1310  1828  1504  1535   876
  997  1288   920   742  1432   840  1174  1337  1041  1113
  998   689  1044   810  1229  2199  1448  2112  1888  1442
  999  1722   901  1161  1044  1251   505  1310   791   308
D (10 last results)=
      0 6.46885 6.56971 6.80382 7.19488 7.25274 7.44602 7.56737 7.75592  7.8215
      0 5.75124 5.96521 6.00626 6.17735  6.6787 6.74106 6.87712 6.89094 6.89425
      0 5.82659 6.08222 6.16805 6.19852 6.25793 6.56962 6.60474 6.71429 6.72893
      0 6.79663 6.83468  6.9018 6.90929 7.06563 7.07221 7.15147 7.18442 7.20781
      0 6.02754 6.53414 6.62136 6.73151 6.83076 6.85785 6.86768 6.87643 6.89012
      0 5.52238 5.78548 5.80803 5.96521 5.97704 6.12522  6.1321 6.18419 6.51028
      0 5.73736 6.25742 6.38132 6.43517 6.63315 6.70425 6.81538 6.84794  6.8531
      0 6.59953 6.84864 7.11777 7.33908 7.38752 7.39641 7.48399 7.52819 7.60603
      0 5.54166 5.68894 5.72082 5.98355 6.49582 6.52649  6.5502 6.66038 6.66049
      0 6.26311 6.37093 6.39842 6.62256 6.73258 6.82148 6.83769 6.84539 6.91491
is_trained = true
ntotal = 100000
I (5 first results)=
    0   723   254   152   403    92   368  1129   673   571
    1   995   136   183   223   555   880   671     5    68
    2   312   253    29   124   148   112   718   713   260
    3   983   467    88   786   327   326   684   367  1053
    4   403   112   643   430   679   142   733   119   382
I (10 last results)=
  990   962  2284   863  1133  1683  1463  2339  1730  2228
  991  1026   995   540  1396   365  1348  1271  1861   975
  992   257   163   135  1489  1315   878  1017   219   777
  993  1331   210  1362   286   444  1329   608  1191   986
  994   155   134   631   469  1044   388  1042   766  1561
  995   511     1   664   991  1800   689    37   634   631
  996   770  1043   827  1264  1310  1828  1504  1535   876
  997  1288   920   742  1432   840  1174  1337  1041  1113
  998   689  1044   810  1229  2199  1448  2112  1888  1442
  999  1722   901  1161  1044  1251   505  1310   791   308
D (10 last results)=
7.62939e-06 6.46885 6.56971 6.80381 7.19488 7.25273 7.44602 7.56738 7.75592  7.8215
      0 5.75124 5.96521 6.00626 6.17735 6.67871 6.74106 6.87711 6.89094 6.89426
      0       0       0       0       0       0       0       0       0       0
      0       0       0       0       0       0       0       0       0       0
      0       0       0       0       0       0       0       0       0       0
      0       0       0       0       0       0       0       0       0       0
      0       0       0       0       0       0       0       0       0       0
      0       0       0       0       0       0       0       0       0       0
      0       0       0       0       0       0       0       0       0       0
      0       0       0       0       0       0       0       0       0       0

cant-repro GPU

opened by JulianThijssen 1

Releases(v1.7.3)

v1.7.3(Nov 30, 2022)
Added

Sparse k-means routines and moved the generic kmeans to contrib

FlatDistanceComputer for all FlatCodes indexes

Support for fast accumulation of 4-bit LSQ and RQ

Product additive quantization support

Support per-query search parameters for many indexes + filtering by ids

write_VectorTransform and read_vectorTransform were added to the public API (by @AbdelrahmanElmeniawy)

Support for IDMap2 in index_factory by adding "IDMap2" to prefix or suffix of the input String (by @AbdelrahmanElmeniawy)

Support for merging all IndexFlatCodes descendants (by @AbdelrahmanElmeniawy)

Remove and merge features for IndexFastScan (by @AbdelrahmanElmeniawy)

Performance improvements: 1) specialized the AVX2 pieces of code speeding up certain hotspots, 2) specialized kernels for vector codecs (this can be found in faiss/cppcontrib)

Fixed

Fixed memory leak in OnDiskInvertedLists::do_mmap when the file is not closed (by @AbdelrahmanElmeniawy)

LSH correctly throws error for metric types other than METRIC_L2 (by @AbdelrahmanElmeniawy)

Source code(tar.gz)
Source code(zip)
v1.7.2(Jan 10, 2022)
ADDED

Support LSQ on GPU (by @KinglittleQ)

Support for exact 1D kmeans (by @KinglittleQ)

LUT-based search for additive quantizers

Autogenerated Python docstrings from Doxygen comments

CHANGED

Cleanup of index_factory parsing

Source code(tar.gz)
Source code(zip)
v1.7.1(May 28, 2021)

Source code(tar.gz)
Source code(zip)
v1.7.0(Feb 13, 2021)

Source code(tar.gz)
Source code(zip)
v1.6.5(Dec 21, 2020)

Source code(tar.gz)
Source code(zip)
v1.6.4(Oct 22, 2020)
Features

Arbitrary dimensions per sub-quantizer now allowed for GpuIndexIVFPQ.

Brute-force kNN on GPU (bfKnn) now accepts int32 indices.

Faiss CPU now supports Windows. Conda packages are available from the nightly channel.

Source code(tar.gz)
Source code(zip)
v1.6.3(Aug 17, 2020)

Source code(tar.gz)
Source code(zip)
v1.5.3(Jun 24, 2019)
Bugfixes:

slow scanning of inverted lists (#836).

Features:

add basic support for 6 new metrics in CPU IndexFlat and IndexHNSW (#848);

add support for IndexIDMap/IndexIDMap2 with binary indexes (#780).

Misc:

throw python exception for OOM (#758);

make DistanceComputer available for all random access indexes;

gradually moving from long to int64_t for portability.

Source code(tar.gz)
Source code(zip)
v1.5.2(May 30, 2019)
The license was changed from BSD+Patents to MIT.

Changelog:

propagates exceptions raised in sub-indexes of IndexShards and IndexReplicas;

support for searching several inverted lists in parallel (parallel_mode != 0);

better support for PQ codes where nbit != 8 or 16;

IVFSpectralHash implementation: spectral hash codes inside an IVF;

6-bit per component scalar quantizer (4 and 8 bit were already supported);

combinations of inverted lists: HStackInvertedLists and VStackInvertedLists;

configurable number of threads for OnDiskInvertedLists prefetching (including 0=no prefetch);

more test and demo code compatible with Python 3 (print with parentheses);

refactored benchmark code: data loading is now in a single file.

Source code(tar.gz)
Source code(zip)
v1.5.1(May 30, 2019)
Changelog:

a MatrixStats object, which reports useful statistics about a dataset;

an option to round coordinates during k-means optimization;

an alternative option for search in HNSW;

moved stats() and imbalance_factor() from IndexIVF to InvertedLists object;

range search is now available for IVFScalarQuantizer;

support for direct uint_8 codec in ScalarQuantizer;

renamed IndexProxy to IndexReplicas (now ;

better support for PQ code assignment with external index;

support for IMI2x16 (4B virtual centroids!);

support for k = 2048 search on GPU (instead of 1024);

most CUDA mem alloc failures now throw exceptions instead of terminating on an assertion;

support for renaming an ondisk invertedlists;

interrupt computations with interrupt signal (ctrl-C) in python;

simplified build system (with --with-cuda/--with-cuda-arch options);

updated example Dockerfile;

conda packages now depend on the cudatoolkit packages, which fixes some interferences with pytorch. Consequentially, faiss-gpu should now be installed by conda install -c pytorch faiss-gpu cudatoolkit=10.0.

Source code(tar.gz)
Source code(zip)
v1.5.0(May 30, 2019)
Changelog:

GpuIndexBinaryFlat

IndexBinaryHNSW

Source code(tar.gz)
Source code(zip)
v1.4.0(Aug 31, 2018)
Faiss 1.4.0

Features:

automatic tracking of C++ references in Python

non-intel platforms supported -- some functions optimized for ARM

override nprobe for concurrent searches

support for floating-point quantizers in binary indexes

Bug fixes:

no more segfaults in python (I know it's the same as the first feature but it's important!)

fix GpuIndexIVFFlat issues for float32 with 64 / 128 dims

fix sharding of flat indexes on GPU with index_cpu_to_gpu_multiple

The Python interface of Faiss closely mimics the C++ interface. This means that all C++ functions, objects, fields and methods are visible and accessible in Python. This is done thanks to SWIG, that automatically generates Python classes from the C++ headers. The downside is that this low-level access means that there is no automatic tracking of C++ references in Python. For example:

index = IndexIVFFlat(IndexFlatL2(10), 10, 100)

would crash. Python does not know that the IndexFlatL2 is referenced by the IndexIVFFlat, so the garbage collector deallocates the IndexFlatL2 while IndexIVFFlat still references it. In Faiss 1.4.0, we added code to all such constructors that adds a Python-level reference to the object and prevents deallocation. With this upgrade, there should be no crashes in pure Python any more, you can report them right away as issues.

Faiss was developed on 64-bit x86 platforms, Linux and Mac OS. There were quite a few locations in the code that shamelessly assumed that they were compiled with SSE support. Faiss 1.4.0 is portable to other hardware, it has pure C++ code for all operations, and SSE/AVX is only enabled if the appropriate macro are set. This was tested on an ARM platform and also a few operations were optimized for the ARM SIMD operations (in utils_simd.cpp).

To compile on a non-x86 platform, you will need to provide a BLAS library (OpenBLAS works for aarch64) and remove x86-specific flags from the makefile.inc (manually for now). Faiss is not portable to other compilers than g++/clang though.

The search-time parameters like nprobe for IndexIVF are set in the index object. What if you want to perform concurrent searches from several threads with different search parameters? This was not possible so far. Now there is an IVFSearchParameters object that can override the parameters set at the object level. See tests/test_params_override.cpp

Faiss' support for binary indexes is recent, and not so many index types are supported. To work around this, we added IndexBinaryFromFloat, a binary index that wraps around any floating-point index. This makes it possible, for example, to use an IndexHNSW as a quantizer for an IndexBinaryIVF. See tests/test_index_binary_from_float.py

We also fixed a few bugs that correspond to github issues.
Source code(tar.gz)
Source code(zip)
v1.3.0(Jul 12, 2018)
Features:

Support for binary indexes (IndexBinaryFlat, IndexBinaryIVF)

Support fp16 encoding in scalar quantizer

Support for deduplication in IndexIVFFlat

Support for index serialization

Bugs:

Fix MMAP bug for normal indexes

Fix propagation of io_flags in read func

Fix k-selection for CUDA 9

Fix race condition in OnDiskInvertedLists

Source code(tar.gz)
Source code(zip)
v1.2.1(Mar 1, 2018)
Features

Support for on-disk storage of IndexIVF data.

C API

extended tutorial to GPU indexes

Source code(tar.gz)
Source code(zip)

Owner

Meta Research

GitHub Repository faiss.ai

🥈78th place in Riiid Solution🥈

Riiid Answer Correctness Prediction Introduction This repository is the code that placed 78th in Riiid Answer Correctness Prediction competition. Requ

14 Apr 26, 2022

Dense Prediction Transformers

Vision Transformers for Dense Prediction This repository contains code and models for our paper: Vision Transformers for Dense Prediction René Ranftl,

1.3k Dec 28, 2022

The BCNet related data and inference model.

BCNet This repository includes the some source code and related dataset of paper BCNet: Learning Body and Cloth Shape from A Single Image, ECCV 2020,

81 Dec 12, 2022

Implementation of gMLP, an all-MLP replacement for Transformers, in Pytorch

383 Jan 02, 2023

(NeurIPS 2021) Realistic Evaluation of Transductive Few-Shot Learning

Realistic evaluation of transductive few-shot learning Introduction This repo contains the code for our NeurIPS 2021 submitted paper "Realistic evalua

14 Dec 13, 2022

Code for Neurips2021 Paper "Topology-Imbalance Learning for Semi-Supervised Node Classification".

Topology-Imbalance Learning for Semi-Supervised Node Classification Introduction Code for NeurIPS 2021 paper "Topology-Imbalance Learning for Semi-Sup

40 Nov 23, 2022

Implementation of Memory-Compressed Attention, from the paper "Generating Wikipedia By Summarizing Long Sequences"

Memory Compressed Attention Implementation of the Self-Attention layer of the proposed Memory-Compressed Attention, in Pytorch. This repository offers

47 Dec 23, 2022

BoxInst: High-Performance Instance Segmentation with Box Annotations

Introduction This repository is the code that needs to be submitted for OpenMMLab Algorithm Ecological Challenge, the paper is BoxInst: High-Performan

88 Dec 21, 2022

Unsupervised phone and word segmentation using dynamic programming on self-supervised VQ features.

Unsupervised Phone and Word Segmentation using Vector-Quantized Neural Networks Overview Unsupervised phone and word segmentation on speech data is pe

13 Dec 11, 2022

Predictive Maintenance LSTM

Predictive-Maintenance-LSTM - Predictive maintenance study for Complex case study, we've obtained failure causes by operational error and more deeply by design mistakes.

1 Dec 31, 2021

PyTorch implementation and pretrained models for XCiT models. See XCiT: Cross-Covariance Image Transformer

Official code Cross-Covariance Image Transformer (XCiT)

605 Jan 02, 2023

Official Implementation for the paper DeepFace-EMD: Re-ranking Using Patch-wise Earth Mover’s Distance Improves Out-Of-Distribution Face Identification

DeepFace-EMD: Re-ranking Using Patch-wise Earth Mover’s Distance Improves Out-Of-Distribution Face Identification Official Implementation for the pape

36 Dec 28, 2022

A library for efficient similarity search and clustering of dense vectors.

Related tags

Overview

Faiss

News

Introduction

Building

How Faiss works

Full documentation of Faiss

Authors

Reference

Join the Faiss community

License

Comments

Summary

Platform

Reproduction instructions

Summary

Platform

Reproduction instructions

Summary

Platform

Reproduction instructions

Summary

Platform

Reproduction instructions

1. The result of GPU index is not the same as CPU, even although on the same dateset with the same index

2. Duplicate items in the GPU result

Summary

Platform

Summary

Platform

#1 for i in range(0, 100): time.sleep(0.2) s_time = time.time() D, I = faiss_index.search(feature, 10) print(time.time() - s_time)

#2 for i in range(0, 100): #time.sleep(0.2) s_time = time.time() D, I = faiss_index.search(feature, 10) print(time.time() - s_time)

Platform

Reproduction instructions

Releases(v1.7.3)

v1.7.3(Nov 30, 2022)

v1.7.2(Jan 10, 2022)

ADDED

CHANGED

v1.7.1(May 28, 2021)

v1.7.0(Feb 13, 2021)

v1.6.5(Dec 21, 2020)

v1.6.4(Oct 22, 2020)

Features

v1.6.3(Aug 17, 2020)

v1.5.3(Jun 24, 2019)

v1.5.2(May 30, 2019)

Changelog:

v1.5.1(May 30, 2019)

Changelog:

v1.5.0(May 30, 2019)

Changelog:

v1.4.0(Aug 31, 2018)

Faiss 1.4.0

Features:

Bug fixes:

v1.3.0(Jul 12, 2018)

Features:

Bugs:

v1.2.1(Mar 1, 2018)

Features

Owner

Meta Research

🥈78th place in Riiid Solution🥈

Dense Prediction Transformers

The BCNet related data and inference model.

Implementation of gMLP, an all-MLP replacement for Transformers, in Pytorch

(NeurIPS 2021) Realistic Evaluation of Transductive Few-Shot Learning

Code for Neurips2021 Paper "Topology-Imbalance Learning for Semi-Supervised Node Classification".

Implementation of Memory-Compressed Attention, from the paper "Generating Wikipedia By Summarizing Long Sequences"

BoxInst: High-Performance Instance Segmentation with Box Annotations

Unsupervised phone and word segmentation using dynamic programming on self-supervised VQ features.

Predictive Maintenance LSTM

PyTorch implementation and pretrained models for XCiT models. See XCiT: Cross-Covariance Image Transformer

Official Implementation for the paper DeepFace-EMD: Re-ranking Using Patch-wise Earth Mover’s Distance Improves Out-Of-Distribution Face Identification

基于Paddle框架的arcface复现

A framework for Quantification written in Python

Implementation of the paper Recurrent Glimpse-based Decoder for Detection with Transformer.

#2 for i in range(0, 100): #time.sleep(0.2)
s_time = time.time() D, I = faiss_index.search(feature, 10) print(time.time() - s_time)