Neighbourhood Retrieval with Distance Correlation

Assign Pseudo class labels to datapoints in the latent space.

NNDC is a slim wrapper around FAISS.
NNDC transforms the space such that the Inner Product Index in FAISS (IndexFlatIP) computes the Distance Correlation.
Support for KernelPCA (non-linear PCA) for dimensionality reduction.

Installation

pip install git+https://github.com/The-Learning-Machines/nndc

Usage

dim = 128 
n = 20000

import nndc
import numpy as np

index = nndc.DCIndex(
    in_dim=dim, # Dimensionality of the input vectors
    threshold=0.2, # How far away from a vector is the neighbourhood
    out_dim=32, # Dimensionality of the vectors after PCA (only needed if using PCA)
    use_pca=True, # Use KernelPCA
    verbose=True,
    kernel="rbf" # Use Radial Basis Function as the kernel for KernelPCA
)

# Generate Random data
np.random.seed(1234)             
xb = np.random.random((n, dim)).astype('float32')
xb[:, 0] += np.arange(n) / 1000.
xq = np.random.random((100, dim)).astype('float32')
xq[:, 0] += np.arange(100) / 1000.

# Fit KernelPCA
index.add_pca_training_data(xb[:1000, :])
index.fit_pca()

# Add vectors to the Index
vector_ids = np.arange(xb.shape[0])
index.add(xb, vector_ids)

# Build a nerighbourhood graph
index.build_neighbourhood()

# Query the neighbours of vector with ID=0
neighbour_ids, neighbour_similarity = index[0]

Neighbourhood Retrieval (Nearest Neighbours) with Distance Correlation.

Related tags

Overview

Neighbourhood Retrieval with Distance Correlation

Installation

Usage

Owner

The Learning Machines

Open-Source CI/CD platform for ML teams. Deliver ML products, better & faster. ⚡️🧑‍🔧

A machine learning project that predicts the price of used cars in the UK

Machine Learning Course with Python:

Send rockets to Mars with artificial intelligence(Genetic algorithm) in python.

ML Optimizers from scratch using JAX

PyPOTS - A Python Toolbox for Data Mining on Partially-Observed Time Series

Implementation of linesearch Optimization Algorithms in Python

Project to deploy a machine learning model based on Titanic dataset from Kaggle

This project impelemented for midterm of the Machine Learning #Zoomcamp #Alexey Grigorev

stability-selection - A scikit-learn compatible implementation of stability selection

LinearRegression2 Tvads and CarSales

Predict the demand for electricity (R) - FRENCH

Azure MLOps (v2) solution accelerators.

A Python Package to Tackle the Curse of Imbalanced Datasets in Machine Learning

The unified machine learning framework, enabling framework-agnostic functions, layers and libraries.

Distributed Tensorflow, Keras and PyTorch on Apache Spark/Flink & Ray

XManager: A framework for managing machine learning experiments 🧑‍🔬

Timeseries analysis for neuroscience data

AutoTabular automates machine learning tasks enabling you to easily achieve strong predictive performance in your applications.

Machine-care - A simple python script to take care of simple maintenance tasks