A fast hierarchical dimensionality reduction algorithm.

Last update: Dec 12, 2022

Related tags

Overview

h-NNE: Hierarchical Nearest Neighbor Embedding

A fast hierarchical dimensionality reduction algorithm.

h-NNE is a general purpose dimensionality reduction algorithm such as t-SNE and UMAP. It stands out for its speed, simplicity and the fact that it provides a hierarchy of clusterings as part of its projection process. The algorithm is inspired by the FINCH clustering algorithm. For more information on the structure of the algorithm, please look at our corresponding paper in ArXiv:

M. Saquib Sarfraz*, Marios Koulakis*, Constantin Seibold, Rainer Stiefelhagen. Hierarchical Nearest Neighbor Graph Embedding for Efficient Dimensionality Reduction. CVPR 2022.

More details are available in the project documentation.

Installation

The project is available in PyPI. To install run:

pip install hnne

How to use h-NNE

The HNNE class implements the common methods of the sklearn interface.

Simple projection example

import numpy as np
from hnne import HNNE

data = np.random.random(size=(1000, 256))

hnne = HNNE(dim=2)
projection = hnne.fit_transform(data)

Projecting on new points

hnne = HNNE()
projection = hnne.fit_transform(data)

new_data_projection = hnne.transform(new_data)

Demos

The following demo notebooks are available:

Citation

If you make use of this project in your work, it would be appreciated if you cite the hnne paper:

@article{hnne,
  title={Hierarchical Nearest Neighbor Graph Embedding for Efficient Dimensionality Reduction},
  author={M. Saquib Sarfraz, Marios Koulakis, Constantin Seibold, Rainer Stiefelhagen},
  booktitle = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  year = {2022}
}

If you make use of the clustering properties of the algorithm please also cite:

 @inproceedings{finch,
   author    = {M. Saquib Sarfraz and Vivek Sharma and Rainer Stiefelhagen},
   title     = {Efficient Parameter-free Clustering Using First Neighbor Relations},
   booktitle = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
   pages = {8934--8943},
   year  = {2019}
}

A fast hierarchical dimensionality reduction algorithm.

Related tags

Overview

h-NNE: Hierarchical Nearest Neighbor Embedding

Installation

How to use h-NNE

Simple projection example

Projecting on new points

Demos

Citation

Owner

Marios Koulakis

GVT is a generic translation tool for parts of text on the PC screen with Text to Speak functionality.

AIDynamicTextReader - A simple dynamic text reader based on Artificial intelligence

This project consists of data analysis and data visualization (done using python)of all IPL seasons from 2008 to 2019 and answering the most asked questions about the IPL.

Transformers Wav2Vec2 + Parlance's CTCDecodeTransformers Wav2Vec2 + Parlance's CTCDecode

This python module is an easy-to-use port of the text normalization used in the paper "Not low-resource anymore: Aligner ensembling, batch filtering, and new datasets for Bengali-English machine translation". It is intended to be used for normalizing / cleaning Bengali and English text.

[AAAI 21] Curriculum Labeling: Revisiting Pseudo-Labeling for Semi-Supervised Learning

Code for the paper "Are Sixteen Heads Really Better than One?"

用Resnet101+GPT搭建一个玩王者荣耀的AI

To classify the News into Real/Fake using Features from the Text Content of the article

Community and sentiment analysis based on tweets

🚀Clone a voice in 5 seconds to generate arbitrary speech in real-time

使用pytorch+transformers复现了SimCSE论文中的有监督训练和无监督训练方法

Winner system (DAMO-NLP) of SemEval 2022 MultiCoNER shared task over 10 out of 13 tracks.

Sequence-to-Sequence learning using PyTorch

GPT-3: Language Models are Few-Shot Learners

A repository to run gpt-j-6b on low vram machines (4.2 gb minimum vram for 2000 token context, 3.5 gb for 1000 token context). Model loading takes 12gb free ram.

NLP Core Library and Model Zoo based on PaddlePaddle 2.0

This repository contains Python scripts for extracting linguistic features from Filipino texts.

Silero Models: pre-trained speech-to-text, text-to-speech models and benchmarks made embarrassingly simple

Bidirectional Variational Inference for Non-Autoregressive Text-to-Speech (BVAE-TTS)