Clustering is a popular approach to detect patterns in unlabeled data

Last update: Nov 11, 2022

Related tags

Overview

Visual Clustering

Clustering is a popular approach to detect patterns in unlabeled data. Existing clustering methods typically treat samples in a dataset as points in a metric space and compute distances to group together similar points. Visual Clustering a different way of clustering points in 2-dimensional space, inspired by how humans "visually" cluster data. The algorithm is based on trained neural networks that perform instance segmentation on plotted data.

For more details, see the accompanying paper: "Clustering Plotted Data by Image Segmentation", arXiv preprint, and please use the citation below.

@article{naous2021clustering,
  title={Clustering Plotted Data by Image Segmentation},
  author={Naous, Tarek and Sarkar, Srinjay and Abid, Abubakar and Zou, James},
  journal={arXiv preprint arXiv:2110.05187},
  year={2021}
}

Installation

pip install visual-clustering

Usage

The algorithm can be used the same way as the classical clustering algorithms in scikit-learn:
You first import the class VisualClustering and create an instance of it.

from visual_clustering import VisualClustering

model = VisualClustering(median_filter_size = 1, max_filter_size= 1)

The parameters median_filter_size and max_filter_size are set to 1 by default.
You can experiment with different values to see what works best for your dataset !

Let's create a simple synthetic dataset of blobs.

from sklearn import datasets

data = datasets.make_blobs(n_samples=50000, centers=6, random_state=23,center_box=(-30, 30))
plt.scatter(data[0][:, 0], data[0][:, 1], s=1, c='black')

To cluster the dataset, use the fit function of the model:

predictions = model.fit(data[0])

Visualizing the results

You can visualize the results using matplotlib as you would normally do with classical clustering algorithms:

import matplotlib.pyplot as plt
from itertools import cycle, islice
import numpy as np

colors = np.array(list(islice(cycle(["#000000", '#377eb8', '#ff7f00', '#4daf4a', '#f781bf', '#a65628', '#984ea3']), int(max(predictions) + 1))))
#Black color for outliers (if any)
colors = np.append(colors, ["#000000"])
plt.scatter(data[0][:, 0], data[0][:, 1], s=10, color=colors[predictions.astype('int8')])

Run this code inside a colab notebook:
https://colab.research.google.com/drive/1DcZXhKnUpz1GDoGaJmpS6VVNXVuaRmE5?usp=sharing

Dependencies

Make sure that you have the following libraries installed:

transformers 4.15.0
scipy 1.4.1
tensorflow 2.7.0
keras 2.7.0
numpy 1.19.5
cv2 4.1.2
skimage 0.18.3

Contact

Clustering is a popular approach to detect patterns in unlabeled data

Related tags

Overview

Visual Clustering

Installation

Usage

Visualizing the results

Dependencies

Contact

Owner

Tarek Naous

An efficient 3D semantic segmentation framework for Urban-scale point clouds like SensatUrban, Campus3D, etc.

Serverless proxy for Spark cluster

AgeGuesser: deep learning based age estimation system. Powered by EfficientNet and Yolov5

Framework for joint representation learning, evaluation through multimodal registration and comparison with image translation based approaches

A PyTorch implementation for our paper "Dual Contrastive Learning: Text Classification via Label-Aware Data Augmentation".

Official code for the ICCV 2021 paper "DECA: Deep viewpoint-Equivariant human pose estimation using Capsule Autoencoders"

LowRankModels.jl is a julia package for modeling and fitting generalized low rank models.

SPRING is a seq2seq model for Text-to-AMR and AMR-to-Text (AAAI2021).

Replication of Pix2Seq with Pretrained Model

A PyTorch based deep learning library for drug pair scoring.

Stock-history-display - something like a easy yearly review for your stock performance

This repository contains the official implementation code of the paper Transformer-based Feature Reconstruction Network for Robust Multimodal Sentiment Analysis

RepMLP: Re-parameterizing Convolutions into Fully-connected Layers for Image Recognition

Learning embeddings for classification, retrieval and ranking.

Non-Vacuous Generalisation Bounds for Shallow Neural Networks

SEOVER: Sentence-level Emotion Orientation Vector based Conversation Emotion Recognition Model

Keyword-BERT: Keyword-Attentive Deep Semantic Matching

Demo for Real-time RGBD-based Extended Body Pose Estimation paper

This project is used for the paper Differentiable Programming of Isometric Tensor Network

Pre-trained Deep Learning models and demos (high quality and extremely fast)