DenseClus is a Python module for clustering mixed type data using UMAP and HDBSCAN

Last update: Dec 08, 2022

Overview

Amazon DenseClus

DenseClus is a Python module for clustering mixed type data using UMAP and HDBSCAN. Allowing for both categorical and numerical data, DenseClus makes it possible to incorporate all features in clustering.

Installation

python3 -m pip install Amazon-DenseClus

Usage

DenseClus requires a Panda's dataframe as input with both numerical and categorical columns. All preprocessing and extraction are done under the hood, just call fit and then retrieve the clusters!

from denseclus import DenseClus

clf = DenseClus(
    umap_combine_method="intersection_union_mapper",
)
clf.fit(df)

print(clf.score())

Examples

A hands-on example with an overview of how to use is currently available in the form of a Jupyter Notebook.

References

@article{mcinnes2018umap-software,
  title={UMAP: Uniform Manifold Approximation and Projection},
  author={McInnes, Leland and Healy, John and Saul, Nathaniel and Grossberger, Lukas},
  journal={The Journal of Open Source Software},
  volume={3},
  number={29},
  pages={861},
  year={2018}
}

@article{mcinnes2017hdbscan,
  title={hdbscan: Hierarchical density based clustering},
  author={McInnes, Leland and Healy, John and Astels, Steve},
  journal={The Journal of Open Source Software},
  volume={2},
  number={11},
  pages={205},
  year={2017}
}

DenseClus is a Python module for clustering mixed type data using UMAP and HDBSCAN

Related tags

Overview

Amazon DenseClus

Installation

Usage

Examples

References

Owner

Amazon Web Services - Labs

A Python package for the mathematical modeling of infectious diseases via compartmental models

Evaluation of a Monocular Eye Tracking Set-Up

This repository contains some analysis of possible nerdle answers

Hg002-qc-snakemake - HG002 QC Snakemake

pandas: powerful Python data analysis toolkit

Python Project on Pro Data Analysis Track

Automatic earthquake catalog building workflow: EQTransformer + Siamese EQTransformer + PickNet + REAL + HypoInverse

Demonstrate the breadth and depth of your data science skills by earning all of the Databricks Data Scientist credentials

Python ELT Studio, an application for building ELT (and ETL) data flows.

Bigdata Simulation Library Of Dream By Sandman Books

This repo contains a simple but effective tool made using python which can be used for quality control in statistical approach.

Synthetic Data Generation for tabular, relational and time series data.

Python Implementation of Scalable In-Memory Updatable Bitmap Indexing

Pipeline to convert a haploid assembly into diploid

ASOUL直播间弹幕抓取&&数据分析

Analyzing Covid-19 Outbreaks in Ontario

Processo de ETL (extração, transformação, carregamento) realizado pela equipe no projeto final do curso da Soul Code Academy.

This is an example of how to automate Ridit Analysis for a dataset with large amount of questions and many item attributes

Amundsen is a metadata driven application for improving the productivity of data analysts, data scientists and engineers when interacting with data.

Full ELT process on GCP environment.