The Fundamental Clustering Problems Suite (FCPS) summaries 54 state-of-the-art clustering algorithms, common cluster challenges and estimations of the number of clusters as well as the testing for cluster tendency.

Overview

CRAN_Status_Badge DOI CRAN RStudio mirror downloads CRAN RStudio mirror downloads

FCPS

Fundamental Clustering Problems Suite

The package provides over sixty state-of-the-art clustering algorithms for unsupervised machine learning published in [Thrun and Stier 2021].

Table of contents

  1. Description
  2. Installation
  3. Tutorial Examples
  4. Manual
  5. Use cases
  6. Additional information
  7. References

Description

The Fundamental Clustering Problems Suite (FCPS) summaries over sixty state-of-the-art clustering algorithms available in R language. An important advantage is that the input and output of clustering algorithms is simplified and consistent in order to enable users a swift execution of cluster analysis. By combining mirrored-density plots (MD plots) with statistical testing FCPS provides a tool to investigate the cluster tendency quickly prior to the cluster analysis itself [Thrun 2020]. Common clustering challenges can be generated with arbitrary sample size [Thrun and Ultsch 2020a]. Additionally, FCPS sums 26 indicators with the goal to estimate the number of clusters up and provides an appropriate implementation of the clustering accuracy for more than two clusters [Thrun and Ultsch 2021]. A subset of methods was used in a benchmarking of algorithms published in [Thrun and Ultsch 2020b].

Installation

Installation using CRAN

Install automatically with all dependencies via

install.packages("FCPS",dependencies = T)

# Optionally, for the automatic installation
# of all suggested packages:
Suggested=c("kernlab", "cclust", "dbscan", "kohonen",
            "MCL", "ADPclust", "cluster", "DatabionicSwarm",
            "orclus", "subspace", "flexclust", "ABCanalysis",
            "apcluster", "pracma", "EMCluster", "pdfCluster", "parallelDist",
            "plotly", "ProjectionBasedClustering", "GeneralizedUmatrix",
            "mstknnclust", "densityClust", "parallel", "energy", "R.utils",
            "tclust", "Spectrum", "genie", "protoclust", "fastcluster", 
			"clusterability", "signal", "reshape2", "PPCI", "clustrd", "smacof",
			"rgl", "prclust", "dendextend",
            "moments", "prabclus", "VarSelLCM", "sparcl", "mixtools",
            "HDclassif", "clustvarsel", "knitr", "rmarkdown")

for(i in 1:length(Suggested)) {
  if (!requireNamespace(Suggested[i], quietly = TRUE)) {
    message(paste("Installing the package", Suggested[i]))
    install.packages(Suggested[i], dependencies = T)
  }
}

Installation using Github

Please note, that dependecies have to be installed manually.

remotes::install_github("Mthrun/FCPS")

Installation using R Studio

Please note, that dependecies have to be installed manually.

Tools -> Install Packages -> Repository (CRAN) -> FCPS

Tutorial Examples

The tutorial with several examples can be found on in the vignette on CRAN:

https://cran.r-project.org/web/packages/FCPS/vignettes/FCPS.html

Manual

The full manual for users or developers is available here: https://cran.r-project.org/web/packages/FCPS/FCPS.pdf

Use Cases

Cluster Analysis of High-dimensional Data

The package FCPS provides a clear and consistent access to state-of-the-art clustering algorithms:

library(FCPS)
data("Leukemia")
Data=Leukemia$Distance
Classification=Leukemia$Cls
ClusterNo=6
CA=ADPclustering(Leukemia$DistanceMatrix,ClusterNo)
Cls=ClusterRenameDescendingSize(CA$Cls)
ClusterPlotMDS(Data,Cls,main =Leukemia’,Plotter3D =plotly’)
ClusterAccuracy(Cls,Classification)
[1] 0.9963899

Generating Typical Challenges for Clustering Algorithms

Several clustering challenge can be generated with an arbitrary sample size:

set.seed(600)
library(FCPS)
DataList=ClusterChallenge("Chainlink", SampleSize = 750,
PlotIt=TRUE)
Data=DataList$Chainlink
Cls=DataList$Cls
> ClusterCount(Cls)
$CountPerCluster
$NumberOfClusters
$ClusterPercentages
[1] 377 373
[1] 2
[1] 50.26667 49.73333

Cluster-Tendency

For many applications, it is crucial to decide if a dataset possesses cluster structures:

library(FCPS)
set.seed(600)
DataList=ClusterChallenge("Chainlink",SampleSize = 750)
Data=DataList$Chainlink
Cls=DataList$Cls
library(ggplot2)
ClusterabilityMDplot(Data)+theme_bw()

Estimation of Number of Clusters

The “FCPS” package provides up to 26 indicators to determine the number of clusters:

library(FCPS)
set.seed(135)
DataList=ClusterChallenge("Chainlink",SampleSize = 900)
Data=DataList$Chainlink
Cls=DataList$Cls
Tree=HierarchicalClustering(Data,0,"SingleL")[[3]]
ClusterDendrogram(Tree,4,main="Single Linkage")
MaximumNumber=7
clsm <- matrix(data = 0, nrow = dim(Data)[1], ncol = MaximumNumber)
for (i in 2:(MaximumNumber+1)) {
clsm[,i-1] <- cutree(Tree,i)
}
out=ClusterNoEstimation(Data, ClsMatrix = clsm,
MaxClusterNo = MaximumNumber, PlotIt = TRUE)

Additional information

Authors website http://www.deepbionics.org/
License GPL-3
Dependencies R (>= 3.5.0)
Bug reports https://github.com/Mthrun/FCPS/issues

References

  1. [Thrun/Stier, 2021] Thrun, M. C., & Stier, Q.: Fundamental Clustering Algorithms Suite SoftwareX, Vol. 13(C), pp. 100642. doi 10.1016/j.softx.2020.100642, 2021.
  2. [Thrun, 2020] Thrun, M. C.: Improving the Sensitivity of Statistical Testing for Clusterability with Mirrored-Density Plot, in Archambault, D., Nabney, I. & Peltonen, J. (eds.), Machine Learning Methods in Visualisation for Big Data, DOI 10.2312/mlvis.20201102, The Eurographics Association, Norrköping , Sweden, May, 2020.
  3. [Thrun/Ultsch, 2020a] Thrun, M. C., & Ultsch, A.: Clustering Benchmark Datasets Exploiting the Fundamental Clustering Problems, Data in Brief,Vol. 30(C), pp. 105501, DOI 10.1016/j.dib.2020.105501 , 2020.
  4. [Thrun/Ultsch, 2021] Thrun, M. C., and Ultsch, A.: Swarm Intelligence for Self-Organized Clustering, Artificial Intelligence, Vol. 290, pp. 103237, \doi{10.1016/j.artint.2020.103237}, 2021.
  5. [Thrun/Ultsch, 2020b] Thrun, M. C., & Ultsch, A. : Using Projection based Clustering to Find Distance and Density based Clusters in High-Dimensional Data, Journal of Classification, \doi{10.1007/s00357-020-09373-2}, Springer, 2020.
You might also like...
Abstractive opinion summarization system (SelSum) and the largest dataset of Amazon product summaries (AmaSum). EMNLP 2021 conference paper.
Abstractive opinion summarization system (SelSum) and the largest dataset of Amazon product summaries (AmaSum). EMNLP 2021 conference paper.

Learning Opinion Summarizers by Selecting Informative Reviews This repository contains the codebase and the dataset for the corresponding EMNLP 2021

On Generating Extended Summaries of Long Documents

ExtendedSumm This repository contains the implementation details and datasets used in On Generating Extended Summaries of Long Documents paper at the

Automatic voice-synthetised summaries of latest research papers on arXiv

PaperWhisperer PaperWhisperer is a Python application that keeps you up-to-date with research papers. How? It retrieves the latest articles from arXiv

Code for ACL 21: Generating Query Focused Summaries from Query-Free Resources

marge This repository releases the code for Generating Query Focused Summaries from Query-Free Resources. Please cite the following paper [bib] if you

We evaluate our method on different datasets (including ShapeNet, CUB-200-2011, and Pascal3D+) and achieve state-of-the-art results, outperforming all the other supervised and unsupervised methods and 3D representations, all in terms of performance, accuracy, and training time. This is the unofficial code of  Deep Dual-resolution Networks for Real-time and Accurate Semantic Segmentation of Road Scenes. which achieve state-of-the-art trade-off between accuracy and speed on cityscapes and camvid, without using inference acceleration and extra data Propose a principled and practically effective framework for unsupervised accuracy estimation and error detection tasks with theoretical analysis and state-of-the-art performance.
Propose a principled and practically effective framework for unsupervised accuracy estimation and error detection tasks with theoretical analysis and state-of-the-art performance.

Detecting Errors and Estimating Accuracy on Unlabeled Data with Self-training Ensembles This project is for the paper: Detecting Errors and Estimating

This repository is related to an Arabic tutorial, within the tutorial we discuss the common data structure and algorithms and their worst and best case for each, then implement the code using Python.

Data Structure and Algorithms with Python This repository is related to the Arabic tutorial here, within the tutorial we discuss the common data struc

tsai is an open-source deep learning package built on top of Pytorch & fastai focused on state-of-the-art techniques for time series classification, regression and forecasting.
tsai is an open-source deep learning package built on top of Pytorch & fastai focused on state-of-the-art techniques for time series classification, regression and forecasting.

Time series Timeseries Deep Learning Pytorch fastai - State-of-the-art Deep Learning with Time Series and Sequences in Pytorch / fastai

Comments
  • Mo gclustering2 model based clustering

    Mo gclustering2 model based clustering

    IMPORTANT UPDATE: MoGclustering renamed to ModelBasedClustering MoG Clustering is now defined es Mixture of Gaussians based on EM This is a change contrary to the book [Thrun, 2018]! Additionally density based clustering methods added.

    opened by Mthrun 1
  • Missing function

    Missing function

    `

    install.packages("FCPS") Installing package into ‘/home/roc/R/x86_64-pc-linux-gnu-library/4.0’ (as ‘lib’ is unspecified) trying URL 'https://cloud.r-project.org/src/contrib/FCPS_1.2.7.tar.gz' Content type 'application/x-gzip' length 2859121 bytes (2.7 MB) ================================================== downloaded 2.7 MB

    • installing source package ‘FCPS’ ... ** package ‘FCPS’ successfully unpacked and MD5 sums checked ** using staged installation ** R ** data *** moving datasets to lazyload DB ** inst ** byte-compile and prepare package for lazy loading ** help *** installing help indices ** building package indices ** installing vignettes ** testing if installed package can be loaded from temporary location ** testing if installed package can be loaded from final location ** testing if installed package keeps a record of temporary installation path
    • DONE (FCPS)

    The downloaded source packages are in ‘/tmp/Rtmpzb0Mdh/downloaded_packages’

    data('Hepta')

    out=HierarchicalClusterDists(as.matrix(dist(Hepta$Data)),ClusterNo=7) Error in HierarchicalClusterDists(as.matrix(dist(Hepta$Data)), ClusterNo = 7) : could not find function "HierarchicalClusterDists" `

    opened by technocrat 3
Releases(1.2.3)
  • 1.2.3(Jun 23, 2020)

    Many conventional clustering algorithms are provided in this package with consistent input and output, which enables the user to try out algorithms swiftly. Additionally, 26 statistical approaches for the estimation of the number of clusters as well as the the mirrored density plot (MD-plot) of clusterability are implemented. Moreover, the fundamental clustering problems suite (FCPS) offers a variety of clustering challenges any algorithm should handle when facing real world data, see Thrun, M.C., Ultsch A.: "Clustering Benchmark Datasets Exploiting the Fundamental Clustering Problems" (2020), Data in Brief, DOI:10.1016/j.dib.2020.105501.

    Source code(tar.gz)
    Source code(zip)
Owner
Dr. rer. nat. Michael C. Thrun, promovierte 2017 an der Philipps-Universität Marburg am Lehrstuhl für Neuroinformatik unter Prof. Dr. rer. nat. Alfred Ultsch.
git《FSCE: Few-Shot Object Detection via Contrastive Proposal Encoding》(CVPR 2021) GitHub: [fig8]

FSCE: Few-Shot Object Detection via Contrastive Proposal Encoding (CVPR 2021) This repo contains the implementation of our state-of-the-art fewshot ob

233 Dec 29, 2022
To SMOTE, or not to SMOTE?

To SMOTE, or not to SMOTE? This package includes the code required to repeat the experiments in the paper and to analyze the results. To SMOTE, or not

Amazon Web Services 1 Jan 03, 2022
Code for Boundary-Aware Segmentation Network for Mobile and Web Applications

BASNet Boundary-Aware Segmentation Network for Mobile and Web Applications This repository contain implementation of BASNet in tensorflow/keras. comme

Hamid Ali 8 Nov 24, 2022
基于Paddle框架的fcanet复现

fcanet-Paddle 基于Paddle框架的fcanet复现 fcanet 本项目基于paddlepaddle框架复现fcanet,并参加百度第三届论文复现赛,将在2021年5月15日比赛完后提供AIStudio链接~敬请期待 参考项目: frazerlin-fcanet 数据准备 本项目已挂

QuanHao Guo 7 Mar 07, 2022
PyTorch implementation of MuseMorphose, a Transformer-based model for music style transfer.

MuseMorphose This repository contains the official implementation of the following paper: Shih-Lun Wu, Yi-Hsuan Yang MuseMorphose: Full-Song and Fine-

Yating Music, Taiwan AI Labs 142 Jan 08, 2023
Scalable training for dense retrieval models.

Scalable implementation of dense retrieval. Training on cluster By default it trains locally: PYTHONPATH=.:$PYTHONPATH python dpr_scale/main.py traine

Facebook Research 90 Dec 28, 2022
🐾 Semantic segmentation of paws from cute pet images (PyTorch)

🐾 paw-segmentation 🐾 Semantic segmentation of paws from cute pet images 🐾 Semantic segmentation of paws from cute pet images (PyTorch) 🐾 Paw Segme

Zabir Al Nazi Nabil 3 Feb 01, 2022
Deploying PyTorch Model to Production with FastAPI in CUDA-supported Docker

Deploying PyTorch Model to Production with FastAPI in CUDA-supported Docker A example FastAPI PyTorch Model deploy with nvidia/cuda base docker. Model

Ming 68 Jan 04, 2023
PrimitiveNet: Primitive Instance Segmentation with Local Primitive Embedding under Adversarial Metric (ICCV 2021)

PrimitiveNet Source code for the paper: Jingwei Huang, Yanfeng Zhang, Mingwei Sun. [PrimitiveNet: Primitive Instance Segmentation with Local Primitive

Jingwei Huang 47 Dec 06, 2022
This tool converts a Nondeterministic Finite Automata (NFA) into a Deterministic Finite Automata (DFA)

This tool converts a Nondeterministic Finite Automata (NFA) into a Deterministic Finite Automata (DFA)

Quinn Herden 1 Feb 04, 2022
Unofficial PyTorch implementation of Masked Autoencoders Are Scalable Vision Learners

Unofficial PyTorch implementation of Masked Autoencoders Are Scalable Vision Learners This repository is built upon BEiT, thanks very much! Now, we on

Zhiliang Peng 2.3k Jan 04, 2023
Multi-Anchor Active Domain Adaptation for Semantic Segmentation (ICCV 2021 Oral)

Multi-Anchor Active Domain Adaptation for Semantic Segmentation Munan Ning*, Donghuan Lu*, Dong Wei†, Cheng Bian, Chenglang Yuan, Shuang Yu, Kai Ma, Y

Munan Ning 36 Dec 07, 2022
Automatically creates genre collections for your Plex media

Plex Auto Genres Plex Auto Genres is a simple script that will add genre collection tags to your media making it much easier to search for genre speci

Shane Israel 63 Dec 31, 2022
Based on Stockfish neural network(similar to LcZero)

MarcoEngine Marco Engine - interesnaya neyronnaya shakhmatnaya set', kotoraya ispol'zuyet metod samoobucheniya(dostizheniye khoroshoy igy putem proboy

Marcus Kemaul 4 Mar 12, 2022
Training vision models with full-batch gradient descent and regularization

Stochastic Training is Not Necessary for Generalization -- Training competitive vision models without stochasticity This repository implements trainin

Jonas Geiping 32 Jan 06, 2023
[AAAI 2022] Negative Sample Matters: A Renaissance of Metric Learning for Temporal Grounding

[AAAI 2022] Negative Sample Matters: A Renaissance of Metric Learning for Temporal Grounding Official Pytorch implementation of Negative Sample Matter

Multimedia Computing Group, Nanjing University 69 Dec 26, 2022
Bridging the Gap between Label- and Reference based Synthesis(ICCV 2021)

Bridging the Gap between Label- and Reference based Synthesis(ICCV 2021) Tensorflow implementation of Bridging the Gap between Label- and Reference-ba

huangqiusheng 8 Jul 13, 2022
CLDF dataset derived from Robbeets et al.'s "Triangulation Supports Agricultural Spread" from 2021

CLDF dataset derived from Robbeets et al.'s "Triangulation Supports Agricultural Spread" from 2021 How to cite If you use these data please cite the o

Digital Linguistics 2 Dec 20, 2021
Code release for ConvNeXt model

A ConvNet for the 2020s Official PyTorch implementation of ConvNeXt, from the following paper: A ConvNet for the 2020s. arXiv 2022. Zhuang Liu, Hanzi

Meta Research 4.6k Jan 08, 2023
Code for "Learning Canonical Representations for Scene Graph to Image Generation", Herzig & Bar et al., ECCV2020

Learning Canonical Representations for Scene Graph to Image Generation (ECCV 2020) Roei Herzig*, Amir Bar*, Huijuan Xu, Gal Chechik, Trevor Darrell, A

roei_herzig 24 Jul 07, 2022