Fuzzy Overclustering (FOC)

Overview

Fuzzy Overclustering (FOC)

In real-world datasets, we need consistent annotations between annotators to give a certain ground-truth label. However, in many applications these consistent annotations can not be given due to issues like intra- and interobserver variability. We call these inconsistent label fuzzy. Our method Fuzzy Overclustering overclusters the data and can therefore handle these fuzzy labels better than Out-of-the-Box Semi-Supervised Methods.

More details are given in the accpeted full paper at https://doi.org/10.3390/s21196661 or in the preprint https://arxiv.org/abs/2012.01768

The main idea is illustrated below. The graphic and caption are taken from the original work.

main idea of paper

Illustration of fuzzy data and overclustering -- The grey dots represent unlabeled data and the colored dots labeled data from different classes. The dashed lines represent decision boundaries. For certain data, a clear separation of the different classes with one decision boundary is possible and both classes contain the same amount of data (top). For fuzzy data determining a decision boundary is difficult because of intermediate datapoints between the classes (middle). These fuzzy datapoints can often not be easily sorted into one consistent class between annotators. If you overcluster the data, you get smaller but more consistent substructures in the fuzzy data (bottom). The images illustrate possible examples for \certain data (cat & dog) and \fuzzy plankton data (trichodesmium puff and tuft). The center plankton image was considered to be trichodesmium puff or tuft by around half of the annotators each. The left and right plankton image were consistently annotated as their respective class.

Installation

We advise to use docker for the experiments. We recommend a python3 container with tesnorflow 1.14 preinstalled. Additionally the following commands need to be executed:

apt-get update
apt-get install -y libsm6 libxext6 libxrender-dev libgl1-mesa-glx

After this ensure that the requirements from requirements.txt are installed. The most important packages are keras, scipy and opencv.

Usage

The parameters are given in arguments.yaml with their description. Most of the parameters can be left at the default value. Especially the dataset, batch size and epoch related parameters are imported.

As a rule of thumb the following should be applied:

  • overcluster_k = 5-6 * the number of classes
  • batch_size = repetition * overcluster_k * 2-3

You need to define three directories for the execution with docker:

  • DATASET_ROOT, this folder contains a folder with the dataset name. This folder contains a trainand val folder. It needs a folder unlabeled if the parameter unlabeled_data is used. Each folder contains subfolder with the given classes.
  • LOG_ROOT, inside a subdiretory logs all experimental results will be stored with regard to the given IDs and a time stamp
  • SRC_ROOT root of the this project source code

The DOCKER_IMAGE is the above defined image.

You can visualize the results with tensorboard --logdir . from inside the log_dir

Example Usages

bash % test pipeline running docker run -it --rm -v :/data-ssd -v :/data1 -v :/src -w="/src" python main.py --IDs foc experiment_name not_use_mi --dataset [email protected] --unlabeled_data [email protected] --frozen_batch_size 130 --batch_size 130 --overcluster_k 60 --num_gpus 1 --normal_epoch 2 --frozen_epoch 1 % training FOC-Light docker run -it --rm -v :/data-ssd -v :/data1 -v :/home -w="/home" python main.py --experiment_identifiers foc experiment_name not_use_mi --dataset stl10 --frozen_batch_size 130 --batch_size 130 --overcluster_k 60 --num_gpus 1 % training FOC (no warmup) % needs multiple GPUs or very large ones (change num gpu to 1 in this case) docker run -it --rm -v :/data-ssd -v :/data1 -v :/home -w="/home" python main.py --experiment_identifiers foc experiment_name not_use_mi --dataset stl10 --frozen_batch_size 390 --batch_size 390 --overcluster_k 60 --num_gpus 3 --lambda_m 1 --sample_repetition 3 ">
% test container
docker run -it --rm -v 
               
                :/data-ssd -v 
                
                 :/data1   -v 
                 
                  :/src -w="/src" 
                  
                    bash


% test pipeline running
docker run -it --rm -v 
                   
                    :/data-ssd -v 
                    
                     :/data1 -v 
                     
                      :/src -w="/src" 
                      
                        python main.py --IDs foc experiment_name not_use_mi --dataset [email protected] --unlabeled_data [email protected] --frozen_batch_size 130 --batch_size 130 --overcluster_k 60 --num_gpus 1 --normal_epoch 2 --frozen_epoch 1 % training FOC-Light docker run -it --rm -v 
                       
                        :/data-ssd -v 
                        
                         :/data1 -v 
                         
                          :/home -w="/home" 
                          
                            python main.py --experiment_identifiers foc experiment_name not_use_mi --dataset stl10 --frozen_batch_size 130 --batch_size 130 --overcluster_k 60 --num_gpus 1 % training FOC (no warmup) % needs multiple GPUs or very large ones (change num gpu to 1 in this case) docker run -it --rm -v 
                           
                            :/data-ssd -v 
                            
                             :/data1 -v 
                             
                              :/home -w="/home" 
                              
                                python main.py --experiment_identifiers foc experiment_name not_use_mi --dataset stl10 --frozen_batch_size 390 --batch_size 390 --overcluster_k 60 --num_gpus 3 --lambda_m 1 --sample_repetition 3 
                              
                             
                            
                           
                          
                         
                        
                       
                      
                     
                    
                   
                  
                 
                
               
Keras-1D-NN-Classifier

Keras-1D-NN-Classifier This code is based on the reference codes linked below. reference 1, reference 2 This code is for 1-D array data classification

Jae-Hoon Shim 6 May 18, 2021
This codebase proposes modular light python and pytorch implementations of several LiDAR Odometry methods

pyLiDAR-SLAM This codebase proposes modular light python and pytorch implementations of several LiDAR Odometry methods, which can easily be evaluated

Kitware, Inc. 208 Dec 16, 2022
An open-source project for applying deep learning to medical scenarios

Auto Vaidya An open source solution for creating end-end web app for employing the power of deep learning in various clinical scenarios like implant d

Smaranjit Ghose 18 May 29, 2022
[AAAI-2022] Official implementations of MCL: Mutual Contrastive Learning for Visual Representation Learning

Mutual Contrastive Learning for Visual Representation Learning This project provides source code for our Mutual Contrastive Learning for Visual Repres

winycg 48 Jan 02, 2023
Learning multiple gaits of quadruped robot using hierarchical reinforcement learning

Learning multiple gaits of quadruped robot using hierarchical reinforcement learning We propose a method to learn multiple gaits of quadruped robot us

Yunho Kim 17 Dec 11, 2022
Python code to fuse multiple RGB-D images into a TSDF voxel volume.

Volumetric TSDF Fusion of RGB-D Images in Python This is a lightweight python script that fuses multiple registered color and depth images into a proj

Andy Zeng 845 Jan 03, 2023
Code to reproduce the results for Compositional Attention

Compositional-Attention This repository contains the official implementation for the paper Compositional Attention: Disentangling Search and Retrieval

Sarthak Mittal 58 Nov 30, 2022
Official PyTorch implementation of the paper Image-Based CLIP-Guided Essence Transfer.

TargetCLIP- official pytorch implementation of the paper Image-Based CLIP-Guided Essence Transfer This repository finds a global direction in StyleGAN

Hila Chefer 221 Dec 13, 2022
Official Repsoitory for "Mish: A Self Regularized Non-Monotonic Neural Activation Function" [BMVC 2020]

Mish: Self Regularized Non-Monotonic Activation Function BMVC 2020 (Official Paper) Notes: (Click to expand) A considerably faster version based on CU

Xa9aX ツ 1.2k Dec 29, 2022
RCD: Relation Map Driven Cognitive Diagnosis for Intelligent Education Systems

RCD: Relation Map Driven Cognitive Diagnosis for Intelligent Education Systems This is our implementation for the paper: Weibo Gao, Qi Liu*, Zhenya Hu

BigData Lab @USTC 中科大大数据实验室 10 Oct 16, 2022
A Semantic Segmentation Network for Urban-Scale Building Footprint Extraction Using RGB Satellite Imagery

A Semantic Segmentation Network for Urban-Scale Building Footprint Extraction Using RGB Satellite Imagery This repository is the official implementati

Aatif Jiwani 42 Dec 08, 2022
GuideDog is an AI/ML-based mobile app designed to assist the lives of the visually impaired, 100% voice-controlled

Guidedog Authors: Kyuhee Jo, Steven Gunarso, Jacky Wang, Raghav Sharma GuideDog is an AI/ML-based mobile app designed to assist the lives of the visua

Kyuhee Jo 5 Nov 24, 2021
JugLab 33 Dec 30, 2022
Official Pytorch Implementation of: "Semantic Diversity Learning for Zero-Shot Multi-label Classification"(2021) paper

Semantic Diversity Learning for Zero-Shot Multi-label Classification Paper Official PyTorch Implementation Avi Ben-Cohen, Nadav Zamir, Emanuel Ben Bar

28 Aug 29, 2022
Delving into Localization Errors for Monocular 3D Object Detection, CVPR'2021

Delving into Localization Errors for Monocular 3D Detection By Xinzhu Ma, Yinmin Zhang, Dan Xu, Dongzhan Zhou, Shuai Yi, Haojie Li, Wanli Ouyang. Intr

XINZHU.MA 124 Jan 04, 2023
Theano is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. It can use GPUs and perform efficient symbolic differentiation.

============================================================================================================ `MILA will stop developing Theano https:

9.6k Jan 06, 2023
[SIGGRAPH 2021 Asia] DeepVecFont: Synthesizing High-quality Vector Fonts via Dual-modality Learning

DeepVecFont This is the official Pytorch implementation of the paper: Yizhi Wang and Zhouhui Lian. DeepVecFont: Synthesizing High-quality Vector Fonts

Yizhi Wang 146 Dec 18, 2022
ColossalAI-Examples - Examples of training models with hybrid parallelism using ColossalAI

ColossalAI-Examples This repository contains examples of training models with Co

HPC-AI Tech 185 Jan 09, 2023
Regression Metrics Calculation Made easy for tensorflow2 and scikit-learn

Regression Metrics Installation To install the package from the PyPi repository you can execute the following command: pip install regressionmetrics I

Ashish Patel 11 Dec 16, 2022
Planar Prior Assisted PatchMatch Multi-View Stereo

ACMP [News] The code for ACMH is released!!! [News] The code for ACMM is released!!! About This repository contains the code for the paper Planar Prio

Qingshan Xu 127 Dec 31, 2022