Interactive dimensionality reduction for large datasets

Related tags

Deep Learningblossom
Overview

BlosSOM 🌼

BlosSOM is a graphical environment for running semi-supervised dimensionality reduction with EmbedSOM. You can use it to explore multidimensional datasets, and produce great-looking 2-dimensional visualizations.

WARNING: BlosSOM is still under development, some stuff may not work right, but things will magically improve without notice. Feel free to open an issue if something looks wrong.

screenshot

BlosSOM was developed at the MFF UK Prague, in cooperation with IOCB Prague.

MFF logoIOCB logo

Overview

BlosSOM creates a landmark-based model of the dataset, and dynamically projects all dataset point to your screen (using EmbedSOM). Several other algorithms and tools are provided to manage the landmarks; a quick overview follows:

  • High-dimensional landmark positioning:
    • Self-organizing maps
    • k-Means
  • 2D landmark positioning
    • k-NN graph generation (only adds edges, not vertices)
    • force-based graph layouting
    • dynamic t-SNE
  • Dimensionality reduction
    • EmbedSOM
    • CUDA EmbedSOM (with roughly 500x speedup, enabling smooth display of a few millions of points)
  • Manual landmark position optimization
  • Visualization settings (colors, transparencies, cluster coloring, ...)
  • Dataset transformations and dimension scaling
  • Import from matrix-like data files
    • FCS3.0 (Flow Cytometry Standard files)
    • TSV (Tab-separated CSV)
  • Export of the data for plotting

Compiling and running BlosSOM

You will need cmake build system and SDL2.

For CUDA EmbedSOM to work, you need the NVIDIA CUDA toolkit. Append -DBUILD_CUDA=1 to cmake options to enable the CUDA version.

Windows (Visual Studio 2019)

Dependencies

The project requires SDL2 as an external dependency:

  1. install vcpkg tool and remember your vcpkg directory
  2. install SDL: vcpkg install SDL2:x64-windows

Compilation

git submodule init
git submodule update

mkdir build
cd build

# You need to fix the path to vcpkg in the following command:
cmake .. -G "Visual Studio 16 2019" -A x64 -DCMAKE_BUILD_TYPE="Release" -DCMAKE_INSTALL_PREFIX=./inst -DCMAKE_TOOLCHAIN_FILE=your-vcpkg-clone-directory/scripts/buildsystems/vcpkg.cmake

cmake --build . --config Release
cmake --install . --config Release

Running

Open Visual Studio solution BlosSOM.sln, set blossom as startup project, set configuration to Release and run the project.

Linux (and possibly other unix-like systems)

Dependencies

The project requires SDL2 as an external dependency. Install libsdl2-dev (on Debian-based systems) or SDL2-devel (on Red Hat-based systems), or similar (depending on the Linux distribution). You should be able to install cmake package the same way.

Compilation

git submodule init
git submodule update

mkdir build
cd build
cmake .. -DCMAKE_INSTALL_PREFIX=./inst    # or any other directory
make install                              # use -j option to speed up the build

Running

./inst/bin/blossom

Documentation

Quickstart

  1. Click on the "plus" button on the bottom right side of the window
  2. Choose Open file (the first button from the top) and open a file from the demo_data/ directory
  3. You can now add and delete landmarks using ctrl+mouse click, and drag them around.
  4. Use the tools and settings available under the "plus" button to optimize the landmark positions and get a better visualization.

See the HOWTO for more details and hints.

Performance and CUDA

If you pass -DBUILD_CUDA=1 to the cmake commands, you will get extra executable called blossom_cuda (or blossom_cuda.exe, on Windows).

The 2 versions of BlosSOM executable differ mainly in the performance of EmbedSOM projection, which is more than 100× faster on GPUs than on CPUs. If the dataset gets large, only a fixed-size slice of the dataset gets processed each frame (e.g., at most 1000 points in case of CPU) to keep the framerate in a usable range. The defaults in BlosSOM should work smoothly for many use-cases (defaulting at 1k points per frame on CPU and 50k points per frame on GPU).

If required (e.g., if you have a really fast GPU), you may modify the constants in the corresponding source files, around the call sites of clean_range(), which is the function that manages the round-robin refreshing of the data. Functionality that dynamically chooses the best data-crunching rate is being implemented and should be available soon.

License

BlosSOM is licensed under GPLv3 or later. Several small libraries bundled in the repository are licensed with MIT-style licenses.

SalFBNet: Learning Pseudo-Saliency Distribution via Feedback Convolutional Networks

SalFBNet This repository includes Pytorch implementation for the following paper: SalFBNet: Learning Pseudo-Saliency Distribution via Feedback Convolu

12 Aug 12, 2022
Who calls the shots? Rethinking Few-Shot Learning for Audio (WASPAA 2021)

rethink-audio-fsl This repo contains the source code for the paper "Who calls the shots? Rethinking Few-Shot Learning for Audio." (WASPAA 2021) Table

Yu Wang 34 Dec 24, 2022
Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit

CNTK Chat Windows build status Linux build status The Microsoft Cognitive Toolkit (https://cntk.ai) is a unified deep learning toolkit that describes

Microsoft 17.3k Dec 29, 2022
Implementation for paper "STAR: A Structure-aware Lightweight Transformer for Real-time Image Enhancement" (ICCV 2021).

STAR-pytorch Implementation for paper "STAR: A Structure-aware Lightweight Transformer for Real-time Image Enhancement" (ICCV 2021). CVF (pdf) STAR-DC

43 Dec 21, 2022
Official pytorch implementation of the IrwGAN for unaligned image-to-image translation

IrwGAN (ICCV2021) Unaligned Image-to-Image Translation by Learning to Reweight [Update] 12/15/2021 All dataset are released, trained models and genera

37 Nov 09, 2022
Oriented Response Networks, in CVPR 2017

Oriented Response Networks [Home] [Project] [Paper] [Supp] [Poster] Torch Implementation The torch branch contains: the official torch implementation

ZhouYanzhao 217 Dec 12, 2022
deep-prae

Deep Probabilistic Accelerated Evaluation (Deep-PrAE) Our work presents an efficient rare event simulation methodology for black box autonomy using Im

Safe AI Lab 4 Apr 17, 2021
PolyGlot, a fuzzing framework for language processors

PolyGlot, a fuzzing framework for language processors Build We tested PolyGlot on Ubuntu 18.04. Get the source code: git clone https://github.com/s3te

Software Systems Security Team at Penn State University 79 Dec 27, 2022
CNN Based Meta-Learning for Noisy Image Classification and Template Matching

CNN Based Meta-Learning for Noisy Image Classification and Template Matching Introduction This master thesis used a few-shot meta learning approach to

Kumar Manas 2 Dec 09, 2021
Angle data is a simple data type.

angledat Angle data is a simple data type. Installing + using Put angledat.py in the main dir of your project. Import it and use. Comments Comments st

1 Jan 05, 2022
Code accompanying "Learning What To Do by Simulating the Past", ICLR 2021.

Learning What To Do by Simulating the Past This repository contains code that implements the Deep Reward Learning by Simulating the Past (Deep RSLP) a

Center for Human-Compatible AI 24 Aug 07, 2021
MPI Interest Group on Algorithms on 1st semester 2021

MPI Algorithms Interest Group Introduction Lecturer: Steve Yan Location: TBA Time Schedule: TBA Semester: 1 Useful URLs Typora: https://typora.io Goog

Ex10si0n 13 Sep 08, 2022
Code for our CVPR 2021 Paper "Rethinking Style Transfer: From Pixels to Parameterized Brushstrokes".

Rethinking Style Transfer: From Pixels to Parameterized Brushstrokes (CVPR 2021) Project page | Paper | Colab | Colab for Drawing App Rethinking Style

CompVis Heidelberg 153 Jan 04, 2023
Quickly comparing your image classification models with the state-of-the-art models (such as DenseNet, ResNet, ...)

Image Classification Project Killer in PyTorch This repo is designed for those who want to start their experiments two days before the deadline and ki

349 Dec 08, 2022
[Machine Learning Engineer Basic Guide] 부스트캠프 AI Tech - Product Serving 자료

Boostcamp-AI-Tech-Product-Serving 부스트캠프 AI Tech - Product Serving 자료 Repository 구조 part1(MLOps 개론, Model Serving, 머신러닝 프로젝트 라이프 사이클은 별도의 코드가 없으며, part

Sung Yun Byeon 269 Dec 21, 2022
Speech Enhancement Generative Adversarial Network Based on Asymmetric AutoEncoder

ASEGAN: Speech Enhancement Generative Adversarial Network Based on Asymmetric AutoEncoder 中文版简介 Readme with English Version 介绍 基于SEGAN模型的改进版本,使用自主设计的非

Nitin 53 Nov 17, 2022
Graph neural network message passing reframed as a Transformer with local attention

Adjacent Attention Network An implementation of a simple transformer that is equivalent to graph neural network where the message passing is done with

Phil Wang 49 Dec 28, 2022
《DeepViT: Towards Deeper Vision Transformer》(2021)

DeepViT This repo is the official implementation of "DeepViT: Towards Deeper Vision Transformer". The repo is based on the timm library (https://githu

109 Dec 02, 2022
When are Iterative GPs Numerically Accurate?

When are Iterative GPs Numerically Accurate? This is a code repository for the paper "When are Iterative GPs Numerically Accurate?" by Wesley Maddox,

Wesley Maddox 1 Jan 06, 2022
Code for NeurIPS 2021 paper "Curriculum Offline Imitation Learning"

README The code is based on the ILswiss. To run the code, use python run_experiment.py --nosrun -e your YAML file -g gpu id Generally, run_experim

ApexRL 12 Mar 19, 2022