The implementation for paper Joint t-SNE for Comparable Projections of Multiple High-Dimensional Datasets.

Overview

Joint t-sne

This is the implementation for paper Joint t-SNE for Comparable Projections of Multiple High-Dimensional Datasets.

abstract:

We present Joint t-Stochastic Neighbor Embedding (Joint t-SNE), a technique to generate comparable projections of multiple high-dimensional datasets. Although t-SNE has been widely employed to visualize high-dimensional datasets from various domains, it is limited to projecting a single dataset. When a series of high-dimensional datasets, such as datasets changing over time, is projected independently using t-SNE, misaligned layouts are obtained. Even items with identical features across datasets are projected to different locations, making the technique unsuitable for comparison tasks. To tackle this problem, we introduce edge similarity, which captures the similarities between two adjacent time frames based on the Graphlet Frequency Distribution (GFD). We then integrate a novel loss term into the t-SNE loss function, which we call vector constraints, to preserve the vectors between projected points across the projections, allowing these points to serve as visual landmarks for direct comparisons between projections. Using synthetic datasets whose ground-truth structures are known, we show that Joint t-SNE outperforms existing techniques, including Dynamic t-SNE, in terms of local coherence error, Kullback-Leibler divergence, and neighborhood preservation. We also showcase a real-world use case to visualize and compare the activation of different layers of a neural network.

Environment:

How to use:

  1. Put the directory of your data sequence, e.g. "YOUR_DATA" in ./data. There are several requirements on the format and organization of your data:

    • Each data frame is named as f_i.txt, where i is the time step/index of this data frame in the sequence.
    • The j th row of the data frame contains both the feature vector and label of the j th item, which is seperated by \tab. The label is at the last position.
    • All data frames must have the same number of rows, and the the same item is at the same row in different data frames to compute the node similarities one by one.
  2. Create a configuration file, e.g. "YOUR_DATA.json" in ./config, which is organized as a json structure.

{
  "algo": {
    "k_closest_count": 3,
    "perplexity": 70,
    "bfs_level": 1,
    "gamma": 0.1
  },
  "thesne": {
    "data_name": "YOUR_DATA",
    "pts_size": 2000,
    "norm": false,
    "data_ids": [1, 3, 6, 9],
    "data_dims": [100, 100, 100, 100, 100, 100, 100, 100, 100, 100],
    "data_titles": [
      "t=0",
      "t=1",
      "t=2",
      "t=3",
      "t=4",
      "t=5",
      "t=6",
      "t=7",
      "t=8",
      "t=9"
    ]
  }
}

In this file, algo represents the hyperparamters of our algorithm except for bfs_level, which always equals to 1. thesne contains the information of the input data. Please remember that data_name must be consistent with the directory name in the previous step.

  1. Create a shell script, e.g. "YOUR_DATA.sh" in ./scripts as below:
# !/bin/bash
# 1. specify the path of the configuration file
config_path="config/YOUR_DATA.json"

workdir=$(pwd)

# 2. build knn graph for each data frame
python3 codes/graphBuild/run.py $config_path

# 3. compute edge similarities between each two adjacent data frames
buildDir="codes/graphSim/build"
if [ ! -d $buildDir ]; then
    mkdir $buildDir
    echo "create directory ${buildDir}"
else
    echo "directory ${buildDir} already exists."
fi
cd $buildDir
qmake ../
make

cd $workdir

# bin is dependent on your operating system
bin=$buildDir/graphSim.app/Contents/MacOS/graphSim
$bin $config_path


# 4. run t-sne optimization
python3 codes/thesne/run.py $config_path

There are several places you should pay attention to.

  • Again, config_path must be consitent with the name of configuration file in the previous step

  • bin is dependent on your operating system. If you use linux, you probably should change it to

      bin=$buildDir/graphSim
    
  1. In root directory, type
sh scripts/YOUR_DATA.sh

The final embeddings will be generated in ./results/YOUR_DATA.

  1. Optionally, you can use codes/draw/run.py to plot the embeddings.

Example:

You can find an example in ./scripts/10_cluster_contract.sh.

Owner
IDEAS Lab
Our mission is to enhance people's ability to understand and communicate data through the design of automated visualization and visual analytics systems.
IDEAS Lab
Implementation of Google Brain's WaveGrad high-fidelity vocoder

WaveGrad Implementation (PyTorch) of Google Brain's high-fidelity WaveGrad vocoder (paper). First implementation on GitHub with high-quality generatio

Ivan Vovk 363 Dec 27, 2022
A Python library for Deep Graph Networks

PyDGN Wiki Description This is a Python library to easily experiment with Deep Graph Networks (DGNs). It provides automatic management of data splitti

Federico Errica 194 Dec 22, 2022
Creating multimodal multitask models

Fusion Brain Challenge The English version of the document can be found here. Обновления 01.11 Мы выкладываем пример данных, аналогичных private test

Sber AI 43 Nov 28, 2022
High performance distributed framework for training deep learning recommendation models based on PyTorch.

High performance distributed framework for training deep learning recommendation models based on PyTorch.

340 Dec 30, 2022
Convert openmmlab (not only mmdetection) series model to tensorrt

MMDet to TensorRT This project aims to convert the mmdetection model to TensorRT model end2end. Focus on object detection for now. Mask support is exp

JinTian 4 Dec 17, 2021
A python module for configuration of block devices

Blivet is a python module for system storage configuration. CI status Licence See COPYING Installation From Fedora repositories Blivet is available in

78 Dec 14, 2022
Official codebase used to develop Vision Transformer, MLP-Mixer, LiT and more.

Big Vision This codebase is designed for training large-scale vision models on Cloud TPU VMs. It is based on Jax/Flax libraries, and uses tf.data and

Google Research 701 Jan 03, 2023
A Pytorch Implementation of Domain adaptation of object detector using scissor-like networks

A Pytorch Implementation of Domain adaptation of object detector using scissor-like networks Please follow Faster R-CNN and DAF to complete the enviro

2 Oct 07, 2022
Keeping it safe - AI Based COVID-19 Tracker using Deep Learning and facial recognition

Keeping it safe - AI Based COVID-19 Tracker using Deep Learning and facial recognition

Vansh Wassan 15 Jun 17, 2021
Panoptic SegFormer: Delving Deeper into Panoptic Segmentation with Transformers

Panoptic SegFormer: Delving Deeper into Panoptic Segmentation with Transformers Results results on COCO val Backbone Method Lr Schd PQ Config Download

155 Dec 20, 2022
RLDS stands for Reinforcement Learning Datasets

RLDS RLDS stands for Reinforcement Learning Datasets and it is an ecosystem of tools to store, retrieve and manipulate episodic data in the context of

Google Research 135 Jan 01, 2023
Code for the paper Open Sesame: Getting Inside BERT's Linguistic Knowledge.

Open Sesame This repository contains the code for the paper Open Sesame: Getting Inside BERT's Linguistic Knowledge. Credits We built the project on t

9 Jul 24, 2022
CVPR2021 Workshop - HDRUNet: Single Image HDR Reconstruction with Denoising and Dequantization.

HDRUNet [Paper Link] HDRUNet: Single Image HDR Reconstruction with Denoising and Dequantization By Xiangyu Chen, Yihao Liu, Zhengwen Zhang, Yu Qiao an

XyChen 105 Dec 20, 2022
A GPU-optional modular synthesizer in pytorch, 16200x faster than realtime, for audio ML researchers.

torchsynth The fastest synth in the universe. Introduction torchsynth is based upon traditional modular synthesis written in pytorch. It is GPU-option

torchsynth 229 Jan 02, 2023
Uncertainty Estimation via Response Scaling for Pseudo-mask Noise Mitigation in Weakly-supervised Semantic Segmentation

Uncertainty Estimation via Response Scaling for Pseudo-mask Noise Mitigation in Weakly-supervised Semantic Segmentation Introduction This is a PyTorch

XMed-Lab 30 Sep 23, 2022
Probabilistic Tracklet Scoring and Inpainting for Multiple Object Tracking

Probabilistic Tracklet Scoring and Inpainting for Multiple Object Tracking (CVPR 2021) Pytorch implementation of the ArTIST motion model. In this repo

Fatemeh 38 Dec 12, 2022
Code release for ConvNeXt model

A ConvNet for the 2020s Official PyTorch implementation of ConvNeXt, from the following paper: A ConvNet for the 2020s. arXiv 2022. Zhuang Liu, Hanzi

Meta Research 4.6k Jan 08, 2023
Official implementation of "CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D Point Cloud Understanding" (CVPR, 2022)

CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D Point Cloud Understanding (CVPR'22) Paper Link | Project Page Abstract : Manual an

Mohamed Afham 152 Dec 23, 2022
The Submission for SIMMC 2.0 Challenge 2021

The Submission for SIMMC 2.0 Challenge 2021 challenge website Requirements python 3.8.8 pytorch 1.8.1 transformers 4.8.2 apex for multi-gpu nltk Prepr

5 Jul 26, 2022
NeoDTI: Neural integration of neighbor information from a heterogeneous network for discovering new drug-target interactions

NeoDTI NeoDTI: Neural integration of neighbor information from a heterogeneous network for discovering new drug-target interactions (Bioinformatics).

62 Nov 26, 2022