This repository contains the DendroMap implementation for scalable and interactive exploration of image datasets in machine learning.

Overview

DendroMap

DendroMap is an interactive tool to explore large-scale image datasets used for machine learning.

A deep understanding of your data can be vital to train or debug your model effectively. However, due to the lack of structure and little-to-no metadata, it can be difficult to gain any insight into large-scale image datasets.

DendroMap adds structure to the data by hierarchically clustering together similar images. Then, the clusters are displayed in a modified treemap visualization that supports zooming.

Check out the live demo of DendroMap and explore for yourself on a few different datasets. If you're interested in

  • the DendroMap motivations
  • how we created the DendroMap visualization
  • DendroMap's effectiveness: user study on DendroMap compared to t-SNE grid for exploration

be sure to also check out our research paper:

Visual Exploration of Large-Scale Image Datasets for Machine Learning with Treemaps.
Donald Bertucci, Md Montaser Hamid, Yashwanthi Anand, Anita Ruangrotsakun, Delyar Tabatabai, Melissa Perez, and Minsuk Kahng.
arXiv preprint arXiv:2205.06935, 2022.

Use Your Own Data

In the public deployment, we hosted our data in the DendroMap Data repository. You can use your own data by following the instructions and example in the DendroMap Data README.md and you can use our python functions found in the clustering folder in this repo. There, you will find specific examples and instructions for how to generate the clustering files.

After generating those files, you can add another option in the src/dataOptions.js file as an object to specify how to read your data with the correct format. This is also detailed in the DendroMap Data README.md, and is simple as adding an option like this:

{
	dataset: "YOUR DATASET NAME",
	model: "YOUR MODEL NAME",
	cluster_filepath: "CLUSTER_FILEPATH",
	class_cluster_filepath: "CLASS_CLUSTER_FILEPATH**OPTIONAL**",
	image_filepath: "IMAGE_FILEPATH",
}

in the src/dataOptions.js options array. Paths start from the public folder, so put your data in there. For more information, go to the README.md in the clustering folder. Notebooks that computed the data in DendroMap Data are located there.

DendroMap Component

The DendroMap treemap visualization itself (not the whole project) only relies on having d3.js and the accompanying Javascript files in the src/components/dendroMap directory. You can reuse that Svelte component by importing from src/components/dendroMap/DendroMap.svelte.

The Component is used in src/App.svelte for an example on what props it takes. Here is the rundown of a simple example: at the bare minimum you can create the DendroMap component with these props (propName:type).

<DendroMap
	dendrogramData:dendrogramNode // (root node as nested JSON from dendrogram-data repo)
	imageFilepath:string // relative path from public dir
	imageWidth:number
	imageHeight:number
	width:number
	height:number
	numClustersShowing:number // > 1
/>

A more comprehensive list of props is below, but please look in the src/components/dendroMap/DendroMap.svelte file to see more details: there are many defaults arguments.

<DendroMap
	dendrogramData: dendrogramNode // (root node as nested JSON from dendrogram-data repo)
	imageFilepath: string // relative path from public dir
	imageWidth: number
	imageHeight: number
	width: number
	height: number
	numClustersShowing: number // > 1

	// the very long list of optional props that you can use to customize the DendroMap
	// ? is not in the actual name, just indicates optional
	highlightedOpacity?: number // between [0.0, 1.0]
	hiddenOpacity?: number // between [0.0, 1.0]
	transitionSpeed?: number // milliseconds for the animation of zooming
	clusterColorInterpolateCallback?: (normalized: number) => string // by default uses d3.interpolateGreys
	labelColorCallback?: (d: d3.HierarchyNode) => string
	labelSizeCallback?: (d: d3.HierarchyNode) => string
	misclassificationColor?: string
	outlineStrokeWidth?: string
	outerPadding?: number // the outer perimeter space of a rects
	innerPadding?: number // the touching inside space between rects
	topPadding?: number // additional top padding on the top of rects
	labelYSpace?: number // shifts the image grid down to make room for label on top

	currentParentCluster?: d3.HierarchyNode // this argument is used to bind: for svelte, not really a prop
	// breadth is the default and renders nodes left to right breadth first traversal
	// min_merging_distance is the common way to get dendrogram clusters from a dendrogram
	// max_node_count traverses and splits the next largest sized node, resulting in an even rendering
	renderingMethod?: "breadth" | "min_merging_distance" | "max_node_count" | "custom_sort"
	// this is only in effect if the renderingMethod is "custom_sort". Nodes last are popped and rendered first in the sort
	customSort?: (a: dendrogramNode, b: dendrogramNode) => number // see example in code
	imagesToFocus?: number[] // instance index of the ones to highlight
	outlineMisclassified?: boolean
	focusMisclassified?: boolean
	clusterLabelCallback?: (d: d3.HierarchyNode) => string
	imageTitleCallback?: (d: d3.HierarchyNode) => string

	// will fire based on user interaction
	// detail contains <T> {data: T, element: HTMLElement, event}
	on:imageClick?: ({detail}) => void
	on:imageMouseEnter?: ({detail}) => void
	on:imageMouseLeave?: ({detail}) => void
	on:clusterClick?: ({detail}) => void
	on:clusterMouseEnter?: ({detail}) => void
	on:clusterMouseLeave?: ({detail}) => void
/>

Run Locally!

This project uses Svelte. You can run the code on your local machine by using one of the following: development or build.

Development

cd dendromap      # inside the dendromap directory
npm install       # install packages if you haven't
npm run dev       # live-reloading server on port 8080

then navigate to port 8080 for a live-reloading on file change development server.

Build

cd dendromap		# inside the dendromap directory
npm install       	# install packages if you haven't
npm run build       	# build project
npm run start		# run on port 8080

then navigate to port 8080 for the static build server.

Links

Owner
DIV Lab
Data Interaction and Visualization Lab at Oregon State University
DIV Lab
Graph Analysis From Scratch

Graph Analysis From Scratch Goal In this notebook we wanted to implement some functionalities to analyze a weighted graph only by using algorithms imp

Arturo Ghinassi 0 Sep 17, 2022
An open source machine learning library for performing regression tasks using RVM technique.

Introduction neonrvm is an open source machine learning library for performing regression tasks using RVM technique. It is written in C programming la

Siavash Eliasi 33 May 31, 2022
This is a clean and robust Pytorch implementation of DQN and Double DQN.

DQN/DDQN-Pytorch This is a clean and robust Pytorch implementation of DQN and Double DQN. Here is the training curve: All the experiments are trained

XinJingHao 15 Dec 27, 2022
Koç University deep learning framework.

Knet Knet (pronounced "kay-net") is the Koç University deep learning framework implemented in Julia by Deniz Yuret and collaborators. It supports GPU

1.4k Dec 31, 2022
Implementation of a Transformer that Ponders, using the scheme from the PonderNet paper

Ponder(ing) Transformer Implementation of a Transformer that learns to adapt the number of computational steps it takes depending on the difficulty of

Phil Wang 65 Oct 04, 2022
Code for 'Blockwise Sequential Model Learning for Partially Observable Reinforcement Learning' (AAAI 2022)

Blockwise Sequential Model Learning Code for 'Blockwise Sequential Model Learning for Partially Observable Reinforcement Learning' (AAAI 2022) For ins

2 Jun 17, 2022
VQMIVC - Vector Quantization and Mutual Information-Based Unsupervised Speech Representation Disentanglement for One-shot Voice Conversion

VQMIVC: Vector Quantization and Mutual Information-Based Unsupervised Speech Representation Disentanglement for One-shot Voice Conversion (Interspeech

Disong Wang 262 Dec 31, 2022
Pytorch implementation of SenFormer: Efficient Self-Ensemble Framework for Semantic Segmentation

SenFormer: Efficient Self-Ensemble Framework for Semantic Segmentation Efficient Self-Ensemble Framework for Semantic Segmentation by Walid Bousselham

61 Dec 26, 2022
PointNetVLAD: Deep Point Cloud Based Retrieval for Large-Scale Place Recognition, CVPR 2018

PointNetVLAD: Deep Point Cloud Based Retrieval for Large-Scale Place Recognition PointNetVLAD: Deep Point Cloud Based Retrieval for Large-Scale Place

Mikaela Uy 294 Dec 12, 2022
Implementation of Kronecker Attention in Pytorch

Kronecker Attention Pytorch Implementation of Kronecker Attention in Pytorch. Results look less than stellar, but if someone found some context where

Phil Wang 16 May 06, 2022
Code for our CVPR2021 paper coordinate attention

Coordinate Attention for Efficient Mobile Network Design (preprint) This repository is a PyTorch implementation of our coordinate attention (will appe

Qibin (Andrew) Hou 726 Jan 05, 2023
Target Propagation via Regularized Inversion

Target Propagation via Regularized Inversion The present code implements an ideal formulation of target propagation using regularized inverses compute

Vincent Roulet 0 Dec 02, 2021
Benchmarks for the Optimal Power Flow Problem

Power Grid Lib - Optimal Power Flow This benchmark library is curated and maintained by the IEEE PES Task Force on Benchmarks for Validation of Emergi

A Library of IEEE PES Power Grid Benchmarks 207 Dec 08, 2022
LWCC: A LightWeight Crowd Counting library for Python that includes several pretrained state-of-the-art models.

LWCC: A LightWeight Crowd Counting library for Python LWCC is a lightweight crowd counting framework for Python. It wraps four state-of-the-art models

Matija Teršek 39 Dec 28, 2022
Training PSPNet in Tensorflow. Reproduce the performance from the paper.

Training Reproduce of PSPNet. (Updated 2021/04/09. Authors of PSPNet have provided a Pytorch implementation for PSPNet and their new work with support

Li Xuhong 126 Jul 13, 2022
DI-HPC is an acceleration operator component for general algorithm modules in reinforcement learning algorithms

DI-HPC: Decision Intelligence - High Performance Computation DI-HPC is an acceleration operator component for general algorithm modules in reinforceme

OpenDILab 185 Dec 29, 2022
KIND: an Italian Multi-Domain Dataset for Named Entity Recognition

KIND (Kessler Italian Named-entities Dataset) KIND is an Italian dataset for Named-Entity Recognition. It contains more than one million tokens with t

Digital Humanities 5 Jun 21, 2022
A library for Deep Learning Implementations and utils

deeply A Deep Learning library Table of Contents Features Quick Start Usage License Features Python 2.7+ and Python 3.4+ compatible. Quick Start $ pip

Achilles Rasquinha 1 Dec 12, 2022
Codes for Causal Semantic Generative model (CSG), the model proposed in "Learning Causal Semantic Representation for Out-of-Distribution Prediction" (NeurIPS-21)

Learning Causal Semantic Representation for Out-of-Distribution Prediction This repository is the official implementation of "Learning Causal Semantic

Chang Liu 54 Dec 01, 2022
Rethinking Transformer-based Set Prediction for Object Detection

Rethinking Transformer-based Set Prediction for Object Detection Here are the code for the ICCV paper. The code is adapted from Detectron2 and AdelaiD

Zhiqing Sun 62 Dec 03, 2022