Mix3D: Out-of-Context Data Augmentation for 3D Scenes (3DV 2021)

Last update: Dec 26, 2022

Overview

Mix3D: Out-of-Context Data Augmentation for 3D Scenes (3DV 2021)

Alexey Nekrasov*, Jonas Schult*, Or Litany, Bastian Leibe, Francis Engelmann

Mix3D is a data augmentation technique for 3D segmentation methods that improves generalization.

[Project Webpage] [arXiv]

News

12. October 2021: Code released.
6. October 2021: Mix3D accepted for oral presentation at 3DV 2021. Paper on [arXiv].
30. July 2021: Mix3D ranks 1st on the ScanNet semantic labeling benchmark.

Running the code

This repository contains the code for the analysis experiments of section 4.2. Motivation and Analysis Experiments from the paper For the ScanNet benchmark and Table 1 (main paper) we use the original SpatioTemporalSegmentation-Scannet code. To add Mix3D to the original MinkowskiNet codebase, we provide the patch file SpatioTemporalSegmentation.patch. Check the supplementary for more details.

Code structure

├── mix3d
│   ├── __init__.py
│   ├── __main__.py     <- the main file
│   ├── conf            <- hydra configuration files
│   ├── datasets
│   │   ├── outdoor_semseg.py       <- outdoor dataset
│   │   ├── preprocessing       <- folder with preprocessing scripts
│   │   ├── semseg.py       <- indoor dataset
│   │   └── utils.py        <- code for mixing point clouds
│   ├── logger
│   ├── models      <- MinkowskiNet models
│   ├── trainer
│   │   ├── __init__.py
│   │   └── trainer.py      <- train loop
│   └── utils
├── data
│   ├── processed       <- folder for preprocessed datasets
│   └── raw     <- folder for raw datasets
├── scripts
│   ├── experiments
│   │   └── 1000_scene_merging.bash
│   ├── init.bash
│   ├── local_run.bash
│   ├── preprocess_matterport.bash
│   ├── preprocess_rio.bash
│   ├── preprocess_scannet.bash
│   └── preprocess_semantic_kitti.bash
├── docs
├── dvc.lock
├── dvc.yaml        <- dvc file to reproduce the data
├── poetry.lock
├── pyproject.toml      <- project dependencies
├── README.md
├── saved       <- folder that stores models and logs
└── SpatioTemporalSegmentation-ScanNet.patch        <- patch file for original repo

Dependencies

The main dependencies of the project are the following:

python: 3.7
cuda: 10.1

For others, the project uses the poetry dependency management package. Everything can be installed with the command:

poetry install

Check scripts/init.bash for more details.

Data preprocessing

After the dependencies are installed, it is important to run the preprocessing scripts. They will bring scannet, matterport, rio, semantic_kitti datasets to a single format. By default, the scripts expect to find datsets in the data/raw/ folder. Check scripts/preprocess_*.bash for more details.

dvc repro scannet # matterport, rio, semantic_kitti

This command will run the preprocessing for scannet and will save the result using the dvc data versioning system.

Training and testing

Train MinkowskiNet on the scannet dataset without Mix3D with a voxel size of 5cm:

poetry run train

Train MinkowskiNet on the scannet dataset with Mix3D with a voxel size of 5cm:

poetry run train data/collation_functions=voxelize_collate_merge

BibTeX

@inproceedings{Nekrasov213DV,
  title     = {{Mix3D: Out-of-Context Data Augmentation for 3D Scenes}},
  author    = {Nekrasov, Alexey and Schult, Jonas and Litany, Or and Leibe, Bastian and Engelmann, Francis},
  booktitle = {{International Conference on 3D Vision (3DV)}},
  year      = {2021}
}

Mix3D: Out-of-Context Data Augmentation for 3D Scenes (3DV 2021)

Related tags

Overview

Mix3D: Out-of-Context Data Augmentation for 3D Scenes (3DV 2021)

News

Running the code

Code structure

Dependencies

Data preprocessing

Training and testing

BibTeX

Owner

Alexey Nekrasov

This repository contains the code for our paper VDA (public in EMNLP2021 main conference)

Implementation of H-UCRL Algorithm

[NeurIPS 2021] Garment4D: Garment Reconstruction from Point Cloud Sequences

Code for One-shot Talking Face Generation from Single-speaker Audio-Visual Correlation Learning (AAAI 2022)

A Low Complexity Speech Enhancement Framework for Full-Band Audio (48kHz) based on Deep Filtering.

Repo for EchoVPR: Echo State Networks for Visual Place Recognition

This project provides an unsupervised framework for mining and tagging quality phrases on text corpora with pretrained language models (KDD'21).

Single-step adversarial training (AT) has received wide attention as it proved to be both efficient and robust.

MOOSE (Multi-organ objective segmentation) a data-centric AI solution that generates multilabel organ segmentations to facilitate systemic TB whole-person research

The official PyTorch code for 'DER: Dynamically Expandable Representation for Class Incremental Learning' accepted by CVPR2021

Blind visual quality assessment on 360° Video based on progressive learning

Cobalt Strike teamserver detection.

Learning Visual Words for Weakly-Supervised Semantic Segmentation

Rendering Point Clouds with Compute Shaders

style mixing for animation face

Knowledgeable Prompt-tuning: Incorporating Knowledge into Prompt Verbalizer for Text Classification

SEAN: Image Synthesis with Semantic Region-Adaptive Normalization (CVPR 2020, Oral)

Next-gen Rowhammer fuzzer that uses non-uniform, frequency-based patterns.

Deep Learning & 3D Convolutional Neural Networks for Speaker Verification

A simple and lightweight genetic algorithm for optimization of any machine learning model