A set of simple scripts to process the Imagenet-1K dataset as TFRecords and make index files for NVIDIA DALI.

Overview

Overview

This is a set of simple scripts to process the Imagenet-1K dataset as TFRecords and make index files for NVIDIA DALI.

Make TFRecords

To run the script setup a virtualenv with the following libraries installed.

  • tensorflow: Install with pip install tensorflow

Once you have all the above libraries setup, you should register on the Imagenet website and download the ImageNet .tar files. It should be extracted and provided in the format:

  • Training images: train/n03062245/n03062245_4620.JPEG
  • Validation Images: validation/ILSVRC2012_val_00000001.JPEG

To run the script to preprocess the raw dataset as TFRecords, run the following command:

python3 make_tfrecords.py \
  --raw_data_dir="path/to/imagenet" \
  --local_scratch_dir="path/to/output"

Note that the label is from 1 to 1000.

Make index files

To run the script setup a virtualenv with the following libraries installed.

python3 make_idx.py --tfrecord_root="path/to/tfrecords"

Build subset of Imagenet-1K

This can help you build a subset of Imagenet-1K (TFRecord format):

python3 build_subset.py "path/to/tfrecords" "output_dir" \
  --train_num_shards=128 \
  --valid_num_shards=16 \
  --num_classes=100

Classes are selected randomly.

DALI dataloader

We also provide a DALI dataloader which can read the processed dataset. The dataloader is equipped with Mixup.

Here is an simple example to construct it:

import glob
import os


def build_dali_train(root):
    train_pat = os.path.join(root, 'train/*')
    train_idx_pat = os.path.join(root, 'idx_files/train/*')
    return DaliDataloader(
        sorted(glob.glob(train_pat)),
        sorted(glob.glob(train_idx_pat)),
        batch_size=BATCH_SIZE,
        shard_id=SHARD_ID,
        num_shards=NUM_SHARDS,
        training=True,
        gpu_aug=True,
        cuda=True,
        mixup_alpha=0.0,
        num_threads=16,
    )
Owner
Bobo @v3nividiv1ci
A fuzzing framework for SMT solvers

yinyang A fuzzing framework for SMT solvers. Given a set of seed SMT formulas, yinyang generates mutant formulas to stress-test SMT solvers. yinyang c

Project Yin-Yang for SMT Solver Testing 145 Jan 04, 2023
Calibrate your listeners! Robust communication-based training for pragmatic speakers. Findings of EMNLP 2021.

Calibrate your listeners! Robust communication-based training for pragmatic speakers Rose E. Wang, Julia White, Jesse Mu, Noah D. Goodman Findings of

Rose E. Wang 3 Apr 02, 2022
Voxel-based Network for Shape Completion by Leveraging Edge Generation (ICCV 2021, oral)

Voxel-based Network for Shape Completion by Leveraging Edge Generation This is the PyTorch implementation for the paper "Voxel-based Network for Shape

10 Dec 04, 2022
Gradient Inversion with Generative Image Prior

Gradient Inversion with Generative Image Prior This repository is an implementation of "Gradient Inversion with Generative Image Prior", accepted to N

MLLab @ Postech 25 Jan 09, 2023
Code related to the manuscript "Averting A Crisis In Simulation-Based Inference"

Abstract We present extensive empirical evidence showing that current Bayesian simulation-based inference algorithms are inadequate for the falsificat

Montefiore Artificial Intelligence Research 3 Nov 14, 2022
Official Pytorch Implementation of Unsupervised Image Denoising with Frequency Domain Knowledge

Unsupervised Image Denoising with Frequency Domain Knowledge (BMVC 2021 Oral) : Official Project Page This repository provides the official PyTorch im

Donggon Jang 12 Sep 26, 2022
MonoRCNN is a monocular 3D object detection method for automonous driving

MonoRCNN MonoRCNN is a monocular 3D object detection method for automonous driving, published at ICCV 2021. This project is an implementation of MonoR

87 Dec 27, 2022
Fine-Tune EleutherAI GPT-Neo to Generate Netflix Movie Descriptions in Only 47 Lines of Code Using Hugginface And DeepSpeed

GPT-Neo-2.7B Fine-Tuning Example Using HuggingFace & DeepSpeed Installation cd venv/bin ./pip install -r ../../requirements.txt ./pip install deepspe

Nikita 180 Jan 05, 2023
Static-test - A playground to play with ideas related to testing the comparability of the code

Static test playground ⚠️ The code is just an experiment. Compiles and runs on U

Igor Bogoslavskyi 4 Feb 18, 2022
CausaLM: Causal Model Explanation Through Counterfactual Language Models

CausaLM: Causal Model Explanation Through Counterfactual Language Models Authors: Amir Feder, Nadav Oved, Uri Shalit, Roi Reichart Abstract: Understan

Amir Feder 39 Jul 10, 2022
Official PyTorch implementation of RIO

Image-Level or Object-Level? A Tale of Two Resampling Strategies for Long-Tailed Detection Figure 1: Our proposed Resampling at image-level and obect-

NVIDIA Research Projects 17 May 20, 2022
Retinal vessel segmentation based on GT-UNet

Retinal vessel segmentation based on GT-UNet Introduction This project is a retinal blood vessel segmentation code based on UNet-like Group Transforme

Kent0n 27 Dec 18, 2022
NaijaSenti is an open-source sentiment and emotion corpora for four major Nigerian languages

NaijaSenti is an open-source sentiment and emotion corpora for four major Nigerian languages. This project was supported by lacuna-fund initiatives. Jump straight to one of the sections below, or jus

Hausa Natural Language Processing 14 Dec 20, 2022
ANEA: Distant Supervision for Low-Resource Named Entity Recognition

ANEA: Distant Supervision for Low-Resource Named Entity Recognition ANEA is a tool to automatically annotate named entities in unlabeled text based on

Saarland University Spoken Language Systems Group 15 Mar 30, 2022
The modify PyTorch version of Siam-trackers which are speed-up by TensorRT.

SiamTracker-with-TensorRT The modify PyTorch version of Siam-trackers which are speed-up by TensorRT or ONNX. [Updating...] Examples demonstrating how

9 Dec 13, 2022
Hso-groupie - A pwnable challenge in Real World CTF 4th

Hso-groupie - A pwnable challenge in Real World CTF 4th

Riatre Foo 42 Dec 05, 2022
Working demo of the Multi-class and Anomaly classification model using the CLIP feature space

👁️ Hindsight AI: Crime Classification With Clip About For Educational Purposes Only This is a recursive neural net trained to classify specific crime

Miles Tweed 2 Jun 05, 2022
A Simple LSTM-Based Solution for "Heartbeat Signal Classification and Prediction" in Tianchi

LSTM-Time-Series-Prediction A Simple LSTM-Based Solution for "Heartbeat Signal Classification and Prediction" in Tianchi Contest. The Link of the Cont

KevinCHEN 1 Jun 13, 2022
Code for "Multi-Time Attention Networks for Irregularly Sampled Time Series", ICLR 2021.

Multi-Time Attention Networks (mTANs) This repository contains the PyTorch implementation for the paper Multi-Time Attention Networks for Irregularly

The Laboratory for Robust and Efficient Machine Learning 68 Dec 17, 2022
Codes for realizing theories learned from Data Mining, Machine Learning, Deep Learning without using the present Python packages.

Codes-for-Algorithms Codes for realizing theories learned from Data Mining, Machine Learning, Deep Learning without using the present Python packages.

Tracy (Shengmin) Tao 1 Apr 12, 2022