Single machine, multiple cards training; mix-precision training; DALI data loader.

Overview

Template

Script Category Description

Category script
comparison script train.py, loader.py
for single-machine-multiple-cards training train_DP.py, train_DDP.py
for mixed-precision training train_amp.py
for DALI data loading loader_DALI.py

Note: The comment # new # in script represents newly added code block (compare to comparison script, e.g., train.py)

Environment

  • CPU: Intel(R) Xeon(R) Gold 5118 CPU @ 2.30GHz
  • GPU: RTX 2080Ti
  • OS: Ubuntu 18.04.3 LTS
  • DL framework: Pytorch 1.6.0, Torchvision 0.7.0

Single-machine-multiple-cards training (two cards for example)

train_DP.py -- Parallel computing using nn.DataParallel

Usage:

cd Template/src
python train_DP.py

Superiority:
- Easy to use
- Accelerate training (inconspicuous)
Weakness:
- Unbalanced load
Description:
DataParallel is very convenient to use, we just need to use DataParallel to package the model:

model = ...
model = nn.DataParallel(model)

train_DDP.py -- Parallel computing using torch.distributed

Usage:

cd Template/src
CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per_node=2 train_DDP.py

Superiority:
- balanced load
- Accelerate training (conspicuous)
Weakness:
- Hard to use
Description:
Unlike DataParallel who control multiple GPUs via single-process, distributed creates multiple process. we just need to accomplish one code and torch will automatically assign it to n processes, each running on corresponding GPU.
To config distributed model via torch.distributed, the following steps needed to be performed:

  1. Get current process index:
parser = argparse.ArgumentParser()
parser.add_argument('--local_rank', default=-1, type=int, help='node rank for distributed training')
opt = parser.parse_args()
# print(opt.local_rank)
  1. Set the backend and port used for communication between GPUs:
dist.init_process_group(backend='nccl')
  1. Config current device according to the local_rank:
torch.cuda.set_device(opt.local_rank)
  1. Config data sampler:
dataset = ...
sampler = distributed.DistributedSampler(dataset)
dataloader = DataLoader(dataset=dataset, ..., sampler=sampler)
  1. Package the model๏ผš
model = ...
model = nn.SyncBatchNorm.convert_sync_batchnorm(model)
model = nn.parallel.DistributedDataParallel(model.cuda(), device_ids=[opt.local_rank])

Mixed-precision training

train_amp.py -- Mixed-precision training using torch.cuda.amp

Usage:

cd Template/src
python train_amp.py

Superiority:
- Easy to use
- Accelerate training (conspicuous for heavy model)
Weakness:
- Accelerate training (inconspicuous for light model)
Description:
Mixed-precision training is a set of techniques that allows us to use fp16 without causing our model training to diverge.
To config mixed-precision training via torch.cuda.amp, the following steps needed to be performed:

  1. Instantiate GradScaler object:
scaler = torch.cuda.amp.GradScaler()
  1. Modify the traditional optimization process:
# Before:
optimizer.zero_grad()
preds = model(imgs)
loss = loss_func(preds, labels)
loss.backward()
optimizer.step()

# After:
optimizer.zero_grad()
with torch.cuda.amp.autocast():
    preds = model(imgs)
    loss = loss_func(preds, labels)
scaler.scale(loss).backward()
scaler.step(optimizer)
scaler.update()

DALI data loading

loader_DALI.py -- Data loading using nvidia.dali

Prerequisite:
- NVIDIA Driver supporting CUDA 10.0 or later (i.e., 410.48 or later driver releases)
- PyTorch 0.4 or later
- Data organization format that matches the code, the format that matches the loader_DALI.py is as follows:
โ€ƒ/dataset / train or test / img or gt / sub_dirs / imgs [View]
Usage:

pip install --extra-index-url https://developer.download.nvidia.com/compute/redist --upgrade nvidia-dali-cuda102
cd Template/src
python loader_DALI.py --data_source /path/to/dataset

Superiority:
- Easy to use
- Accelerate data loading
Weakness:
- Occupy video memory
Description:
NVIDIA Data Loading Library (DALI) is a collection of highly optimized building blocks and an execution engine that accelerates the data pipeline for computer vision and audio deep learning applications.
To load dataset using DALI, the following steps needed to be performed:

  1. Config external input iterator:
eii = ExternalInputIterator(data_source=opt.data_source, batch_size=opt.batch_size, shuffle=True)
# A demo of external input iterator
class ExternalInputIterator(object):
    def __init__(self, data_source, batch_size, shuffle):
        self.batch_size = batch_size
        
        img_paths = sorted(glob.glob(data_source + '/train' + '/blurry' + '/*/*.*'))
        gt_paths = sorted(glob.glob(data_source + '/train' + '/sharp' + '/*/*.*'))
        self.paths = list(zip(*(img_paths,gt_paths)))
        if shuffle:
            random.shuffle(self.paths)

    def __iter__(self):
        self.i = 0
        return self

    def __next__(self):
        imgs = []
        gts = []

        if self.i >= len(self.paths):
            self.__iter__()
            raise StopIteration

        for _ in range(self.batch_size):
            img_path, gt_path = self.paths[self.i % len(self.paths)]
            imgs.append(np.fromfile(img_path, dtype = np.uint8))
            gts.append(np.fromfile(gt_path, dtype = np.uint8))
            self.i += 1
        return (imgs, gts)

    def __len__(self):
        return len(self.paths)

    next = __next__
  1. Config pipeline:
pipe = externalSourcePipeline(batch_size=opt.batch_size, num_threads=opt.num_workers, device_id=0, seed=opt.seed, external_data = eii, resize=opt.resize, crop=opt.crop)
# A demo of pipeline
@pipeline_def
def externalSourcePipeline(external_data, resize, crop):
    imgs, gts = fn.external_source(source=external_data, num_outputs=2)
    
    crop_pos = (fn.random.uniform(range=(0., 1.)), fn.random.uniform(range=(0., 1.)))
    flip_p = (fn.random.coin_flip(), fn.random.coin_flip())
    
    imgs = transform(imgs, resize, crop, crop_pos, flip_p)
    gts = transform(gts, resize, crop, crop_pos, flip_p)
    return imgs, gts

def transform(imgs, resize, crop, crop_pos, flip_p):
    imgs = fn.decoders.image(imgs, device='mixed')
    imgs = fn.resize(imgs, resize_y=resize)
    imgs = fn.crop(imgs, crop=(crop,crop), crop_pos_x=crop_pos[0], crop_pos_y=crop_pos[1])
    imgs = fn.flip(imgs, horizontal=flip_p[0], vertical=flip_p[1])
    imgs = fn.transpose(imgs, perm=[2, 0, 1])
    imgs = imgs/127.5-1
    
    return imgs
  1. Instantiate DALIGenericIterator object:
dgi = DALIGenericIterator(pipe, output_map=["imgs", "gts"], last_batch_padded=True, last_batch_policy=LastBatchPolicy.PARTIAL, auto_reset=True)
  1. Read data:
for i, data in enumerate(dgi):
    imgs = data[0]['imgs']
    gts = data[0]['gts']
Learn machine learning the fun way, with Oracle and RedBull Racing

Red Bull Racing Analytics Hands-On Labs Introduction Are you interested in learning machine learning (ML)? How about doing this in the context of the

Oracle DevRel 55 Oct 24, 2022
PyNHD is a part of HyRiver software stack that is designed to aid in watershed analysis through web services.

A part of HyRiver software stack that provides access to NHD+ V2 data through NLDI and WaterData web services

Taher Chegini 23 Dec 14, 2022
Scraping and analysis of leetcode-compensations page.

Leetcode compensations report Scraping and analysis of leetcode-compensations page.

utsav 96 Jan 01, 2023
This is a tool for speculation of ancestral allel, calculation of sfs and drawing its bar plot.

superSFS This is a tool for speculation of ancestral allel, calculation of sfs and drawing its bar plot. It is easy-to-use and runing fast. What you s

3 Dec 16, 2022
Streamz helps you build pipelines to manage continuous streams of data

Streamz helps you build pipelines to manage continuous streams of data. It is simple to use in simple cases, but also supports complex pipelines that involve branching, joining, flow control, feedbac

Python Streamz 1.1k Dec 28, 2022
A distributed block-based data storage and compute engine

Nebula is an extremely-fast end-to-end interactive big data analytics solution. Nebula is designed as a high-performance columnar data storage and tabular OLAP engine.

Columns AI 131 Dec 26, 2022
Geospatial data-science analysis on reasons behind delay in Grab ride-share services

Grab x Pulis Detailed analysis done to investigate possible reasons for delay in Grab services for NUS Data Analytics Competition 2022, to be found in

Keng Hwee 6 Jun 07, 2022
BAyesian Model-Building Interface (Bambi) in Python.

Bambi BAyesian Model-Building Interface in Python Overview Bambi is a high-level Bayesian model-building interface written in Python. It's built on to

861 Dec 29, 2022
International Space Station data with Python research ๐ŸŒŽ

International Space Station data with Python research ๐ŸŒŽ Plotting ISS trajectory, calculating the velocity over the earth and more. Plotting trajector

Facundo Pedaccio 41 Jun 16, 2022
PipeChain is a utility library for creating functional pipelines.

PipeChain Motivation PipeChain is a utility library for creating functional pipelines. Let's start with a motivating example. We have a list of Austra

Michael Milton 2 Aug 07, 2022
Toolchest provides APIs for scientific and bioinformatic data analysis.

Toolchest Python Client Toolchest provides APIs for scientific and bioinformatic data analysis. It allows you to abstract away the costliness of runni

Toolchest 11 Jun 30, 2022
Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis

Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis. You write a high level configuration file specifying your in

Blue Collar Bioinformatics 917 Jan 03, 2023
Developed for analyzing the covariance for OrcVIO

about This repo is developed for analyzing the covariance for OrcVIO environment setup platform ubuntu 18.04 using conda conda env create --file envir

Sean 1 Dec 08, 2021
Data pipelines built with polars

valves Warning: the project is very much work in progress. Valves is a collection of functions for your data .pipe()-lines. This project aimes to host

14 Jan 03, 2023
Data exploration done quick.

Pandas Tab Implementation of Stata's tabulate command in Pandas for extremely easy to type one-way and two-way tabulations. Support: Python 3.7 and 3.

W.D. 20 Aug 27, 2022
NumPy aware dynamic Python compiler using LLVM

Numba A Just-In-Time Compiler for Numerical Functions in Python Numba is an open source, NumPy-aware optimizing compiler for Python sponsored by Anaco

Numba 8.2k Jan 07, 2023
A meta plugin for processing timelapse data timepoint by timepoint in napari

napari-time-slicer A meta plugin for processing timelapse data timepoint by timepoint. It enables a list of napari plugins to process 2D+t or 3D+t dat

Robert Haase 2 Oct 13, 2022
talkbox is a scikit for signal/speech processing, to extend scipy capabilities in that domain.

talkbox is a scikit for signal/speech processing, to extend scipy capabilities in that domain.

David Cournapeau 76 Nov 30, 2022
a tool that compiles a csv of all h1 program stats

h1stats - h1 Program Stats Scraper This python3 script will call out to HackerOne's graphql API and scrape all currently active programs for informati

Evan 40 Oct 27, 2022
This is an example of how to automate Ridit Analysis for a dataset with large amount of questions and many item attributes

This is an example of how to automate Ridit Analysis for a dataset with large amount of questions and many item attributes

Ishan Hegde 1 Nov 17, 2021