Single machine, multiple cards training; mix-precision training; DALI data loader.

Overview

Template

Script Category Description

Category script
comparison script train.py, loader.py
for single-machine-multiple-cards training train_DP.py, train_DDP.py
for mixed-precision training train_amp.py
for DALI data loading loader_DALI.py

Note: The comment # new # in script represents newly added code block (compare to comparison script, e.g., train.py)

Environment

  • CPU: Intel(R) Xeon(R) Gold 5118 CPU @ 2.30GHz
  • GPU: RTX 2080Ti
  • OS: Ubuntu 18.04.3 LTS
  • DL framework: Pytorch 1.6.0, Torchvision 0.7.0

Single-machine-multiple-cards training (two cards for example)

train_DP.py -- Parallel computing using nn.DataParallel

Usage:

cd Template/src
python train_DP.py

Superiority:
- Easy to use
- Accelerate training (inconspicuous)
Weakness:
- Unbalanced load
Description:
DataParallel is very convenient to use, we just need to use DataParallel to package the model:

model = ...
model = nn.DataParallel(model)

train_DDP.py -- Parallel computing using torch.distributed

Usage:

cd Template/src
CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per_node=2 train_DDP.py

Superiority:
- balanced load
- Accelerate training (conspicuous)
Weakness:
- Hard to use
Description:
Unlike DataParallel who control multiple GPUs via single-process, distributed creates multiple process. we just need to accomplish one code and torch will automatically assign it to n processes, each running on corresponding GPU.
To config distributed model via torch.distributed, the following steps needed to be performed:

  1. Get current process index:
parser = argparse.ArgumentParser()
parser.add_argument('--local_rank', default=-1, type=int, help='node rank for distributed training')
opt = parser.parse_args()
# print(opt.local_rank)
  1. Set the backend and port used for communication between GPUs:
dist.init_process_group(backend='nccl')
  1. Config current device according to the local_rank:
torch.cuda.set_device(opt.local_rank)
  1. Config data sampler:
dataset = ...
sampler = distributed.DistributedSampler(dataset)
dataloader = DataLoader(dataset=dataset, ..., sampler=sampler)
  1. Package the model:
model = ...
model = nn.SyncBatchNorm.convert_sync_batchnorm(model)
model = nn.parallel.DistributedDataParallel(model.cuda(), device_ids=[opt.local_rank])

Mixed-precision training

train_amp.py -- Mixed-precision training using torch.cuda.amp

Usage:

cd Template/src
python train_amp.py

Superiority:
- Easy to use
- Accelerate training (conspicuous for heavy model)
Weakness:
- Accelerate training (inconspicuous for light model)
Description:
Mixed-precision training is a set of techniques that allows us to use fp16 without causing our model training to diverge.
To config mixed-precision training via torch.cuda.amp, the following steps needed to be performed:

  1. Instantiate GradScaler object:
scaler = torch.cuda.amp.GradScaler()
  1. Modify the traditional optimization process:
# Before:
optimizer.zero_grad()
preds = model(imgs)
loss = loss_func(preds, labels)
loss.backward()
optimizer.step()

# After:
optimizer.zero_grad()
with torch.cuda.amp.autocast():
    preds = model(imgs)
    loss = loss_func(preds, labels)
scaler.scale(loss).backward()
scaler.step(optimizer)
scaler.update()

DALI data loading

loader_DALI.py -- Data loading using nvidia.dali

Prerequisite:
- NVIDIA Driver supporting CUDA 10.0 or later (i.e., 410.48 or later driver releases)
- PyTorch 0.4 or later
- Data organization format that matches the code, the format that matches the loader_DALI.py is as follows:
 /dataset / train or test / img or gt / sub_dirs / imgs [View]
Usage:

pip install --extra-index-url https://developer.download.nvidia.com/compute/redist --upgrade nvidia-dali-cuda102
cd Template/src
python loader_DALI.py --data_source /path/to/dataset

Superiority:
- Easy to use
- Accelerate data loading
Weakness:
- Occupy video memory
Description:
NVIDIA Data Loading Library (DALI) is a collection of highly optimized building blocks and an execution engine that accelerates the data pipeline for computer vision and audio deep learning applications.
To load dataset using DALI, the following steps needed to be performed:

  1. Config external input iterator:
eii = ExternalInputIterator(data_source=opt.data_source, batch_size=opt.batch_size, shuffle=True)
# A demo of external input iterator
class ExternalInputIterator(object):
    def __init__(self, data_source, batch_size, shuffle):
        self.batch_size = batch_size
        
        img_paths = sorted(glob.glob(data_source + '/train' + '/blurry' + '/*/*.*'))
        gt_paths = sorted(glob.glob(data_source + '/train' + '/sharp' + '/*/*.*'))
        self.paths = list(zip(*(img_paths,gt_paths)))
        if shuffle:
            random.shuffle(self.paths)

    def __iter__(self):
        self.i = 0
        return self

    def __next__(self):
        imgs = []
        gts = []

        if self.i >= len(self.paths):
            self.__iter__()
            raise StopIteration

        for _ in range(self.batch_size):
            img_path, gt_path = self.paths[self.i % len(self.paths)]
            imgs.append(np.fromfile(img_path, dtype = np.uint8))
            gts.append(np.fromfile(gt_path, dtype = np.uint8))
            self.i += 1
        return (imgs, gts)

    def __len__(self):
        return len(self.paths)

    next = __next__
  1. Config pipeline:
pipe = externalSourcePipeline(batch_size=opt.batch_size, num_threads=opt.num_workers, device_id=0, seed=opt.seed, external_data = eii, resize=opt.resize, crop=opt.crop)
# A demo of pipeline
@pipeline_def
def externalSourcePipeline(external_data, resize, crop):
    imgs, gts = fn.external_source(source=external_data, num_outputs=2)
    
    crop_pos = (fn.random.uniform(range=(0., 1.)), fn.random.uniform(range=(0., 1.)))
    flip_p = (fn.random.coin_flip(), fn.random.coin_flip())
    
    imgs = transform(imgs, resize, crop, crop_pos, flip_p)
    gts = transform(gts, resize, crop, crop_pos, flip_p)
    return imgs, gts

def transform(imgs, resize, crop, crop_pos, flip_p):
    imgs = fn.decoders.image(imgs, device='mixed')
    imgs = fn.resize(imgs, resize_y=resize)
    imgs = fn.crop(imgs, crop=(crop,crop), crop_pos_x=crop_pos[0], crop_pos_y=crop_pos[1])
    imgs = fn.flip(imgs, horizontal=flip_p[0], vertical=flip_p[1])
    imgs = fn.transpose(imgs, perm=[2, 0, 1])
    imgs = imgs/127.5-1
    
    return imgs
  1. Instantiate DALIGenericIterator object:
dgi = DALIGenericIterator(pipe, output_map=["imgs", "gts"], last_batch_padded=True, last_batch_policy=LastBatchPolicy.PARTIAL, auto_reset=True)
  1. Read data:
for i, data in enumerate(dgi):
    imgs = data[0]['imgs']
    gts = data[0]['gts']
Statistical package in Python based on Pandas

Pingouin is an open-source statistical package written in Python 3 and based mostly on Pandas and NumPy. Some of its main features are listed below. F

Raphael Vallat 1.2k Dec 31, 2022
Python Library for learning (Structure and Parameter) and inference (Statistical and Causal) in Bayesian Networks.

pgmpy pgmpy is a python library for working with Probabilistic Graphical Models. Documentation and list of algorithms supported is at our official sit

pgmpy 2.2k Dec 25, 2022
DenseClus is a Python module for clustering mixed type data using UMAP and HDBSCAN

DenseClus is a Python module for clustering mixed type data using UMAP and HDBSCAN. Allowing for both categorical and numerical data, DenseClus makes it possible to incorporate all features in cluste

Amazon Web Services - Labs 53 Dec 08, 2022
Repository created with LinkedIn profile analysis project done

EN/en Repository created with LinkedIn profile analysis project done. The datase

Mayara Canaver 4 Aug 06, 2022
Galvanalyser is a system for automatically storing data generated by battery cycling machines in a database

Galvanalyser is a system for automatically storing data generated by battery cycling machines in a database, using a set of "harvesters", whose job it

Battery Intelligence Lab 20 Sep 28, 2022
A variant of LinUCB bandit algorithm with local differential privacy guarantee

Contents LDP LinUCB Description Model Architecture Dataset Environment Requirements Script Description Script and Sample Code Script Parameters Launch

Weiran Huang 4 Oct 25, 2022
First and foremost, we want dbt documentation to retain a DRY principle. Every time we repeat ourselves, we waste our time. Second, we want to understand column level lineage and automate impact analysis.

dbt-osmosis First and foremost, we want dbt documentation to retain a DRY principle. Every time we repeat ourselves, we waste our time. Second, we wan

Alexander Butler 150 Jan 06, 2023
A Python module for clustering creators of social media content into networks

sm_content_clustering A Python module for clustering creators of social media content into networks. Currently supports identifying potential networks

72 Dec 30, 2022
Flenser is a simple, minimal, automated exploratory data analysis tool.

Flenser Have you ever been handed a dataset you've never seen before? Flenser is a simple, minimal, automated exploratory data analysis tool. It runs

John McCambridge 79 Sep 20, 2022
Tools for the analysis, simulation, and presentation of Lorentz TEM data.

ltempy ltempy is a set of tools for Lorentz TEM data analysis, simulation, and presentation. Features Single Image Transport of Intensity Equation (SI

McMorran Lab 1 Dec 26, 2022
Port of dplyr and other related R packages in python, using pipda.

Unlike other similar packages in python that just mimic the piping syntax, datar follows the API designs from the original packages as much as possible, and is tested thoroughly with the cases from t

179 Dec 21, 2022
A CLI tool to reduce the friction between data scientists by reducing git conflicts removing notebook metadata and gracefully resolving git conflicts.

databooks is a package for reducing the friction data scientists while using Jupyter notebooks, by reducing the number of git conflicts between different notebooks and assisting in the resolution of

dataroots 86 Dec 25, 2022
PySpark bindings for H3, a hierarchical hexagonal geospatial indexing system

h3-pyspark: Uber's H3 Hexagonal Hierarchical Geospatial Indexing System in PySpark PySpark bindings for the H3 core library. For available functions,

Kevin Schaich 12 Dec 24, 2022
MapReader: A computer vision pipeline for the semantic exploration of maps at scale

MapReader A computer vision pipeline for the semantic exploration of maps at scale MapReader is an end-to-end computer vision (CV) pipeline designed b

Living with Machines 25 Dec 26, 2022
A data parser for the internal syncing data format used by Fog of World.

A data parser for the internal syncing data format used by Fog of World. The parser is not designed to be a well-coded library with good performance, it is more like a demo for showing the data struc

Zed(Zijun) Chen 40 Dec 12, 2022
talkbox is a scikit for signal/speech processing, to extend scipy capabilities in that domain.

talkbox is a scikit for signal/speech processing, to extend scipy capabilities in that domain.

David Cournapeau 76 Nov 30, 2022
A stock analysis app with streamlit

StockAnalysisApp A stock analysis app with streamlit. You select the ticker of the stock and the app makes a series of analysis by using the price cha

Antonio Catalano 50 Nov 27, 2022
A tax calculator for stocks and dividends activities.

Revolut Stocks calculator for Bulgarian National Revenue Agency Information Processing and calculating the required information about stock possession

Doino Gretchenliev 200 Oct 25, 2022
Streamz helps you build pipelines to manage continuous streams of data

Streamz helps you build pipelines to manage continuous streams of data. It is simple to use in simple cases, but also supports complex pipelines that involve branching, joining, flow control, feedbac

Python Streamz 1.1k Dec 28, 2022
Investigating EV charging data

Investigating EV charging data Introduction: Got an opportunity to work with a home monitoring technology company over the last 6 months whose goal wa

Yash 2 Apr 07, 2022