Stochastic gradient descent with model building

Last update: Jan 19, 2022

Overview

Stochastic Model Building (SMB)

This repository includes a new fast and robust stochastic optimization algorithm for training deep learning models. The core idea of the algorithm is based on building models with local stochastic gradient information. The details of the algorithm is given in our recent paper.

Abstract

Stochastic gradient descent method and its variants constitute the core optimization algorithms that achieve good convergence rates for solving machine learning problems. These rates are obtained especially when these algorithms are fine-tuned for the application at hand. Although this tuning process can require large computational costs, recent work has shown that these costs can be reduced by line search methods that iteratively adjust the stepsize. We propose an alternative approach to stochastic line search by using a new algorithm based on forward step model building. This model building step incorporates a second-order information that allows adjusting not only the stepsize but also the search direction. Noting that deep learning model parameters come in groups (layers of tensors), our method builds its model and calculates a new step for each parameter group. This novel diagonalization approach makes the selected step lengths adaptive. We provide convergence rate analysis, and experimentally show that the proposed algorithm achieves faster convergence and better generalization in most problems. Moreover, our experiments show that the proposed method is quite robust as it converges for a wide range of initial stepsizes.

Keywords: model building; second-order information; stochastic gradient descent; convergence analysis

Installation

pip install git+https://github.com/sbirbil/SMB.git

Testing

Here is how you can use SMB:

import smb

optimizer = smb.SMB(model.parameters(), independent_batch=False) #independent_batch=True for SMBi optimizer

for epoch in range(100):
    
    # training steps
    model.train()
    
    for batch_index, (data, target) in enumerate(train_loader):
            
        # create loss closure for smb algorithm
        def closure():
            optimizer.zero_grad()
            loss = torch.nn.CrossEntropyLoss()(model(data), target)
            return loss
        
        # forward pass
        loss = optimizer.step(closure=closure)

You can also check our tutorial for a complete example (or the Colab notebook without installation). Set the hyper-parameter independent_batch to True in order to use the SMBi optimizer. Our paper includes more information.

Reproducing The Experiments

See the following script in order to reproduce the results in our paper.

Stochastic gradient descent with model building

Related tags

Overview

Stochastic Model Building (SMB)

Installation

Testing

Reproducing The Experiments

Owner

S. Ilker Birbil

We evaluate our method on different datasets (including ShapeNet, CUB-200-2011, and Pascal3D+) and achieve state-of-the-art results, outperforming all the other supervised and unsupervised methods and 3D representations, all in terms of performance, accuracy, and training time.

Notepy is a full-featured Notepad Python app

Machine learning library for fast and efficient Gaussian mixture models

Translate darknet to tensorflow. Load trained weights, retrain/fine-tune using tensorflow, export constant graph def to mobile devices

Implementation of various Vision Transformers I found interesting

Storchastic is a PyTorch library for stochastic gradient estimation in Deep Learning

Tensorflow implementation of our method: "Triangle Graph Interest Network for Click-through Rate Prediction".

This is the pytorch re-implementation of the IterNorm

Official repository for Fourier model that can generate periodic signals

Temporal Knowledge Graph Reasoning Triggered by Memories

C3DPO - Canonical 3D Pose Networks for Non-rigid Structure From Motion.

A Python Package for Convex Regression and Frontier Estimation

Fast algorithms to compute an approximation of the minimal volume oriented bounding box of a point cloud in 3D.

A set of tools for Namebase and HNS

PyTorchCV: A PyTorch-Based Framework for Deep Learning in Computer Vision.

Shuwa Gesture Toolkit is a framework that detects and classifies arbitrary gestures in short videos

BigDetection: A Large-scale Benchmark for Improved Object Detector Pre-training

UniMoCo: Unsupervised, Semi-Supervised and Full-Supervised Visual Representation Learning

A High-Performance Distributed Library for Large-Scale Bundle Adjustment

An API-first distributed deployment system of deep learning models using timeseries data to analyze and predict systems behaviour