SLIDE : In Defense of Smart Algorithms over Hardware Acceleration for Large-Scale Deep Learning Systems

Last update: Dec 16, 2022

Related tags

Overview

SLIDE

The SLIDE package contains the source code for reproducing the main experiments in this paper.

Dataset

The Datasets can be downloaded in Amazon-670K. Note that the data is sorted by labels so please shuffle at least the validation/testing data.

TensorFlow Baselines

We suggest directly get TensorFlow docker image to install TensorFlow-GPU. For TensorFlow-CPU compiled with AVX2, we recommend using this precompiled build.

Also there is a TensorFlow docker image specifically built for CPUs with AVX-512 instructions, to get it use:

docker pull clearlinux/stacks-dlrs_2-mkl

config.py controls the parameters of TensorFlow training like learning rate. example_full_softmax.py, example_sampled_softmax.py are example files for Amazon-670K dataset with full softmax and sampled softmax respectively.

Build/Run on Intel platform

Prerequisites:

CMake >= 3.0 Intel Compiler (ICC) >= 19

Build with ICC compiler

source /opt/intel/compilers_and_libraries/linux/bin/compilervars.sh -arch intel64 -platform linux
cd /path/to/slide-root
mkdir -p bin && cd bin 
# BDW (AVX2)
cmake .. -DCMAKE_CXX_COMPILER=icpc -DCMAKE_C_COMPILER=icc
# SKX/CLX (AVX512)
cmake .. -DCMAKE_CXX_COMPILER=icpc -DCMAKE_C_COMPILER=icc -DOPT_AVX512=1
# CPX (AVX512 + BF16)
cmake .. -DCMAKE_CXX_COMPILER=icpc -DCMAKE_C_COMPILER=icc -DOPT_AVX512=1 -DOPT_AVX512_BF16=1
make -j

Run on Intel SKX/CLX/CPX

cd bin
OMP_NUM_THREADS= KMP_HW_SUBSET=s,c,t KMP_AFFINITY=compact,granularity=fine KMP_BLOCKTIME=200 ./runme ../SLIDE/Config_amz.csv
For example, on CLX8280 2Sx28c:
OMP_NUM_THREADS=112 KMP_HW_SUBSET=2s,28c,2t KMP_AFFINITY=compact,granularity=fine KMP_BLOCKTIME=200 ./runme ../SLIDE/Config_amz.csv

For best performance please set Batchsize=multiple-of-logic-core-number from SLIDE/Config_amz.csv.

Results can be checked from the log file under dataset:

tail -f dataset/log.txt

SLIDE : In Defense of Smart Algorithms over Hardware Acceleration for Large-Scale Deep Learning Systems

Related tags

Overview

SLIDE

Dataset

TensorFlow Baselines

Build/Run on Intel platform

Prerequisites:

Build with ICC compiler

Run on Intel SKX/CLX/CPX

Owner

Intel Labs

Implementation of "Deep Implicit Templates for 3D Shape Representation"

This repository contains an implementation of ConvMixer for the ICLR 2022 submission "Patches Are All You Need?".

Solving SMPL/MANO parameters from keypoint coordinates.

RADIal is available now! Check the download section

Code for our ACL 2021 paper - ConSERT: A Contrastive Framework for Self-Supervised Sentence Representation Transfer

Implementation of "Semi-supervised Domain Adaptive Structure Learning"

RITA is a family of autoregressive protein models, developed by LightOn in collaboration with the OATML group at Oxford and the Debora Marks Lab at Harvard.

Official Code for "Non-deep Networks"

Compositional Sketch Search

Dataset Condensation with Contrastive Signals

SwinTrack: A Simple and Strong Baseline for Transformer Tracking

PyTorch implementation and pretrained models for XCiT models. See XCiT: Cross-Covariance Image Transformer

Make a surveillance camera from your raspberry pi!

Turi Create simplifies the development of custom machine learning models.

Open-sourcing the Slates Dataset for recommender systems research

A pytorch &keras implementation and demo of Fastformer.

Unsupervised Foreground Extraction via Deep Region Competition

Learning to Draw: Emergent Communication through Sketching

This is the formal code implementation of the CVPR 2022 paper 'Federated Class Incremental Learning'.

The 7th edition of NTIRE: New Trends in Image Restoration and Enhancement workshop will be held on June 2022 in conjunction with CVPR 2022.