An exploration of log domain "alternative floating point" for hardware ML/AI accelerators.

Overview

This repository contains the SystemVerilog RTL, C++, HLS (Intel FPGA OpenCL to wrap RTL code) and Python needed to reproduce the numerical results in "Rethinking floating point for deep learning" [1].

There are two types of floating point implemented:

  • N-bit (N, l, alpha, beta, gamma) log with ELMA [1]
  • N-bit (N, s) (linear) posit [2]

with partial implementation of IEEE-style (e, s) floating point (likely quite buggy) and non-posit tapered log.

8-bit (8, 1, 5, 5, 7) log is the format described in "Rethinking floating point for deep learning", shown within to be more energy efficient than int8/32 integer multiply-add at 28 nm and an effective drop-in replacement for IEEE 754 binary32 single precision floating point via round to nearest even for CNN inference on ResNet-50 on ImageNet.

[1] Johnson, J. "Rethinking floating point for deep learning." (2018). https://arxiv.org/abs/1811.01721

[2] Gustafson, J. and Yonemoto, I. "Beating floating point at its own game: Posit arithmetic." Supercomputing Frontiers and Innovations 4.2 (2017): 71-86.

Requirements

You will need:

  • a PyTorch CPU installation
  • a C++11-compatible compiler to use to generate a PyTorch C++ extension module
  • the ImageNet ILSVRC12 image validation set
  • an Intel OpenCL for FPGA compatible board
  • a Quartus Prime Pro installation with the Intel OpenCL for FPGA compiler

rtl contains the SystemVerilog modules needed for the design.

bitstream contains the OpenCL that wraps the RTL modules.

cpp contains host CPU-side code for interacting with the FPGA OpenCL design.

py contains the top-level functionality to compile the CPU code and run networks.

Flow

In bitstream, run

./build_lib.sh <design>

followed by

./build_afu.sh <design> (this will take several hours to synthesize the FPGA design)

where <design> is one of loglib or positlib. The aoc/aocl tools, Quartus, Quartus license file, OpenCL BSP etc. must be in your path/environment. loglib is configured to generate a design with 8-bit (8, 1, 5, 5, 7) log arithmetic, and positlib is configured to generate a design with 8-bit (8, 1) posit arithmetic by default.

The aoc build seems to require a Python 2.x interpreter in the path, otherwise it will fail.

Update the aocx_file in py/run_fpga_resnet.py to your choice of design.

Update valdir towards the end of py/validate.py to point to a Torch dataset loader compatible installation of the ImageNet validation set.

Using a python environment with PyTorch available, in py run:

python run_fpga_resnet.py

If successful, this will run the complete validation set against the FPGA design. This requires a Python 3.x interpreter.

RTL comments

The modules used by the OpenCL design reside in rtl/log/operators and rtl/posit/operators. You can see how they are assembled here.

rtl/paper_syn contains the modules used in the paper's 28 nm synthesis results (Paper*Top.sv are the top-level modules). Waves_*.sv are the testbench programs used to generate switching activity for power analysis output.

You will have to provide your own Synopsys Design Compiler scripts/flow/cell libraries/PDK/etc. for synthesis, as we are not allowed to share details on which 28 nm semiconductor process was used or our Design Compiler synthesis scripts.

Other comments

The posit encoding implemented herein implements negative values with a sign bit rather than two's complement encoding. It is a TODO to change it, but the cost either way is largely dwarfed by other concerns in my opinion.

The FPGA design itself is not super flexible yet to support different bit widths than 8. loglib is restricted to N <= 8 bits at the moment, while positlib should be ok for N <= 16 bits, though some of the larger designs may run into FPGA resource issues if synthesized for the FPGA.

Contributions

This repo currently exists as a proof of concept. Contributions may be considered, but the design is mostly that which is needed to reproduce the results from the paper.

License

This code is licensed under CC-BY-NC 4.0.

This code also includes and uses the Single Python Fixed-Point Module for LUT SystemVerilog log-to-linear and linear-to-log mapping module generation in rtl/log/luts, which is licensed by the Python-2.4.2 license.

Owner
Facebook Research
Facebook Research
Learning recognition/segmentation models without end-to-end training. 40%-60% less GPU memory footprint. Same training time. Better performance.

InfoPro-Pytorch The Information Propagation algorithm for training deep networks with local supervision. (ICLR 2021) Revisiting Locally Supervised Lea

78 Dec 27, 2022
Python scripts for performing object detection with the 1000 labels of the ImageNet dataset in ONNX.

Python scripts for performing object detection with the 1000 labels of the ImageNet dataset in ONNX. The repository combines a class agnostic object localizer to first detect the objects in the image

Ibai Gorordo 24 Nov 14, 2022
GE2340 project source code without credentials.

GE2340-Project-Public GE2340 project source code without credentials. Run the bot.py to start the bot Telegram: @jasperwong_ge2340_bot If the bot does

0 Feb 10, 2022
This repository lets you interact with Lean through a REPL.

lean-gym This repository lets you interact with Lean through a REPL. See Formal Mathematics Statement Curriculum Learning for a presentation of lean-g

OpenAI 87 Dec 28, 2022
AttentionGAN for Unpaired Image-to-Image Translation & Multi-Domain Image-to-Image Translation

AttentionGAN-v2 for Unpaired Image-to-Image Translation AttentionGAN-v2 Framework The proposed generator learns both foreground and background attenti

Hao Tang 530 Dec 27, 2022
Usable Implementation of "Bootstrap Your Own Latent" self-supervised learning, from Deepmind, in Pytorch

Bootstrap Your Own Latent (BYOL), in Pytorch Practical implementation of an astoundingly simple method for self-supervised learning that achieves a ne

Phil Wang 1.4k Dec 29, 2022
Building Ellee — A GPT-3 and Computer Vision Powered Talking Robotic Teddy Bear With Human Level Conversation Intelligence

Using an object detection and facial recognition system built on MobileNetSSDV2 and Dlib and running on an NVIDIA Jetson Nano, a GPT-3 model, Google Speech Recognition, Amazon Polly and servo motors,

24 Oct 26, 2022
Sparse Progressive Distillation: Resolving Overfitting under Pretrain-and-Finetune Paradigm

Sparse Progressive Distillation: Resolving Overfitting under Pretrain-and-Finetu

3 Dec 05, 2022
Re-implementation of 'Grokking: Generalization beyond overfitting on small algorithmic datasets'

Re-implementation of the paper 'Grokking: Generalization beyond overfitting on small algorithmic datasets' Paper Original paper can be found here Data

Tom Lieberum 38 Aug 09, 2022
A library that can print Python objects in human readable format

objprint A library that can print Python objects in human readable format Install pip install objprint Usage op Use op() (or objprint()) to print obj

319 Dec 25, 2022
Python Fanduel API (2021) - Lineup Automation

Southpaw is a python package that provides access to the Fanduel API. Optimize your DFS experience by programmatically updating your lineups, analyzin

Brandin Canfield 13 Jan 04, 2023
Hybrid Neural Fusion for Full-frame Video Stabilization

FuSta: Hybrid Neural Fusion for Full-frame Video Stabilization Project Page | Video | Paper | Google Colab Setup Setup environment for [Yu and Ramamoo

Yu-Lun Liu 430 Jan 04, 2023
Adaptive Graph Convolution for Point Cloud Analysis

Adaptive Graph Convolution for Point Cloud Analysis This repository contains the implementation of AdaptConv for point cloud analysis. Adaptive Graph

64 Dec 21, 2022
Barlow Twins and HSIC

Barlow Twins and HSIC Unofficial Pytorch implementation for Barlow Twins and HSIC_SSL on small datasets (CIFAR10, STL10, and Tiny ImageNet). Correspon

Yao-Hung Hubert Tsai 49 Nov 24, 2022
Public implementation of the Convolutional Motif Kernel Network (CMKN) architecture

CMKN Implementation of the convolutional motif kernel network (CMKN) introduced in Ditz et al., "Convolutional Motif Kernel Network", 2021. Testing Yo

1 Nov 17, 2021
[NeurIPS 2021] Garment4D: Garment Reconstruction from Point Cloud Sequences

Garment4D [PDF] | [OpenReview] | [Project Page] Overview This is the codebase for our NeurIPS 2021 paper Garment4D: Garment Reconstruction from Point

Fangzhou Hong 112 Dec 23, 2022
The code is an implementation of Feedback Convolutional Neural Network for Visual Localization and Segmentation.

Feedback Convolutional Neural Network for Visual Localization and Segmentation The code is an implementation of Feedback Convolutional Neural Network

19 Dec 04, 2022
Author's PyTorch implementation of Randomized Ensembled Double Q-Learning (REDQ) algorithm.

REDQ source code Author's PyTorch implementation of Randomized Ensembled Double Q-Learning (REDQ) algorithm. Paper link: https://arxiv.org/abs/2101.05

109 Dec 16, 2022
Pytorch implementation for the paper: Contrastive Learning for Cold-start Recommendation

Contrastive Learning for Cold-start Recommendation This is our Pytorch implementation for the paper: Yinwei Wei, Xiang Wang, Qi Li, Liqiang Nie, Yan L

45 Dec 13, 2022
Unified MultiWOZ evaluation scripts for the context-to-response task.

MultiWOZ Context-to-Response Evaluation Standardized and easy to use Inform, Success, BLEU ~ See the paper ~ Easy-to-use scripts for standardized eval

Tomáš Nekvinda 38 Dec 13, 2022