Memory-efficient optimum einsum using opt_einsum planning and PyTorch kernels.

Last update: Nov 18, 2022

Overview

opt-einsum-torch

There have been many implementations of Einstein's summation. numpy's numpy.einsum is the least efficient one as it only runs in single thread on CPU. PyTorch's torch.einsum works for both CPU and CUDA tensors. However, since there is no virtual CUDA memory, torch.einsum will run out of CUDA memory for large tensors.

This code aims at implementing a memory-efficient einsum function using PyTorch as the backend. This code also uses the opt_einsum package to optimizes the contraction path to achieve the minimal FLOPS.

Usage

from opt_einsum_torch import EinsumPlanner
import torch

# Some huge tensors
arr1, arr2 = ..., ...
ee = EinsumPlanner(torch.device('cuda:0'), cuda_mem_limit=0.9)
result = ee.einsum('ijk,jkl->il', arr1, arr2)

The resulting tensor result will be a PyTorch CPU tensor. You could convert it into numpy array by simply calling result.numpy().

Future works

Support multiple GPUs.
Memory efficient einsum kernels.
CUDA data transfer profilers.

You might also like...

Implementation of "Efficient Regional Memory Network for Video Object Segmentation" (Xie et al., CVPR 2021).

RMNet This repository contains the source code for the paper Efficient Regional Memory Network for Video Object Segmentation. Cite this work @inprocee

76 Dec 14, 2022

Rethinking Space-Time Networks with Improved Memory Coverage for Efficient Video Object Segmentation

STCN Rethinking Space-Time Networks with Improved Memory Coverage for Efficient Video Object Segmentation Ho Kei Cheng, Yu-Wing Tai, Chi-Keung Tang [a

456 Dec 12, 2022

A memory-efficient implementation of DenseNets

efficient_densenet_pytorch A PyTorch =1.0 implementation of DenseNets, optimized to save GPU memory. Recent updates Now works on PyTorch 1.0! It uses

1.4k Dec 25, 2022

InvTorch: memory-efficient models with invertible functions

InvTorch: Memory-Efficient Invertible Functions This module extends the functionality of torch.utils.checkpoint.checkpoint to work with invertible fun

12 May 12, 2022

Implementation of Memory-Efficient Neural Networks with Multi-Level Generation, ICCV 2021

Memory-Efficient Multi-Level In-Situ Generation (MLG) By Jiaqi Gu, Hanqing Zhu, Chenghao Feng, Mingjie Liu, Zixuan Jiang, Ray T. Chen and David Z. Pan

2 Jan 4, 2022

This is the official repo for TransFill: Reference-guided Image Inpainting by Merging Multiple Color and Spatial Transformations at CVPR'21. According to some product reasons, we are not planning to release the training/testing codes and models. However, we will release the dataset and the scripts to prepare the dataset.

TransFill-Reference-Inpainting This is the official repo for TransFill: Reference-guided Image Inpainting by Merging Multiple Color and Spatial Transf

80 Dec 8, 2022

Efficient-GlobalPointer - Pytorch Efficient GlobalPointer

Releases(0.1.0)

0.1.0(Dec 30, 2021)

Initial release of the package.
Source code(tar.gz)
Source code(zip)

Memory-efficient optimum einsum using opt_einsum planning and PyTorch kernels.

Related tags

Overview

opt-einsum-torch

Usage

Future works

You might also like...

Implementation of "Efficient Regional Memory Network for Video Object Segmentation" (Xie et al., CVPR 2021).

Rethinking Space-Time Networks with Improved Memory Coverage for Efficient Video Object Segmentation

A memory-efficient implementation of DenseNets

InvTorch: memory-efficient models with invertible functions

Implementation of Memory-Efficient Neural Networks with Multi-Level Generation, ICCV 2021

Efficient-GlobalPointer - Pytorch Efficient GlobalPointer

Official repository for "PAIR: Planning and Iterative Refinement in Pre-trained Transformers for Long Text Generation"

GNPy: Optical Route Planning and DWDM Network Optimization

Releases(0.1.0)

0.1.0(Dec 30, 2021)

Owner

Haoyan Huo

Video-face-extractor - Video face extractor with Python

BMVC 2021: This is the github repository for "Few Shot Temporal Action Localization using Query Adaptive Transformers" accepted in British Machine Vision Conference (BMVC) 2021, Virtual

Implementation of Lie Transformer, Equivariant Self-Attention, in Pytorch

Deep Learning and Reinforcement Learning Library for Scientists and Engineers 🔥

BoxInst: High-Performance Instance Segmentation with Box Annotations

This is a official repository of SimViT.

The official github repository for Towards Continual Knowledge Learning of Language Models

The codes and related files to reproduce the results for Image Similarity Challenge Track 2.

Unsupervised Domain Adaptation for Nighttime Aerial Tracking (CVPR2022)

Inhomogeneous Social Recommendation with Hypergraph Convolutional Networks

Reverse engineer your pytorch vision models, in style

ML-Decoder: Scalable and Versatile Classification Head

Transformers based fully on MLPs

Code to compute permutation and drop-column importances in Python scikit-learn models

Multi-Output Gaussian Process Toolkit

Pytorch implementation of MaskGIT: Masked Generative Image Transformer

Unified file system operation experience for different backend

Ontologysim: a Owlready2 library for applied production simulation

HMLET (Hybrid-Method-of-Linear-and-non-linEar-collaborative-filTering-method)

Deploying PyTorch Model to Production with FastAPI in CUDA-supported Docker