Implementation of Hierarchical Transformer Memory (HTM) for Pytorch

Last update: Dec 29, 2022

Overview

Hierarchical Transformer Memory (HTM) - Pytorch

Implementation of Hierarchical Transformer Memory (HTM) for Pytorch. This Deepmind paper proposes a simple method to allow transformers to attend to memories of the past efficiently. Original Jax repository

Install

$ pip install htm-pytorch

Usage

import torch
from htm_pytorch import HTMAttention

attn = HTMAttention(
    dim = 512,
    heads = 8,               # number of heads for within-memory attention
    dim_head = 64,           # dimension per head for within-memory attention
    topk_mems = 8,           # how many memory chunks to select for
    mem_chunk_size = 32,     # number of tokens in each memory chunk
    add_pos_enc = True       # whether to add positional encoding to the memories
)

queries = torch.randn(1, 128, 512)     # queries
memories = torch.randn(1, 20000, 512)  # memories, of any size
mask = torch.ones(1, 20000).bool()     # memory mask

attended = attn(queries, memories, mask = mask) # (1, 128, 512)

If you want the entire HTM Block (which contains the layernorm for the input followed by a skip connection), just import HTMBlock instead

import torch
from htm_pytorch import HTMBlock

block = HTMBlock(
    dim = 512,
    topk_mems = 8,
    mem_chunk_size = 32
)

queries = torch.randn(1, 128, 512)
memories = torch.randn(1, 20000, 512)
mask = torch.ones(1, 20000).bool()

out = block(queries, memories, mask = mask) # (1, 128, 512)

Citations

@misc{lampinen2021mental,
    title   = {Towards mental time travel: a hierarchical memory for reinforcement learning agents}, 
    author  = {Andrew Kyle Lampinen and Stephanie C. Y. Chan and Andrea Banino and Felix Hill},
    year    = {2021},
    eprint  = {2105.14039},
    archivePrefix = {arXiv},
    primaryClass = {cs.LG}
}

Segcache: a memory-efficient and scalable in-memory key-value cache for small objects

Segcache: a memory-efficient and scalable in-memory key-value cache for small objects This repo contains the code of Segcache described in the followi

78 Jan 7, 2023

Episodic-memory - Ego4D Episodic Memory Benchmark

Ego4D Episodic Memory Benchmark EGO4D is the world's largest egocentric (first p

3 Feb 18, 2022

Implementation of H-Transformer-1D, Hierarchical Attention for Sequence Learning

H-Transformer-1D Implementation of H-Transformer-1D, Transformer using hierarchical Attention for sequence learning with subquadratic costs. For now,

123 Nov 17, 2022

Implementation of H-Transformer-1D, Hierarchical Attention for Sequence Learning using 🤗 transformers

hierarchical-transformer-1d Implementation of H-Transformer-1D, Hierarchical Attention for Sequence Learning using 🤗 transformers In Progress!! 2021.

7 Nov 6, 2022

VSR-Transformer - This paper proposes a new Transformer for video super-resolution (called VSR-Transformer).

VSR-Transformer By Jiezhang Cao, Yawei Li, Kai Zhang, Luc Van Gool This paper proposes a new Transformer for video super-resolution (called VSR-Transf

225 Nov 13, 2022

HiFT: Hierarchical Feature Transformer for Aerial Tracking (ICCV2021)

HiFT: Hierarchical Feature Transformer for Aerial Tracking Ziang Cao, Changhong Fu, Junjie Ye, Bowen Li, and Yiming Li Our paper is Accepted by ICCV 2

Intelligent Vision for Robotics in Complex Environment

55 Nov 23, 2022

The official code for paper "R2D2: Recursive Transformer based on Differentiable Tree for Interpretable Hierarchical Language Modeling".

R2D2 This is the official code for paper titled "R2D2: Recursive Transformer based on Differentiable Tree for Interpretable Hierarchical Language Mode

49 Dec 17, 2022

Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image classification, in Pytorch

Transformer in Transformer Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image c

272 Dec 23, 2022

Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch

12.6k Jan 9, 2023

Comments

auto-regressive use case
Hi Phil! I was wondering if HTM part can be used in/for auto-regressive scenario? Full proposed arch in the paper has 3 blocks:

Self Att - this can be easily done with causal masking

next we have HTM block with memories - can it be used in autoregressive scenario i wonder?

Feed Forward block

please let me know your thoughts?
opened by inspirit 0

Implementation of Hierarchical Transformer Memory (HTM) for Pytorch

Related tags

Overview

Hierarchical Transformer Memory (HTM) - Pytorch

Install

Usage

Citations

You might also like...

Segcache: a memory-efficient and scalable in-memory key-value cache for small objects

Episodic-memory - Ego4D Episodic Memory Benchmark

Implementation of H-Transformer-1D, Hierarchical Attention for Sequence Learning

Implementation of H-Transformer-1D, Hierarchical Attention for Sequence Learning using 🤗 transformers

VSR-Transformer - This paper proposes a new Transformer for video super-resolution (called VSR-Transformer).

HiFT: Hierarchical Feature Transformer for Aerial Tracking (ICCV2021)

The official code for paper "R2D2: Recursive Transformer based on Differentiable Tree for Interpretable Hierarchical Language Modeling".

Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image classification, in Pytorch

Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch

Comments

auto-regressive use case

Releases(0.0.4)

0.0.4(Sep 15, 2021)

0.0.3(Sep 14, 2021)

0.0.2(Sep 14, 2021)

0.0.1(Sep 14, 2021)

Owner

Phil Wang

Speech Recognition is an important feature in several applications used such as home automation, artificial intelligence

Learning RGB-D Feature Embeddings for Unseen Object Instance Segmentation

Google-drive-to-sqlite - Create a SQLite database containing metadata from Google Drive

COIN the currently largest dataset for comprehensive instruction video analysis.

Official implementation of "DSP: Dual Soft-Paste for Unsupervised Domain Adaptive Semantic Segmentation"

Editing a classifier by rewriting its prediction rules

A cool little repl-based simulation written in Python

Quantized tflite models for ailia TFLite Runtime

Code, Data and Demo for Paper: Controllable Generation from Pre-trained Language Models via Inverse Prompting

Official Pytorch implementation of the paper "Action-Conditioned 3D Human Motion Synthesis with Transformer VAE", ICCV 2021

A map update dataset and benchmark

A copy of Ares that costs 30 fucking dollars.

[CVPR 2022] Back To Reality: Weak-supervised 3D Object Detection with Shape-guided Label Enhancement

Self-Supervised depth kalilia

Code for paper ECCV 2020 paper: Who Left the Dogs Out? 3D Animal Reconstruction with Expectation Maximization in the Loop.

A Momentumized, Adaptive, Dual Averaged Gradient Method for Stochastic Optimization

Replication Package for "An Empirical Study of the Effectiveness of an Ensemble of Stand-alone Sentiment Detection Tools for Software Engineering Datasets"

Automatic packaging of the open-composite libs for OvGME

The Balloon Learning Environment - flying stratospheric balloons with deep reinforcement learning.