Implementation of Hierarchical Transformer Memory (HTM) for Pytorch

Overview

Hierarchical Transformer Memory (HTM) - Pytorch

Implementation of Hierarchical Transformer Memory (HTM) for Pytorch. This Deepmind paper proposes a simple method to allow transformers to attend to memories of the past efficiently. Original Jax repository

Install

$ pip install htm-pytorch

Usage

import torch
from htm_pytorch import HTMAttention

attn = HTMAttention(
    dim = 512,
    heads = 8,               # number of heads for within-memory attention
    dim_head = 64,           # dimension per head for within-memory attention
    topk_mems = 8,           # how many memory chunks to select for
    mem_chunk_size = 32,     # number of tokens in each memory chunk
    add_pos_enc = True       # whether to add positional encoding to the memories
)

queries = torch.randn(1, 128, 512)     # queries
memories = torch.randn(1, 20000, 512)  # memories, of any size
mask = torch.ones(1, 20000).bool()     # memory mask

attended = attn(queries, memories, mask = mask) # (1, 128, 512)

If you want the entire HTM Block (which contains the layernorm for the input followed by a skip connection), just import HTMBlock instead

import torch
from htm_pytorch import HTMBlock

block = HTMBlock(
    dim = 512,
    topk_mems = 8,
    mem_chunk_size = 32
)

queries = torch.randn(1, 128, 512)
memories = torch.randn(1, 20000, 512)
mask = torch.ones(1, 20000).bool()

out = block(queries, memories, mask = mask) # (1, 128, 512)

Citations

@misc{lampinen2021mental,
    title   = {Towards mental time travel: a hierarchical memory for reinforcement learning agents}, 
    author  = {Andrew Kyle Lampinen and Stephanie C. Y. Chan and Andrea Banino and Felix Hill},
    year    = {2021},
    eprint  = {2105.14039},
    archivePrefix = {arXiv},
    primaryClass = {cs.LG}
}
You might also like...
Segcache: a memory-efficient and scalable in-memory key-value cache for small objects

Segcache: a memory-efficient and scalable in-memory key-value cache for small objects This repo contains the code of Segcache described in the followi

Episodic-memory - Ego4D Episodic Memory Benchmark

Ego4D Episodic Memory Benchmark EGO4D is the world's largest egocentric (first p

Implementation of H-Transformer-1D, Hierarchical Attention for Sequence Learning
Implementation of H-Transformer-1D, Hierarchical Attention for Sequence Learning

H-Transformer-1D Implementation of H-Transformer-1D, Transformer using hierarchical Attention for sequence learning with subquadratic costs. For now,

Implementation of H-Transformer-1D, Hierarchical Attention for Sequence Learning using 🤗 transformers

hierarchical-transformer-1d Implementation of H-Transformer-1D, Hierarchical Attention for Sequence Learning using 🤗 transformers In Progress!! 2021.

VSR-Transformer - This paper proposes a new Transformer for video super-resolution (called VSR-Transformer).
VSR-Transformer - This paper proposes a new Transformer for video super-resolution (called VSR-Transformer).

VSR-Transformer By Jiezhang Cao, Yawei Li, Kai Zhang, Luc Van Gool This paper proposes a new Transformer for video super-resolution (called VSR-Transf

HiFT: Hierarchical Feature Transformer for Aerial Tracking (ICCV2021)
HiFT: Hierarchical Feature Transformer for Aerial Tracking (ICCV2021)

HiFT: Hierarchical Feature Transformer for Aerial Tracking Ziang Cao, Changhong Fu, Junjie Ye, Bowen Li, and Yiming Li Our paper is Accepted by ICCV 2

The official code for paper "R2D2: Recursive Transformer based on Differentiable Tree for Interpretable Hierarchical Language Modeling".

R2D2 This is the official code for paper titled "R2D2: Recursive Transformer based on Differentiable Tree for Interpretable Hierarchical Language Mode

Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image classification, in Pytorch
Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image classification, in Pytorch

Transformer in Transformer Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image c

Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch
Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch

Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch

Comments
  • auto-regressive use case

    auto-regressive use case

    Hi Phil! I was wondering if HTM part can be used in/for auto-regressive scenario? Full proposed arch in the paper has 3 blocks:

    1. Self Att - this can be easily done with causal masking
    2. next we have HTM block with memories - can it be used in autoregressive scenario i wonder?
    3. Feed Forward block

    please let me know your thoughts?

    opened by inspirit 0
Releases(0.0.4)
Owner
Phil Wang
Working with Attention. It's all we need
Phil Wang
Speech Recognition is an important feature in several applications used such as home automation, artificial intelligence

Speech Recognition is an important feature in several applications used such as home automation, artificial intelligence, etc. This article aims to provide an introduction on how to make use of the S

RISHABH MISHRA 1 Feb 13, 2022
Learning RGB-D Feature Embeddings for Unseen Object Instance Segmentation

Unseen Object Clustering: Learning RGB-D Feature Embeddings for Unseen Object Instance Segmentation Introduction In this work, we propose a new method

NVIDIA Research Projects 132 Dec 13, 2022
Google-drive-to-sqlite - Create a SQLite database containing metadata from Google Drive

google-drive-to-sqlite Create a SQLite database containing metadata from Google

Simon Willison 140 Dec 04, 2022
COIN the currently largest dataset for comprehensive instruction video analysis.

COIN Dataset COIN is the currently largest dataset for comprehensive instruction video analysis. It contains 11,827 videos of 180 different tasks (i.e

86 Dec 28, 2022
Official implementation of "DSP: Dual Soft-Paste for Unsupervised Domain Adaptive Semantic Segmentation"

DSP Official implementation of "DSP: Dual Soft-Paste for Unsupervised Domain Adaptive Semantic Segmentation". Accepted by ACM Multimedia 2021. Authors

20 Oct 24, 2022
Editing a classifier by rewriting its prediction rules

This repository contains the code and data for our paper: Editing a classifier by rewriting its prediction rules Shibani Santurkar*, Dimitris Tsipras*

Madry Lab 86 Dec 27, 2022
A cool little repl-based simulation written in Python

A cool little repl-based simulation written in Python planned to integrate machine-learning into itself to have AI battle to the death before your eye

Em 6 Sep 17, 2022
Quantized tflite models for ailia TFLite Runtime

ailia-models-tflite Quantized tflite models for ailia TFLite Runtime About ailia TFLite Runtime ailia TF Lite Runtime is a TensorFlow Lite compatible

ax Inc. 13 Dec 23, 2022
Code, Data and Demo for Paper: Controllable Generation from Pre-trained Language Models via Inverse Prompting

InversePrompting Paper: Controllable Generation from Pre-trained Language Models via Inverse Prompting Code: The code is provided in the "chinese_ip"

THUDM 101 Dec 16, 2022
Official Pytorch implementation of the paper "Action-Conditioned 3D Human Motion Synthesis with Transformer VAE", ICCV 2021

ACTOR Official Pytorch implementation of the paper "Action-Conditioned 3D Human Motion Synthesis with Transformer VAE", ICCV 2021. Please visit our we

Mathis Petrovich 248 Dec 23, 2022
A map update dataset and benchmark

MUNO21 MUNO21 is a dataset and benchmark for machine learning methods that automatically update and maintain digital street map datasets. Previous dat

16 Nov 30, 2022
A copy of Ares that costs 30 fucking dollars.

Finalement, j'ai décidé d'abandonner cette idée, je me suis comporté comme un enfant qui été en colère. Comme m'ont dit certaines personnes j'ai des c

Bleu 24 Apr 14, 2022
[CVPR 2022] Back To Reality: Weak-supervised 3D Object Detection with Shape-guided Label Enhancement

Back To Reality: Weak-supervised 3D Object Detection with Shape-guided Label Enhancement Announcement 🔥 We have not tested the code yet. We will fini

Xiuwei Xu 7 Oct 30, 2022
Self-Supervised depth kalilia

Self-Supervised depth kalilia

24 Oct 15, 2022
Code for paper ECCV 2020 paper: Who Left the Dogs Out? 3D Animal Reconstruction with Expectation Maximization in the Loop.

Who Left the Dogs Out? Evaluation and demo code for our ECCV 2020 paper: Who Left the Dogs Out? 3D Animal Reconstruction with Expectation Maximization

Benjamin Biggs 29 Dec 28, 2022
A Momentumized, Adaptive, Dual Averaged Gradient Method for Stochastic Optimization

MADGRAD Optimization Method A Momentumized, Adaptive, Dual Averaged Gradient Method for Stochastic Optimization pip install madgrad Try it out! A best

Meta Research 774 Dec 31, 2022
Replication Package for "An Empirical Study of the Effectiveness of an Ensemble of Stand-alone Sentiment Detection Tools for Software Engineering Datasets"

Replication Package for "An Empirical Study of the Effectiveness of an Ensemble of Stand-alone Sentiment Detection Tools for Software Engineering Data

2 Oct 06, 2022
Automatic packaging of the open-composite libs for OvGME

OvGME Packager for OpenXR – OpenComposite for DCS Note This repository is currently unsupported and needs to be migrated to the upstream OpenComposite

12 Nov 03, 2022
The Balloon Learning Environment - flying stratospheric balloons with deep reinforcement learning.

Balloon Learning Environment Docs The Balloon Learning Environment (BLE) is a simulator for stratospheric balloons. It is designed as a benchmark envi

Google 87 Dec 25, 2022