Experiments and code to generate the GINC small-scale in-context learning dataset from "An Explanation for In-context Learning as Implicit Bayesian Inference"

Last update: Dec 19, 2022

Related tags

Deep Learning incontext-learning

Overview

GINC small-scale in-context learning dataset

GINC (Generative In-Context learning Dataset) is a small-scale synthetic dataset for studying in-context learning. The pretraining data is generated by a mixture of HMMs and the in-context learning prompt examples are also generated from HMMs (either from the mixture or not). The prompt examples are out-of-distribution with respect to the pretraining data since every example is independent, concatenated, and separated by delimiters. We provide code to generate GINC-style datasets of varying vocabulary sizes, number of HMMs, and other parameters.

Quickstart

Please create a conda environment or virtualenv using the information in conda-env.yml, then install transformers by going into the transformers/ directory and running pip install -e .. Modify consts.sh to change the default output locations and insert code to activate the environment of choice. Run scripts/runner.sh to run all the experiments on sbatch.

Explore the data

The default dataset has vocab size 50 and the pretraining data is generated as a mixture of 5 HMMs. The pretraining dataset is in data/GINC_trans0.1_start10.0_nsymbols50_nvalues10_nslots10_vic0.9_nhmms10/train.json while in-context prompts are in data/GINC_trans0.1_start10.0_nsymbols50_nvalues10_nslots10_vic0.9_nhmms10/id_prompts_randomsample_*.json.

This repo contains the experiments for the paper An Explanation of In-context Learning as Implicit Bayesian Inference. If you found this repo useful, please cite

@article{xie2021incontext,
  author = {Sang Michael Xie and Aditi Raghunathan and Percy Liang and Tengyu Ma},
  journal = {arXiv preprint arXiv:2111.02080},
  title = {An Explanation of In-context Learning as Implicit Bayesian Inference},
  year = {2021},
}

Experiments and code to generate the GINC small-scale in-context learning dataset from "An Explanation for In-context Learning as Implicit Bayesian Inference"

Related tags

Overview

GINC small-scale in-context learning dataset

Quickstart

Explore the data

Owner

P-Lambda

CC-GENERATOR - A python script for generating CC

Official implementation of the paper "AAVAE: Augmentation-AugmentedVariational Autoencoders"

Machine learning framework for both deep learning and traditional algorithms

Instance-Dependent Partial Label Learning

Large dataset storage format for Pytorch

Basit bir burç modülü.

Non-Imaging Transient Reconstruction And TEmporal Search (NITRATES)

[NeurIPS 2020] Official repository for the project "Listening to Sound of Silence for Speech Denoising"

The challenge for Quantum Coalition Hackathon 2021

Unofficial PyTorch implementation of Fastformer based on paper "Fastformer: Additive Attention Can Be All You Need"."

Transformer model implemented with Pytorch

Pytorch implementation of the paper "Enhancing Content Preservation in Text Style Transfer Using Reverse Attention and Conditional Layer Normalization"

[NeurIPS 2021] Large Scale Learning on Non-Homophilous Graphs: New Benchmarks and Strong Simple Methods

Using Convolutional Neural Networks (CNN) for Semantic Segmentation of Breast Cancer Lesions (BRCA)

This is an official implementation for "Exploiting Temporal Contexts with Strided Transformer for 3D Human Pose Estimation".

Subnet Replacement Attack: Towards Practical Deployment-Stage Backdoor Attack on Deep Neural Networks

Code for: Gradient-based Hierarchical Clustering using Continuous Representations of Trees in Hyperbolic Space. Nicholas Monath, Manzil Zaheer, Daniel Silva, Andrew McCallum, Amr Ahmed. KDD 2019.

Code for Contrastive-Geometry Networks for Generalized 3D Pose Transfer

QueryDet: Cascaded Sparse Query for Accelerating High-Resolution SmallObject Detection

CLOOB: Modern Hopfield Networks with InfoLOOB Outperform CLIP