Re-implementation of 'Grokking: Generalization beyond overfitting on small algorithmic datasets'

Last update: Aug 09, 2022

Related tags

Deep Learning grokking

Overview

Re-implementation of the paper 'Grokking: Generalization beyond overfitting on small algorithmic datasets'

Paper

Original paper can be found here

Datasets

I'm not super clear on how they defined their division. I am using integer division:

$$x\circ y = (x // y) mod p$$, for some prime $$p$$ and $$0\leq x,y \leq p$$
$$x\circ y = (x // y) mod p$$ if y is odd else (x - y) mod p, for some prime $$p$$ and $$0\leq x,y \leq p$$

Hyperparameters

The default hyperparameters are from the paper, but can be adjusted via the command line when running train.py

Running experiments

To run with default settings, simply run python train.py. The first time you train on any dataset you have to specify --force_data.

Arguments:

optimizer args

"--lr", type=float, default=1e-3
"--weight_decay", type=float, default=1
"--beta1", type=float, default=0.9
"--beta2", type=float, default=0.98

model args

"--num_heads", type=int, default=4
"--layers", type=int, default=2
"--width", type=int, default=128

data args

"--data_name", type=str, default="perm", choices=[
- "perm_xy", # permutation composition x * y
- "perm_xyx1", # permutation composition x * y * x^-1
- "perm_xyx", # permutation composition x * y * x
- "plus", # x + y
- "minus", # x - y
- "div", # x / y
- "div_odd", # x / y if y is odd else x - y
- "x2y2", # x^2 + y^2
- "x2xyy2", # x^2 + y^2 + xy
- "x2xyy2x", # x^2 + y^2 + xy + x
- "x3xy", # x^3 + y
- "x3xy2y" # x^3 + xy^2 + y ]
"--num_elements", type=int, default=5 (choose 5 for permutation data, 97 for arithmetic data)
"--data_dir", type=str, default="./data"
"--force_data", action="store_true", help="Whether to force dataset creation."

training args

"--batch_size", type=int, default=512
"--steps", type=int, default=10**5
"--train_ratio", type=float, default=0.5
"--seed", type=int, default=42
"--verbose", action="store_true"
"--log_freq", type=int, default=10
"--num_workers", type=int, default=4

Re-implementation of 'Grokking: Generalization beyond overfitting on small algorithmic datasets'

Related tags

Overview

Re-implementation of the paper 'Grokking: Generalization beyond overfitting on small algorithmic datasets'

Paper

Datasets

Hyperparameters

Running experiments

Arguments:

optimizer args

model args

data args

training args

Owner

Tom Lieberum

Customer Segmentation using RFM

[WWW 2022] Zero-Shot Stance Detection via Contrastive Learning

Using Convolutional Neural Networks (CNN) for Semantic Segmentation of Breast Cancer Lesions (BRCA)

Image process framework based on plugin like imagej, it is esay to glue with scipy.ndimage, scikit-image, opencv, simpleitk, mayavi...and any libraries based on numpy

A toy project using OpenCV and PyMunk

A simple baseline for 3d human pose estimation in tensorflow. Presented at ICCV 17.

Project to create an open-source 6 DoF input device

OpenMatch: Open-set Consistency Regularization for Semi-supervised Learning with Outliers (NeurIPS 2021)

functorch is a prototype of JAX-like composable function transforms for PyTorch.

NeuralWOZ: Learning to Collect Task-Oriented Dialogue via Model-based Simulation (ACL-IJCNLP 2021)

The source code of CVPR 2019 paper "Deep Exemplar-based Video Colorization".

Potato Disease Classification - Training, Rest APIs, and Frontend to test.

A distributed deep learning framework that supports flexible parallelization strategies.

Adversarial examples to the new ConvNeXt architecture

Human-Pose-and-Motion History

Implementation of [Time in a Box: Advancing Knowledge Graph Completion with Temporal Scopes].

Source codes of CenterTrack++ in 2021 ICME Workshop on Big Surveillance Data Processing and Analysis

Adaptable tools to make reinforcement learning and evolutionary computation algorithms.

Lightweight Cuda Renderer with Python Wrapper.

robomimic: A Modular Framework for Robot Learning from Demonstration