Final project code: Implementing MAE with downscaled encoders and datasets, for ESE546 FA21 at University of Pennsylvania

Last update: Apr 22, 2022

Overview

546 Final Project: Masked Autoencoder

Haoran Tang, Qirui Wu

1. Training

To train the network, please run mae_pretraining.py. Please modify folder paths and args if necessary. Also, to modify embedding dimension of the encoder, please change it manually in models.py, the VIT network.

2. Results

We save training loss curves in the checkpoint, and to visualize the losses please load from checkpoint (only a list is needed) in draw.py. PLease modify paths if necessary. We also record the test accuracy at last epoch, but we report the best of the last 10 epochs.

3. Other files

Models are defined in models.py, datasets in datasets.py, KNN test in knntest.py, initialization of optimizers in utils.py,

4. Log files

We saved the last epoch for each experiment (total 12 checkpoints), most of then are large. If you need a checkpoint to test please let us know, thank you!

Owner

Haoran Tang

UPenn Robotics

GitHub Repository

Final project code: Implementing MAE with downscaled encoders and datasets, for ESE546 FA21 at University of Pennsylvania

Related tags

Overview

546 Final Project: Masked Autoencoder

Haoran Tang, Qirui Wu

1. Training

2. Results

3. Other files

4. Log files

Owner

Haoran Tang

The King is Naked: on the Notion of Robustness for Natural Language Processing

Pretraining Representations For Data-Efficient Reinforcement Learning

CAST: Character labeling in Animation using Self-supervision by Tracking

git《Learning Pairwise Inter-Plane Relations for Piecewise Planar Reconstruction》(ECCV 2020) GitHub:

Linescanning - Package for (pre)processing of anatomical and (linescanning) fMRI data

An algorithm study of the 6th iOS 10 set of Boost Camp Web Mobile

SE3 Pose Interp - Interpolate camera pose or trajectory in SE3, pose interpolation, trajectory interpolation

Collection of Docker images for ML/DL and video processing projects

This program was designed to detect whether someone is wearing a facemask through a live video stream.

[ICCV 2021] Code release for "Sub-bit Neural Networks: Learning to Compress and Accelerate Binary Neural Networks"

Code in conjunction with the publication 'Contrastive Representation Learning for Hand Shape Estimation'

Tiny Object Detection in Aerial Images.

A benchmark framework for Tensorflow

TiP-Adapter: Training-free CLIP-Adapter for Better Vision-Language Modeling

The lightweight PyTorch wrapper for high-performance AI research. Scale your models, not the boilerplate.

EmoTag helps you train emotion detection model for Chinese audios

Repository relating to the CVPR21 paper TimeLens: Event-based Video Frame Interpolation

Semi-supervised Implicit Scene Completion from Sparse LiDAR

Modular Probabilistic Programming on MXNet

[NeurIPS 2020] Official Implementation: "SMYRF: Efficient Attention using Asymmetric Clustering".