Minimalist BERT implementation assignment for CS11-747

Overview

minbert Assignment

by Zhengbao Jiang, Shuyan Zhou, and Ritam Dutt

This is an exercise in developing a minimalist version of BERT, part of Carnegie Mellon University's CS11-747: Neural Networks for NLP.

In this assignment, you will implement some important components of the BERT model to better understanding its architecture. You will then perform sentence classification on sst dataset and cfimdb dataset with the BERT model.

Assignment Details

Important Notes

  • Follow setup.sh to properly setup the environment and install dependencies.
  • There is a detailed description of the code structure in structure.md, including a description of which parts you will need to implement.
  • You are only allowed to use torch, no other external libraries are allowed (e.g., transformers).
  • We will run your code with the following commands, so make sure that whatever your best results are reproducible using these commands (where you replace ANDREWID with your andrew ID):
mkdir -p ANDREWID

python3 classifier.py --option [pretrain/finetune] --epochs NUM_EPOCHS --lr_pretrain LR_FOR_PRETRAINING --lr_finetune LR_FOR_FINETUNING --seed RANDOM_SEED

Reference accuracies:

Mean reference accuracies over 10 random seeds with their standard deviation shown in brackets.

Pretraining for SST: Dev Accuracy : 0.387 (0.008) Test Accuracy : 0.397 (0.013)

Finetuning for SST : Dev Accuracy : 0.520 (0.006) Test Accuracy : 0.525 (0.007)

Submission

The submission file should be a zip file with the following structure (assuming the andrew id is ANDREWID):

ANDREWID/
├── base_bert.py
├── bert.py
├── classifier.py
├── config.py
├── optimizer.py
├── sanity_check.py
├── tokenizer.py
├── utils.py
├── README.md
├── structure.md
├── sanity_check.data
├── sst-dev-output.txt 
├── sst-test-output.txt 
├── cfimdb-dev-output.txt 
├── cfimdb-test-output.txt 
└── setup.py

Grading

  • A+: You additionally implement something else on top of the requirements for A, and achieve significant accuracy improvements:
  • A: You implement all the missing pieces and the original classifier.py with --option finetune code that achieves comparable accuracy to our reference implementation
  • A-: You implement all the missing pieces and the original classifier.py with --option pretrain code that achieves comparable accuracy to our reference implementation
  • B+: All missing pieces are implemented and pass tests in sanity_check.py, but accuracy is not comparable to the reference.
  • B or below: Some parts of the missing pieces are not implemented.

Acknowledgement

Parts of the code are from the transformers library (Apache License 2.0).

Owner
Graham Neubig
Graham Neubig
Metal Gear Rising: Revengeance's DAT archive (un)packer

DOOMP Metal Gear Rising: Revengeance's DAT archive (un)packer

Christopher Holzmann Pérez 5 Sep 02, 2022
Have an idea for a Python package? Register the name on PyPI 💡

Register Package Names on PyPI Have an idea for a Python package? Thought of a great name? Register it on PyPI, before someone else does! A tool that

Alex Ioannides 1 Jul 15, 2022
A toy repo illustrating a minimal installable Python package

MyToy: a minimal Python package This repository contains a minimal, toy Python package with a few files as illustration for students of how to lay out

Fernando Perez 19 Apr 24, 2022
CaskDB is a disk-based, embedded, persistent, key-value store based on the Riak's bitcask paper, written in Python.

CaskDB - Disk based Log Structured Hash Table Store CaskDB is a disk-based, embedded, persistent, key-value store based on the Riak's bitcask paper, w

886 Dec 27, 2022
Scitizen - Help scientific research for the benefit of mankind and humanity 🔬

Scitizen - Help scientific research for the benefit of mankind and humanity 🔬 Scitizen has been built from the ground up to give everyone the possibi

Pierre CORBEL 21 Mar 08, 2022
A tool for fixing inconsistent timestamp metadata (atime, ctime, and mtime).

Mtime Fixer Mtime Fixer is a tool for fixing inconsistent timestamp metadata (atime, ctime, and mtime). Sometimes timestamp metadata of folders are in

Halit Şimşek 2 Jan 11, 2022
Width-customizer-for-streamlit-apps - Width customizer for Streamlit Apps

🎈 Width customizer for Streamlit Apps As of now, you can only change your Strea

Charly Wargnier 5 Aug 09, 2022
Stori QA Automation Challenge

Stori-QA-Automation-Challenge This is the repository is created for the Stori QA Intern Automation Engineer Challenge! In this you can find the Requir

Daniel Castañeda 0 Feb 20, 2022
This is a modified variation of abhiTronix's vidgear. In this variation, it is possible to write the output file anywhere regardless the permissions.

Info In order to download this package: Windows 10: Press Windows+S, Type PowerShell (cmd in older versions) and hit enter, Type pip install vidgear_n

Ege Akman 3 Jan 30, 2022
Simulation simplifiée du fonctionnement du protocole RIP

ProjetRIPlay v2 Simulation simplifiée du fonctionnement du protocole RIP par Eric Buonocore le 18/01/2022 Sur la base de l'exercice 5 du sujet zéro du

Eric Buonocore 2 Feb 15, 2022
Fortnite StW Claimer for Daily Rewards, Research Points and free Llamas.

Fortnite Save the World Daily Reward, Research Points & free Llama Claimer This program allows you to claim Save the World Daily Reward, Research Poin

PRO100KatYT 27 Dec 22, 2022
Plugin to generate BOM + CPL files for JLCPCB

KiCAD JLCPCB tools Plugin to generate all files necessary for JLCPCB board fabrication and assembly Gerber files Excellon files BOM file CPL file Furt

bouni 566 Dec 29, 2022
Chemical Analysis Calculator, with full solution display.

Chemicology Chemical Analysis Calculator, to solve problems efficiently by displaying whole solution. Go to releases for downloading .exe, .dmg, Linux

Muhammad Moazzam 2 Aug 06, 2022
Contains a Jupyter Notebook for calculating remaining plants required based on field/lathhouse data.

Davis-Sunflowers-Su21 Project goals: Plants influence their reproduction and mating system in many ways. Various factors such as time of flowering, ab

1 Feb 10, 2022
My solutions for Advent of Code 2021 🌟🎄

🌟 Advent of Code 2021 🎄 My solutions for Advent of Code 2021. About · What is Advent of Code? · Contents · Usage · Table of puzzles (TODO: add final

Amanda P. Pinha 2 Dec 05, 2022
Python script which allows for automatic registration in Golfbox

Python script which allows for automatic registration in Golfbox

Guðni Þór Björnsson 8 Dec 04, 2021
Cross-platform MachO/ObjC Static binary analysis tool & library. class-dump + otool + lipo + more

ktool Static Mach-O binary metadata analysis tool / information dumper pip3 install k2l Development is currently taking place on the @python3.10 branc

Kritanta 301 Dec 28, 2022
App to get data from popular polish pages with job offers

Job board parser I written simple app to get me data from popular pages with job offers, because I wanted to knew immidietly if there is some new offe

0 Jan 04, 2022
A desktop app to check the unlocked courses bases on previously done courses.

Course Picker A desktop app to check the unlocked courses bases on previously done courses. Table of contents About the Project Built with What it doe

Ahmed Symum Swapno 3 Feb 07, 2022
It was created to conveniently respond to events such as donation, follow, and hosting using the Alert Box provided by twip to streamers

This library is not an official library of twip. It was created to conveniently respond to events such as donation, follow, and hosting using the Alert Box provided by twip to streamers.

junah201 8 Nov 19, 2022