This is the solution for 2nd rank in Kaggle competition: Feedback Prize - Evaluating Student Writing.

Overview

Feedback Prize - Evaluating Student Writing

This is the solution for 2nd rank in Kaggle competition: Feedback Prize - Evaluating Student Writing. The competition can be found here: https://www.kaggle.com/competitions/feedback-prize-2021/

Datasets required

Use this command to convert roberta-large to LSG

$ python convert_roberta_checkpoint.py \
                        --initial_model roberta-large \
                        --model_name lsg-roberta-large \
                        --max_sequence_length 1536

Follow following instructions to manually add fast tokenizer to transformer library:

# The following is necessary if you want to use the fast tokenizer for deberta v2 or v3
# This must be done before importing transformers
import shutil
from pathlib import Path

# Path to installed transformer library
transformers_path = Path("/opt/conda/lib/python3.7/site-packages/transformers")

input_dir = Path("../input/deberta-v2-3-fast-tokenizer")

convert_file = input_dir / "convert_slow_tokenizer.py"
conversion_path = transformers_path/convert_file.name

if conversion_path.exists():
    conversion_path.unlink()

shutil.copy(convert_file, transformers_path)
deberta_v2_path = transformers_path / "models" / "deberta_v2"

for filename in ['tokenization_deberta_v2.py', 'tokenization_deberta_v2_fast.py']:
    filepath = deberta_v2_path/filename
    if filepath.exists():
        filepath.unlink()

    shutil.copy(input_dir/filename, filepath)

After this ../input directory should look something like this.

.
├── input
│   ├── feedback-prize-2021
│   │   ├── train/
│   │   ├── test/
│   │   ├── sample_submission.csv
│   │   └── train.csv
│   ├── lsg-roberta-large
│   │   ├── config.json
│   │   ├── merges.txt
│   │   ├── modeling.py
│   │   ├── pytorch_model.bin
│   │   ├── special_tokens_map.json
│   │   ├── tokenizer.json
│   │   ├── tokenizer_config.json
│   │   └── vocab.json
│   ├── deberta-v2-3-fast-tokenizer
│   │   ├── convert_slow_tokenizer.py
│   │   ├── deberta__init__.py
│   │   ├── tokenization_auto.py
│   │   ├── tokenization_deberta_v2.py
│   │   ├── tokenization_deberta_v2_fast.py
│   │   └── transformers__init__.py
│   └── feedbackgroupshufflesplit1337
│       └── groupshufflesplit_1337.p

or you can change the DATA_BASE_DIR in SETTINGS.json to download the files in your desired location.

Models and Training

  • Deberta large, Deberta xlarge, Deberta v2 xlarge, Deberta v3 large, Funnel transformer large and BigBird are trained using trainer.py

Example:

$ python trainer.py --fold 0 --pretrained_model google/bigbird-roberta-large

where pretrained_model can be microsoft/deberta-large, microsoft/deberta-xlarge, microsoft/deberta-v2-xlarge, microsoft/deberta-v3-large, funnel-transformer/large or google/bigbird-roberta-large

  • Deberta large with LSTM head and jaccard loss is trained using debertabilstm_trainer.py

Example:

$ python debertabilstm_trainer.py --fold 0
  • Longformer large with LSTM head is trained using longformerwithbilstm_trainer.py

Example:

$ python longformerwithbilstm_trainer.py --fold 0
  • LSG Roberta is trained with lsgroberta_trainer.py

Example:

$ python lsgroberta_trainer.py --fold 0
  • YOSO is trained with yoso_trainer.py

Example:

$ python yoso_trainer.py --fold 0

Inference

After training all the models, the outputs were pushed to Kaggle Datasets.

And the final inference kernel can be found here: https://www.kaggle.com/code/cdeotte/2nd-place-solution-cv741-public727-private740?scriptVersionId=90301836

Solution writeup: https://www.kaggle.com/competitions/feedback-prize-2021/discussion/313389

Owner
Udbhav Bamba
Deep Learning || Computer Vision || Machine Learning
Udbhav Bamba
Code for NeurIPS 2021 paper 'Spatio-Temporal Variational Gaussian Processes'

Spatio-Temporal Variational GPs This repository is the official implementation of the methods in the publication: O. Hamelijnck, W.J. Wilkinson, N.A.

AaltoML 26 Sep 16, 2022
Patch Rotation: A Self-Supervised Auxiliary Task for Robustness and Accuracy of Supervised Models

Patch-Rotation(PatchRot) Patch Rotation: A Self-Supervised Auxiliary Task for Robustness and Accuracy of Supervised Models Submitted to Neurips2021 To

4 Jul 12, 2021
Machine Learning University: Accelerated Computer Vision Class

Machine Learning University: Accelerated Computer Vision Class This repository contains slides, notebooks, and datasets for the Machine Learning Unive

AWS Samples 1.3k Dec 28, 2022
Multi-objective gym environments for reinforcement learning.

MO-Gym: Multi-Objective Reinforcement Learning Environments Gym environments for multi-objective reinforcement learning (MORL). The environments follo

Lucas Alegre 74 Jan 03, 2023
MacroTools provides a library of tools for working with Julia code and expressions.

MacroTools.jl MacroTools provides a library of tools for working with Julia code and expressions. This includes a powerful template-matching system an

FluxML 278 Dec 11, 2022
This repo holds code for TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation

TransUNet This repo holds code for TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation Usage

1.4k Jan 04, 2023
PoseViz – Multi-person, multi-camera 3D human pose visualization tool built using Mayavi.

PoseViz – 3D Human Pose Visualizer Multi-person, multi-camera 3D human pose visualization tool built using Mayavi. As used in MeTRAbs visualizations.

István Sárándi 79 Dec 30, 2022
Simple Text-Generator with OpenAI gpt-2 Pytorch Implementation

GPT2-Pytorch with Text-Generator Better Language Models and Their Implications Our model, called GPT-2 (a successor to GPT), was trained simply to pre

Tae-Hwan Jung 775 Jan 08, 2023
Implementation of Google Brain's WaveGrad high-fidelity vocoder

WaveGrad Implementation (PyTorch) of Google Brain's high-fidelity WaveGrad vocoder (paper). First implementation on GitHub with high-quality generatio

Ivan Vovk 363 Dec 27, 2022
Implementation of Shape Generation and Completion Through Point-Voxel Diffusion

Shape Generation and Completion Through Point-Voxel Diffusion Project | Paper Implementation of Shape Generation and Completion Through Point-Voxel Di

Linqi Zhou 103 Dec 29, 2022
A smaller subset of 10 easily classified classes from Imagenet, and a little more French

Imagenette 🎶 Imagenette, gentille imagenette, Imagenette, je te plumerai. 🎶 (Imagenette theme song thanks to Samuel Finlayson) NB: Versions of Image

fast.ai 718 Jan 01, 2023
Pytorch implementation of set transformer

set_transformer Official PyTorch implementation of the paper Set Transformer: A Framework for Attention-based Permutation-Invariant Neural Networks .

Juho Lee 410 Jan 06, 2023
Final project for Intro to CS class.

Financial Analysis Web App https://share.streamlit.io/mayurk1/fin-web-app-final-project/webApp.py 1. Project Description This project is a technical a

Mayur Khanna 1 Dec 10, 2021
Pytorch implementation of Depth-conditioned Dynamic Message Propagation forMonocular 3D Object Detection

DDMP-3D Pytorch implementation of Depth-conditioned Dynamic Message Propagation forMonocular 3D Object Detection, a paper on CVPR2021. Instroduction T

Li Wang 32 Nov 09, 2022
HeatNet is a python package that provides tools to build, train and evaluate neural networks designed to predict extreme heat wave events globally on daily to subseasonal timescales.

HeatNet HeatNet is a python package that provides tools to build, train and evaluate neural networks designed to predict extreme heat wave events glob

Google Research 6 Jul 07, 2022
Pytorch implementation of FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks

flownet2-pytorch Pytorch implementation of FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks. Multiple GPU training is supported, a

NVIDIA Corporation 2.8k Dec 27, 2022
Demo for the paper "Overlap-aware low-latency online speaker diarization based on end-to-end local segmentation"

Streaming speaker diarization Overlap-aware low-latency online speaker diarization based on end-to-end local segmentation by Juan Manuel Coria, Hervé

Juanma Coria 187 Jan 06, 2023
OCR Post Correction for Endangered Language Texts

📌 Coming soon: an update to the software including features from our paper on semi-supervised OCR post-correction, to be published in the Transaction

Shruti Rijhwani 96 Dec 31, 2022
PRTR: Pose Recognition with Cascade Transformers

PRTR: Pose Recognition with Cascade Transformers Introduction This repository is the official implementation for Pose Recognition with Cascade Transfo

mlpc-ucsd 133 Dec 30, 2022
YOLOV4运行在嵌入式设备上

在嵌入式设备上实现YOLO V4 tiny 在嵌入式设备上实现YOLO V4 tiny 目录结构 目录结构 |-- YOLO V4 tiny |-- .gitignore |-- LICENSE |-- README.md |-- test.txt |-- t

Liu-Wei 6 Sep 09, 2021