PyTorch reimplementation of REALM and ORQA

Last update: Aug 20, 2022

Related tags

Overview

PyTorch Reimplementation of REALM and ORQA

This is PyTorch reimplementation of REALM (paper, codebase) and ORQA (paper, codebase).

Some features have not been implemented yet, currently the predictor and finetuning script are available.

The term retriever and searcher in the code are basically interchangeable, their difference is that retriever is for REALM pretraining, and searcher is for ORQA finetuning.

Prerequisite

cd transformers && pip install -U -e ".[dev]"
pip install -U scann, apache_beam

Data

To download pretrained checkpoints and preprocessed data, please follow the instructions below:

cd data
pip install -U -r requirements.txt
sh download.sh

Finetune (Experimental)

The default finetuning dataset is Natural Question(NQ). To laod your custom dataset, please change the loading function in data.py.

Training:

python run_finetune.py --is_train \
    --model_dir "./" \
    --num_epochs 2 \
    --device cuda

Evaluation:

python run_finetune.py \
    --retriever_pretrained_name "retriever" \
    --checkpoint_pretrained_name "reader" \
    --model_dir "./" \
    --device cuda

Predict

The default checkpoints of retriever and reader are orqa_nq_model_from_realm. To change them, kindly specify --retriever_path and --checkpoint_path.

python predictor.py --question "Who is the pioneer in modern computer science?"

Output: alan mathison turing

License

Apache License 2.0

PyTorch reimplementation of REALM and ORQA

Related tags

Overview

PyTorch Reimplementation of REALM and ORQA

Prerequisite

Data

Finetune (Experimental)

Predict

License

Owner

Li-Huai (Allan) Lin

A Tensorflow based library for Time Series Modelling with Gaussian Processes

Official PyTorch implementation of "IntegralAction: Pose-driven Feature Integration for Robust Human Action Recognition in Videos", CVPRW 2021

Learning from graph data using Keras

Temporally Efficient Vision Transformer for Video Instance Segmentation, CVPR 2022, Oral

This repo is to be freely used by ML devs to check the GAN performances without coding from scratch.

Campsite Reservation Finder

AMTML-KD: Adaptive Multi-teacher Multi-level Knowledge Distillation

Object detection evaluation metrics using Python.

Deep Markov Factor Analysis (NeurIPS2021)

Deep Learning Pipelines for Apache Spark

The repo for reproducing Seed-driven Document Ranking for Systematic Reviews: A Reproducibility Study

A PyTorch implementation of "Capsule Graph Neural Network" (ICLR 2019).

ByteTrack超详细教程！训练自己的数据集&&摄像头实时检测跟踪

Script that attempts to force M1 macs into RGB mode when used with monitors that are defaulting to YPbPr.

Painting app using Python machine learning and vision technology.

The official repository for Deep Image Matting with Flexible Guidance Input

Yolov3 pytorch implementation

OoD Minimum Anomaly Score GAN - Code for the Paper 'OMASGAN: Out-of-Distribution Minimum Anomaly Score GAN for Sample Generation on the Boundary'

A tensorflow implementation of Fully Convolutional Networks For Semantic Segmentation

Arabic Car License Recognition. A solution to the kaggle competition Machathon 3.0.