NaturalProofs: Mathematical Theorem Proving in Natural Language

Overview

NaturalProofs: Mathematical Theorem Proving in Natural Language

NaturalProofs: Mathematical Theorem Proving in Natural Language
Sean Welleck, Jiacheng Liu, Ronan Le Bras, Hannaneh Hajishirzi, Yejin Choi, Kyunghyun Cho

This repo contains:

  • The NaturalProofs Dataset and the mathematical reference retrieval task data.
  • Preprocessing NaturalProofs and the retrieval task data.
  • Training and evaluation for mathematical reference retrieval.
  • Pretrained models for mathematical reference retrieval.

Please cite our work if you found the resources in this repository useful:

@article{welleck2021naturalproofs,
  title={NaturalProofs: Mathematical Theorem Proving in Natural Language},
  author={Welleck, Sean and Liu, Jiacheng and Le Bras, Ronan and Hajishirzi, Hannaneh and Choi, Yejin and Cho, Kyunghyun},
  year={2021}
}
Section Subsection
NaturalProofs Dataset Dataset
Preprocessing
Mathematical Reference Retrieval Dataset
Setup
Preprocessing
Pretrained models
Training
Evaluation

NaturalProofs Dataset

We provide the preprocessed NaturalProofs Dataset (JSON):

NaturalProofs Dataset
dataset.json [zenodo]

Preprocessing

To see the steps used to create the NaturalProofs dataset.json from raw ProofWiki data:

  1. Download the ProofWiki XML.
  2. Preprocess the data using notebooks/parse_proofwiki.ipynb.
  3. Form the data splits using notebooks/dataset_splits.ipynb.

Mathematical Reference Retrieval

Dataset

The Mathematical Reference Retrieval dataset contains (x, r, y) examples with theorem statements x, positive and negative references r, and 0/1 labels y, derived from NaturalProofs.

We provide the version used in the paper (bert-based-cased tokenizer, 200 randomly sampled negatives):

Reference Retrieval Dataset
bert-base-cased 200 negatives

Pretrained Models

Pretrained models
bert-base-cased
lstm

These models were trained with the "bert-base-cased 200 negatives" dataset provided above.

Setup

python setup.py develop

You can see the DockerFile for additional version info, etc.

Generating and tokenizing

To create your own version of the retrieval dataset, use python utils.py.

This step is not needed if you are using the reference retrieval dataset provided above.

Example:

python utils.py --filepath /path/to/dataset.json --output-path /path/to/out/ --model-type bert-base-cased
# => Writing dataset to /path/to/out/dataset_tokenized__bert-base-cased_200.pkl

Evaluation

Using the retrieval dataset and a model provided above, we compute the test evaluation metrics in the paper:

  1. Predict the rankings:
python naturalproofs/predict.py \
--method bert-base-cased \      # | lstm
--model-type bert-base-cased \  # | lstm
--datapath /path/to/dataset_tokenized__bert-base-cased_200.pkl \
--datapath-base /path/to/dataset.json \
--checkpoint-path /path/to/best.ckpt \
--output-dir /path/to/out/ \
--split test  # use valid during model development
  1. Compute metrics over the rankings:
python naturalproofs/analyze.py \
--method bert-base-cased \      # | lstm
--eval-path /path/to/out/eval.pkl \
--analysis-path /path/to/out/analysis.pkl

Training

python naturalproofs/model.py \
--datapath /path/to/dataset_tokenized__bert-base-cased_200.pkl \
--default-root-dir /path/to/out/

Classical Retrieval Baselines

TF-IDF example:

python naturalproofs/baselines.py \
--method tfidf \
--datapath /path/to/dataset_tokenized__bert-base-cased_200.pkl \
--datapath-base /path/to/dataset.json \
--savedir /path/to/out/

Then use analyze.py as shown above to compute metrics.

Owner
Sean Welleck
Sean Welleck
A PyTorch Image-Classification With AlexNet And ResNet50.

PyTorch 图像分类 依赖库的下载与安装 在终端中执行 pip install -r -requirements.txt 完成项目依赖库的安装 使用方式 数据集的准备 STL10 数据集 下载:STL-10 Dataset 存储位置:将下载后的数据集中 train_X.bin,train_y.b

FYH 4 Feb 22, 2022
Official repository for the paper F, B, Alpha Matting

FBA Matting Official repository for the paper F, B, Alpha Matting. This paper and project is under heavy revision for peer reviewed publication, and s

Marco Forte 404 Jan 05, 2023
Finite difference solution of 2D Poisson equation. Can handle Dirichlet, Neumann and mixed boundary conditions.

Poisson-solver-2D Finite difference solution of 2D Poisson equation Current version can handle Dirichlet, Neumann, and mixed (combination of Dirichlet

Mohammad Asif Zaman 34 Dec 23, 2022
The Environment I built to study Reinforcement Learning + Pokemon Showdown

pokemon-showdown-rl-environment The Environment I built to study Reinforcement Learning + Pokemon Showdown Been a while since I ran this. Think it is

3 Jan 16, 2022
Deep Reinforced Attention Regression for Partial Sketch Based Image Retrieval.

DARP-SBIR Intro This repository contains the source code implementation for ICDM submission paper Deep Reinforced Attention Regression for Partial Ske

2 Jan 09, 2022
A pytorch implementation of Paper "Improved Training of Wasserstein GANs"

WGAN-GP An pytorch implementation of Paper "Improved Training of Wasserstein GANs". Prerequisites Python, NumPy, SciPy, Matplotlib A recent NVIDIA GPU

Marvin Cao 1.4k Dec 14, 2022
You Only Hypothesize Once: Point Cloud Registration with Rotation-equivariant Descriptors

You Only Hypothesize Once: Point Cloud Registration with Rotation-equivariant Descriptors In this paper, we propose a novel local descriptor-based fra

Haiping Wang 80 Dec 15, 2022
QuakeLabeler is a Python package to create and manage your seismic training data, processes, and visualization in a single place — so you can focus on building the next big thing.

QuakeLabeler Quake Labeler was born from the need for seismologists and developers who are not AI specialists to easily, quickly, and independently bu

Hao Mai 15 Nov 04, 2022
The VeriNet toolkit for verification of neural networks

VeriNet The VeriNet toolkit is a state-of-the-art sound and complete symbolic interval propagation based toolkit for verification of neural networks.

9 Dec 21, 2022
PyTorch experiments with the Zalando fashion-mnist dataset

zalando-pytorch PyTorch experiments with the Zalando fashion-mnist dataset Project Organization ├── LICENSE ├── Makefile - Makefile with co

Federico Baldassarre 31 Sep 25, 2021
Step by Step on how to create an vision recognition model using LOBE.ai, export the model and run the model in an Azure Function

Step by Step on how to create an vision recognition model using LOBE.ai, export the model and run the model in an Azure Function

El Bruno 3 Mar 30, 2022
Explaining Deep Neural Networks - A comparison of different CAM methods based on an insect data set

Explaining Deep Neural Networks - A comparison of different CAM methods based on an insect data set This is the repository for the Deep Learning proje

Robert Krug 3 Feb 06, 2022
Code repository for "Stable View Synthesis".

Stable View Synthesis Code repository for "Stable View Synthesis". Setup Install the following Python packages in your Python environment - numpy (1.1

Intelligent Systems Lab Org 195 Dec 24, 2022
Implementation of the final project of the course DDA6309 Probabilistic Graphical Model

Task-aware Joint CWS and POS (TCwsPos) This is the implementation of the final project of the course DDA6309 Probabilistic Graphical Models, The Chine

Peng 1 Dec 26, 2021
Code for "R-GCN: The R Could Stand for Random"

RR-GCN: Random Relational Graph Convolutional Networks PyTorch Geometric code for the paper "R-GCN: The R Could Stand for Random" RR-GCN is an extensi

PreDiCT.IDLab 31 Sep 07, 2022
ViSER: Video-Specific Surface Embeddings for Articulated 3D Shape Reconstruction

ViSER: Video-Specific Surface Embeddings for Articulated 3D Shape Reconstruction. NeurIPS 2021.

Gengshan Yang 59 Nov 25, 2022
Theory-inspired Parameter Control Benchmarks for Dynamic Algorithm Configuration

This repo is for the paper: Theory-inspired Parameter Control Benchmarks for Dynamic Algorithm Configuration The DAC environment is based on the Dynam

Carola Doerr 1 Aug 19, 2022
Examples of using f2py to get high-speed Fortran integrated with Python easily

f2py Examples Simple examples of using f2py to get high-speed Fortran integrated with Python easily. These examples are also useful to troubleshoot pr

Michael 35 Aug 21, 2022
Tiny-NewsRec: Efficient and Effective PLM-based News Recommendation

Tiny-NewsRec The source codes for our paper "Tiny-NewsRec: Efficient and Effective PLM-based News Recommendation". Requirements PyTorch == 1.6.0 Tensor

Yang Yu 3 Dec 07, 2022
Implementation of Restricted Boltzmann Machine (RBM) and its variants in Tensorflow

xRBM Library Implementation of Restricted Boltzmann Machine (RBM) and its variants in Tensorflow Installation Using pip: pip install xrbm Examples Tut

Omid Alemi 55 Dec 29, 2022