Paraphrastic Representations at Scale

Code to train models from "Paraphrastic Representations at Scale".

The code is written in Python 3.7 and requires H5py, jieba, numpy, scipy, sentencepiece, sacremoses, and PyTorch >= 1.0 libraries. These can be insalled with the following command:

pip install -r requirements.txt

To get started, download the data files used for training from http://www.cs.cmu.edu/~jwieting and download the STS evaluation data:

wget http://phontron.com/data/paraphrase-at-scale.zip
unzip paraphrase-at-scale.zip
rm paraphrase-at-scale.zip
wget http://www.cs.cmu.edu/~jwieting/STS.zip .
unzip STS.zip
rm STS.zip

If you use our code, models, or data for your work please cite:

@article{wieting2021paraphrastic,
    title={Paraphrastic Representations at Scale},
    author={Wieting, John and Gimpel, Kevin and Neubig, Graham and Berg-Kirkpatrick, Taylor},
    journal={arXiv preprint arXiv:2104.15114},
    year={2021}
}

@inproceedings{wieting19simple,
    title={Simple and Effective Paraphrastic Similarity from Parallel Translations},
    author={Wieting, John and Gimpel, Kevin and Neubig, Graham and Berg-Kirkpatrick, Taylor},
    booktitle={Proceedings of the Association for Computational Linguistics},
    url={https://arxiv.org/abs/1909.13872},
    year={2019}
}

To embed a list of sentences:

python -u embed_sentences.py --sentence-file paraphrase-at-scale/example-sentences.txt --load-file paraphrase-at-scale/model.para.lc.100.pt  --sp-model paraphrase-at-scale/paranmt.model --output-file sentence_embeds.np --gpu 0

To score a list of sentence pairs:

python -u score_sentence_pairs.py --sentence-pair-file paraphrase-at-scale/example-sentences-pairs.txt --load-file paraphrase-at-scale/model.para.lc.100.pt  --sp-model paraphrase-at-scale/paranmt.model --gpu 0

To train a model (for example, on ParaNMT):

python -u main.py --outfile model.para.out --lower-case 1 --tokenize 0 --data-file paraphrase-at-scale/paranmt.sim-low=0.4-sim-high=1.0-ovl=0.7.final.h5 \
       --model avg --dim 1024 --epochs 25 --dropout 0.0 --sp-model paraphrase-at-scale/paranmt.model --megabatch-size 100 --save-every-epoch 1 --gpu 0 --vocab-file paraphrase-at-scale/paranmt.sim-low=0.4-sim-high=1.0-ovl=0.7.final.vocab

To download and preprocess raw data for training models (both bilingual and ParaNMT), see preprocess/bilingual and preprocess/paranmt.

Code to train models from "Paraphrastic Representations at Scale".

Related tags

Overview

Paraphrastic Representations at Scale

Owner

John Wieting

Source code release of the paper: Knowledge-Guided Deep Fractal Neural Networks for Human Pose Estimation.

HiFi++: a Unified Framework for Neural Vocoding, Bandwidth Extension and Speech Enhancement

UNAVOIDS: Unsupervised and Nonparametric Approach for Visualizing Outliers and Invariant Detection Scoring

Diverse graph algorithms implemented using JGraphT library.

Neural Message Passing for Computer Vision

Using LSTM to detect spoofing attacks in an Air-Ground network

PixelPyramids: Exact Inference Models from Lossless Image Pyramids (ICCV 2021)

Tensorflow 2.x implementation of Panoramic BlitzNet for object detection and semantic segmentation on indoor panoramic images.

Code for the paper Relation Prediction as an Auxiliary Training Objective for Improving Multi-Relational Graph Representations (AKBC 2021).

Official repository of "Investigating Tradeoffs in Real-World Video Super-Resolution"

A Temporal Extension Library for PyTorch Geometric

Hyperopt for solving CIFAR-100 with a convolutional neural network (CNN) built with Keras and TensorFlow, GPU backend

QAT(quantize aware training) for classification with MQBench

Prediction of MBA refinance Index (Mortgage prepayment)

A PyTorch implementation for our paper "Dual Contrastive Learning: Text Classification via Label-Aware Data Augmentation".

Fast Neural Representations for Direct Volume Rendering

It's a implement of this paper：Relation extraction via Multi-Level attention CNNs

A curated (most recent) list of resources for Learning with Noisy Labels

PyTorch implementation of SimCLR: A Simple Framework for Contrastive Learning of Visual Representations

Codes for SIGIR'22 Paper 'On-Device Next-Item Recommendation with Self-Supervised Knowledge Distillation'