This repo is the code release of EMNLP 2021 conference paper "Connect-the-Dots: Bridging Semantics between Words and Definitions via Aligning Word Sense Inventories".

Overview

Connect-the-Dots: Bridging Semantics between Words and Definitions via Aligning Word Sense Inventories

This repo is the code release of EMNLP 2021 conference paper "Connect-the-Dots: Bridging Semantics between Words and Definitions via Aligning Word Sense Inventories".

1. install python environment.

Follow the instruction of "env_install.txt" to create python virtual environment and install necessary packages. The environment is tested on python >=3.6 and pytorch >=1.8.

2. Gloss alignment algorithm.

Change your dictionary data format into the data format of "wordnet_def.txt" in "data/". Run the following commands to get gloss alignment results.

cd run_align_definitions_main/
python ../model/align_definitions_main.py

3. Download the pretrained model and data.

Visit https://drive.google.com/drive/folders/1I5-iOfWr1E32ahYDCbHKCssMdm74_JXG?usp=sharing. Download the pretrained model (SemEq-General-Large which is based on Roberta-Large) and put it under run_robertaLarge_model_span_WSD_twoStageTune/ and also run_robertaLarge_model_span_FEWS_twoStageTune/. Please make sure that the downloaded model file name is "pretrained_model_CrossEntropy.pt". The script will load the general model and fine-tune on specific WSD datasets to get the expert model.

4. Fine-tune the general model to get an expert model (SemEq-Expert-Large).

All-words WSD:

cd run_robertaLarge_model_span_WSD_twoStageTune/
python ../BERT_model_span/BERT_model_main.py --gpu_id 0 --prepare_data True --eval_dataset WSD --exp_mode twoStageTune --optimizer AdamW --learning_rate 2e-6 --bert_model roberta_large --batch_size 16

Few-shot WSD (FEWS):

cd run_robertaLarge_model_span_FEWS_twoStageTune/
python ../BERT_model_span/BERT_model_main.py --gpu_id 0 --prepare_data True --eval_dataset FEWS --exp_mode twoStageTune --optimizer AdamW --learning_rate 5e-6 --bert_model roberta_large --batch_size 16

5. Evaluate results.

All-words WSD: (you can try different epochs)

cd run_robertaLarge_model_span_WSD_twoStageTune/
python ../evaluate/evaluate_WSD.py --loss CrossEntropy --epoch 1
python ../evaluate/evaluate_WSD_POS.py

Few-shot WSD (FEWS): (you can try different epochs)

cd run_robertaLarge_model_span_FEWS_twoStageTune/
python ../evaluate/evaluate_FEWS.py --loss CrossEntropy --epoch 1

Note that the best results of test set on few-shot setting or zero-shot setting are selected based on dev set across epochs, respectively.

Extra. Apply the trained model to any given sentences to do WSD.

After training, you can apply the trained model (trained_model_CrossEntropy.pt) to any sentences. Examples are included in data_custom/. Examples are based on glosses in WordNet3.0.

cd run_BERT_model_span_CustomData/
python ../BERT_model_span/BERT_model_main.py --gpu_id 0 --prepare_data True --eval_dataset custom_data --exp_mode eval --bert_model roberta_large --batch_size 16

If you think this repo is useful, please cite our work. Thanks!

@inproceedings{yao-etal-2021-connect,
    title = "Connect-the-Dots: Bridging Semantics between Words and Definitions via Aligning Word Sense Inventories",
    author = "Yao, Wenlin  and
      Pan, Xiaoman  and
      Jin, Lifeng  and
      Chen, Jianshu  and
      Yu, Dian  and
      Yu, Dong",
    booktitle = "Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing",
    month = nov,
    year = "2021",
    address = "Online and Punta Cana, Dominican Republic",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.emnlp-main.610",
    pages = "7741--7751",
}

Disclaimer: This repo is only for research purpose. It is not an officially supported Tencent product.

Owner
Research repositories.
PyTorch implementation of "PatchGame: Learning to Signal Mid-level Patches in Referential Games" to appear in NeurIPS 2021

PatchGame: Learning to Signal Mid-level Patches in Referential Games This repository is the official implementation of the paper - "PatchGame: Learnin

Kamal Gupta 22 Mar 16, 2022
PyTorch implementation of normalizing flow models

PyTorch implementation of normalizing flow models

Vincent Stimper 242 Jan 02, 2023
Official Pytorch implementation of "Beyond Static Features for Temporally Consistent 3D Human Pose and Shape from a Video", CVPR 2021

TCMR: Beyond Static Features for Temporally Consistent 3D Human Pose and Shape from a Video Qualtitative result Paper teaser video Introduction This r

Hongsuk Choi 215 Jan 06, 2023
Robotic Process Automation in Windows and Linux by using Driagrams.net BPMN diagrams.

BPMN_RPA Robotic Process Automation in Windows and Linux by using BPMN diagrams. With this Framework you can draw Business Process Model Notation base

23 Dec 14, 2022
[CVPR 2021] VirTex: Learning Visual Representations from Textual Annotations

VirTex: Learning Visual Representations from Textual Annotations Karan Desai and Justin Johnson University of Michigan CVPR 2021 arxiv.org/abs/2006.06

Karan Desai 533 Dec 24, 2022
The official PyTorch implementation for the paper "sMGC: A Complex-Valued Graph Convolutional Network via Magnetic Laplacian for Directed Graphs".

Magnetic Graph Convolutional Networks About The official PyTorch implementation for the paper sMGC: A Complex-Valued Graph Convolutional Network via M

3 Feb 25, 2022
PyTorch implementation of DD3D: Is Pseudo-Lidar needed for Monocular 3D Object detection?

PyTorch implementation of DD3D: Is Pseudo-Lidar needed for Monocular 3D Object detection? (ICCV 2021), Dennis Park*, Rares Ambrus*, Vitor Guizilini, Jie Li, and Adrien Gaidon.

Toyota Research Institute - Machine Learning 364 Dec 27, 2022
This repository stores the code to reproduce the results published in "TiWS-iForest: Isolation Forest in Weakly Supervised and Tiny ML scenarios"

TinyWeaklyIsolationForest This repository stores the code to reproduce the results published in "TiWS-iForest: Isolation Forest in Weakly Supervised a

2 Mar 21, 2022
Scribble-Supervised LiDAR Semantic Segmentation, CVPR 2022 (ORAL)

Scribble-Supervised LiDAR Semantic Segmentation Dataset and code release for the paper Scribble-Supervised LiDAR Semantic Segmentation, CVPR 2022 (ORA

102 Dec 25, 2022
Code for Phase diagram of Stochastic Gradient Descent in high-dimensional two-layer neural networks

Phase diagram of Stochastic Gradient Descent in high-dimensional two-layer neural networks Under construction. Description Code for Phase diagram of S

Rodrigo Veiga 3 Nov 24, 2022
🥇Samsung AI Challenge 2021 1등 솔루션입니다🥇

MoT - Molecular Transformer Large-scale Pretraining for Molecular Property Prediction Samsung AI Challenge for Scientific Discovery This repository is

Jungwoo Park 44 Dec 03, 2022
Project page of the paper 'Analyzing Perception-Distortion Tradeoff using Enhanced Perceptual Super-resolution Network' (ECCVW 2018)

EPSR (Enhanced Perceptual Super-resolution Network) paper This repo provides the test code, pretrained models, and results on benchmark datasets of ou

Subeesh Vasu 78 Nov 19, 2022
View model summaries in PyTorch!

torchinfo (formerly torch-summary) Torchinfo provides information complementary to what is provided by print(your_model) in PyTorch, similar to Tensor

Tyler Yep 1.5k Jan 05, 2023
gtfs2vec - Learning GTFS Embeddings for comparing PublicTransport Offer in Microregions

gtfs2vec This is a companion repository for a gtfs2vec - Learning GTFS Embeddings for comparing PublicTransport Offer in Microregions publication. Vis

Politechnika Wrocławska - repozytorium dla informatyków 5 Oct 10, 2022
DSTC10 Track 2 - Knowledge-grounded Task-oriented Dialogue Modeling on Spoken Conversations

DSTC10 Track 2 - Knowledge-grounded Task-oriented Dialogue Modeling on Spoken Conversations This repository contains the data, scripts and baseline co

Alexa 51 Dec 17, 2022
PyTorch implementations of the paper: "Learning Independent Instance Maps for Crowd Localization"

IIM - Crowd Localization This repo is the official implementation of paper: Learning Independent Instance Maps for Crowd Localization. The code is dev

tao han 91 Nov 10, 2022
Code base for "On-the-Fly Test-time Adaptation for Medical Image Segmentation"

On-the-Fly Adaptation Official Pytorch Code base for On-the-Fly Test-time Adaptation for Medical Image Segmentation Paper Introduction One major probl

Jeya Maria Jose 17 Nov 10, 2022
TAPEX: Table Pre-training via Learning a Neural SQL Executor

TAPEX: Table Pre-training via Learning a Neural SQL Executor The official repository which contains the code and pre-trained models for our paper TAPE

Microsoft 157 Dec 28, 2022
Efficient Deep Learning Systems course

Efficient Deep Learning Systems This repository contains materials for the Efficient Deep Learning Systems course taught at the Faculty of Computer Sc

Max Ryabinin 173 Dec 29, 2022
[NeurIPS 2021] Garment4D: Garment Reconstruction from Point Cloud Sequences

Garment4D [PDF] | [OpenReview] | [Project Page] Overview This is the codebase for our NeurIPS 2021 paper Garment4D: Garment Reconstruction from Point

Fangzhou Hong 112 Dec 23, 2022