Automatic Idiomatic Expression Detection

Last update: Jun 09, 2022

Related tags

Deep Learning DISC

Overview

IDentifier of Idiomatic Expressions via Semantic Compatibility (DISC)

An Idiomatic identifier that detects the presence and span of idiomatic expression in a given sentence.

Table of Contents

About The Project
- Built With
Getting Started
- Prerequisites
- Installation
Usage

Configuration
Demo
Data Processing
Training and Testing

License
Contact
Acknowledgements

About The Project

This project is a supervised idiomatic expression identification method. Given a sentence that contains a potentially idiomatic expression (PIE), the model identifies the span of the PIE if it is indeed used in an idiomatic sense, otherwise, the model does not identify the PIE. The identification is done via checking the smemantic compatibility. More details will be updated here (Detail description, figures, etc.).

The paper will appear in TACL.

Built With

This model is heavily relying the resources/libraries list as following:

Getting Started

The implementation here includes processed data created for MAGPIE random-split dataset. The model checkpoint that trained with MAGPIE random-split is also provided.

Prerequisites

All the dependencies for this project is listed in requirements.txt. You can install them via a standard command:

pip install -r requirements.txt

It is highly recommanded to start a conda environment with PyTorch properly installed based on your hardward before install the other requirements.

Checkpoint

To run the model with a pre-trained checkpoint, please first create a ./checkpoints folder at root. Then, please download the checkpoint from Google Drive via this Link. Please put the checkpoint in the ./checkpoints folder.

Usage

Configuration

Before running the demo or experiments (training or testing), please see the config.py which sets the configuration of the model. Some parameters there, such as MODE needs to be set appropriately for the model to run correctly. Please see comments for more details.

Demo

To start, please go through the examples provided in demo.ipynb. In there, we process a given input sentence into the model input data and then run model inference to extract the idiomatic expression (if present) from the input sentence (visualized).

Data processing

To process a dataset (such as MAGPIE) for model training and testing, please refer to ./data_processing/MAGPIE/read_comp_data_processing.ipynb. It takes a dataset with sententences and their PIE lcoations as input and generate all the necessary files for model training and inference.

Training and Testing

For training and testing, please refer to train.py and test.py. Note that test.py is used to produce evaluation scores as shown in the paper. inference.py is used to produce prediction for sentences.

License

Distributed under the MIT License. See LICENSE for more information.

Contact

Ziheng Zeng - [email protected]

Project Link: https://github.com/your_username/repo_name

Acknowledgements

[TODO]:

Add the following in README:

Method detail descrption
Method figure
Demo walkthrough
Data processing tips and instructions Add requirements.txt

Automatic Idiomatic Expression Detection

Related tags

Overview

IDentifier of Idiomatic Expressions via Semantic Compatibility (DISC)

About The Project

Built With

Getting Started

Prerequisites

Checkpoint

Usage

Configuration

Demo

Data processing

Training and Testing

License

Contact

Acknowledgements

[TODO]:

Owner

Unsupervised Attributed Multiplex Network Embedding (AAAI 2020)

Codes for AAAI22 paper "Learning to Solve Travelling Salesman Problem with Hardness-Adaptive Curriculum"

Reviatalizing Optimization for 3D Human Pose and Shape Estimation: A Sparse Constrained Formulation

Interactive dimensionality reduction for large datasets

Python版OpenCVのTracking APIのサンプルです。DaSiamRPNアルゴリズムまで対応しています。

Resilient projection-based consensus actor-critic (RPBCAC) algorithm

Python scripts for performing stereo depth estimation using the MobileStereoNet model in ONNX

PGPortfolio: Policy Gradient Portfolio, the source code of "A Deep Reinforcement Learning Framework for the Financial Portfolio Management Problem"(https://arxiv.org/pdf/1706.10059.pdf).

PyBrain - Another Python Machine Learning Library.

A repository with exploration into using transformers to predict DNA ↔ transcription factor binding

Public repository of the 3DV 2021 paper "Generative Zero-Shot Learning for Semantic Segmentation of 3D Point Clouds"

Self-Supervised Learning with Kernel Dependence Maximization

Neural-net-from-scratch - A simple Neural Network from scratch in Python using the Pymathrix library

CBREN: Convolutional Neural Networks for Constant Bit Rate Video Quality Enhancement

Alphabetical Letter Recognition

Material for my PyConDE & PyData Berlin 2022 Talk "5 Steps to Speed Up Your Data-Analysis on a Single Core"

Towards End-to-end Video-based Eye Tracking

Code repository for EMNLP 2021 paper 'Adversarial Attacks on Knowledge Graph Embeddings via Instance Attribution Methods'

A set of tools for creating and testing machine learning features, with a scikit-learn compatible API

Underwater industrial application yolov5m6