Improving Machine Translation Systems via Isotopic Replacement

Related tags

Deep LearningCAT
Overview

CAT (Improving Machine Translation Systems via Isotopic Replacement)

Machine translation plays an essential role in people’s daily international communication. However, machine translation systems are far from perfect. To tackle this problem, researchers have proposed several approaches to testing machine translation. A promising trend among these approaches is to use word replacement, where only one word in the original sentence is replaced with another word to form a sentence pair. However, precise control of the impact of word replacement remains an outstanding issue in these approaches.

To address this issue, we propose CAT, a novel word-replacement-based approach, whose basic idea is to identify word replacement with controlled impact (referred to as isotopic replacement). To achieve this purpose, we use a neural-based language model to encode the sentence context, and design a neural-network-based algorithm to evaluate context-aware semantic similarity between two words. Furthermore, similar to TransRepair, a state-of-the-art word-replacement-based approach, CAT also provides automatic fixing of revealed bugs without model retraining.

Our evaluation on Google Translate and Transformer indicates that CAT achieves significant improvements over TransRepair. In particular, 1) CAT detects seven more types of bugs than TransRepair; 2) CAT detects 129% more translation bugs than TransRepair; 3) CAT repairs twice more bugs than TransRepair, many of which may bring serious consequences if left unfixed; and 4) CAT has better efficiency than TransRepair in input generation (0.01s v.s. 0.41s) and comparable efficiency with TransRepair in bug repair (1.92s v.s. 1.34s).

The main file tree of CAT

.
├── Labeled data
│   ├── RQ1 Test Input Generation
│   ├── RQ2 Bug Detection
│   ├── RQ3 Bug Repair
│   └── Extended Analysis
├── TS
├── MutantGen-Test.py
├── MutantGen-Repair.py
├── Repair.py
├── Testing.py
├── NewThres
│   ├── TestGenerator-NMT
│   └── TestGenerator-NMTRep
└── NMT_zh_en0-8Mu
    ├── padTrans
    └── repair-new

The manual assessment results are in the Labeled data folder.

For Testing:

python3 Testing.py

After it, the results are in the NMT_zh_en0-8Mu/padTrans folder.

For Repair:

python3 Repair.py

After it, the results are in the TS/quickstart0/repair-NEW folder.

Data

The LookUpTable.txt used in NMT_zh_en_0-8Mu/padTrans and NMT_zh_en_0-8Mu/repair-new is available at https://drive.google.com/file/d/1fjGpryzGohla0ZA4u7KDgRJeAHegy0A1/view?usp=sharing

Dependenices

NLTK 3.2.1
Pytorch 1.6.1
Python 3.7
Ubuntu 16.04
Transformers 3.3.0
Owner
Zeyu Sun
A Ph.D. student.
Zeyu Sun
AoT is a system for automatically generating off-target test harness by using build information.

AoT: Auto off-Target Automatically generating off-target test harness by using build information. Brought to you by the Mobile Security Team at Samsun

Samsung 10 Oct 19, 2022
A Fast and Accurate One-Stage Approach to Visual Grounding, ICCV 2019 (Oral)

One-Stage Visual Grounding ***** New: Our recent work on One-stage VG is available at ReSC.***** A Fast and Accurate One-Stage Approach to Visual Grou

Zhengyuan Yang 118 Dec 05, 2022
Complete-IoU (CIoU) Loss and Cluster-NMS for Object Detection and Instance Segmentation (YOLACT)

Complete-IoU Loss and Cluster-NMS for Improving Object Detection and Instance Segmentation. Our paper is accepted by IEEE Transactions on Cybernetics

290 Dec 25, 2022
Multiwavelets-based operator model

Multiwavelet model for Operator maps Gaurav Gupta, Xiongye Xiao, and Paul Bogdan Multiwavelet-based Operator Learning for Differential Equations In Ne

Gaurav 33 Dec 04, 2022
Source Code and data for my paper titled Linguistic Knowledge in Data Augmentation for Natural Language Processing: An Example on Chinese Question Matching

Description The source code and data for my paper titled Linguistic Knowledge in Data Augmentation for Natural Language Processing: An Example on Chin

Zhengxiang Wang 3 Jun 28, 2022
A Python reference implementation of the CF data model

cfdm A Python reference implementation of the CF data model. References Compliance with FAIR principles Documentation https://ncas-cms.github.io/cfdm

NCAS CMS 25 Dec 13, 2022
Multi-Glimpse Network With Python

Multi-Glimpse Network Multi-Glimpse Network: A Robust and Efficient Classification Architecture based on Recurrent Downsampled Attention arXiv Require

9 May 10, 2022
Just playing with getting CLIP Guided Diffusion running locally, rather than having to use colab.

CLIP-Guided-Diffusion Just playing with getting CLIP Guided Diffusion running locally, rather than having to use colab. Original colab notebooks by Ka

Nerdy Rodent 336 Dec 09, 2022
Customizable RecSys Simulator for OpenAI Gym

gym-recsys: Customizable RecSys Simulator for OpenAI Gym Installation | How to use | Examples | Citation This package describes an OpenAI Gym interfac

Xingdong Zuo 14 Dec 08, 2022
A repo with study material, exercises, examples, etc for Devnet SPAUTO

MPLS in the SDN Era -- DevNet SPAUTO Get right to the study material: Checkout the Wiki! A lab topology based on MPLS in the SDN era book used for 30

Hugo Tinoco 67 Nov 16, 2022
Code for ICCV 2021 paper Graph-to-3D: End-to-End Generation and Manipulation of 3D Scenes using Scene Graphs

Graph-to-3D This is the official implementation of the paper Graph-to-3d: End-to-End Generation and Manipulation of 3D Scenes Using Scene Graphs | arx

Helisa Dhamo 33 Jan 06, 2023
🏃‍♀️ A curated list about human motion capture, analysis and synthesis.

Awesome Human Motion 🏃‍♀️ A curated list about human motion capture, analysis and synthesis. Contents Introduction Human Models Datasets Data Process

Dennis Wittchen 274 Dec 14, 2022
This is the official pytorch implementation of Student Helping Teacher: Teacher Evolution via Self-Knowledge Distillation(TESKD)

Student Helping Teacher: Teacher Evolution via Self-Knowledge Distillation (TESKD) By Zheng Li[1,4], Xiang Li[2], Lingfeng Yang[2,4], Jian Yang[2], Zh

Zheng Li 9 Sep 26, 2022
WSDM‘2022: Knowledge Enhanced Sports Game Summarization

Knowledge Enhanced Sports Game Summarization Cooming Soon! :) Data will be released after approval process. Code will be published once the author of

Jiaan Wang 14 Jul 13, 2022
A Closer Look at Invalid Action Masking in Policy Gradient Algorithms

A Closer Look at Invalid Action Masking in Policy Gradient Algorithms This repo contains the source code to reproduce the results in the paper A Close

Costa Huang 73 Dec 24, 2022
An open-source project for applying deep learning to medical scenarios

Auto Vaidya An open source solution for creating end-end web app for employing the power of deep learning in various clinical scenarios like implant d

Smaranjit Ghose 18 May 29, 2022
pytorch bert intent classification and slot filling

pytorch_bert_intent_classification_and_slot_filling 基于pytorch的中文意图识别和槽位填充 说明 基本思路就是:分类+序列标注(命名实体识别)同时训练。 使用的预训练模型:hugging face上的chinese-bert-wwm-ext 依

西西嘛呦 33 Dec 15, 2022
Python Classes: Medical Insurance Project using Object Oriented Programming Concepts

Medical-Insurance-Project-OOP Python Classes: Medical Insurance Project using Object Oriented Programming Concepts Classes are an incredibly useful pr

Hugo B. 0 Feb 04, 2022
POT : Python Optimal Transport

POT: Python Optimal Transport This open source Python library provide several solvers for optimization problems related to Optimal Transport for signa

Python Optimal Transport 1.7k Dec 31, 2022
Annotate with anyone, anywhere.

h h is the web app that serves most of the https://hypothes.is/ website, including the web annotations API at https://hypothes.is/api/. The Hypothesis

Hypothesis 2.6k Jan 08, 2023