EMNLP 2020 - Summarizing Text on Any Aspects

Overview

Summarizing Text on Any Aspects

This repo contains preliminary code of the following paper:

Summarizing Text on Any Aspects: A Knowledge-Informed Weakly-Supervised Approach
Bowen Tan, Lianhui Qin, Eric P. Xing, Zhiting Hu
EMNLP 2020
[ArXiv] [Slides]

Getting Started

  • Given a document and a target aspect (e.g., a topic of interest), aspect-based abstractive summarization attempts to generate a summary with respect to the aspect.
  • In this work, we study summarizing on arbitrary aspects relevant to the document.
  • Due to the lack of supervision data, we develop a new weak supervision construction method integrating rich external knowledge sources such as ConceptNet and Wikipedia.

Requirements

Our python version is 3.8, required packages can be installed by

pip install -r requrements.txt

Our code can run on a single GTX 1080Ti GPU.

Datasets & Knowledge Sources

Weakly Supervised Dataset

Our constructed weakly supervised dataset can be downloaded by

bash data_utils/download_weaksup.sh

Downloaded data will be saved into data/weaksup/.

We also provide the code to construct it. For more details, see

MA-News Dataset

MA-News Dataset is a aspect summarization dataset constructed by (Frermann et al.) . Its aspects are restricted to only 6 coarsegrained topics. We use MA-News dataset for our automatic evaluation. Scripts to make MA-News is here.

A JSON version processed by us can be download by

bash data_utils/download_manews.sh

Downloaded data will be saved into data/manews/.

Knowledge Graph - ConceptNet

ConceptNet is a huge multilingual commonsense knowledge graph. We extract an English subset that can be downloaded by

bash data_utils/download_concept_net.sh

Knowledge Base - Wikipedia

Wikipedia is an encyclopaedic knowledge base. We use its python API to access it online, so make sure your web connection is good when running our code.

Weakly Supervised Model

Train

Run this command to finetune a weakly supervised model from pretrained BART model (Lewis et al.).

python finetune.py --dataset_name weaksup --train_docs 100000 --n_epochs 1

Training logs and checkpoints will be saved into logs/weaksup/docs100000/

The training takes ~48h on a single GTX 1080Ti GPU. You may want to directly download the training log and the trained model here.

Generation

Run this command to generate on MA-News test set with the weakly supervised model.

python generate.py --log_path logs/weaksup/docs100000/

Source texts, target texts, generated texts will be saved as test.source, test.gold, and test.hypo respectively, into the log dir: logs/weaksup/docs100000/.

Evaluation

To run evaluation, make sure you have installed java and files2rouge on your device.

First, download stanford nlp by

python data_utils/download_stanford_core_nlp.py

and run

bash evaluate.sh logs/weaksup/docs100000/

to get rouge scores. Results will be saved in logs/weaksup/docs100000/rouge_scores.txt.

Finetune with MA-News Training Data

Baseline

Run this command to finetune a BART model with 1K MA-News training data examples.

python finetune.py --dataset_name manews --train_docs 1000 --wiki_sup False
python generate.py --log_path logs/manews/docs1000/ --wiki_sup False
bash evaluate.sh logs/manews/docs1000/

Results will be saved in logs/manews/docs1000/.

+ Weak Supervision

Run this command to finetune with 1K MA-News training data examples starting with our weakly supervised model.

python finetune.py --dataset_name manews --train_docs 1000 --pretrained_ckpt logs/weaksup/docs100000/best_model.ckpt
python generate.py --log_path logs/manews_plus/docs1000/
bash evaluate.sh logs/manews_plus/docs1000/

Results will be saved in logs/manews_plus/docs1000/.

Results

Results on MA-News dataset are as below (same setting as paper Table 2).

All the detailed logs, including training log, generated texts, and rouge scores, are available here.

(Note: The result numbers may be slightly different from those in the paper due to slightly different implementation details and random seeds, while the improvements over comparison methods are consistent.)

Model ROUGE-1 ROUGE-2 ROUGE-L
Weak-Sup Only 28.41 10.18 25.34
MA-News-Sup 1K 24.34 8.62 22.40
MA-News-Sup 1K + Weak-Sup 34.10 14.64 31.45
MA-News-Sup 3K 26.38 10.09 24.37
MA-News-Sup 3K + Weak-Sup 37.40 16.87 34.51
MA-News-Sup 10K 38.71 18.02 35.78
MA-News-Sup 10K + Weak-Sup 39.92 18.87 36.98

Demo

We provide a demo on a real news on Feb. 2021. (see demo_input.json).

To run the demo, download our trained model here, and run the command below

python demo.py --ckpt_path logs/weaksup/docs100000/best_model.ckpt
Owner
Bowen Tan
Bowen Tan
Unsupervised Real-World Super-Resolution: A Domain Adaptation Perspective

Unofficial pytorch implementation of the paper "Unsupervised Real-World Super-Resolution: A Domain Adaptation Perspective"

16 Nov 21, 2022
Explore the Expression: Facial Expression Generation using Auxiliary Classifier Generative Adversarial Network

Explore the Expression: Facial Expression Generation using Auxiliary Classifier Generative Adversarial Network This is the official implementation of

azad 2 Jul 09, 2022
Source Code and data for my paper titled Linguistic Knowledge in Data Augmentation for Natural Language Processing: An Example on Chinese Question Matching

Description The source code and data for my paper titled Linguistic Knowledge in Data Augmentation for Natural Language Processing: An Example on Chin

Zhengxiang Wang 3 Jun 28, 2022
[NeurIPS-2021] Slow Learning and Fast Inference: Efficient Graph Similarity Computation via Knowledge Distillation

Efficient Graph Similarity Computation - (EGSC) This repo contains the source code and dataset for our paper: Slow Learning and Fast Inference: Effici

23 Nov 11, 2022
Speckle-free Holography with Partially Coherent Light Sources and Camera-in-the-loop Calibration

Speckle-free Holography with Partially Coherent Light Sources and Camera-in-the-loop Calibration Project Page | Paper Yifan Peng*, Suyeon Choi*, Jongh

Stanford Computational Imaging Lab 19 Dec 11, 2022
Python Fanduel API (2021) - Lineup Automation

Southpaw is a python package that provides access to the Fanduel API. Optimize your DFS experience by programmatically updating your lineups, analyzin

Brandin Canfield 13 Jan 04, 2023
Official code release for: EditGAN: High-Precision Semantic Image Editing

Official code release for: EditGAN: High-Precision Semantic Image Editing

565 Jan 05, 2023
Simple improvement of VQVAE that allow to generate x2 sized images compared to baseline

vqvae_dwt_distiller.pytorch Simple improvement of VQVAE that allow to generate x2 sized images compared to baseline. It allows to generate 512x512 ima

Sergei Belousov 25 Jul 19, 2022
Code for "LoRA: Low-Rank Adaptation of Large Language Models"

LoRA: Low-Rank Adaptation of Large Language Models This repo contains the implementation of LoRA in GPT-2 and steps to replicate the results in our re

Microsoft 394 Jan 08, 2023
TimeSHAP explains Recurrent Neural Network predictions.

TimeSHAP TimeSHAP is a model-agnostic, recurrent explainer that builds upon KernelSHAP and extends it to the sequential domain. TimeSHAP computes even

Feedzai 90 Dec 18, 2022
A benchmark for the task of translation suggestion

WeTS: A Benchmark for Translation Suggestion Translation Suggestion (TS), which provides alternatives for specific words or phrases given the entire d

zhyang 55 Dec 24, 2022
Godot RL Agents is a fully Open Source packages that allows video game creators

Godot RL Agents The Godot RL Agents is a fully Open Source packages that allows video game creators, AI researchers and hobbiest the opportunity to le

Edward Beeching 326 Dec 30, 2022
Data and codes for ACL 2021 paper: Towards Emotional Support Dialog Systems

Emotional-Support-Conversation Copyright © 2021 CoAI Group, Tsinghua University. All rights reserved. Data and codes are for academic research use onl

126 Dec 21, 2022
Official PyTorch Implementation of Mask-aware IoU and maYOLACT Detector [BMVC2021]

The official implementation of Mask-aware IoU and maYOLACT detector. Our implementation is based on mmdetection. Mask-aware IoU for Anchor Assignment

Kemal Oksuz 46 Sep 29, 2022
Current state of supervised and unsupervised depth completion methods

Awesome Depth Completion Table of Contents About Sparse-to-Dense Depth Completion Current State of Depth Completion Unsupervised VOID Benchmark Superv

224 Dec 28, 2022
An implementation of quantum convolutional neural network with MindQuantum. Huawei, classifying MNIST dataset

关于实现的一点说明 山东大学 2020级 苏博南 www.subonan.com 文件说明 tools.py 这里面主要有两个函数: resize(a, lenb) 这其实是我找同学写的一个小算法hhh。给出一个$28\times 28$的方阵a,返回一个$lenb\times lenb$的方阵。因

ぼっけなす 2 Aug 29, 2022
[ICLR 2021] Rank the Episodes: A Simple Approach for Exploration in Procedurally-Generated Environments.

[ICLR 2021] RAPID: A Simple Approach for Exploration in Reinforcement Learning This is the Tensorflow implementation of ICLR 2021 paper Rank the Episo

Daochen Zha 48 Nov 21, 2022
Final term project for Bayesian Machine Learning Lecture (XAI-623)

Mixquality_AL Final Term Project For Bayesian Machine Learning Lecture (XAI-623) Youtube Link The presentation is given in YoutubeLink Problem Formula

JeongEun Park 3 Jan 18, 2022
Evolving neural network parameters in JAX.

Evolving Neural Networks in JAX This repository holds code displaying techniques for applying evolutionary network training strategies in JAX. Each sc

Trevor Thackston 6 Feb 12, 2022
A gesture recognition system powered by OpenPose, k-nearest neighbours, and local outlier factor.

OpenHands OpenHands is a gesture recognition system powered by OpenPose, k-nearest neighbours, and local outlier factor. Currently the system can iden

Paul Treanor 12 Jan 10, 2022