Text Summarization - WCN — Weighted Contextual N-gram method for evaluation of Text Summarization

Last update: Jan 03, 2022

Related tags

Overview

Text Summarization

WCN — Weighted Contextual N-gram method for evaluation of Text Summarization

In this project, I fine tune T5 model on Extreme Summarization (XSum) Dataset achieving a rouge2 f score of 9.5% on test data. Further I discuss the drawbacks of ngram based metrics as well as contextual word metrics.

Finally, I propose use of Weighted Contextual N-gram (WCN) method – an alternative metric which can be more effective for evaluation of text generation tasks.

The complete documentation of the project can be found here

Dataset

I use the Extreme Summarization (XSum) Dataset. The dataset can be downloaded from here

The dataset consists of BBC articles and accompanying single sentence summaries. Specifically, each article is prefaced with an introductory sentence (aka summary) which is professionally written, typically by the author of the article.

There are two features in this dataset:
(1) document: Input news article.
(2) summary: Onesentence summary of the article.

The idea is to generate a short, one-sentence news summary answering the question ”What is the article about?”. There are in total 226k samples: 204,045 samples for training data, 11,332 samples for validation data and 11,334 samples for test data. The average number of words in a document is 431.07 (19.77 sentences) and the average number of words in a summary is 23.26.

Code

The source code for this project can be found at text_summarization.ipynb.

Text Summarization - WCN — Weighted Contextual N-gram method for evaluation of Text Summarization

Related tags

Overview

Text Summarization

WCN — Weighted Contextual N-gram method for evaluation of Text Summarization

Dataset

Code

Owner

Aditya Shah

Generic template to bootstrap your PyTorch project with PyTorch Lightning, Hydra, W&B, and DVC.

Keras Model Implementation Walkthrough

tinykernel - A minimal Python kernel so you can run Python in your Python

MoveNetを用いたPythonでの姿勢推定のデモ

code for "AttentiveNAS Improving Neural Architecture Search via Attentive Sampling"

Repository for open research on optimizers.

[NeurIPS-2021] Slow Learning and Fast Inference: Efficient Graph Similarity Computation via Knowledge Distillation

The PyTorch implementation of paper REST: Debiased Social Recommendation via Reconstructing Exposure Strategies

UMT is a unified and flexible framework which can handle different input modality combinations, and output video moment retrieval and/or highlight detection results.

Model Serving Made Easy

Repo público onde postarei meus estudos de Python, buscando aprender por meio do compartilhamento do aprendizado!

A Deep Reinforcement Learning Framework for Stock Market Trading

Deep-learning X-Ray Micro-CT image enhancement, pore-network modelling and continuum modelling

Conversational text Analysis using various NLP techniques

Neural machine translation between the writings of Shakespeare and modern English using TensorFlow

Training Very Deep Neural Networks Without Skip-Connections

The sixth place winning solution (6/220) in 2021 Gaofen Challenge.

Embracing Single Stride 3D Object Detector with Sparse Transformer

MEDS: Enhancing Memory Error Detection for Large-Scale Applications

COVINS -- A Framework for Collaborative Visual-Inertial SLAM and Multi-Agent 3D Mapping