(ACL 2022) The source code for the paper "Towards Abstractive Grounded Summarization of Podcast Transcripts"

Overview

Towards Abstractive Grounded Summarization of Podcast Transcripts

We provide the source code for the paper "Towards Abstractive Grounded Summarization of Podcast Transcripts" accepted at ACL'22. If you find the code useful, please cite the following paper.

@inproceedings{song-etal-2022-grounded,
    title="Towards Abstractive Grounded Summarization of Podcast Transcripts",
    author = "Song, Kaiqiang and
              Li, Chen and
              Wang, Xiaoyang and
              Yu, Dong and
              Liu, Fei",
    booktitle={Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics},
    year={2022}
}

Goal

We proposed a grounded summarization system, which provide each summary sentence a linked chunk of the original transcripts and their audio/video recordings. It allows a human evaluator to quickly verify the summary content against source clips. example

News

  • 03/04/2022 Trained model and processed testing data released.
  • 03/03/2022 Code Released. Paper link, trained model and processed testing data will be released soon.
  • 02/23/2022 Paper accepted at ACL 2022.

Experiments

You can follow the below 4 steps to generate grounded podcast summaries or directly download the generated summary from this link

Step 1: Download Code, Model & Data

Download the code

git clone https://github.com/tencent-ailab/GrndPodcastSum.git
cd GrndPodcastSum

Download the Trained Models to GrndPodcastSum Directory and unzip

unzip model.zip

Download the Processed Test Set (1027) to GrndPodcastSum Directory and unzip

unzip data.zip

Step 2: Setup Environment

Create the environment using .yml file.

conda env create -f env.yml
conda activate GrndPodcastSum

Step 3. Offline Computing for Chunk Embeddings

Calculating the chunk embedding offline.

sh offline.sh

Step 4. Generating Grounded Summary

Use Grnd-token-nonoveralp model to generate summary.

sh test.sh

License

Copyright 2022 Tencent

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

   http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Disclaimer

This repo is only for research purpose. It is not an officially supported Tencent product.

Owner
Research repositories.
An example project using OpenPrompt under pytorch-lightning for prompt-based SST2 sentiment analysis model

pl_prompt_sst An example project using OpenPrompt under the framework of pytorch-lightning for a training prompt-based text classification model on SS

Zhiling Zhang 5 Oct 21, 2022
This project consists of data analysis and data visualization (done using python)of all IPL seasons from 2008 to 2019 and answering the most asked questions about the IPL.

IPL-data-analysis This project consists of data analysis and data visualization of all IPL seasons from 2008 to 2019 and answering the most asked ques

Sivateja A T 2 Feb 08, 2022
Two-stage text summarization with BERT and BART

Two-Stage Text Summarization Description We experiment with a 2-stage summarization model on CNN/DailyMail dataset that combines the ability to filter

Yukai Yang (Alexis) 6 Oct 22, 2022
Intent parsing and slot filling in PyTorch with seq2seq + attention

PyTorch Seq2Seq Intent Parsing Reframing intent parsing as a human - machine translation task. Work in progress successor to torch-seq2seq-intent-pars

Sean Robertson 159 Apr 04, 2022
Takes a string and puts it through different languages in Google Translate a requested amount of times, returning nonsense.

PythonTextObfuscator Takes a string and puts it through different languages in Google Translate a requested amount of times, returning nonsense. Requi

2 Aug 29, 2022
NLP tool to extract emotional phrase from tweets ๐Ÿคฉ

Emotional phrase extractor Extract phrase in the given text that is used to express the sentiment. Capturing sentiment in language is important in the

Shahul ES 38 Oct 17, 2022
Code for Emergent Translation in Multi-Agent Communication

Emergent Translation in Multi-Agent Communication PyTorch implementation of the models described in the paper Emergent Translation in Multi-Agent Comm

Facebook Research 75 Jul 15, 2022
Implementation of TTS with combination of Tacotron2 and HiFi-GAN

Tacotron2-HiFiGAN-master Implementation of TTS with combination of Tacotron2 and HiFi-GAN for Mandarin TTS. Inference In order to inference, we need t

SunLu Z 7 Nov 11, 2022
PyTorch Implementation of Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation

StyleSpeech - PyTorch Implementation PyTorch Implementation of Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation. Status (2021.06.09

Keon Lee 142 Jan 06, 2023
Python package for Turkish Language.

PyTurkce Python package for Turkish Language. Documentation: https://pyturkce.readthedocs.io. Installation pip install pyturkce Usage from pyturkce im

Mert Cobanov 14 Oct 09, 2022
:hot_pepper: RยฒSQL: "Dynamic Hybrid Relation Network for Cross-Domain Context-Dependent Semantic Parsing." (AAAI 2021)

RยฒSQL The PyTorch implementation of paper Dynamic Hybrid Relation Network for Cross-Domain Context-Dependent Semantic Parsing. (AAAI 2021) Requirement

huybery 60 Dec 31, 2022
189 Jan 02, 2023
Spert NLP Relation Extraction API deployed with torchserve for inference

URLMask Python program for Linux users to change a URL to ANY domain. A program than can take any url and mask it to any domain name you like. E.g. ne

Zichu Chen 1 Nov 24, 2021
NLPShala , the best IDE for all Natural language processing tasks.

The revolutionary IDE for all NLP (Natural language processing) stuffs on the internet.

Abhi 3 Aug 08, 2021
BERN2: an advanced neural biomedical namedentity recognition and normalization tool

BERN2 We present BERN2 (Advanced Biomedical Entity Recognition and Normalization), a tool that improves the previous neural network-based NER tool by

DMIS Laboratory - Korea University 99 Jan 06, 2023
Use the power of GPT3 to execute any function inside your programs just by giving some doctests

gptrun Don't feel like coding today? Use the power of GPT3 to execute any function inside your programs just by giving some doctests. How is this diff

Roberto Abdelkader Martรญnez Pรฉrez 11 Nov 11, 2022
Just Another Telegram Ai Chat Bot Written In Python With Pyrogram.

OkaeriChatBot Just another Telegram AI chat bot written in Python using Pyrogram. Requirements Python 3.7 or higher.

Wahyusaputra 2 Dec 23, 2021
Demo programs for the Talking Head Anime from a Single Image 2: More Expressive project.

Demo Code for "Talking Head Anime from a Single Image 2: More Expressive" This repository contains demo programs for the Talking Head Anime

Pramook Khungurn 901 Jan 06, 2023
Code for paper: An Effective, Robust and Fairness-awareHate Speech Detection Framework

BiQQLSTM_HS Code and data for paper: Title: An Effective, Robust and Fairness-awareHate Speech Detection Framework. Authors: Guanyi Mou and Kyumin Lee

Guanyi Mou 2 Dec 27, 2022
kochat

Kochat ์ฑ—๋ด‡ ๋นŒ๋”๋Š” ์„ฑ์— ์•ˆ์ฐจ๊ณ , ์ž์‹ ๋งŒ์˜ ๋”ฅ๋Ÿฌ๋‹ ์ฑ—๋ด‡ ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์„ ๋งŒ๋“œ์‹œ๊ณ  ์‹ถ์œผ์‹ ๊ฐ€์š”? Kochat์„ ์ด์šฉํ•˜๋ฉด ์†์‰ฝ๊ฒŒ ์ž์‹ ๋งŒ์˜ ๋”ฅ๋Ÿฌ๋‹ ์ฑ—๋ด‡ ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์„ ๋นŒ๋“œํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. # 1. ๋ฐ์ดํ„ฐ์…‹ ๊ฐ์ฒด ์ƒ์„ฑ dataset = Dataset(ood=True) #

1 Oct 25, 2021