(ACL 2022) The source code for the paper "Towards Abstractive Grounded Summarization of Podcast Transcripts"

Last update: Jul 01, 2022

Related tags

Overview

Towards Abstractive Grounded Summarization of Podcast Transcripts

We provide the source code for the paper "Towards Abstractive Grounded Summarization of Podcast Transcripts" accepted at ACL'22. If you find the code useful, please cite the following paper.

@inproceedings{song-etal-2022-grounded,
    title="Towards Abstractive Grounded Summarization of Podcast Transcripts",
    author = "Song, Kaiqiang and
              Li, Chen and
              Wang, Xiaoyang and
              Yu, Dong and
              Liu, Fei",
    booktitle={Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics},
    year={2022}
}

Goal

We proposed a grounded summarization system, which provide each summary sentence a linked chunk of the original transcripts and their audio/video recordings. It allows a human evaluator to quickly verify the summary content against source clips.

News

03/04/2022 Trained model and processed testing data released.
03/03/2022 Code Released. Paper link, trained model and processed testing data will be released soon.
02/23/2022 Paper accepted at ACL 2022.

Experiments

You can follow the below 4 steps to generate grounded podcast summaries or directly download the generated summary from this link

Step 1: Download Code, Model & Data

Download the code

git clone https://github.com/tencent-ailab/GrndPodcastSum.git
cd GrndPodcastSum

Download the Trained Models to GrndPodcastSum Directory and unzip

unzip model.zip

Download the Processed Test Set (1027) to GrndPodcastSum Directory and unzip

unzip data.zip

Step 2: Setup Environment

Create the environment using .yml file.

conda env create -f env.yml
conda activate GrndPodcastSum

Step 3. Offline Computing for Chunk Embeddings

Calculating the chunk embedding offline.

sh offline.sh

Step 4. Generating Grounded Summary

Use Grnd-token-nonoveralp model to generate summary.

sh test.sh

License

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

   http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Disclaimer

This repo is only for research purpose. It is not an officially supported Tencent product.

(ACL 2022) The source code for the paper "Towards Abstractive Grounded Summarization of Podcast Transcripts"

Related tags

Overview

Towards Abstractive Grounded Summarization of Podcast Transcripts

Goal

News

Experiments

Step 1: Download Code, Model & Data

Step 2: Setup Environment

Step 3. Offline Computing for Chunk Embeddings

Step 4. Generating Grounded Summary

License

Disclaimer

Owner

🛸 Use pretrained transformers like BERT, XLNet and GPT-2 in spaCy

Quick insights from Zoom meeting transcripts using Graph + NLP

Official code repository of the paper Linear Transformers Are Secretly Fast Weight Programmers.

NLP command-line assistant powered by OpenAI

Signature remover is a NLP based solution which removes email signatures from the rest of the text.

Sentiment Classification using WSD, Maximum Entropy & Naive Bayes Classifiers

Indonesia spellchecker with python

PyABSA - Open & Efficient for Framework for Aspect-based Sentiment Analysis

A PyTorch implementation of VIOLET

📜 GPT-2 Rhyming Limerick and Haiku models using data augmentation

Torchrecipes provides a set of reproduci-able, re-usable, ready-to-run RECIPES for training different types of models, across multiple domains, on PyTorch Lightning.

Bu Chatbot, Konya Bilim Merkezi Yen için tasarlanmış olan bir projedir.

Extract city and country mentions from Text like GeoText without regex, but FlashText, a Aho-Corasick implementation.

Proquabet - Convert your prose into proquints and then you essentially have Vogon poetry

An open collection of annotated voices in Japanese language

Beautiful visualizations of how language differs among document types.

NLP, Machine learning

This is the offline-training-pipeline for our project.

API for the GPT-J language model 🦜. Including a FastAPI backend and a streamlit frontend

A very simple framework for state-of-the-art Natural Language Processing (NLP)