Multiple implementations for abstractive text summurization , using google colab

Last update: Dec 26, 2022

Overview

Text Summarization models

if you are able to endorse me on Arxiv, i would be more than glad https://arxiv.org/auth/endorse?x=FRBB89 thanks This repo is built to collect multiple implementations for abstractive approaches to address text summarization , for different languages (Hindi, Amharic, English, and soon isA Arabic)

If you found this project helpful please consider citing our work, it would truly mean so much for me

@INPROCEEDINGS{9068171,
  author={A. M. {Zaki} and M. I. {Khalil} and H. M. {Abbas}},
  booktitle={2019 14th International Conference on Computer Engineering and Systems (ICCES)}, 
  title={Deep Architectures for Abstractive Text Summarization in Multiple Languages}, 
  year={2019},
  volume={},
  number={},
  pages={22-27},}

@misc{zaki2020amharic,
    title={Amharic Abstractive Text Summarization},
    author={Amr M. Zaki and Mahmoud I. Khalil and Hazem M. Abbas},
    year={2020},
    eprint={2003.13721},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

it is built to simply run on google colab , in one notebook so you would only need an internet connection to run these examples without the need to have a powerful machine , so all the code examples would be in a jupiter format , and you don't have to download data to your device as we connect these jupiter notebooks to google drive

Arabic Summarization Model using the corner stone implemtnation (seq2seq using Bidirecional LSTM Encoder and attention in the decoder) for summarizing Arabic news
implementation A Corner stone seq2seq with attention (using bidirectional ltsm ) , three different models for this implemntation
implementation B seq2seq with pointer genrator model
implementation C seq2seq with reinforcement learning

Blogs

This repo has been explained in a series of Blogs

to understand how to work with google colab eco system , and how to integrate it with your google drive , this blog can prove useful DeepLearning Free Ecosystem
Tutorial 1 Overview on the different appraches used for abstractive text summarization
Tutorial 2 How to represent text for our text summarization task
Tutorial 3 What seq2seq and why do we use it in text summarization
Tutorial 4 Multilayer Bidirectional Lstm/Gru for text summarization
Tutorial 5 Beam Search & Attention for text summarization
Tutorial 6 Build an Abstractive Text Summarizer in 94 Lines of Tensorflow
Tutorial 7 Pointer generator for combination of Abstractive & Extractive methods for Text Summarization
Tutorial 8 Teach seq2seq models to learn from their mistakes using deep curriculum learning
Tutorial 9 Deep Reinforcement Learning (DeepRL) for Abstractive Text Summarization made easy
Tutorial 10 Hindi Text Summarization

Try out this text summarization through this website (eazymind) , which enables you to summarize your text through

curl call

curl -X POST 
http://eazymind.herokuapp.com/arabic_sum/eazysum
-H 'cache-control: no-cache' 
-H 'content-type: application/x-www-form-urlencoded' 
-d "eazykey={eazymind api key}&sentence={your sentence to be summarized}"

python package (pip install eazymind) pip install eazymind

from eazymind.nlp.eazysum import Summarizer

#---key from eazymind website---
key = "xxxxxxxxxxxxxxxxxxxxx"

#---sentence to be summarized---
sentence = """(CNN)The White House has instructed former
    White House Counsel Don McGahn not to comply with a subpoena
    for documents from House Judiciary Chairman Jerry Nadler, 
    teeing up the latest in a series of escalating oversight 
    showdowns between the Trump administration and congressional Democrats."""
    
summarizer = Summarizer(key)
print(summarizer.run(sentence))

Implementation A (seq2seq with attention and feature rich representation)

contains 3 different models that implements the concept of hving a seq2seq network with attention also adding concepts like having a feature rich word representation This work is a continuation of these amazing repos

Model 1

is a modification on of David Currie's https://github.com/Currie32/Text-Summarization-with-Amazon-Reviews seq2seq

Model 2

1- Model_2/Model_2.ipynb

a modification to https://github.com/dongjun-Lee/text-summarization-tensorflow

2- Model_2/Model 2 features(tf-idf , pos tags).ipynb

a modification to Model 2.ipynb by using concepts from http://www.aclweb.org/anthology/K16-1028

Results

A folder contains the results of both the 2 models , from validation text samples in a zaksum format , which is combining all of

bleu
rouge_1
rouge_2
rouge_L
rouge_be for each sentence , and average of all of them

Model 3

a modification to https://github.com/thomasschmied/Text_Summarization_with_Tensorflow/blob/master/summarizer_amazon_reviews.ipynb

Implementation B (Pointer Generator seq2seq network)

it is a continuation of the amazing work of https://github.com/abisee/pointer-generator https://arxiv.org/abs/1704.04368 this implementation uses the concept of having a pointer generator network to diminish some problems that appears with the normal seq2seq network

Model_4_generator_.ipynb

uses a pointer generator with seq2seq with attention it is built using python2.7

zaksum_eval.ipynb

built by python3 for evaluation

Results/Pointer Generator

output from generator (article / reference / summary) used as input to the zaksum_eval.ipynb
result from zaksum_eval

i will still work on their implementation of coverage mechanism , so much work is yet to come if God wills it isA

Implementation C (Reinforcement Learning For Sequence to Sequence )

this implementation is a continuation of the amazing work done by https://github.com/yaserkl/RLSeq2Seq https://arxiv.org/abs/1805.09461

@article{keneshloo2018deep,
 title={Deep Reinforcement Learning For Sequence to Sequence Models},
 author={Keneshloo, Yaser and Shi, Tian and Ramakrishnan, Naren and Reddy, Chandan K.},
 journal={arXiv preprint arXiv:1805.09461},
 year={2018}
}

Model 5 RL

this is a library for building multiple approaches using Reinforcement Learning with seq2seq , i have gathered their code to run in a jupiter notebook , and to access google drive built for python 2.7

zaksum_eval.ipynb

built by python3 for evaluation

Results/Reinforcement Learning

output from Model 5 RL used as input to the zaksum_eval.ipynb

Multiple implementations for abstractive text summurization , using google colab

Related tags

Overview

Text Summarization models

Blogs

Implementation A (seq2seq with attention and feature rich representation)

Model 1

Model 2

1- Model_2/Model_2.ipynb

2- Model_2/Model 2 features(tf-idf , pos tags).ipynb

Results

Model 3

Implementation B (Pointer Generator seq2seq network)

Model_4_generator_.ipynb

zaksum_eval.ipynb

Results/Pointer Generator

Implementation C (Reinforcement Learning For Sequence to Sequence )

Model 5 RL

zaksum_eval.ipynb

Results/Reinforcement Learning

Owner

🤗 The largest hub of ready-to-use NLP datasets for ML models with fast, easy-to-use and efficient data manipulation tools

Text to speech is a process to convert any text into voice. Text to speech project takes words on digital devices and convert them into audio. Here I have used Google-text-to-speech library popularly known as gTTS library to convert text file to .mp3 file. Hope you like my project!

本插件是pcrjjc插件的重置版，可以独立于后端api运行

spaCy-wrap: For Wrapping fine-tuned transformers in spaCy pipelines

ADCS cert template modification and ACL enumeration

中文問句產生器；使用台達電閱讀理解資料集(DRCD)

"Investigating the Limitations of Transformers with Simple Arithmetic Tasks", 2021

Sploitus - Command line search tool for sploitus.com. Think searchsploit, but with more POCs

👄 The most accurate natural language detection library for Python, suitable for long and short text alike

The official implementation of "BERT is to NLP what AlexNet is to CV: Can Pre-Trained Language Models Identify Analogies?, ACL 2021 main conference"

Natural Language Processing at EDHEC, 2022

:hot_pepper: R²SQL: "Dynamic Hybrid Relation Network for Cross-Domain Context-Dependent Semantic Parsing." (AAAI 2021)

PyTorch implementation of Microsoft's text-to-speech system FastSpeech 2: Fast and High-Quality End-to-End Text to Speech.

Sorce code and datasets for "K-BERT: Enabling Language Representation with Knowledge Graph",

Nested Named Entity Recognition

EMNLP'2021: Can Language Models be Biomedical Knowledge Bases?

Arabic-Phonetic-Output - You can input the phonetic version of any Arabic text here. This software will show you output in Arabic (with vowels)

Japanese synonym library

2021 AI CUP Competition on Traditional Chinese Scene Text Recognition - Intermediate Contest

Just Another Telegram Ai Chat Bot Written In Python With Pyrogram.