Wikipedia Extractive Text Summarizer + Keywords Identification (entropy-based)

Overview

Wikipedia Extractive Text Summarizer + Keywords Identification (entropy-based)

Uses Beautiful Soup to read Wiki pages, Gensim to summarize, NLTK to process, and extracts keywords based on entropy: everything in one beautiful code. I was looking for similar codes throughout Github but most of them were very difficult to understand and use. I'm building this repo to provide simple, yet effective solution in extractive summarization and keyword identification.

Program works best for 300+ words summary.

License

Please follow license guidelines in usage. GNU General Public License v3.0

Requirements

  • Gensim
  • NLTK
  • and others

I provided requirements.txt. Simply input command below in the terminal.

    pip install -r requirements.txt

How to Use

    python summarize.py 
    
   

output:

[email protected](github)

Apple Computer Company was founded on April 1, 1976, by Steve Jobs, Steve Wozniak, and Ronald Wayne as a business partnership. The company's first product is the Apple I, a computer designed and hand-built entirely by Wozniak. To finance its creation, Jobs sold his only motorized means of transportation, a VW Microbus, for a few hundred dollars, and Wozniak sold his HP-65 calculator for US$500 . Wozniak debuted the first prototype at the Homebrew Computer Club in July 1976. The Apple I was sold as a motherboard with CPU, RAM, and basic textual-video chips—a base kit concept which would not yet be marketed as a complete personal computer. It went on sale soon after debut for US$666.66 .:180 Wozniak later said he was unaware of the coincidental mark of the beast in the number 666, and that he came up with the price because he liked "repeating digits". During his keynote speech at the Macworld Expo on January 9, 2007, Jobs announced that Apple Computer, Inc. would thereafter be known as "Apple Inc.", because the company had shifted its emphasis from computers to consumer electronics. This event also saw the announcement of the iPhone and the Apple TV. The company sold 270,000 iPhone units during the first 30 hours of sales, and the device was called "a game changer for the industry". Apple would achieve widespread success with its iPhone, iPod Touch, and iPad products, which introduced innovations in mobile phones, portable music players, and personal computers respectively. Furthermore, by early 2007, 800,000 Final Cut Pro users were registered.

keywords:

'iphone', 'ipad', 'jobs', 'macintosh', 'stores'

Examples

Python (programming language) (300 words)

    python summarize.py https://en.wikipedia.org/wiki/Python_\(programming_language\) 300

output-summary:

Python was conceived in the late 1980s by Guido van Rossum at Centrum Wiskunde & Informatica in the Netherlands as a successor to the ABC language , capable of exception handling and interfacing with the Amoeba operating system. Its implementation began in December 1989. Van Rossum shouldered sole responsibility for the project, as the lead developer, until 12 July 2018, when he announced his "permanent vacation" from his responsibilities as Python's Benevolent Dictator For Life, a title the Python community bestowed upon him to reflect his long-term commitment as the project's chief decision-maker. He now shares his leadership as a member of a five-person steering council. In January 2019, active Python core developers elected Brett Cannon, Nick Coghlan, Barry Warsaw, Carol Willing and Van Rossum to a five-member "Steering Council" to lead the project. Python uses dynamic typing and a combination of reference counting and a cycle-detecting garbage collector for memory management. It also features dynamic name resolution , which binds method and variable names during program execution. Python's developers strive to avoid premature optimization, and reject patches to non-critical parts of the CPython reference implementation that would offer marginal increases in speed at the cost of clarity. When speed is important, a Python programmer can move time-critical functions to extension modules written in languages such as C, or use PyPy, a just-in-time compiler. The long-term plan is to support gradual typing and from Python 3.5, the syntax of the language allows specifying static types but they are not checked in the default implementation, CPython. Examples of the use of this prefix in names of Python applications or libraries include Pygame, a binding of SDL to Python ; PyQt and PyGTK, which bind Qt and GTK to Python respectively; and PyPy, a Python implementation originally written in Python.

output-keywords:

'python', 'class', 'classes', 'division', 'round', 'type'

Steve Jobs (350 words)

    python summarize.py https://en.wikipedia.org/wiki/Steve_Jobs 350

output-summary:

He worked closely with designer Jony Ive to develop a line of products that had larger cultural ramifications, beginning in 1997 with the "Think different" advertising campaign and leading to the iMac, iTunes, iTunes Store, Apple Store, iPod, iPhone, App Store, and the iPad. In 2001, the original Mac OS was replaced with a completely new Mac OS X , based on NeXT's NeXTSTEP platform, giving the OS a modern Unix-based foundation for the first time. 1931), grew up in Homs, Syria, and was born into an Arab Muslim household. While an undergraduate at the American University of Beirut, Lebanon, he was a student activist and spent time in prison for his political activities. He pursued a PhD at the University of Wisconsin, where he met Joanne Carole Schieble, a Catholic of Swiss and German descent. As a doctoral candidate, Jandali was a teaching assistant for a course Schieble was taking, although both were the same age. Mona Simpson, Jobs's biological sister, notes that her maternal grandparents were not happy that their daughter was dating a Muslim. Walter Isaacson, author of the Steve Jobs biography, additionally states that Schieble's father "threatened to cut Joanne off completely" if she continued the relationship. The location of the Los Altos home meant that Jobs would be able to attend nearby Homestead High School, which had strong ties to Silicon Valley. He began his first year there in late 1968 along with Bill Fernandez. Neither Jobs nor Fernandez came from engineering households and thus decided to enroll in John McCollum's "Electronics 1." McCollum and the rebellious Jobs would eventually clash and Jobs began to lose interest in the class.

output-keywords:

'brennan', 'apple', 'macintosh', 'disney', 'next', 'ipod', 'jandali', 'wozniak'

University of Pennsylvania (300 words)

    python summarize.py https://en.wikipedia.org/wiki/University_of_Pennsylvania 300

output-summary:

In 2019, the university had an endowment of $14.65 billion, the sixth-largest endowment of all colleges in the United States, as well as a research budget of $1.02 billion. The university's athletics program, the Quakers, fields varsity teams in 33 sports as a member of the NCAA Division I Ivy League conference. As of 2018, distinguished alumni include three U.S. Supreme Court justices, 32 U.S. senators, 46 U.S. governors, 163 members of the U.S. House of Representatives, eight signers of the Declaration of Independence, 12 signers of the U.S. Constitution, 24 members of the Continental Congress, 14 foreign heads of state, and two presidents of the United States, including the incumbent, Donald Trump. As of October 2019, 36 Nobel laureates, 80 members of the American Academy of Arts and Sciences, 64 billionaires, 29 Rhodes Scholars, 15 Marshall Scholars, and 16 Pulitzer Prize winners have been affiliated with the university. Penn has three claims to being the first university in the United States, according to university archives director Mark Frazier Lloyd: the 1765 founding of the first medical school in America made Penn the first institution to offer both "undergraduate" and professional education; the 1779 charter made it the first American institution of higher learning to take the name of "University"; and existing colleges were established as seminaries. Penn's educational innovations include the nation's first medical school in 1765; the first university teaching hospital in 1874; the Wharton School, the world's first collegiate business school, in 1881; the first American student union building, Houston Hall, in 1896; the country's second school of veterinary medicine; and the home of ENIAC, the world's first electronic, large-scale, general-purpose digital computer in 1946.

output-keywords:

'rugby', 'team', 'football', 'research', 'programs', 'founder', 'school', 'cricket', 'located', 'former'

Owner
Kevin Lai
Kevin Lai
Utility for Text Normalisation or Inverse Normalisation

Text Processor Text Normalisation or Inverse Normalisation for Indonesian, e.g. measurements "123 kg" - "seratus dua puluh tiga kilogram" Currency/Mo

Cahya Wirawan 2 Aug 11, 2022
A python tool one can extract the "hash" from a WINDOWS HELLO PIN

WINHELLO2hashcat About With this tool one can extract the "hash" from a WINDOWS HELLO PIN. This hash can be cracked with Hashcat, more precisely with

33 Dec 05, 2022
Question answering on russian with XLMRobertaLarge as a service

QA Roberta Ru SaaS Question answering on russian with XLMRobertaLarge as a service. Thanks for the model to Alexander Kaigorodov. Stack Flask Gunicorn

Gladkikh Prohor 21 Jul 04, 2022
Paranoid text spacing in Python

pangu.py Paranoid text spacing for good readability, to automatically insert whitespace between CJK (Chinese, Japanese, Korean) and half-width charact

Vinta Chen 194 Nov 19, 2022
Adventura is an open source Python Text Adventure Engine

Adventura Adventura is an open source Python Text Adventure Engine, Not yet uplo

5 Oct 02, 2022
🍋 A Python package to process food

Pyfood is a simple Python package to process food, in different languages. Pyfood's ambition is to be the go-to library to deal with food, recipes, on

Local Seasonal 8 Apr 04, 2022
The Levenshtein Python C extension module contains functions for fast computation of Levenshtein distance and string similarity

Contents Maintainer wanted Introduction Installation Documentation License History Source code Authors Maintainer wanted I am looking for a new mainta

Antti Haapala 1.2k Dec 16, 2022
py-trans is a Free Python library for translate text into different languages.

Free Python library to translate text into different languages.

I'm Not A Bot #Left_TG 13 Aug 27, 2022
Making simplex testing clean and simple

Making Simplex Project Testing - Clean and Simple What does this repo do? It organizes the python stack for the coding project What do I need to do in

Mohit Mahajan 1 Jan 30, 2022
The bot creates hashtags for user's texts in Russian and English.

telegram_bot_hashtags The bot creates hashtags for user's texts in Russian and English. It is a simple bot for creating hashtags. NOTE file config.py

Yana Davydovich 2 Feb 12, 2022
Little python script + dictionary to help solve Wordle puzzles

Wordle Solver Little python script + dictionary to help solve Wordle puzzles Usage Usage: ./wordlesolver.py [letters in word] [letters not in word] [p

Luke Stephens (hakluke) 4 Jul 24, 2022
Production First and Production Ready End-to-End Keyword Spotting Toolkit

WeKws Production First and Production Ready End-to-End Keyword Spotting Toolkit. The goal of this toolkit it to... Small footprint keyword spotting (K

222 Dec 30, 2022
A simple Python module for parsing human names into their individual components

Name Parser A simple Python (3.2+ & 2.6+) module for parsing human names into their individual components. hn.title hn.first hn.middle hn.last hn.suff

Derek Gulbranson 574 Dec 20, 2022
从flomo导出的笔记中生成词云

flomo-word-cloud 从flomo导出的笔记中生成词云 如何使用? 将本项目克隆到你的电脑上,使用如下的命令,安装所需python库 pip install -r requirements.txt 在项目里新建一个file文件夹,把所有从flomo导出的html文件放入其中 运行main

Hannnk 9 Dec 30, 2022
Shows twitch pay for any streamer from Twitch leaked CSV files.

twitch_leak_csv_reader Shows twitch pay for any streamer from Twitch leaked CSV files. Requirements: You need python3 (you can install python 3 from o

5 Nov 11, 2022
This project aims to test check if your RegExp are being matched by grep.

Bash RegExp This project aims to test check if your RegExp are being matched by grep. It's a local server that starts on the port 8080. It runs the se

Quatrecentquatre 1 Feb 28, 2022
一个可以可以统计群组用户发言,并且能将聊天内容生成词云的机器人

当前版本 v2.2 更新维护日志 更新维护日志 有问题请加群组反馈 Telegram 交流反馈群组 点击加入 演示 配置要求 内存:1G以上 安装方法 使用 Docker 安装 Docker官方安装

机器人总动员 117 Dec 29, 2022
Compute distance between sequences. 30+ algorithms, pure python implementation, common interface, optional external libs usage.

TextDistance TextDistance -- python library for comparing distance between two or more sequences by many algorithms. Features: 30+ algorithms Pure pyt

Life4 3k Jan 02, 2023
Parse Any Text With Python

ParseAnyText A small package to parse strings. What is the work of it? Well It's a module to creates parser that helps to parse a text easily with les

Sayam Goswami 1 Jan 11, 2022
Etranslate is a free and unlimited python library for transiting your texts

Etranslate is a free and unlimited python library for transiting your texts

Abolfazl Khalili 16 Sep 13, 2022