Python wrapper for Wikipedia

Overview

Wikipedia API

Wikipedia-API is easy to use Python wrapper for Wikipedias' API. It supports extracting texts, sections, links, categories, translations, etc from Wikipedia. Documentation provides code snippets for the most common use cases.

build status Documentation Status Test Coverage Version Py Versions GitHub stars

Installation

This package requires at least Python 3.4 to install because it's using IntEnum.

pip3 install wikipedia-api

Usage

Goal of Wikipedia-API is to provide simple and easy to use API for retrieving informations from Wikipedia. Bellow are examples of common use cases.

Importing

import wikipediaapi

How To Get Single Page

Getting single page is straightforward. You have to initialize Wikipedia object and ask for page by its name. It's parameter language has be one of supported languages.

import wikipediaapi
    wiki_wiki = wikipediaapi.Wikipedia('en')

    page_py = wiki_wiki.page('Python_(programming_language)')

How To Check If Wiki Page Exists

For checking, whether page exists, you can use function exists.

page_py = wiki_wiki.page('Python_(programming_language)')
print("Page - Exists: %s" % page_py.exists())
# Page - Exists: True

page_missing = wiki_wiki.page('NonExistingPageWithStrangeName')
print("Page - Exists: %s" %     page_missing.exists())
# Page - Exists: False

How To Get Page Summary

Class WikipediaPage has property summary, which returns description of Wiki page.

import wikipediaapi
    wiki_wiki = wikipediaapi.Wikipedia('en')

    print("Page - Title: %s" % page_py.title)
    # Page - Title: Python (programming language)

    print("Page - Summary: %s" % page_py.summary[0:60])
    # Page - Summary: Python is a widely used high-level programming language for

How To Get Page URL

WikipediaPage has two properties with URL of the page. It is fullurl and canonicalurl.

print(page_py.fullurl)
# https://en.wikipedia.org/wiki/Python_(programming_language)

print(page_py.canonicalurl)
# https://en.wikipedia.org/wiki/Python_(programming_language)

How To Get Full Text

To get full text of Wikipedia page you should use property text which constructs text of the page as concatanation of summary and sections with their titles and texts.

wiki_wiki = wikipediaapi.Wikipedia(
        language='en',
        extract_format=wikipediaapi.ExtractFormat.WIKI
)

p_wiki = wiki_wiki.page("Test 1")
print(p_wiki.text)
# Summary
# Section 1
# Text of section 1
# Section 1.1
# Text of section 1.1
# ...


wiki_html = wikipediaapi.Wikipedia(
        language='en',
        extract_format=wikipediaapi.ExtractFormat.HTML
)
p_html = wiki_html.page("Test 1")
print(p_html.text)
# <p>Summary</p>
# <h2>Section 1</h2>
# <p>Text of section 1</p>
# <h3>Section 1.1</h3>
# <p>Text of section 1.1</p>
# ...

How To Get Page Sections

To get all top level sections of page, you have to use property sections. It returns list of WikipediaPageSection, so you have to use recursion to get all subsections.

def print_sections(sections, level=0):
        for s in sections:
                print("%s: %s - %s" % ("*" * (level + 1), s.title, s.text[0:40]))
                print_sections(s.sections, level + 1)


print_sections(page_py.sections)
# *: History - Python was conceived in the late 1980s,
# *: Features and philosophy - Python is a multi-paradigm programming l
# *: Syntax and semantics - Python is meant to be an easily readable
# **: Indentation - Python uses whitespace indentation, rath
# **: Statements and control flow - Python's statements include (among other
# **: Expressions - Some Python expressions are similar to l

How To Get Page In Other Languages

If you want to get other translations of given page, you should use property langlinks. It is map, where key is language code and value is WikipediaPage.

def print_langlinks(page):
        langlinks = page.langlinks
        for k in sorted(langlinks.keys()):
            v = langlinks[k]
            print("%s: %s - %s: %s" % (k, v.language, v.title, v.fullurl))

print_langlinks(page_py)
# af: af - Python (programmeertaal): https://af.wikipedia.org/wiki/Python_(programmeertaal)
# als: als - Python (Programmiersprache): https://als.wikipedia.org/wiki/Python_(Programmiersprache)
# an: an - Python: https://an.wikipedia.org/wiki/Python
# ar: ar - بايثون: https://ar.wikipedia.org/wiki/%D8%A8%D8%A7%D9%8A%D8%AB%D9%88%D9%86
# as: as - পাইথন: https://as.wikipedia.org/wiki/%E0%A6%AA%E0%A6%BE%E0%A6%87%E0%A6%A5%E0%A6%A8

page_py_cs = page_py.langlinks['cs']
print("Page - Summary: %s" % page_py_cs.summary[0:60])
# Page - Summary: Python (anglická výslovnost [ˈpaiθtən]) je vysokoúrovňový sk

How To Get Links To Other Pages

If you want to get all links to other wiki pages from given page, you need to use property links. It's map, where key is page title and value is WikipediaPage.

def print_links(page):
        links = page.links
        for title in sorted(links.keys()):
            print("%s: %s" % (title, links[title]))

print_links(page_py)
# 3ds Max: 3ds Max (id: ??, ns: 0)
# ?:: ?: (id: ??, ns: 0)
# ABC (programming language): ABC (programming language) (id: ??, ns: 0)
# ALGOL 68: ALGOL 68 (id: ??, ns: 0)
# Abaqus: Abaqus (id: ??, ns: 0)
# ...

How To Get Page Categories

If you want to get all categories under which page belongs, you should use property categories. It's map, where key is category title and value is WikipediaPage.

def print_categories(page):
        categories = page.categories
        for title in sorted(categories.keys()):
            print("%s: %s" % (title, categories[title]))


print("Categories")
print_categories(page_py)
# Category:All articles containing potentially dated statements: ...
# Category:All articles with unsourced statements: ...
# Category:Articles containing potentially dated statements from August 2016: ...
# Category:Articles containing potentially dated statements from March 2017: ...
# Category:Articles containing potentially dated statements from September 2017: ...

How To Get All Pages From Category

To get all pages from given category, you should use property categorymembers. It returns all members of given category. You have to implement recursion and deduplication by yourself.

def print_categorymembers(categorymembers, level=0, max_level=1):
        for c in categorymembers.values():
            print("%s: %s (ns: %d)" % ("*" * (level + 1), c.title, c.ns))
            if c.ns == wikipediaapi.Namespace.CATEGORY and level < max_level:
                print_categorymembers(c.categorymembers, level=level + 1, max_level=max_level)


cat = wiki_wiki.page("Category:Physics")
print("Category members: Category:Physics")
print_categorymembers(cat.categorymembers)

# Category members: Category:Physics
# * Statistical mechanics (ns: 0)
# * Category:Physical quantities (ns: 14)
# ** Refractive index (ns: 0)
# ** Vapor quality (ns: 0)
# ** Electric susceptibility (ns: 0)
# ** Specific weight (ns: 0)
# ** Category:Viscosity (ns: 14)
# *** Brookfield Engineering (ns: 0)

How To See Underlying API Call

If you have problems with retrieving data you can get URL of undrerlying API call. This will help you determine if the problem is in the library or somewhere else.

import wikipediaapi
import sys
wikipediaapi.log.setLevel(level=wikipediaapi.logging.DEBUG)

# Set handler if you use Python in interactive mode
out_hdlr = wikipediaapi.logging.StreamHandler(sys.stderr)
out_hdlr.setFormatter(wikipediaapi.logging.Formatter('%(asctime)s %(message)s'))
out_hdlr.setLevel(wikipediaapi.logging.DEBUG)
wikipediaapi.log.addHandler(out_hdlr)

wiki = wikipediaapi.Wikipedia(language='en')

page_ostrava = wiki.page('Ostrava')
print(page_ostrava.summary)
# logger prints out: Request URL: http://en.wikipedia.org/w/api.php?action=query&prop=extracts&titles=Ostrava&explaintext=1&exsectionformat=wiki

External Links

Other Badges

Code Climate Issue Count Coveralls Version Py Versions implementations Downloads Tags github-release Github commits (since latest release) GitHub forks GitHub stars GitHub watchers GitHub commit activity the past week, 4 weeks, year Last commit GitHub code size in bytes GitHub repo size in bytes PyPi License PyPi Wheel PyPi Format PyPi PyVersions PyPi Implementations PyPi Status PyPi Downloads - Day PyPi Downloads - Week PyPi Downloads - Month Libraries.io - SourceRank Libraries.io - Dependent Repos

Other Pages

.. toctree::
        :maxdepth: 2

        API
        CHANGES
        DEVELOPMENT
        wikipediaapi/api

Owner
Martin Majlis
Martin Majlis
Bitcoin tracker hecho con python.

Bitcoin Tracker Precio del Bitcoin en tiempo real. Script simple hecho con python. Rollercoin RollerCoin es un juego en el que puedes ganar bitcoin (y

biyivi 3 Jan 04, 2022
A Python library for the Docker Engine API

Docker SDK for Python A Python library for the Docker Engine API. It lets you do anything the docker command does, but from within Python apps – run c

Docker 6.1k Jan 03, 2023
Modern, privacy-friendly, and detailed web analytics that works without cookies or JS.

Modern, privacy-friendly, and cookie-free web analytics. Getting started » Screenshots • Features • Office Hours Motivation There are a lot of web ana

R. Miles McCain 2.1k Jan 03, 2023
Yes, it's true :purple_heart: This repository has 353 stars.

Yes, it's true! Inspired by a similar repository from @RealPeha, but implemented using a webhook on AWS Lambda and API Gateway, so it's serverless! If

510 Dec 28, 2022
Most Powerful Chatbot On Telegram Bot

About Hello, I am Lycia [リュキア], An Intelligent ChatBot. If You Are Feeling Lonely, You can Always Come to me and Chat With Me! How To Host The easiest

RedAura 8 May 26, 2021
Web3 Ethereum DeFi toolkit for smart contracts, Uniswap and PancakeSwap trades, Ethereum JSON-RPC utilities, wallets and automated test suites.

Web3 Ethereum Defi This project contains common Ethereum smart contracts and utilities, for trading, wallets,automated test suites and backend integra

Trading Strategy 222 Jan 04, 2023
A bot to view Garfield comics directly from Discord and get updates of the comics automatically

Garfield-Bot A bot to view Garfield comics directly from Discord and get updates of the comics automatically. Instructions to use the bot: Invite the

Raghav Sharma 3 Feb 13, 2022
Yok bentar lagi update Premium :( DI FOLLOW YA GUYS

SIMBF + PREMIUM PRINTAH PENGINSTALAN ON TERMUX $ pkg update && upgrade $ termux-setup-storage $ pkg install python $ pkg install git $ pip install bs4

Jeeck 21 Jan 14, 2022
A simple Telegram bot that can add caption to any media on your channel

Channel Auto Caption This bot can add a caption for any media/document sent to a channel. Just deploy bot and add bot as admin to a channel. Deploy to

22 Nov 14, 2022
Discord.py Bot Series With Python

Discord.py Bot Series YouTube Playlist: https://www.youtube.com/playlist?list=PL9nZZVP3OGOAx2S75YdBkrIbVpiSL5oc5 Installation pip install -r requireme

Step 1 Dec 17, 2021
an API to check if a url or IP address is safe or phishing

an API to check if a url or IP address is safe or phishing. Using a ML model. The API created using FastAPI.

Adel Dahani 1 Feb 16, 2022
Telegram bot using python

Telegram bot using python

Masha Kubyshina 1 Oct 11, 2021
8300-account-nuker - A simple accoutn nuker or can use it full closing dm and leaving server

8300 ACCOUNT NUKER VERISON: its just simple accoutn nuker or can use it full clo

†† 5 Jan 26, 2022
Discord bot to administer IITD Study Servers (unofficial)

IITD-Bot Discord bot to administer IITD'20 Acad Server Commands hello to check if bot is online ?help to display this message ?set kerberos to set y

Aditya Singh 47 Dec 19, 2022
(unofficial) Googletrans: Free and Unlimited Google translate API for Python. Translates totally free of charge.

Googletrans Googletrans is a free and unlimited python library that implemented Google Translate API. This uses the Google Translate Ajax API to make

Suhun Han 3.2k Jan 04, 2023
A Matrix-Instagram DM puppeting bridge

mautrix-instagram A Matrix-Instagram DM puppeting bridge. Documentation All setup and usage instructions are located on docs.mau.fi. Some quick links:

89 Dec 14, 2022
Demonstrate how GitHub OIDC token getting should be included in boto3

boto3 should add direct support for AssumeRoleWithWebIdentity for GitHub Actions There is a aws-actions/configure-aws-credentials action that will get

Ben Kehoe 11 Aug 29, 2022
TORNADO CASH Proxy Pancakeswap Sniper BOT 2022-V1 (MAC WINDOWS ANDROID LINUX)

TORNADO CASH Pancakeswap Sniper BOT 2022-V1 (MAC WINDOWS ANDROID LINUX) ⭐️ A ful

Crypto Trader 1 Jan 06, 2022
twtxt is a decentralised, minimalist microblogging service for hackers.

twtxt twtxt is a decentralised, minimalist microblogging service for hackers. So you want to get some thoughts out on the internet in a convenient and

buckket 1.8k Jan 09, 2023
Elkeid HUB - A rule/event processing engine maintained by the Elkeid Team that supports streaming/offline data processing

Elkeid HUB - A rule/event processing engine maintained by the Elkeid Team that supports streaming/offline data processing

Bytedance Inc. 61 Dec 29, 2022