wikirepo is a Python package that provides a framework to easily source and leverage standardized Wikidata information

Last update: Jan 04, 2023

Overview

Python based Wikidata framework for easy dataframe extraction

wikirepo is a Python package that provides a framework to easily source and leverage standardized Wikidata information. The goal is to create an intuitive interface so that Wikidata can function as a common read-write repository for public statistics.

Installation
Data
Maps (WIP)
Examples
To-Do

Installation `↩`

wikirepo can be downloaded from PyPI via pip or sourced directly from this repository:

pip install wikirepo

git clone https://github.com/andrewtavis/wikirepo.git
cd wikirepo
python setup.py install

import wikirepo

Data `↩`

wikirepo's data structure is built around Wikidata.org. Human-readable access to Wikidata statistics is achieved through converting requests into Wikidata's Quantity IDs (QIDs) and Property IDs (PIDs), with the Python package wikidata serving as a basis for data loading and indexing. See the documentation for a structured overview of the currently available properties.

Query Data

wikirepo's main access function, wikirepo.data.query, returns a pandas.DataFrame of locations and property data across time.

Each query needs the following inputs:

locations: the locations that data should be queried for
- Strings are accepted for Earth, continents, and countries
- Get all country names with wikirepo.data.incl_lctn_lbls(lctn_lvls='country')
- The user can also pass Wikidata QIDs directly
depth: the geographic level of the given locations to query
- A depth of 0 is the locations themselves
- Greater depths correspond to lower geographic levels (states of countries, etc.)
- A dictionary of locations is generated for lower depths (see second example below)
timespan: start and end datetime.date objects defining when data should come from
- If not provided, then the most recent data will be retrieved with annotation for when it's from
interval: yearly, monthly, weekly, or daily as strings
Further arguments: the names of modules in wikirepo/data directories
- These are passed to arguments corresponding to their directories
- Data will be queried for these properties for the given locations, depth, timespan and interval, with results being merged as dataframe columns

Queries are also able to access information in Wikidata sub-pages for locations. For example: if inflation rate is not found on the location's main page, then wikirepo checks the location's economic topic page as inflation_rate.py is found in wikirepo/data/economic (see Germany and economy of Germany).

wikirepo further provides a unique dictionary class, EntitiesDict, that stores all loaded Wikidata entities during a query. This speeds up data retrieval, as entities are loaded once and then accessed in the EntitiesDict object for any other needed properties.

Examples of wikirepo.data.query follow:

Querying Information for Given Countries

import wikirepo
from wikirepo.data import wd_utils
from datetime import date

ents_dict = wd_utils.EntitiesDict()
# Strings must match their Wikidata English page names
countries = ["Germany", "United States of America", "People's Republic of China"]
# countries = ["Q183", "Q30", "Q148"] # we could also pass QIDs
# data.incl_lctn_lbls(lctn_lvls='country') # or all countries`
depth = 0
timespan = (date(2009, 1, 1), date(2010, 1, 1))
interval = "yearly"

df = wikirepo.data.query(
    ents_dict=ents_dict,
    locations=countries,
    depth=depth,
    timespan=timespan,
    interval=interval,
    climate_props=None,
    demographic_props=["population", "life_expectancy"],
    economic_props="median_income",
    electoral_poll_props=None,
    electoral_result_props=None,
    geographic_props=None,
    institutional_props="human_dev_idx",
    political_props="executive",
    misc_props=None,
    verbose=True,
)

col_order = [
    "location",
    "qid",
    "year",
    "executive",
    "population",
    "life_exp",
    "human_dev_idx",
    "median_income",
]
df = df[col_order]

df.head(6)

location	qid	year	executive	population	life_exp	human_dev_idx	median_income
Germany	Q183	2010	Angela Merkel	8.1752e+07	79.9878	0.921	33333
Germany	Q183	2009	Angela Merkel	nan	79.8366	0.917	nan
United States of America	Q30	2010	Barack Obama	3.08746e+08	78.5415	0.914	43585
United States of America	Q30	2009	George W. Bush	nan	78.3902	0.91	nan
People's Republic of China	Q148	2010	Wen Jiabao	1.35976e+09	75.236	0.706	nan
People's Republic of China	Q148	2009	Wen Jiabao	nan	75.032	0.694	nan

Querying Information for all US Counties

# Note: >3000 regions, expect a 45 minute runtime
import wikirepo
from wikirepo.data import lctn_utils, wd_utils
from datetime import date

ents_dict = wd_utils.EntitiesDict()
country = "United States of America"
# country = "Q30" # we could also pass its QID
depth = 2  # 2 for counties, 1 for states and territories
sub_lctns = True  # for all
# Only valid sub-locations given the timespan will be queried
timespan = (date(2016, 1, 1), date(2018, 1, 1))
interval = "yearly"

us_counties_dict = lctn_utils.gen_lctns_dict(
    ents_dict=ents_dict,
    locations=country,
    depth=depth,
    sub_lctns=sub_lctns,
    timespan=timespan,
    interval=interval,
    verbose=True,
)

df = wikirepo.data.query(
    ents_dict=ents_dict,
    locations=us_counties_dict,
    depth=depth,
    timespan=timespan,
    interval=interval,
    climate_props=None,
    demographic_props="population",
    economic_props=None,
    electoral_poll_props=None,
    electoral_result_props=None,
    geographic_props="area",
    institutional_props="capital",
    political_props=None,
    misc_props=None,
    verbose=True,
)

df[df["population"].notnull()].head(6)

location	sub_lctn	sub_sub_lctn	qid	year	population	area_km2	capital
United States of America	California	Alameda County	Q107146	2018	1.6602e+06	2127	Oakland
United States of America	California	Contra Costa County	Q108058	2018	1.14936e+06	2078	Martinez
United States of America	California	Marin County	Q108117	2018	263886	2145	San Rafael
United States of America	California	Napa County	Q108137	2018	141294	2042	Napa
United States of America	California	San Mateo County	Q108101	2018	774155	1919	Redwood City
United States of America	California	Santa Clara County	Q110739	2018	1.9566e+06	3377	San Jose

Upload Data (WIP)

wikirepo.data.upload will be the core of the eventual wikirepo upload feature. The goal is to record edits that a user makes to a previously queried or baseline dataframe such that these changes can then be pushed back to Wikidata. With the addition of Wikidata login credentials as a wikirepo feature (WIP), the unique information in the edited dataframe could then be uploaded to Wikidata for all to use.

The same process used to query information from Wikidata could be reversed for the upload process. Dataframe columns could be linked to their corresponding Wikidata properties, whether the time qualifiers are a point in time or spans using start time and end time could be derived through the defined variables in the module header, and other necessary qualifiers for proper data indexing could also be included. Source information could also be added in corresponding columns to the given property edits.

Pseudocode for how this process could function follows:

In the first example, changes are made to a df.copy() of a queried dataframe. pandas is then used to compare the new and original dataframes after the user has added information that they have access to.

import wikirepo
from wikirepo.data import lctn_utils, wd_utils
from datetime import date

credentials = wd_utils.login()

ents_dict = wd_utils.EntitiesDict()
country = "Country Name"
depth = 2
sub_lctns = True
timespan = (date(2000,1,1), date(2018,1,1))
interval = 'yearly'

lctns_dict = lctn_utils.gen_lctns_dict()

df = wikirepo.data.query()
df_copy = df.copy()

# The user checks for NaNs and adds data

df_edits = pd.concat([df, df_copy]).drop_duplicates(keep=False)

wikirepo.data.upload(df_edits, credentials)

In the next example data.data_utils.gen_base_df is used to create a dataframe with dimensions that match a time series that the user has access to. The data is then added to the column that corresponds to the property to which it should be added. Source information could further be added via a structured dictionary generated for the user.

import wikirepo
from wikirepo.data import data_utils, wd_utils
from datetime import date

credentials = wd_utils.login()

locations = "Country Name"
depth = 0
# The user defines the time parameters based on their data
timespan = (date(1995,1,2), date(2010,1,2)) # (first Monday, last Sunday)
interval = 'weekly'

base_df = data_utils.gen_base_df()
base_df['data'] = data_for_matching_time_series

source_data = wd_utils.gen_source_dict('Source Information')
base_df['data_source'] = [source_data] * len(base_df)

wikirepo.data.upload(base_df, credentials)

Put simply: a full featured wikirepo.data.upload function would realize the potential of a single read-write repository for all public information.

Maps (WIP) `↩`

wikirepo/maps is a further goal of the project, as it combines wikirepo's focus on easy to access open source data and quick high level analytics.

Query Maps

As in wikirepo.data.query, passing the locations, depth, timespan and interval arguments could access GeoJSON files stored on Wikidata, thus providing mapping files in parallel to the user's data. These files could then be leveraged using existing Python plotting libraries to provide detailed presentations of geographic analysis.

Upload Maps

Similar to the potential of adding statistics through wikirepo.data.upload, GeoJSON map files could also be uploaded to Wikidata using appropriate arguments. The potential exists for a myriad of variable maps given locations, depth, timespan and interval information that would allow all wikirepo users to get the exact mapping file that they need for their given task.

Examples `↩`

wikirepo can be used as a foundation for countless projects, with its usefulness and practicality only improving as more properties are added and more data is uploaded to Wikidata.

Current usage examples include:

Sample notebooks for the Python package poli-sci-kit show how to use wikirepo as a basis for political election and parliamentary appointment analysis, with those notebooks being found in the examples for poli-sci-kit or on Google Colab
Pull requests with other examples will gladly be accepted

To-Do `↩`

Please see the contribution guidelines if you are interested in contributing to this project. Work that is in progress or could be implemented includes:

Expanding wikirepo

Creating an outline of the package's structure for the readme (see issue)
Integrating current Python tools with wikirepo structures for uploads to Wikidata
Adding a query of property descriptions to data.data_utils.incl_dir_idxs (see issue)
Adding multiprocessing support to the wikirepo.data.query process and data.lctn_utils.gen_lctns_dict
Potentially converting wikirepo.data.query and data.lctn_utils.gen_lctns_dict over to generated Wikidata SPARQL queries
Optimizing wikirepo.data.query:
- Potentially converting EntitiesDict and LocationsDict to slotted object classes for memory savings
- Deriving and optimizing other slow parts of the query process
Adding access to GeoJSON files for mapping via wikirepo.maps.query
Designing and adding GeoJSON files indexed by time properties to Wikidata
Creating, improving and sharing examples
Improving tests for greater code coverage
Improving code quality by refactoring large functions and checking conventions

Expanding Wikidata

The growth of wikirepo's database relies on that of Wikidata. Through data.wd_utils.dir_to_topic_page wikirepo can access properties on location sub-pages, thus allowing for statistics on any topic to be linked to. Beyond including entries for already existing properties (see this issue), the following are examples of property types that could be added:

Climate statistics could be added to data/climate
- This would allow for easy modeling of global warming and its effects
- Planning would be needed for whether lower intervals would be necessary, or just include daily averages
Those for electoral polling and results for locations
- This would allow direct access to all needed election information in a single function call
A property that links political parties and their regions in data/political
- For easy professional presentation of electoral results (ex: loading in party hex colors, abbreviations, and alignments)
data/demographic properties such as:
- age, education, religious, and linguistic diversities across time
data/economic properties such as:
- female workforce participation, workforce industry diversity, wealth diversity, and total working age population across time
Distinct properties for Freedom House and Press Freedom indexes, as well as other descriptive metrics
- These could be added to data/institutional

Similar Projects

Python

JavaScript

Java

https://github.com/Wikidata/Wikidata-Toolkit

Powered By

Comments

Create concise requirement and env files
This issue is for creating concise versions of requirements.txt and environment.yml for wikirepo. It would be great if these files were created by hand with specific version numbers or generated in a way so that sub-dependencies don't always need to be updated.

As of now both files are being created with the following commands in the package's conda virtual environment:

pip list --format=freeze > requirements.txt conda env export --no-builds | grep -v "^prefix: " > environment.yml

wikirepo and other obviously unneeded packages are then removed from these files before being uploaded.

Any insights or help would be much appreciated!
help wanted good first issue question
opened by andrewtavis 7
Remove unused packages in requirements

Hello, This is to follow-up issue https://github.com/andrewtavis/wikirepo/issues/17.

Please review~

And about setup.py, is there some purpose to use graph package, such as matplotlib and seaborn?

opened by kination 2
Bump aiohttp from 3.7.3 to 3.7.4
Bumps aiohttp from 3.7.3 to 3.7.4.

Changelog

Sourced from aiohttp's changelog.

3.7.4 (2021-02-25)

Bugfixes

(SECURITY BUG) Started preventing open redirects in the aiohttp.web.normalize_path_middleware middleware. For more details, see https://github.com/aio-libs/aiohttp/security/advisories/GHSA-v6wp-4m6f-gcjg.

Thanks to Beast Glatisant <https://github.com/g147>__ for finding the first instance of this issue and Jelmer Vernooĳ <https://jelmer.uk/>__ for reporting and tracking it down in aiohttp. [#5497](https://github.com/aio-libs/aiohttp/issues/5497) <https://github.com/aio-libs/aiohttp/issues/5497>_

Fix interpretation difference of the pure-Python and the Cython-based HTTP parsers construct a yarl.URL object for HTTP request-target.

Before this fix, the Python parser would turn the URI's absolute-path for //some-path into / while the Cython code preserved it as //some-path. Now, both do the latter. [#5498](https://github.com/aio-libs/aiohttp/issues/5498) <https://github.com/aio-libs/aiohttp/issues/5498>_

Commits

0a26acc Bump aiohttp to v3.7.4 for a security release

021c416 Merge branch 'ghsa-v6wp-4m6f-gcjg' into master

4ed7c25 Bump chardet from 3.0.4 to 4.0.0 (#5333)

b61f0fd Fix how pure-Python HTTP parser interprets //

5c1efbc Bump pre-commit from 2.9.2 to 2.9.3 (#5322)

0075075 Bump pygments from 2.7.2 to 2.7.3 (#5318)

5085173 Bump multidict from 5.0.2 to 5.1.0 (#5308)

5d1a75e Bump pre-commit from 2.9.0 to 2.9.2 (#5290)

6724d0e Bump pre-commit from 2.8.2 to 2.9.0 (#5273)

c688451 Removed duplicate timeout parameter in ClientSession reference docs. (#5262) ...

See full diff in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies
opened by dependabot[bot] 1
Bump lxml from 4.6.2 to 4.6.3
Bumps lxml from 4.6.2 to 4.6.3.

Changelog

Sourced from lxml's changelog.

4.6.3 (2021-03-21)

Bugs fixed

A vulnerability (CVE-2021-28957) was discovered in the HTML Cleaner by Kevin Chung, which allowed JavaScript to pass through. The cleaner now removes the HTML5 formaction attribute.

Commits

a5f9cb5 Prepare release of lxml 4.6.3.

2d01a1b Add HTML-5 "formaction" attribute to "defs.link_attrs" (GH-316)

e986a9c Fix reference in docs.

See full diff in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies
opened by dependabot[bot] 1
[ImgBot] Optimize images

Beep boop. Your images are optimized!

Your image file size has been reduced by 45% 🎉

Details

| File | Before | After | Percent reduction | |:--|:--|:--|:--| | /resources/wikirepo_logo_transparent.png | 171.28kb | 76.11kb | 55.56% | | /resources/gh_images/wikidata_logo.png | 26.59kb | 16.87kb | 36.56% | | /resources/wikirepo_logo.png | 150.90kb | 96.30kb | 36.18% | | /resources/gh_images/wikibase_logo.png | 20.41kb | 14.64kb | 28.30% | | | | | | | Total : | 369.18kb | 203.92kb | 44.76% |

Black Lives Matter | 💰 donate | 🎓 learn | ✍🏾 sign

📝 docs | :octocat: repo | 🙋🏾 issues | 🏅 swag | 🏪 marketplace

opened by imgbot[bot] 1
Bump aiohttp from 3.7.3 to 3.7.4
Bumps aiohttp from 3.7.3 to 3.7.4.

Changelog

Sourced from aiohttp's changelog.

3.7.4 (2021-02-25)

Bugfixes

(SECURITY BUG) Started preventing open redirects in the aiohttp.web.normalize_path_middleware middleware. For more details, see https://github.com/aio-libs/aiohttp/security/advisories/GHSA-v6wp-4m6f-gcjg.

Thanks to Beast Glatisant <https://github.com/g147>__ for finding the first instance of this issue and Jelmer Vernooĳ <https://jelmer.uk/>__ for reporting and tracking it down in aiohttp. [#5497](https://github.com/aio-libs/aiohttp/issues/5497) <https://github.com/aio-libs/aiohttp/issues/5497>_

Fix interpretation difference of the pure-Python and the Cython-based HTTP parsers construct a yarl.URL object for HTTP request-target.

Before this fix, the Python parser would turn the URI's absolute-path for //some-path into / while the Cython code preserved it as //some-path. Now, both do the latter. [#5498](https://github.com/aio-libs/aiohttp/issues/5498) <https://github.com/aio-libs/aiohttp/issues/5498>_

Commits

0a26acc Bump aiohttp to v3.7.4 for a security release

021c416 Merge branch 'ghsa-v6wp-4m6f-gcjg' into master

4ed7c25 Bump chardet from 3.0.4 to 4.0.0 (#5333)

b61f0fd Fix how pure-Python HTTP parser interprets //

5c1efbc Bump pre-commit from 2.9.2 to 2.9.3 (#5322)

0075075 Bump pygments from 2.7.2 to 2.7.3 (#5318)

5085173 Bump multidict from 5.0.2 to 5.1.0 (#5308)

5d1a75e Bump pre-commit from 2.9.0 to 2.9.2 (#5290)

6724d0e Bump pre-commit from 2.8.2 to 2.9.0 (#5273)

c688451 Removed duplicate timeout parameter in ClientSession reference docs. (#5262) ...

See full diff in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies
opened by dependabot[bot] 1
Create package structure outline

wikirepo as a project has many modules that interconnect and are funneled to two functions - wikirepo.data.query and lctn_utils.gen_lctns_dict. It would be helpful for users and potential contributors to have a visual representation of the package that details the overarching structure and the purpose of various components. This outline could then be added to the readme in the To-Do section, potentially in a drop down.

An initial test of this could be as simple as a directory outline that has a bit more detail about the given components - say by using *, **, †, ‡ and other symbols to indicate where a description could be found.

A discussion of how to best present the package structure is more than welcome, and contributions would further be very appreciated!
documentation good first issue question

opened by andrewtavis 0
Suggest properties for wikirepo
Please use this issue to suggest Wikidata properties that could be added to wikirepo. With the suggestion it would be great to get the following:

The link to the property page on Wikidata

A suggestion of which category (demographic, economic, etc) the property should go into

[Optional] how the query script should be written (see examples/add_property to make suggestions for how the module header should be structured)

Accepted property suggestions would then be converted to good first issues for wikirepo. Pull requests with new properties following the process of examples/add_property would also gladly be accepted! Documentation could also be done fur such issues or PRs, or could also be a separate issue.

Thanks for your interest in supporting this project :)
good first issue question
opened by andrewtavis 2
Add descriptions to data.data_utils.incl_dir_idxs

The function data.data_utils.incl_dir_idxs is how a user can find what indexes are available for a given type of data - demographic, economic, etc. It would be great if data.data_utils.incl_dir_idxs would have an option to also provide a description for the index. This could be directly queried from Wikidata.
enhancement good first issue

opened by andrewtavis 0

Releases(v1.0.0)

v1.0.0(Dec 28, 2021)
Release switches wikirepo over to semantic versioning and indicates that it is stable

Source code(tar.gz)
Source code(zip)
v0.1.1.5(Mar 28, 2021)
Changes include:

An src structure has been adopted for easier testing and to fix wheel distribution issues

Code quality is now checked with Codacy

Extensive code formatting to improve quality and style

Fixes to vulnerabilities through exception use

Source code(tar.gz)
Source code(zip)
v0.1.0(Feb 23, 2021)
First stable release of wikirepo

Changes include:

Full documentation of the package

Virtual environment files

Bug fixes

Extensive testing of all modules with GH Actions and Codecov

Code of conduct and contribution guidelines

Source code(tar.gz)
Source code(zip)
v0.0.2(Dec 8, 2020)
The minimum viable product of wikirepo:

Users are able to query data from Wikidata given locations, depth, time_lvl, and timespan arguments

String arguments are accepted for Earth, continents, countries and disputed territories

Data for greater depths can be retrieved by creating a dictionary given initial starting locations and going to greater depths using the contains administrative territorial entity property

Data is formatted and loaded into a pandas dataframe for further manipulation

All available social science properties on Wikidata have had modules created for them

Estimated load times and progress are given

The project's scope and general roadmap have been defined and detailed in the README

Source code(tar.gz)
Source code(zip)

Owner

Andrew Tavis McAllister

Data scientist focussing on NLP, causal inference and recommendation engines. Humboldt University of Berlin (MS); University of Oregon (BA).

GitHub Repository

Python implementation of Principal Component Analysis

Principal Component Analysis Principal Component Analysis (PCA) is a dimension-reduction algorithm. The idea is to use the singular value decompositio

1 Nov 06, 2021

Exploratory Data Analysis of the 2019 Indian General Elections using a dataset from Kaggle.

2019-indian-election-eda Exploratory Data Analysis of the 2019 Indian General Elections using a dataset from Kaggle. This project is a part of the Cou

5 Oct 10, 2022

Accurately separate the TLD from the registered domain and subdomains of a URL, using the Public Suffix List.

tldextract Python Module tldextract accurately separates the gTLD or ccTLD (generic or country code top-level domain) from the registered domain and s

1.6k Jan 03, 2023

An interactive grid for sorting, filtering, and editing DataFrames in Jupyter notebooks

qgrid Qgrid is a Jupyter notebook widget which uses SlickGrid to render pandas DataFrames within a Jupyter notebook. This allows you to explore your D

2.9k Jan 08, 2023

Random dataframe and database table generator

Random database/dataframe generator Authored and maintained by Dr. Tirthajyoti Sarkar, Fremont, USA Introduction Often, beginners in SQL or data scien

249 Jan 08, 2023

This repo contains a simple but effective tool made using python which can be used for quality control in statistical approach.

📈 Statistical Quality Control 📉 This repo contains a simple but effective tool made using python which can be used for quality control in statistica

8 Oct 18, 2022

A tax calculator for stocks and dividends activities.

Revolut Stocks calculator for Bulgarian National Revenue Agency Information Processing and calculating the required information about stock possession

200 Oct 25, 2022

Synthetic data need to preserve the statistical properties of real data in terms of their individual behavior and (inter-)dependences

Synthetic data need to preserve the statistical properties of real data in terms of their individual behavior and (inter-)dependences. Copula and functional Principle Component Analysis (fPCA) are st

32 Dec 20, 2022

Elementary is an open-source data reliability framework for modern data teams. The first module of the framework is data lineage.

Data lineage made simple, reliable, and automated. Effortlessly track the flow of data, understand dependencies and analyze impact. Features Visualiza

898 Jan 09, 2023

A probabilistic programming language in TensorFlow. Deep generative models, variational inference.

Edward is a Python library for probabilistic modeling, inference, and criticism. It is a testbed for fast experimentation and research with probabilis

4.7k Jan 09, 2023

Approximate Nearest Neighbor Search for Sparse Data in Python!

Approximate Nearest Neighbor Search for Sparse Data in Python! This library is well suited to finding nearest neighbors in sparse, high dimensional spaces (like text documents).

906 Jan 01, 2023

Geospatial data-science analysis on reasons behind delay in Grab ride-share services

Grab x Pulis Detailed analysis done to investigate possible reasons for delay in Grab services for NUS Data Analytics Competition 2022, to be found in

6 Jun 07, 2022

Amundsen is a metadata driven application for improving the productivity of data analysts, data scientists and engineers when interacting with data.

3.7k Jan 03, 2023

wikirepo is a Python package that provides a framework to easily source and leverage standardized Wikidata information

Related tags

Overview

Python based Wikidata framework for easy dataframe extraction

Contents

Installation ↩

Data ↩

Query Data

Querying Information for Given Countries

Querying Information for all US Counties

Upload Data (WIP)

Maps (WIP) ↩

Query Maps

Upload Maps

Examples ↩

To-Do ↩

Expanding wikirepo

Expanding Wikidata

Similar Projects

Powered By

Comments

3.7.4 (2021-02-25)

Bugfixes

4.6.3 (2021-03-21)

Bugs fixed

Beep boop. Your images are optimized!

3.7.4 (2021-02-25)

Bugfixes

Releases(v1.0.0)

v1.0.0(Dec 28, 2021)

v0.1.1.5(Mar 28, 2021)

v0.1.0(Feb 23, 2021)

v0.0.2(Dec 8, 2020)