wikirepo is a Python package that provides a framework to easily source and leverage standardized Wikidata information

Overview

rtd ci codecov pyversions pypi pypistatus license coc codestyle

Python based Wikidata framework for easy dataframe extraction

wikirepo is a Python package that provides a framework to easily source and leverage standardized Wikidata information. The goal is to create an intuitive interface so that Wikidata can function as a common read-write repository for public statistics.

Contents

Installation

wikirepo can be downloaded from PyPI via pip or sourced directly from this repository:

pip install wikirepo
git clone https://github.com/andrewtavis/wikirepo.git
cd wikirepo
python setup.py install
import wikirepo

Data

wikirepo's data structure is built around Wikidata.org. Human-readable access to Wikidata statistics is achieved through converting requests into Wikidata's Quantity IDs (QIDs) and Property IDs (PIDs), with the Python package wikidata serving as a basis for data loading and indexing. See the documentation for a structured overview of the currently available properties.

Query Data

wikirepo's main access function, wikirepo.data.query, returns a pandas.DataFrame of locations and property data across time.

Each query needs the following inputs:

  • locations: the locations that data should be queried for
    • Strings are accepted for Earth, continents, and countries
    • Get all country names with wikirepo.data.incl_lctn_lbls(lctn_lvls='country')
    • The user can also pass Wikidata QIDs directly
  • depth: the geographic level of the given locations to query
    • A depth of 0 is the locations themselves
    • Greater depths correspond to lower geographic levels (states of countries, etc.)
    • A dictionary of locations is generated for lower depths (see second example below)
  • timespan: start and end datetime.date objects defining when data should come from
    • If not provided, then the most recent data will be retrieved with annotation for when it's from
  • interval: yearly, monthly, weekly, or daily as strings
  • Further arguments: the names of modules in wikirepo/data directories
    • These are passed to arguments corresponding to their directories
    • Data will be queried for these properties for the given locations, depth, timespan and interval, with results being merged as dataframe columns

Queries are also able to access information in Wikidata sub-pages for locations. For example: if inflation rate is not found on the location's main page, then wikirepo checks the location's economic topic page as inflation_rate.py is found in wikirepo/data/economic (see Germany and economy of Germany).

wikirepo further provides a unique dictionary class, EntitiesDict, that stores all loaded Wikidata entities during a query. This speeds up data retrieval, as entities are loaded once and then accessed in the EntitiesDict object for any other needed properties.

Examples of wikirepo.data.query follow:

Querying Information for Given Countries

import wikirepo
from wikirepo.data import wd_utils
from datetime import date

ents_dict = wd_utils.EntitiesDict()
# Strings must match their Wikidata English page names
countries = ["Germany", "United States of America", "People's Republic of China"]
# countries = ["Q183", "Q30", "Q148"] # we could also pass QIDs
# data.incl_lctn_lbls(lctn_lvls='country') # or all countries`
depth = 0
timespan = (date(2009, 1, 1), date(2010, 1, 1))
interval = "yearly"

df = wikirepo.data.query(
    ents_dict=ents_dict,
    locations=countries,
    depth=depth,
    timespan=timespan,
    interval=interval,
    climate_props=None,
    demographic_props=["population", "life_expectancy"],
    economic_props="median_income",
    electoral_poll_props=None,
    electoral_result_props=None,
    geographic_props=None,
    institutional_props="human_dev_idx",
    political_props="executive",
    misc_props=None,
    verbose=True,
)

col_order = [
    "location",
    "qid",
    "year",
    "executive",
    "population",
    "life_exp",
    "human_dev_idx",
    "median_income",
]
df = df[col_order]

df.head(6)
location qid year executive population life_exp human_dev_idx median_income
Germany Q183 2010 Angela Merkel 8.1752e+07 79.9878 0.921 33333
Germany Q183 2009 Angela Merkel nan 79.8366 0.917 nan
United States of America Q30 2010 Barack Obama 3.08746e+08 78.5415 0.914 43585
United States of America Q30 2009 George W. Bush nan 78.3902 0.91 nan
People's Republic of China Q148 2010 Wen Jiabao 1.35976e+09 75.236 0.706 nan
People's Republic of China Q148 2009 Wen Jiabao nan 75.032 0.694 nan

Querying Information for all US Counties

# Note: >3000 regions, expect a 45 minute runtime
import wikirepo
from wikirepo.data import lctn_utils, wd_utils
from datetime import date

ents_dict = wd_utils.EntitiesDict()
country = "United States of America"
# country = "Q30" # we could also pass its QID
depth = 2  # 2 for counties, 1 for states and territories
sub_lctns = True  # for all
# Only valid sub-locations given the timespan will be queried
timespan = (date(2016, 1, 1), date(2018, 1, 1))
interval = "yearly"

us_counties_dict = lctn_utils.gen_lctns_dict(
    ents_dict=ents_dict,
    locations=country,
    depth=depth,
    sub_lctns=sub_lctns,
    timespan=timespan,
    interval=interval,
    verbose=True,
)

df = wikirepo.data.query(
    ents_dict=ents_dict,
    locations=us_counties_dict,
    depth=depth,
    timespan=timespan,
    interval=interval,
    climate_props=None,
    demographic_props="population",
    economic_props=None,
    electoral_poll_props=None,
    electoral_result_props=None,
    geographic_props="area",
    institutional_props="capital",
    political_props=None,
    misc_props=None,
    verbose=True,
)

df[df["population"].notnull()].head(6)
location sub_lctn sub_sub_lctn qid year population area_km2 capital
United States of America California Alameda County Q107146 2018 1.6602e+06 2127 Oakland
United States of America California Contra Costa County Q108058 2018 1.14936e+06 2078 Martinez
United States of America California Marin County Q108117 2018 263886 2145 San Rafael
United States of America California Napa County Q108137 2018 141294 2042 Napa
United States of America California San Mateo County Q108101 2018 774155 1919 Redwood City
United States of America California Santa Clara County Q110739 2018 1.9566e+06 3377 San Jose

Upload Data (WIP)

wikirepo.data.upload will be the core of the eventual wikirepo upload feature. The goal is to record edits that a user makes to a previously queried or baseline dataframe such that these changes can then be pushed back to Wikidata. With the addition of Wikidata login credentials as a wikirepo feature (WIP), the unique information in the edited dataframe could then be uploaded to Wikidata for all to use.

The same process used to query information from Wikidata could be reversed for the upload process. Dataframe columns could be linked to their corresponding Wikidata properties, whether the time qualifiers are a point in time or spans using start time and end time could be derived through the defined variables in the module header, and other necessary qualifiers for proper data indexing could also be included. Source information could also be added in corresponding columns to the given property edits.

Pseudocode for how this process could function follows:

In the first example, changes are made to a df.copy() of a queried dataframe. pandas is then used to compare the new and original dataframes after the user has added information that they have access to.

import wikirepo
from wikirepo.data import lctn_utils, wd_utils
from datetime import date

credentials = wd_utils.login()

ents_dict = wd_utils.EntitiesDict()
country = "Country Name"
depth = 2
sub_lctns = True
timespan = (date(2000,1,1), date(2018,1,1))
interval = 'yearly'

lctns_dict = lctn_utils.gen_lctns_dict()

df = wikirepo.data.query()
df_copy = df.copy()

# The user checks for NaNs and adds data

df_edits = pd.concat([df, df_copy]).drop_duplicates(keep=False)

wikirepo.data.upload(df_edits, credentials)

In the next example data.data_utils.gen_base_df is used to create a dataframe with dimensions that match a time series that the user has access to. The data is then added to the column that corresponds to the property to which it should be added. Source information could further be added via a structured dictionary generated for the user.

import wikirepo
from wikirepo.data import data_utils, wd_utils
from datetime import date

credentials = wd_utils.login()

locations = "Country Name"
depth = 0
# The user defines the time parameters based on their data
timespan = (date(1995,1,2), date(2010,1,2)) # (first Monday, last Sunday)
interval = 'weekly'

base_df = data_utils.gen_base_df()
base_df['data'] = data_for_matching_time_series

source_data = wd_utils.gen_source_dict('Source Information')
base_df['data_source'] = [source_data] * len(base_df)

wikirepo.data.upload(base_df, credentials)

Put simply: a full featured wikirepo.data.upload function would realize the potential of a single read-write repository for all public information.

Maps (WIP)

wikirepo/maps is a further goal of the project, as it combines wikirepo's focus on easy to access open source data and quick high level analytics.

Query Maps

As in wikirepo.data.query, passing the locations, depth, timespan and interval arguments could access GeoJSON files stored on Wikidata, thus providing mapping files in parallel to the user's data. These files could then be leveraged using existing Python plotting libraries to provide detailed presentations of geographic analysis.

Upload Maps

Similar to the potential of adding statistics through wikirepo.data.upload, GeoJSON map files could also be uploaded to Wikidata using appropriate arguments. The potential exists for a myriad of variable maps given locations, depth, timespan and interval information that would allow all wikirepo users to get the exact mapping file that they need for their given task.

Examples

wikirepo can be used as a foundation for countless projects, with its usefulness and practicality only improving as more properties are added and more data is uploaded to Wikidata.

Current usage examples include:

  • Sample notebooks for the Python package poli-sci-kit show how to use wikirepo as a basis for political election and parliamentary appointment analysis, with those notebooks being found in the examples for poli-sci-kit or on Google Colab
  • Pull requests with other examples will gladly be accepted

To-Do

Please see the contribution guidelines if you are interested in contributing to this project. Work that is in progress or could be implemented includes:

Expanding wikirepo

  • Creating an outline of the package's structure for the readme (see issue)

  • Integrating current Python tools with wikirepo structures for uploads to Wikidata

  • Adding a query of property descriptions to data.data_utils.incl_dir_idxs (see issue)

  • Adding multiprocessing support to the wikirepo.data.query process and data.lctn_utils.gen_lctns_dict

  • Potentially converting wikirepo.data.query and data.lctn_utils.gen_lctns_dict over to generated Wikidata SPARQL queries

  • Optimizing wikirepo.data.query:

    • Potentially converting EntitiesDict and LocationsDict to slotted object classes for memory savings
    • Deriving and optimizing other slow parts of the query process
  • Adding access to GeoJSON files for mapping via wikirepo.maps.query

  • Designing and adding GeoJSON files indexed by time properties to Wikidata

  • Creating, improving and sharing examples

  • Improving tests for greater code coverage

  • Improving code quality by refactoring large functions and checking conventions

Expanding Wikidata

The growth of wikirepo's database relies on that of Wikidata. Through data.wd_utils.dir_to_topic_page wikirepo can access properties on location sub-pages, thus allowing for statistics on any topic to be linked to. Beyond including entries for already existing properties (see this issue), the following are examples of property types that could be added:

  • Climate statistics could be added to data/climate

    • This would allow for easy modeling of global warming and its effects
    • Planning would be needed for whether lower intervals would be necessary, or just include daily averages
  • Those for electoral polling and results for locations

    • This would allow direct access to all needed election information in a single function call
  • A property that links political parties and their regions in data/political

    • For easy professional presentation of electoral results (ex: loading in party hex colors, abbreviations, and alignments)
  • data/demographic properties such as:

    • age, education, religious, and linguistic diversities across time
  • data/economic properties such as:

    • female workforce participation, workforce industry diversity, wealth diversity, and total working age population across time
  • Distinct properties for Freedom House and Press Freedom indexes, as well as other descriptive metrics

Similar Projects

Python

JavaScript

Java

Powered By


Wikimedia           Wikibase           Wikidata
Comments
  • Create concise requirement and env files

    Create concise requirement and env files

    This issue is for creating concise versions of requirements.txt and environment.yml for wikirepo. It would be great if these files were created by hand with specific version numbers or generated in a way so that sub-dependencies don't always need to be updated.

    As of now both files are being created with the following commands in the package's conda virtual environment:

    pip list --format=freeze > requirements.txt  
    conda env export --no-builds | grep -v "^prefix: " > environment.yml
    

    wikirepo and other obviously unneeded packages are then removed from these files before being uploaded.

    Any insights or help would be much appreciated!

    help wanted good first issue question 
    opened by andrewtavis 7
  • Remove unused packages in requirements

    Remove unused packages in requirements

    Hello, This is to follow-up issue https://github.com/andrewtavis/wikirepo/issues/17.

    Please review~

    And about setup.py, is there some purpose to use graph package, such as matplotlib and seaborn?

    opened by kination 2
  • Bump aiohttp from 3.7.3 to 3.7.4

    Bump aiohttp from 3.7.3 to 3.7.4

    Bumps aiohttp from 3.7.3 to 3.7.4.

    Changelog

    Sourced from aiohttp's changelog.

    3.7.4 (2021-02-25)

    Bugfixes

    • (SECURITY BUG) Started preventing open redirects in the aiohttp.web.normalize_path_middleware middleware. For more details, see https://github.com/aio-libs/aiohttp/security/advisories/GHSA-v6wp-4m6f-gcjg.

      Thanks to Beast Glatisant <https://github.com/g147>__ for finding the first instance of this issue and Jelmer Vernooij <https://jelmer.uk/>__ for reporting and tracking it down in aiohttp. [#5497](https://github.com/aio-libs/aiohttp/issues/5497) <https://github.com/aio-libs/aiohttp/issues/5497>_

    • Fix interpretation difference of the pure-Python and the Cython-based HTTP parsers construct a yarl.URL object for HTTP request-target.

      Before this fix, the Python parser would turn the URI's absolute-path for //some-path into / while the Cython code preserved it as //some-path. Now, both do the latter. [#5498](https://github.com/aio-libs/aiohttp/issues/5498) <https://github.com/aio-libs/aiohttp/issues/5498>_


    Commits
    • 0a26acc Bump aiohttp to v3.7.4 for a security release
    • 021c416 Merge branch 'ghsa-v6wp-4m6f-gcjg' into master
    • 4ed7c25 Bump chardet from 3.0.4 to 4.0.0 (#5333)
    • b61f0fd Fix how pure-Python HTTP parser interprets //
    • 5c1efbc Bump pre-commit from 2.9.2 to 2.9.3 (#5322)
    • 0075075 Bump pygments from 2.7.2 to 2.7.3 (#5318)
    • 5085173 Bump multidict from 5.0.2 to 5.1.0 (#5308)
    • 5d1a75e Bump pre-commit from 2.9.0 to 2.9.2 (#5290)
    • 6724d0e Bump pre-commit from 2.8.2 to 2.9.0 (#5273)
    • c688451 Removed duplicate timeout parameter in ClientSession reference docs. (#5262) ...
    • See full diff in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 1
  • Bump lxml from 4.6.2 to 4.6.3

    Bump lxml from 4.6.2 to 4.6.3

    Bumps lxml from 4.6.2 to 4.6.3.

    Changelog

    Sourced from lxml's changelog.

    4.6.3 (2021-03-21)

    Bugs fixed

    • A vulnerability (CVE-2021-28957) was discovered in the HTML Cleaner by Kevin Chung, which allowed JavaScript to pass through. The cleaner now removes the HTML5 formaction attribute.
    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 1
  • [ImgBot] Optimize images

    [ImgBot] Optimize images

    Beep boop. Your images are optimized!

    Your image file size has been reduced by 45% 🎉

    Details

    | File | Before | After | Percent reduction | |:--|:--|:--|:--| | /resources/wikirepo_logo_transparent.png | 171.28kb | 76.11kb | 55.56% | | /resources/gh_images/wikidata_logo.png | 26.59kb | 16.87kb | 36.56% | | /resources/wikirepo_logo.png | 150.90kb | 96.30kb | 36.18% | | /resources/gh_images/wikibase_logo.png | 20.41kb | 14.64kb | 28.30% | | | | | | | Total : | 369.18kb | 203.92kb | 44.76% |


    Black Lives Matter | 💰 donate | 🎓 learn | ✍🏾 sign

    📝 docs | :octocat: repo | 🙋🏾 issues | 🏅 swag | 🏪 marketplace

    opened by imgbot[bot] 1
  • Bump aiohttp from 3.7.3 to 3.7.4

    Bump aiohttp from 3.7.3 to 3.7.4

    Bumps aiohttp from 3.7.3 to 3.7.4.

    Changelog

    Sourced from aiohttp's changelog.

    3.7.4 (2021-02-25)

    Bugfixes

    • (SECURITY BUG) Started preventing open redirects in the aiohttp.web.normalize_path_middleware middleware. For more details, see https://github.com/aio-libs/aiohttp/security/advisories/GHSA-v6wp-4m6f-gcjg.

      Thanks to Beast Glatisant <https://github.com/g147>__ for finding the first instance of this issue and Jelmer Vernooij <https://jelmer.uk/>__ for reporting and tracking it down in aiohttp. [#5497](https://github.com/aio-libs/aiohttp/issues/5497) <https://github.com/aio-libs/aiohttp/issues/5497>_

    • Fix interpretation difference of the pure-Python and the Cython-based HTTP parsers construct a yarl.URL object for HTTP request-target.

      Before this fix, the Python parser would turn the URI's absolute-path for //some-path into / while the Cython code preserved it as //some-path. Now, both do the latter. [#5498](https://github.com/aio-libs/aiohttp/issues/5498) <https://github.com/aio-libs/aiohttp/issues/5498>_


    Commits
    • 0a26acc Bump aiohttp to v3.7.4 for a security release
    • 021c416 Merge branch 'ghsa-v6wp-4m6f-gcjg' into master
    • 4ed7c25 Bump chardet from 3.0.4 to 4.0.0 (#5333)
    • b61f0fd Fix how pure-Python HTTP parser interprets //
    • 5c1efbc Bump pre-commit from 2.9.2 to 2.9.3 (#5322)
    • 0075075 Bump pygments from 2.7.2 to 2.7.3 (#5318)
    • 5085173 Bump multidict from 5.0.2 to 5.1.0 (#5308)
    • 5d1a75e Bump pre-commit from 2.9.0 to 2.9.2 (#5290)
    • 6724d0e Bump pre-commit from 2.8.2 to 2.9.0 (#5273)
    • c688451 Removed duplicate timeout parameter in ClientSession reference docs. (#5262) ...
    • See full diff in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 1
  • Create package structure outline

    Create package structure outline

    wikirepo as a project has many modules that interconnect and are funneled to two functions - wikirepo.data.query and lctn_utils.gen_lctns_dict. It would be helpful for users and potential contributors to have a visual representation of the package that details the overarching structure and the purpose of various components. This outline could then be added to the readme in the To-Do section, potentially in a drop down.

    An initial test of this could be as simple as a directory outline that has a bit more detail about the given components - say by using *, **, †, ‡ and other symbols to indicate where a description could be found.

    A discussion of how to best present the package structure is more than welcome, and contributions would further be very appreciated!

    documentation good first issue question 
    opened by andrewtavis 0
  • Suggest properties for wikirepo

    Suggest properties for wikirepo

    Please use this issue to suggest Wikidata properties that could be added to wikirepo. With the suggestion it would be great to get the following:

    • The link to the property page on Wikidata
    • A suggestion of which category (demographic, economic, etc) the property should go into
    • [Optional] how the query script should be written (see examples/add_property to make suggestions for how the module header should be structured)

    Accepted property suggestions would then be converted to good first issues for wikirepo. Pull requests with new properties following the process of examples/add_property would also gladly be accepted! Documentation could also be done fur such issues or PRs, or could also be a separate issue.

    Thanks for your interest in supporting this project :)

    good first issue question 
    opened by andrewtavis 2
  • Add descriptions to data.data_utils.incl_dir_idxs

    Add descriptions to data.data_utils.incl_dir_idxs

    The function data.data_utils.incl_dir_idxs is how a user can find what indexes are available for a given type of data - demographic, economic, etc. It would be great if data.data_utils.incl_dir_idxs would have an option to also provide a description for the index. This could be directly queried from Wikidata.

    enhancement good first issue 
    opened by andrewtavis 0
Releases(v1.0.0)
  • v1.0.0(Dec 28, 2021)

  • v0.1.1.5(Mar 28, 2021)

    Changes include:

    • An src structure has been adopted for easier testing and to fix wheel distribution issues
    • Code quality is now checked with Codacy
    • Extensive code formatting to improve quality and style
    • Fixes to vulnerabilities through exception use
    Source code(tar.gz)
    Source code(zip)
  • v0.1.0(Feb 23, 2021)

    First stable release of wikirepo

    Changes include:

    • Full documentation of the package

    • Virtual environment files

    • Bug fixes

    • Extensive testing of all modules with GH Actions and Codecov

    • Code of conduct and contribution guidelines

    Source code(tar.gz)
    Source code(zip)
  • v0.0.2(Dec 8, 2020)

    The minimum viable product of wikirepo:

    • Users are able to query data from Wikidata given locations, depth, time_lvl, and timespan arguments

    • String arguments are accepted for Earth, continents, countries and disputed territories

    • Data for greater depths can be retrieved by creating a dictionary given initial starting locations and going to greater depths using the contains administrative territorial entity property

    • Data is formatted and loaded into a pandas dataframe for further manipulation

    • All available social science properties on Wikidata have had modules created for them

    • Estimated load times and progress are given

    • The project's scope and general roadmap have been defined and detailed in the README

    Source code(tar.gz)
    Source code(zip)
Owner
Andrew Tavis McAllister
Data scientist focussing on NLP, causal inference and recommendation engines. Humboldt University of Berlin (MS); University of Oregon (BA).
Andrew Tavis McAllister
PyPDC is a Python package for calculating asymptotic Partial Directed Coherence estimations for brain connectivity analysis.

Python asymptotic Partial Directed Coherence and Directed Coherence estimation package for brain connectivity analysis. Free software: MIT license Doc

Heitor Baldo 3 Nov 26, 2022
Data collection, enhancement, and metrics calculation.

l3_data_collection Data collection, enhancement, and metrics calculation. Summary Repository containing code for QuantDAO's JDT data collection task.

Ruiwyn 3 Dec 23, 2022
Mining the Stack Overflow Developer Survey

Mining the Stack Overflow Developer Survey A prototype data mining application to compare the accuracy of decision tree and random forest regression m

1 Nov 16, 2021
Exploring the Top ML and DL GitHub Repositories

This repository contains my work related to my project where I scraped data on the most popular machine learning and deep learning GitHub repositories in order to further visualize and analyze it.

Nico Van den Hooff 17 Aug 21, 2022
Common bioinformatics database construction

biodb Common bioinformatics database construction 1.taxonomy (Substance classification database) Download the database wget -c https://ftp.ncbi.nlm.ni

sy520 2 Jan 04, 2022
ETL flow framework based on Yaml configs in Python

ETL framework based on Yaml configs in Python A light framework for creating data streams. Setting up streams through configuration in the Yaml file.

Павел Максимов 18 Jul 06, 2022
For making Tagtog annotation into csv dataset

tagtog_relation_extraction for making Tagtog annotation into csv dataset How to Use On Tagtog 1. Go to Project Downloads 2. Download all documents,

hyeong 4 Dec 28, 2021
Pipeline to convert a haploid assembly into diploid

HapDup (haplotype duplicator) is a pipeline to convert a haploid long read assembly into a dual diploid assembly. The reconstructed haplotypes

Mikhail Kolmogorov 50 Jan 05, 2023
Analysis of a dataset of 10000 passwords to find common trends and mistakes people generally make while setting up a password.

Analysis of a dataset of 10000 passwords to find common trends and mistakes people generally make while setting up a password.

Aryan Raj 7 Sep 04, 2022
Hue Editor: Open source SQL Query Assistant for Databases/Warehouses

Hue Editor: Open source SQL Query Assistant for Databases/Warehouses

Cloudera 759 Jan 07, 2023
We're Team Arson and we're using the power of predictive modeling to combat wildfires.

We're Team Arson and we're using the power of predictive modeling to combat wildfires. Arson Map Inspiration There’s been a lot of wildfires in Califo

Jerry Lee 3 Oct 17, 2021
My first Python project is a simple Mad Libs program.

Python CLI Mad Libs Game My first Python project is a simple Mad Libs program. Mad Libs is a phrasal template word game created by Leonard Stern and R

Carson Johnson 1 Dec 10, 2021
Feature engineering and machine learning: together at last

Feature engineering and machine learning: together at last! Lambdo is a workflow engine which significantly simplifies data analysis by unifying featu

Alexandr Savinov 14 Sep 15, 2022
A data analysis using python and pandas to showcase trends in school performance.

A data analysis using python and pandas to showcase trends in school performance. A data analysis to showcase trends in school performance using Panda

Jimmy Faccioli 0 Sep 07, 2021
An ETL framework + Monitoring UI/API (experimental project for learning purposes)

Fastlane An ETL framework for building pipelines, and Flask based web API/UI for monitoring pipelines. Project structure fastlane |- fastlane: (ETL fr

Dan Katz 2 Jan 06, 2022
Developed for analyzing the covariance for OrcVIO

about This repo is developed for analyzing the covariance for OrcVIO environment setup platform ubuntu 18.04 using conda conda env create --file envir

Sean 1 Dec 08, 2021
WithPipe is a simple utility for functional piping in Python.

A utility for functional piping in Python that allows you to access any function in any scope as a partial.

Michael Milton 1 Oct 26, 2021
Python package to transfer data in a fast, reliable, and packetized form.

pySerialTransfer Python package to transfer data in a fast, reliable, and packetized form.

PB2 101 Dec 07, 2022
MDAnalysis is a Python library to analyze molecular dynamics simulations.

MDAnalysis Repository README [*] MDAnalysis is a Python library for the analysis of computer simulations of many-body systems at the molecular scale,

MDAnalysis 933 Dec 28, 2022
Data imputations library to preprocess datasets with missing data

Impyute is a library of missing data imputation algorithms. This library was designed to be super lightweight, here's a sneak peak at what impyute can do.

Elton Law 329 Dec 05, 2022