Wikidata scholarly profiles

Last update: Jan 03, 2023

Overview

Scholia is a python package and webapp for interaction with scholarly information in Wikidata.

Webapp

As a webapp, it currently runs from Wikimedia Toolforge, a facility provided by the Wikimedia Foundation. It is accessible from

https://scholia.toolforge.org/

The webapp displays scholarly profiles for individual researchers, research topics, organizations, journals, works, events, awards and so on. For instance, the scholarly profile for psychologist Uta Frith is accessible from

https://scholia.toolforge.org/author/Q8219

The information displayed on the page is only what is available in Wikidata.

Script

It is possible to use methods of the scholia package as a script:

$ python -m scholia.query twitter-to-q fnielsen
Q20980928

Contributing

A simple way to get up and running is to launch Scholia via Gitpod, which installs the dependencies listed in requirements.txt automatically and launches the web app via runserver.py.

See file CONTRIBUTING.rst for technical details on how to improve Scholia.

References

Scholia's page about itself: https://scholia.toolforge.org/topic/Q45340488
Wikidata overview page about Scholia: https://www.wikidata.org/wiki/Wikidata:Scholia
Lane Rasberry, Egon Willighagen, Finn Nielsen, Daniel Mietchen, "Robustifying Scholia: paving the way for knowledge discovery and research assessment through Wikidata. Research Ideas and Outcomes", 2019, RIO Journal, 5: e35820. https://doi.org/10.3897/rio.5.e35820
Finn Årup Nielsen, Daniel Mietchen, Egon Willighagen, "Scholia and scientometrics with Wikidata", Joint Proceedings of the 1st International Workshop on Scientometrics and 1st International Workshop on Enabling Decentralised Scholarly Communication, 2017. http://ceur-ws.org/Vol-1878/article-03.pdf
Finn Årup Nielsen, Daniel Mietchen, Egon Willighagen, "Scholia, Scientometrics and Wikidata", The Semantic Web: ESWC 2017 Satellite Events, 2017. DOI: 10.1007/978-3-319-70407-4_36. https://link.springer.com/content/pdf/10.1007%2F978-3-319-70407-4_36.pdf

Comments

Creating a configuration file in Scholia

Creating a configuration file in Scholia to make the data sources (SPARQL EndPoint) configurable. This is very important to allow caching servers in an easy and transparent way. An added benefit is the fact that we can now support new data sources (SPARQL end-points) in an easy way.

I would appreciate comments, especially on the variable names of the config file and in the way they were set (3 servers).

opened by nunogit 27
Simplify running Scholia locally
This PR does the following to make it easier to develop Scholia locally:

Adds documentation in the README on how to install Scholia locally

Adds an entrypoint in the setup.py file so you can run scholia from the shell instead of python -m scholia

Adds a run command to the CLI that mimics runserver.py
opened by cthoyt 19
Better or fix issue template: bug report and feature request is not shown

#1603 introduced an issue template. In the directory https://github.com/WDscholia/scholia/tree/master/.github/ISSUE_TEMPLATE there are three files, but on the GitHub issue page there is only shown the question line while neither but report nor feature requests are not shown.

I would have expected that a but report and a feature request option was available on https://github.com/WDscholia/scholia/issues/new/choose
bug documentation

opened by fnielsen 16
On Scholia landing page, provide some overview stats about Wikidata and scholarly publications in it
e.g. number of triples in Wikidata

SELECT (count(*) as ?counts) WHERE { ?s ?p ?o . }

and some WikiCite-focused ones, e.g. as per this list

or some version of http://wikicite.org/statistics.html .
LandingPage stats P50-author P2860-cites P496-ORCID P2093-author-name-string P225-taxon-name P356-DOI P921-main-subject P932-PMCID P625-geolocation P108-employer P1416-affiliation P166-award-received
opened by Daniel-Mietchen 14
check and externalize explicit sparql queries #1284
related #1283, #1282, #785

Externalize explicit sparql queries

Converting the explicit queries (tables) first to externalized query format. For example, files who have the following queries:

somePanelDescriptionSparql = ` SELECT ... ... `

This PR solves partially the issue #1284.

The following files still have explicit queries:

[x] 404_chemical.html

[x] author.html

[x] author-index-curation.html

[x] authors.html

[x] award_curation.html

[x] chemical-index-curation.html

[x] chemical-index.html

[x] lexeme_empty.html

[x] pathway_empty.html

[x] pathway.html

[x] property.html

[x] software_empty.html

[x] topic_curation.html

[x] use_empty.html

[x] use.html

[x] venue_curation.html

[x] venues.html

[x] work_cito.html

[x] work_empty.html

[x] works.html

Some annotations

Files with empty will use the aspect as index but the name of html file will be the same for the moment. For example, use_empty.html have an aspect named use-index. The name of these html will be change in other PR

Externalization for author.html, pathway.html and 404_chemical.html required the creation of new macros and custom JS functions for extracting queries to external .sparql files
opened by curibe 13
Chemical classes are special and the regular chemical aspect does not…

… work well. The new page looks like (except that "Related compound" in the screenshot is "Example compounds", capped at 500).

@fnielsen, please do take note of the change regarding figuring out what feature to show... for this patch I had to change the logic: if something cannot be types (P31), then it needs to determine the things the item is subclass of (more expensive) which is run only when no suitable aspect was found...
enhancement

opened by egonw 13
Add CiTO panels to work/venue aspect (using ask query)
Close #1610, also start using ask queries (#617) and add a way to hide panels (close #741)

Description

Please include a summary of the change, relevant motivation and context. If possible and applicable, include before and after screenshots and a URL where the changes can be seen.

On a venue (Q4775205) without CiTO

On a venue (Q6294930) with CiTO

On a work with CiTO the highlight panel is added

Created a macro and a JS function to handle "ask" queries

As we need to pass the panels which should be loaded in the success case, I made use of a call function, which calls the macro with another macro:

{% call ask_query_callback('cito') %} {{ sparql_to_iframe('articles-by-intention') }} {{ sparql_to_iframe('incoming-bubble') }} {{ sparql_to_table('incoming', options={ "linkPrefixes": { "intention": "../../cito/" } } ) }} {{ sparql_to_iframe('outgoing-bubble') }} {{ sparql_to_table('outgoing', options={ "linkPrefixes": { "intention": "../../cito/" } } ) }} {{ sparql_to_table('most-reused-articles', options={ "linkPrefixes": { "citedArticle": "../../work/" }, "linkSuffixes": { "citedArticle": "/cito" }, } ) }} {% endcall %}

The macro takes the panel parameter 'cito' and passes that to askQuery as shown below. The body (all of the {{ sparql_to..}} statements) are passed in the callback function with {{ caller() }}

{% macro ask_query_callback(panel) -%} // {{ panel }} ask query askQuery("{{ panel }}", `# tool: scholia {% include 'ask_' + aspect + '_' + panel + '.sparql' %}`, () => { {{ caller() }}; }); {%- endmacro %}

The JS function is generic, and takes the ask query (that jinja includes from a file), a panel name (which is used to show/hide the panels) and the callback function (which is the result of the sparql_to_iframe, sparql_to_table macros)

function askQuery(panel, askQuery, callback) { var endpointUrl = 'https://query.wikidata.org/sparql'; settings = { headers: { Accept: 'application/sparql-results+json' }, data: { query: askQuery }, }; $.ajax(endpointUrl, settings).then((data) => { if (data.boolean) { // unhide panels document.getElementById(panel).classList.remove("d-none"); callback(); } else { // hide from table of contents var headings = document.querySelectorAll("#" + panel + " h2, #" + panel + " h3"); for (var elem of headings) { document.querySelector("li a[href='#" + elem.id + "']").parentElement.classList.add("d-none") } } }); }

Move the venue/work CiTO panels to the venue/work page

Remove the /cito route and page

Caveats

Please list anything which has been left out of this PR or which should be considered before this PR is accepted Check any of the following which apply:

[x] Breaking change (fix or feature that would cause existing functionality to not work as expected)

Removes the /cito route

[x] This change requires a documentation update

[ ] I have made corresponding changes to the documentation

I've documented above, but not sure if there is a better place to note this behaviour

[ ] This change requires new dependencies (please list)

if you make changes to the python code

[ ] my code passes tox check, you can receive warnings about tests, documentation or both

Testing

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration

Checked that the panels showed with the correct information on a venue with CiTO (Q6294930)

Checked that the requests aren't performed and the panels are hidden on a venue without CiTO (Q4775205)

Checked a work with CiTO (Q21090124) and without (Q21090025)

Checklist

[ ] I have commented my code, particularly in hard-to-understand areas

[x] My changes generate no new warnings

[x] I have not used code from external sources without attribution

[x] I have considered accessibility in my implementation

[x] There are no remaining debug statements (print, console.log, ...)

ready for merge
opened by carlinmack 11
Externalize SPARQL queries into templates
Reference #791 (original issue) and references #906 (concrete solution proposal)

I haven't marked this PR as closing either issue because there are still many other HTML templates that would need to be modified, so consider this PR as a pilot that can be easily followed by one or more PRs to address the remaining HTML templates.

Changes

Move SPARQL strings in HTML templates into dedicated SPARQL templates

Update README with explanation for new contributors on how to use templating
opened by cthoyt 11
Adds a chemistry/missing page for curation
with column on the /chemical/ frontpage

adds a few more example chemical structures

I expect more patches later for more physchem properties from @lahire.
aspects ready for merge
opened by egonw 11
Display ORCID iDs

In the "Prolific authors" section, please display the subject's ORCID iD (P496), if available.

If not available, please include a "search ORCID" link, formatted like https://orcid.org/orcid-search/quick-search?searchQuery=Andy+Mabbett, so that volunteers can more easily find the iDs and add them to Wikidata
aspects JavaScript SPARQL examples P496-ORCID

opened by pigsonthewing 11
Add Bioschemas by proxying Wikidata content (making Google bots happy)
@fnielsen, this is now a finished patch, but if you have additional ideas, plz let me know (if not, please do merge in).

The new design solve the problem of the robots.txt blocking calls and limiting the SEO indexing of Scholia pages:

a Scholia proxy is defined with the URL pattern /$qid/bioschemas which returns JSON

the existing base.html uses this new call instead of a call to wikidata.org (with the robots.txt problem)

The extra call is only made when the aspect template has an id=bioschemas holder.

Possibly future optimization:

[x] property_for_q() calls are replaced by a single properties_for_q(q, {"P235": "key1", "Pxx": "key2", ...})

[x] 2-3 helper functions get added to simplify the code

[x] other bits of the page get included in a similar way (like descriptions), making it also available for SEO

[x] use the Wikidata description as Bioschemas content

ready for review
opened by egonw 10
Get user data fails for Google Scholar

Describe the bug Get user data fails for Google Scholar To Reproduce Steps to reproduce the behavior: python -m py.test --doctest-modules scholia/googlescholar.py

or

python -m scholia.googlescholar get-user-data 9cagBQYAAAAJ

Expected behavior No error. Data should be returned.

Additional context This also fails with tox.
bug

opened by fnielsen 2
Vejhistorie OJS journal is not scraped correctly
Describe the bug Vejhistorie OJS journal is not scraped correctly

To Reproduce Steps to reproduce the behavior:

$ python -m scholia.scrape.ojs issue-url-to-quickstatements https://tidsskrift.dk/vejhistorie/issue/view/9914 CREATE LAST P31 Q13442814 LAST P856 "https://tidsskrift.dk/vejhistorie/article/view/135395"

Expected behavior Output of more metadata

Additional context There does not seem to be meta tags in the HTML for this issue.
bug OJS
opened by fnielsen 0
New property for crystal structures, new statistics
Description

Small patch: when calculating the number of crystal structures, compounds with a CSD Refcode can be counted too. This property was accepted this week.

Caveats

Potentially, a more complex makes the query not run fast enough, but this does not seem to be the case (not noticeable ).

Testing

Visit https://scholia.toolforge.org/chemical/ (before/after)

Checklist

[ ] I have commented my code, particularly in hard-to-understand areas

[x] My changes generate no new warnings

[ x I have not used code from external sources without attribution

[x] I have considered accessibility in my implementation

[x] There are no remaining debug statements (print, console.log, ...)
opened by egonw 0
Panel for the author curation page to list articles that are not used as reference for any statement
Fixes #2213

Description

This patch adds a panel to an author curation page, listing works for that author that do not support any statements.

Caveats

No caveats I can foresee.

Testing

Test is with any author with multiple articles. The output should look something like this:

Checklist

[ ] I have commented my code, particularly in hard-to-understand areas

[x] My changes generate no new warnings

[x] I have not used code from external sources without attribution

[ ] I have considered accessibility in my implementation

[x] There are no remaining debug statements (print, console.log, ...)
opened by egonw 8
only 0.67% of the articles in Wikidata are used to support statements

On the Telegram channel for Wikidata, they looked into how many articles are actually used as reference to support a claim. That turned out to be 263,247. Or, 0.67%.

So, another curation people can do around an author (e.g. by an author) is to use their articles as "citation" in statements.
data-quality

opened by egonw 0
Fixing an issue and implements a feature request around linking versions of papers
Fixes #1597 and fixes #1886

Description

The first (oldest) patch fixes the problem reported in both bug reports. The second (newer) patch implements the suggestion reported in the https://github.com/WDscholia/scholia/issues/1597#issuecomment-898766620 comment (and the similar for retractions).

Caveats

There are no code changes.

Testing

Suggested to test the following pages with various situations, which before/after sometimes differs, demonstrating what is fixed:

https://scholia.toolforge.org/work/Q24613508

https://scholia.toolforge.org/work/Q24564615

https://scholia.toolforge.org/work/Q114679534

https://scholia.toolforge.org/work/Q102319086

https://scholia.toolforge.org/work/Q102092244

Checklist

[ ] I have commented my code, particularly in hard-to-understand areas

[x] My changes generate no new warnings

[x] I have not used code from external sources without attribution

[ ] I have considered accessibility in my implementation

[x] There are no remaining debug statements (print, console.log, ...)
opened by egonw 0

Releases(v0.3)

v0.3(Nov 30, 2021)

November 2021 version in relation to the end of the Robustifying Scholia project.
Source code(tar.gz)
Source code(zip)
v0.2(Nov 28, 2019)

November 2019 version in relation to the midpoint for the Robustifying Scholia project.
Source code(tar.gz)
Source code(zip)
v0.1(Apr 19, 2019)

April 2019 version in relation to the start of the Robustifying Scholia project.

Of new aspects are, e.g., sponsor, printer, event, event-series, location, country, clinical trial, project, ...
Source code(tar.gz)
Source code(zip)
nielsen2017scholia(Mar 13, 2017)

Release for a publication.
Source code(tar.gz)
Source code(zip)

Owner

Finn Årup Nielsen

Data science. Data and text mining, neuroinformatics, social media, wiki.

GitHub Repository https://scholia.toolforge.org

Wikidata scholarly profiles

Scholia is a python package and webapp for interaction with scholarly information in Wikidata. Webapp As a webapp, it currently runs from Wikimedia To

181 Jan 03, 2023

Small and highly customizable twin-panel file manager for Linux with support for plugins.

Note: Prefered repository hosting is GitLab. If you don't have an account there and don't wish to make one interacting with one on GitHub is fine. Sun

407 Dec 29, 2022

The official source code repository for the calibre ebook manager

calibre calibre is an e-book manager. It can view, convert, edit and catalog e-books in all of the major e-book formats. It can also talk to e-book re

14.1k Dec 27, 2022

A simple shared budget manager web application

I hate money I hate money is a web application made to ease shared budget management. It keeps track of who bought what, when, and for whom; and helps

829 Dec 31, 2022

SENAITE Meta Package

SENAITE LIMS Meta Installation Package What does SENAITE mean? SENAITE is a beautiful trigonal, oil-green to greenish black crystal, with almost the h

135 Dec 14, 2022

A Python library to manage ACBF ebooks.

libacbf A Python library to read and edit ACBF formatted comic book files and archives. XML Specifications here: https://acbf.fandom.com/wiki/Advanced

0 Nov 09, 2021

cherrytree

CherryTree A hierarchical note taking application, featuring rich text and syntax highlighting, storing data in a single XML or SQLite file. The proje

2.7k Jan 08, 2023

🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...

14.8k Jan 05, 2023

A collection of self-contained and well-documented issues for newcomers to start contributing with

fedora-easyfix A collection of self-contained and well-documented issues for newcomers to start contributing with How to setup the local development e

8 Oct 16, 2021

Find duplicate files

dupeGuru dupeGuru is a cross-platform (Linux, OS X, Windows) GUI tool to find duplicate files in a system. It is written mostly in Python 3 and has th

3.3k Jan 04, 2023

WikidPad is a single user desktop wiki

What is WikidPad? WikidPad is a Wiki-like notebook for storing your thoughts, ideas, todo lists, contacts, or anything else you can think of to write

176 Dec 14, 2022

Automatic Movie Downloading via NZBs & Torrents

CouchPotato CouchPotato (CP) is an automatic NZB and torrent downloader. You can keep a "movies I want"-list and it will search for NZBs/torrents of t

3.9k Jan 04, 2023

A time tracking application

GTimeLog GTimeLog is a simple app for keeping track of time. Contents Installing Documentation Resources Credits Installing GTimeLog is packaged for D

224 Nov 28, 2022

Open source platform for the machine learning lifecycle

MLflow: A Machine Learning Lifecycle Platform MLflow is a platform to streamline machine learning development, including tracking experiments, packagi

13.3k Jan 04, 2023

One webpage for every book ever published!

Open Library Open Library is an open, editable library catalog, building towards a web page for every book ever published. Are you looking to get star

4k Jan 08, 2023

Scan, index, and archive all of your paper documents

[ en | de | el ] Important news about the future of this project It's been more than 5 years since I started this project on a whim as an effort to tr

7.8k Jan 06, 2023

ProPublica's collaborative tip-gathering framework. Import and manage CSV, Google Sheets and Screendoor data with ease.

Collaborate This is a web application for managing and building stories based on tips solicited from the public. This project is meant to be easy to s

86 Oct 18, 2022

Main repository of the zim desktop wiki project

Zim - A Desktop Wiki Editor Zim is a graphical text editor used to maintain a collection of wiki pages. Each page can contain links to other pages, si

1.6k Dec 30, 2022

The open-source core of Pinry, a tiling image board system for people who want to save, tag, and share images, videos and webpages in an easy to skim through format.

The open-source core of Pinry, a tiling image board system for people who want to save, tag, and share images, videos and webpages in an easy to skim

2.7k Jan 08, 2023

Free and open-source digital preservation system designed to maintain standards-based, long-term access to collections of digital objects.

Archivematica By Artefactual Archivematica is a web- and standards-based, open-source application which allows your institution to preserve long-term

338 Dec 16, 2022

Wikidata scholarly profiles

Related tags

Overview

Webapp

Script

Contributing

References

Comments

Externalize explicit sparql queries

Some annotations

Description

Caveats

Testing

Checklist

Changes

Description

Caveats

Testing

Checklist

Description

Caveats

Testing

Checklist

Description

Caveats

Testing

Checklist

Releases(v0.3)

v0.3(Nov 30, 2021)

v0.2(Nov 28, 2019)

v0.1(Apr 19, 2019)

nielsen2017scholia(Mar 13, 2017)

Owner

Finn Årup Nielsen

Wikidata scholarly profiles

Small and highly customizable twin-panel file manager for Linux with support for plugins.

The official source code repository for the calibre ebook manager

A simple shared budget manager web application

SENAITE Meta Package

A Python library to manage ACBF ebooks.

cherrytree

🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...

A collection of self-contained and well-documented issues for newcomers to start contributing with

Find duplicate files

WikidPad is a single user desktop wiki

Automatic Movie Downloading via NZBs & Torrents

A time tracking application

Open source platform for the machine learning lifecycle

One webpage for every book ever published!

Scan, index, and archive all of your paper documents

ProPublica's collaborative tip-gathering framework. Import and manage CSV, Google Sheets and Screendoor data with ease.

Main repository of the zim desktop wiki project

The open-source core of Pinry, a tiling image board system for people who want to save, tag, and share images, videos and webpages in an easy to skim through format.

Free and open-source digital preservation system designed to maintain standards-based, long-term access to collections of digital objects.