I³ Tracker for Essential Open Innovation Datasets

Overview

I³ Tracker for Essential Open Innovation Datasets

This repository is set up to track, version, and contribute updates to the I³ Essential Open Innovation Dataset Index, which consists of lists of datasets and tools relevant to Innovation Data. This index may be collaboratively edited, either by making edits to markdown files contained in this repository, or editing metadata in the Google Sheet.

The repository checks the Google Sheet for changes every 5min (and will update the site if there are any), and will also re-build the site automatically when somebody makes an edit via git. The site is generated from markdown files in this repository using the static site generator Jekyll.

Add/edit a Dataset using Git

Each record in the index has a corresponding markdown file (auto-generated) in the folder datasets/. These files contain the basic metadata associated with the record in the frontmatter, and also allow more long-form information, such as details of queries, images, and other written information, to be added. Both of these things are editable.

When a markdown file is added to the datasets/ folder, a GitHub action publishes the metadata in the frontmatter to the Google Sheet, and to the archive csv, to keep the records up to date. This script calls various metadata scrapers to automatically pull information like permalinks, citation information, and versioning. Once the file has been successfully committed, a second action will run to refresh the state of the website to reflect the edits.

Contribution Steps

  1. fork the repository, create a markdown in the folder 'datasets' based on the template file
  2. add as much metadata as you like, and create a pull request in this repository
  3. all being well, this should automatically merge. if not, you can check the GitHub actions log, or open an issue. (make sure it's in the correct folder, and has a .md file extension before doing so)

If the dataset is hosted on a platform with parseable citation metadata (Dataverse, Zenodo, ICPSR, and major university repositories are examples of these), then the tool will automatically pull most of the data associated with the dataset -- fields that will auto-fill are indicated by a comment. If the dataset is hosted on e.g. a personal site, then you might want to include some more information -- but ultimately, only a title and URL is really necessary. However you fill out your dataset, a uuid and timestamp will be generated for it automatically; these aren't fields you need to include (hence not included in the template).

The reason we've done this is to save you from copy-pasting a lot of information from existing repositories, and to make it easier for you to curate more useful and harder-to-scrape metadata -- such as the timeframes of datasets, links to code and documentation, and datasets that might be built on top of it but don't use an easy-to-parse citation. So definitely prioritise these fields!

If you're unsure of how to make a pull request, github has some good guides to doing this. You can also just make an edit to the Google Sheet, which will have an equivalent effect.

If there's a piece of metadata you think we should collect but don't, please add it to the frontmatter of the markdown files you contribute. (Nothing will break!) Then open an issue mentioning the new field, so that we can discuss adding it to the repository officially too.

To contribute a new dataset via pull request, please use the template file datasets/_template.md as a reference:

---
title: #required
url: #required
doi: #scrapeable
citation: #scrapeable
description: #scrapeable
timeframe:
documentation:
error_metrics:
code:
versioning: #scrapeable
terms_of_use: #scrapeable
tags:
references:
---


body text. info about `queries`, links and images goes here :)

Collections

The site also indexes collections, which are pages containing thematic information about datasets, tools and resources. These are housed in the folder collections/. The collection intro.md is an example -- this particular collection is also rendered on the front page of the site.

In the same manner as datasets, collection files can be added or edited using pull requests, where the repository is forked, and additions or edits to the collections can be made. The collections are not currently tracked via Google Sheets, and so may only be edited via git.

To create a new collection, the collection template may be copied to use as a reference:

---
title:
author:
tags:
---

Collections are a way to list resources around a theme, relevant to a research agenda or set of papers, or as an introduction to various aspects of the field. They are formatted in markdown:

To list a dataset that's in the index, use a relative link, e.g.

```markdown
[local dataset name](/datasets/dataset_shortname)

Dataset shortnames can be found either by looking at the urls directly, or through the 'shortnames' column of the Google Sheet.

Index

A versioned .csv file containing the index may be accessed in the folder index_archive. If you'd like to browse and query either sheet, you can do so using Github's Flat Data tool here. The Github Action that pulls the sheet is based on Dolthub's Gsheets-to-csv action.

Screenshot 2021-07-13 at 13 35 49

A Notifier Program that Notifies you to relax your eyes Every 15 Minutes👀

Every 15 Minutes is an application that is used to Notify you to Relax your eyes Every 15 Minutes, This is fully made with Python and also with the us

FSP Gang s' YT 2 Nov 11, 2021
A Non profit app built on top of Frappe framework & ERPNext

Non Profit A Non profit app built on top of Frappe framework & ERPNext. People who change the world need the tools to do it! The Non Profit Modules of

Frappe 16 Nov 17, 2022
Data and analysis relating to the 5.8M Melbourne quake of 2021

quake2021 Data and analysis relating to the 5.8M Melbourne quake of 2021 Monash University Woodside Living Lab Building The building is located here T

Colin Caprani 6 May 16, 2022
A student information management system in Python

Student-information-management-system 本项目是一个学生信息管理系统,这个项目是用Python语言实现的,也实现了图形化界面的显示,同时也实现了管理员端,学生端两个登陆入口,同时底层使用的是Redis做的数据持久化。 This project is a stude

liuyunfei 7 Nov 15, 2022
A collection of UIKit components that can be used as a Wagtail StreamField block.

Wagtail UIKit Blocks A collection of UIKit components that can be used as a Wagtail StreamField block. Available UIKit components Container Grid Headi

Krishna Prasad K 13 Dec 15, 2022
Strong Typing in Python with Decorators

typy Strong Typing in Python with Decorators Description This light-weight library provides decorators that can be used to implement strongly-typed be

Ekin 0 Feb 06, 2022
The fastest way to copy to (not from) high speed flash storage.

FastestCopy The fastest way to copy to (not from) high speed flash storage. This is about 3-6x faster than file copy on explorer.exe to usb flash driv

Derek Frombach 0 Nov 03, 2021
Fabric mod where anyone can PR anything, concerning or not. I'll merge everything as soon as it works.

Guess What Will Happen In This Fabric mod where anyone can PR anything, concerning or not (Unless it's too concerning). I'll merge everything as soon

anatom 65 Dec 25, 2022
A casual IDOR exploiter that provides .csv files of url and status code.

IDOR-for-the-casual Do you like to IDOR? Are you a Windows hax0r? Well have I got a tool for you... A casual IDOR exploiter that provides .csv files o

Ben Wildee 2 Jan 20, 2022
Jannik Ramrath 1 Feb 05, 2022
Boamp-extractor - Script d'extraction des AOs publiés au BOAMP

BOAMP Extractor BOAMP-Extractor permet d'extraire les offres de marchés publics publiées au bulletin officiel des annonces des marchés publics (BOAMP)

Julien 3 Dec 09, 2022
Automation of VASP DFT workflows with ASE - application scripts

This repo contains a library that aims at automatizing some Density Functional Theory (DFT) workflows in VASP by using the ASE toolkit.

Frank Niessen 5 Sep 06, 2022
Python implementation of the Lox language from Robert Nystrom's Crafting Interpreters

pylox Python implementation of the Lox language from Robert Nystrom's Crafting Interpreters. https://craftinginterpreters.com. This only implements th

David Beazley 37 Dec 28, 2022
Lags valorant servers by rapidly picking up and throwing shorties.

Lags valorant servers by rapidly picking up and throwing shorties.

Eric Still 9 Dec 30, 2021
Telegram bot to search quotes from brainyquote.com

Brainy Quote Bot @BrainQuoteBot A star ⭐ from you means a lot to us! Telegram bot to search quotes from brainyquote.com Usage Deploy to Heroku Tap on

21 Nov 24, 2022
python based clash stars made by grade 7 and 5

clash_stars python based clash stars made by grade 7 and 5 How to play: PLAYER ONE (LEFT PLAYER) Move: W,A,S,D Shoot: SHIFT PLAYER TWO (RIGHT PLAYER)

5 Oct 22, 2021
Improving Representations via Similarities

embetter warning I like to build in public, but please don't expect anything yet. This is alpha stuff! notes Improving Representations via Similaritie

vincent d warmerdam 229 Jan 08, 2023
Problem 5: Fermat near-misses

Problem 5: Fermat near-misses fermatnearmiss This is a script that computes fermat nearm misses when the -f option is set and requires users to input

CHRIS BYRON (Int0x80) 1 Jan 08, 2022
A inspector to be able to view and edit Qt style sheet while an application is running

Qt Style Sheet Inspector An inspector widget to view and modify the style sheet of a Qt app at runtime. Usage In order to use the inspector widget on

ESSS 46 Dec 10, 2022
We are building an open database of COVID-19 cases with chest X-ray or CT images.

🛑 Note: please do not claim diagnostic performance of a model without a clinical study! This is not a kaggle competition dataset. Please read this pa

Joseph Paul Cohen 2.9k Dec 30, 2022