Experiments in converting wikidata to ftm

Overview

FollowTheMoney / Wikidata mappings

This repo will contain tools for converting Wikidata entities into FtM schema.

Prefixes: https://www.mediawiki.org/wiki/Wikibase/Indexing/RDF_Dump_Format#Full_list_of_prefixes

https://github.com/tmtmtmtm/every-politician-scraper/blob/main/lib/every_politician_scraper/wikidata_query.rb#L87

https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/queries/examples#Current_U.S._members_of_the_Senate_with_district,_party_and_date_they_assumed_office

https://stackoverflow.com/questions/46383784/wikidata-get-all-properties-with-labels-and-values-of-an-item/46385132

What do I want?

  • People as entities
  • Organizations as entities
  • Membership of people in certain bodies
    • Query all legislatures
    • Query all cabinets
    • Query SOEs

Query experimentation

Get statement properties on all properties for the given entity:

select distinct ?statement ?wd ?wdLabel ?ps_ ?ps_Label ?wdpq ?wdpqLabel ?pq_ ?pq_Label where {
    wd:Q180589 ?p ?statement .
    ?statement ?ps ?ps_ .

    ?wd wikibase:claim ?p.
    ?wd wikibase:statementProperty ?ps.

    OPTIONAL {
      ?statement ?pq ?pq_ .
      ?wdpq wikibase:qualifier ?pq .
    }

    SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
}

What statement properties (ie. the metadata properties) are we interested in?

  • pq:P580 - start time

  • pq:P582 - end time

  • pq:P585 - point in time

  • wd:P4100 - parliamentary group

  • wd:P768 - district

  • wd:P1534 - end cause

select distinct ?statement ?wd ?wdLabel ?value ?valueLabel ?starttime ?endtime ?time where {
    wd:Q180589 ?p ?statement .
    ?statement ?ps ?value .

    ?wd wikibase:claim ?p.
    ?wd wikibase:statementProperty ?ps.

    OPTIONAL {
      ?statement pq:P580 ?starttime.
    }
    OPTIONAL {
      ?statement pq:P582 ?endtime.
    }
    OPTIONAL {
      ?statement pq:P585 ?time.
    }

    SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
}

How to get source/reference URLs: https://en.wikibooks.org/wiki/SPARQL/WIKIDATA_Qualifiers,_References_and_Ranks

Owner
Friedrich Lindenberg
Data and software engineer, investigative support.
Friedrich Lindenberg
The code for the Subformer, from the EMNLP 2021 Findings paper: "Subformer: Exploring Weight Sharing for Parameter Efficiency in Generative Transformers", by Machel Reid, Edison Marrese-Taylor, and Yutaka Matsuo

Subformer This repository contains the code for the Subformer. To help overcome this we propose the Subformer, allowing us to retain performance while

Machel Reid 10 Dec 27, 2022
Binary LSTM model for text classification

Text Classification The purpose of this repository is to create a neural network model of NLP with deep learning for binary classification of texts re

Nikita Elenberger 1 Mar 11, 2022
A highly sophisticated sequence-to-sequence model for code generation

CoderX A proof-of-concept AI system by Graham Neubig (June 30, 2021). About CoderX CoderX is a retrieval-based code generation AI system reminiscent o

Graham Neubig 39 Aug 03, 2021
A Facebook Messenger Chatbot using NLP

A Facebook Messenger Chatbot using NLP This project is about creating a messenger chatbot using basic NLP techniques and models like Logistic Regressi

6 Nov 20, 2022
Code for "Finetuning Pretrained Transformers into Variational Autoencoders"

transformers-into-vaes Code for Finetuning Pretrained Transformers into Variational Autoencoders (our submission to NLP Insights Workshop 2021). Gathe

Seongmin Park 22 Nov 26, 2022
An official repository for tutorials of Probabilistic Modelling and Reasoning (2021/2022) - a University of Edinburgh master's course.

PMR computer tutorials on HMMs (2021-2022) This is a repository for computer tutorials of Probabilistic Modelling and Reasoning (2021/2022) - a Univer

Vaidotas Šimkus 10 Dec 06, 2022
Mednlp - Medical natural language parsing and utility library

Medical natural language parsing and utility library A natural language medical

Paul Landes 3 Aug 24, 2022
Korean Simple Contrastive Learning of Sentence Embeddings using SKT KoBERT and kakaobrain KorNLU dataset

KoSimCSE Korean Simple Contrastive Learning of Sentence Embeddings implementation using pytorch SimCSE Installation git clone https://github.com/BM-K/

34 Nov 24, 2022
Ukrainian TTS (text-to-speech) using Coqui TTS

title emoji colorFrom colorTo sdk app_file pinned Ukrainian TTS 🐸 green green gradio app.py false Ukrainian TTS 📢 🤖 Ukrainian TTS (text-to-speech)

Yurii Paniv 85 Dec 26, 2022
Train and use generative text models in a few lines of code.

blather Train and use generative text models in a few lines of code. To see blather in action check out the colab notebook! Installation Use the packa

Dan Carroll 16 Nov 07, 2022
A natural language modeling framework based on PyTorch

Overview PyText is a deep-learning based NLP modeling framework built on PyTorch. PyText addresses the often-conflicting requirements of enabling rapi

Meta Research 6.4k Jan 08, 2023
Unofficial implementation of Google's FNet: Mixing Tokens with Fourier Transforms

FNet: Mixing Tokens with Fourier Transforms Pytorch implementation of Fnet : Mixing Tokens with Fourier Transforms. Citation: @misc{leethorp2021fnet,

Rishikesh (ऋषिकेश) 217 Dec 05, 2022
Japanese Long-Unit-Word Tokenizer with RemBertTokenizerFast of Transformers

Japanese-LUW-Tokenizer Japanese Long-Unit-Word (国語研長単位) Tokenizer for Transformers based on 青空文庫 Basic Usage from transformers import RemBertToken

Koichi Yasuoka 3 Dec 22, 2021
SNCSE: Contrastive Learning for Unsupervised Sentence Embedding with Soft Negative Samples

SNCSE SNCSE: Contrastive Learning for Unsupervised Sentence Embedding with Soft Negative Samples This is the repository for SNCSE. SNCSE aims to allev

Sense-GVT 59 Jan 02, 2023
使用pytorch+transformers复现了SimCSE论文中的有监督训练和无监督训练方法

SimCSE复现 项目描述 SimCSE是一种简单但是很巧妙的NLP对比学习方法,创新性地引入Dropout的方式,对样本添加噪声,从而达到对正样本增强的目的。 该框架的训练目的为:对于batch中的每个样本,拉近其与正样本之间的距离,拉远其与负样本之间的距离,使得模型能够在大规模无监督语料(也可以

58 Dec 20, 2022
Traditional Chinese Text Recognition Dataset: Synthetic Dataset and Labeled Data

Traditional Chinese Text Recognition Dataset: Synthetic Dataset and Labeled Data Authors: Yi-Chang Chen, Yu-Chuan Chang, Yen-Cheng Chang and Yi-Ren Ye

Yi-Chang Chen 5 Dec 15, 2022
Sentello is python script that simulates the anti-evasion and anti-analysis techniques used by malware.

sentello Sentello is a python script that simulates the anti-evasion and anti-analysis techniques used by malware. For techniques that are difficult t

Malwation 62 Oct 02, 2022
(ACL-IJCNLP 2021) Convolutions and Self-Attention: Re-interpreting Relative Positions in Pre-trained Language Models.

BERT Convolutions Code for the paper Convolutions and Self-Attention: Re-interpreting Relative Positions in Pre-trained Language Models. Contains expe

mlpc-ucsd 21 Jul 18, 2022
Knowledge Graph,Question Answering System,基于知识图谱和向量检索的医疗诊断问答系统

Knowledge Graph,Question Answering System,基于知识图谱和向量检索的医疗诊断问答系统

wangle 823 Dec 28, 2022
Composed Image Retrieval using Pretrained LANguage Transformers (CIRPLANT)

CIRPLANT This repository contains the code and pre-trained models for Composed Image Retrieval using Pretrained LANguage Transformers (CIRPLANT) For d

Zheyuan (David) Liu 29 Nov 17, 2022