Sequence lineage information extracted from RKI sequence data repo

Overview

Pango lineage information for German SARS-CoV-2 sequences

This repository contains a join of the metadata and pango lineage tables of all German SARS-CoV-2 sequences published by the Robert-Koch-Institut on Github.

The data here is updated every hour, automatically through a Github action, so whenever new data appears in the RKI repo, you will see it here within at most an hour.

The resulting dataset can be downloaded here, beware it's currently around 50MB in size: https://raw.githubusercontent.com/corneliusroemer/desh-data/main/data/meta_lineages.csv

Omicron share plot

Omicron Logit Plot

Omicron Logit Plot

Description of data

Column description:

  • IMS_ID: Unique identifier of the sequence
  • DATE_DRAW: Date the sample was taken from the patient
  • SEQ_REASON: Reason for sequencing, one of:
    • X: Unknown
    • N: Random sampling
    • Y: Targeted sequencing (exact reason unknown)
    • A[<reason>]: Targeted sequencing because variant PCR indicated VOC
  • PROCESSING_DATE: Date the sample was processed by the RKI and added to Github repo
  • SENDING_LAB_PC: Postcode (PLZ) of lab that did the initial PCR
  • SEQUENCING_LAB_PC: Postcode (PLZ) of lab that did the sequencing
  • lineage: Pango lineage as reported by pangolin
  • scorpio_call: Alternative, rough, variant as determined by scorpio (part of pangolin), this is less precise but a bit more robust than pangolin.

Excerpt

Here are the first 10 lines of the dataset.

IMS_ID,DATE_DRAW,SEQ_REASON,PROCESSING_DATE,SENDING_LAB_PC,SEQUENCING_LAB_PC,lineage,scorpio_call
IMS-10294-CVDP-00001,2021-01-14,X,2021-01-25,40225,40225,B.1.1.297,
IMS-10025-CVDP-00001,2021-01-17,N,2021-01-26,10409,10409,B.1.389,
IMS-10025-CVDP-00002,2021-01-17,N,2021-01-26,10409,10409,B.1.258,
IMS-10025-CVDP-00003,2021-01-17,N,2021-01-26,10409,10409,B.1.177.86,
IMS-10025-CVDP-00004,2021-01-17,N,2021-01-26,10409,10409,B.1.389,
IMS-10025-CVDP-00005,2021-01-18,N,2021-01-26,10409,10409,B.1.160,
IMS-10025-CVDP-00006,2021-01-17,N,2021-01-26,10409,10409,B.1.1.297,
IMS-10025-CVDP-00007,2021-01-18,N,2021-01-26,10409,10409,B.1.177.81,
IMS-10025-CVDP-00008,2021-01-18,N,2021-01-26,10409,10409,B.1.177,
IMS-10025-CVDP-00009,2021-01-18,N,2021-01-26,10409,10409,B.1.1.7,Alpha (B.1.1.7-like)
IMS-10025-CVDP-00010,2021-01-17,N,2021-01-26,10409,10409,B.1.1.7,Alpha (B.1.1.7-like)
IMS-10025-CVDP-00011,2021-01-17,N,2021-01-26,10409,10409,B.1.389,

Suggested import into pandas

You can import the data into pandas as follows:

#%%
import pandas as pd

#%%
df = pd.read_csv(
    'https://raw.githubusercontent.com/corneliusroemer/desh-data/main/data/meta_lineages.csv',
    index_col=0,
    parse_dates=[1,3],
    infer_datetime_format=True,
    cache_dates=True,
    dtype = {'SEQ_REASON': 'category',
             'SENDING_LAB_PC': 'category',
             'SEQUENCING_LAB_PC': 'category',
             'lineage': 'category',
             'scorpio_call': 'category'
             }
)
#%%
df.rename(columns={
    'DATE_DRAW': 'date',
    'PROCESSING_DATE': 'processing_date',
    'SEQ_REASON': 'reason',
    'SENDING_LAB_PC': 'sending_pc',
    'SEQUENCING_LAB_PC': 'sequencing_pc',
    'lineage': 'lineage',
    'scorpio_call': 'scorpio'
    },
    inplace=True
)
df

License

The underlying files that I use as input are licensed by RKI under CC-BY 4.0, see more details here: https://github.com/robert-koch-institut/SARS-CoV-2-Sequenzdaten_aus_Deutschland#lizenz.

The software here is licensed under the "Unlicense". You can do with it whatever you want.

For the data, just cite the original source, no need to cite this repo since it's just a trivial join.

Owner
Cornelius Roemer
Cornelius Roemer
EvDistill: Asynchronous Events to End-task Learning via Bidirectional Reconstruction-guided Cross-modal Knowledge Distillation (CVPR'21)

EvDistill: Asynchronous Events to End-task Learning via Bidirectional Reconstruction-guided Cross-modal Knowledge Distillation (CVPR'21) Citation If y

addisonwang 18 Nov 11, 2022
ZeroGen: Efficient Zero-shot Learning via Dataset Generation

ZEROGEN This repository contains the code for our paper “ZeroGen: Efficient Zero

Jiacheng Ye 31 Dec 30, 2022
This repository implements and evaluates convolutional networks on the Möbius strip as toy model instantiations of Coordinate Independent Convolutional Networks.

Orientation independent Möbius CNNs This repository implements and evaluates convolutional networks on the Möbius strip as toy model instantiations of

Maurice Weiler 59 Dec 09, 2022
Speech-Emotion-Analyzer - The neural network model is capable of detecting five different male/female emotions from audio speeches. (Deep Learning, NLP, Python)

Speech Emotion Analyzer The idea behind creating this project was to build a machine learning model that could detect emotions from the speech we have

Mitesh Puthran 965 Dec 24, 2022
Oriented Object Detection: Oriented RepPoints + Swin Transformer/ReResNet

Oriented RepPoints for Aerial Object Detection The code for the implementation of “Oriented RepPoints + Swin Transformer/ReResNet”. Introduction Based

96 Dec 13, 2022
I-BERT: Integer-only BERT Quantization

I-BERT: Integer-only BERT Quantization HuggingFace Implementation I-BERT is also available in the master branch of HuggingFace! Visit the following li

Sehoon Kim 139 Dec 27, 2022
Code repo for "Towards Interpretable Deep Networks for Monocular Depth Estimation" paper.

InterpretableMDE A PyTorch implementation for "Towards Interpretable Deep Networks for Monocular Depth Estimation" paper. arXiv link: https://arxiv.or

Zunzhi You 16 Aug 12, 2022
Reproduced Code for Image Forgery Detection papers.

Image Forgery Detection With over 4.5 billion active internet users, the amount of multimedia content being shared every day has surpassed everyone’s

Umar Masud 15 Dec 06, 2022
Official implementation of Deep Burst Super-Resolution

Deep-Burst-SR Official implementation of Deep Burst Super-Resolution Publication: Deep Burst Super-Resolution. Goutam Bhat, Martin Danelljan, Luc Van

Goutam Bhat 113 Dec 19, 2022
This is a repo of basic Machine Learning!

Basic Machine Learning This repository contains a topic-wise curated list of Machine Learning and Deep Learning tutorials, articles and other resource

Ekram Asif 53 Dec 31, 2022
Converts given image (png, jpg, etc) to amogus gif.

Image to Amogus Converter Converts given image (.png, .jpg, etc) to an amogus gif! Usage Place image in the /target/ folder (or anywhere realistically

Hank Magan 1 Nov 24, 2021
Official code repository of the paper Learning Associative Inference Using Fast Weight Memory by Schlag et al.

Learning Associative Inference Using Fast Weight Memory This repository contains the offical code for the paper Learning Associative Inference Using F

Imanol Schlag 18 Oct 12, 2022
Supervised Sliding Window Smoothing Loss Function Based on MS-TCN for Video Segmentation

SSWS-loss_function_based_on_MS-TCN Supervised Sliding Window Smoothing Loss Function Based on MS-TCN for Video Segmentation Supervised Sliding Window

3 Aug 03, 2022
Keras-1D-NN-Classifier

Keras-1D-NN-Classifier This code is based on the reference codes linked below. reference 1, reference 2 This code is for 1-D array data classification

Jae-Hoon Shim 6 May 18, 2021
Semantic code search implementation using Tensorflow framework and the source code data from the CodeSearchNet project

Semantic Code Search Semantic code search implementation using Tensorflow framework and the source code data from the CodeSearchNet project. The model

Chen Wu 24 Nov 29, 2022
A Dynamic Residual Self-Attention Network for Lightweight Single Image Super-Resolution

DRSAN A Dynamic Residual Self-Attention Network for Lightweight Single Image Super-Resolution Karam Park, Jae Woong Soh, and Nam Ik Cho Environments U

4 May 10, 2022
Colossal-AI: A Unified Deep Learning System for Large-Scale Parallel Training

ColossalAI An integrated large-scale model training system with efficient parallelization techniques. arXiv: Colossal-AI: A Unified Deep Learning Syst

HPC-AI Tech 7.9k Jan 08, 2023
ULMFiT for Genomic Sequence Data

Genomic ULMFiT This is an implementation of ULMFiT for genomics classification using Pytorch and Fastai. The model architecture used is based on the A

Karl 276 Dec 12, 2022
GBIM(Gesture-Based Interaction map)

手势交互地图 GBIM(Gesture-Based Interaction map),基于视觉深度神经网络的交互地图,通过电脑摄像头观察使用者的手势变化,进而控制地图进行简单的交互。网络使用PaddleX提供的轻量级模型PPYOLO Tiny以及MobileNet V3 small,使得整个模型大小约10MB左右,即使在CPU下也能快速定位和识别手势。

8 Feb 10, 2022
Tensor-based approaches for fMRI classification

tensor-fmri Using tensor-based approaches to classify fMRI data from StarPLUS. Citation If you use any code in this repository, please cite the follow

4 Sep 07, 2022