RGN2-Replica (WIP)

To eventually become an unofficial working Pytorch implementation of RGN2, an state of the art model for MSA-less Protein Folding for particular use when no evolutionary homologs are available (ie. for protein design).

Install

$ pip install rgn2-replica

To load sample dataset

from datasets import load_from_disk
ds = load_from_disk("data/ur90_small")
print(ds['train'][0])

To convert to pandas for exploration

df = ds['train'].to_pandas()
df.sample(5)

To train ProteinLM

Run the following command with default parameters

python -m scripts.lmtrainer

This will start the run using sample dataset in repo directory on CPU.

TO-DO LIST: ordered by priority

Contribute:

Hey there! New ideas are welcome: open/close issues, fork the repo and share your code with a Pull Request.

Currently the main discussions / conversation about the model development is happening in this discord server under the /self-supervised-learning channel.

Clone this project to your computer:

git clone https://github.com/EricAlcaide/pysimplechain

Please, follow this guideline on open source contribtuion

Citations:

@article {Chowdhury2021.08.02.454840,
    author = {Chowdhury, Ratul and Bouatta, Nazim and Biswas, Surojit and Rochereau, Charlotte and Church, George M. and Sorger, Peter K. and AlQuraishi, Mohammed},
    title = {Single-sequence protein structure prediction using language models from deep learning},
    elocation-id = {2021.08.02.454840},
    year = {2021},
    doi = {10.1101/2021.08.02.454840},
    publisher = {Cold Spring Harbor Laboratory},
    URL = {https://www.biorxiv.org/content/early/2021/08/04/2021.08.02.454840},
    eprint = {https://www.biorxiv.org/content/early/2021/08/04/2021.08.02.454840.full.pdf},
    journal = {bioRxiv}
}

@article{alquraishi_2019,
	author={AlQuraishi, Mohammed},
	title={End-to-End Differentiable Learning of Protein Structure},
	volume={8},
	DOI={10.1016/j.cels.2019.03.006},
	URL={https://www.cell.com/cell-systems/fulltext/S2405-4712(19)30076-6}
	number={4},
	journal={Cell Systems},
	year={2019},
	pages={292-301.e3}

Replication attempt for the Protein Folding Model

Related tags

Overview

RGN2-Replica (WIP)

Install

To load sample dataset

To train ProteinLM

TO-DO LIST: ordered by priority

Contribute:

Citations:

Owner

Eric Alcaide

Adaptive Attention Span for Reinforcement Learning

EMNLP 2021 Adapting Language Models for Zero-shot Learning by Meta-tuning on Dataset and Prompt Collections

Madanalysis5 - A package for event file analysis and recasting of LHC results

Capture all information throughout your model's development in a reproducible way and tie results directly to the model code!

Download and preprocess popular sequential recommendation datasets

JUSTICE: A Benchmark Dataset for Supreme Court’s Judgment Prediction

SimDeblur is a simple framework for image and video deblurring, implemented by PyTorch

一个多语言支持、易使用的 OCR 项目。An easy-to-use OCR project with multilingual support.

Pseudo-rng-app - whos needs science to make a random number when you have pseudoscience?

TargetAllDomainObjects - A python wrapper to run a command on against all users/computers/DCs of a Windows Domain

This is the dataset and code release of the OpenRooms Dataset.

Official implementation of the NeurIPS'21 paper 'Conditional Generation Using Polynomial Expansions'.

Official implementation for the paper: Permutation Invariant Graph Generation via Score-Based Generative Modeling

Official implementation of MSR-GCN (ICCV 2021 paper)

PyTorch implementation of EGVSR: Efficcient & Generic Video Super-Resolution (VSR)

Segmentation in Style: Unsupervised Semantic Image Segmentation with Stylegan and CLIP

Pytorch implementation of YOLOX、PPYOLO、PPYOLOv2、FCOS an so on.

PyTorch IPFS Dataset

Fast Learning of MNL Model From General Partial Rankings with Application to Network Formation Modeling

Stacked Recurrent Hourglass Network for Stereo Matching