SurvTRACE: Transformers for Survival Analysis with Competing Events

Overview

SurvTRACE: Transformers for Survival Analysis with Competing Events

This repo provides the implementation of SurvTRACE for survival analysis. It is easy to use with only the following codes:

from survtrace.dataset import load_data
from survtrace.model import SurvTraceSingle
from survtrace import Evaluator
from survtrace import Trainer
from survtrace import STConfig

# use METABRIC dataset
STConfig['data'] = 'metabric'
df, df_train, df_y_train, df_test, df_y_test, df_val, df_y_val = load_data(STConfig)

# initialize model
model = SurvTraceSingle(STConfig)

# execute training
trainer = Trainer(model)
trainer.fit((df_train, df_y_train), (df_val, df_y_val))

# evaluating
evaluator = Evaluator(df, df_train.index)
evaluator.eval(model, (df_test, df_y_test))

print("done!")

🔥 See the demo

Please refer to experiment_metabric.ipynb and experiment_support.ipynb !

🔥 How to config the environment

Use our pre-saved conda environment!

conda env create --name survtrace --file=survtrace.yml
conda activate survtrace

or try to install from the requirement.txt

pip3 install -r requirements.txt

🔥 How to get SEER data

  1. Go to https://seer.cancer.gov/data/ to ask for data request from SEER following the guide there.

  2. After complete the step one, we should have the following seerstat software for data access. Open it and sign in with the username and password sent by seer.

  1. Use seerstat to open the ./data/seer.sl file, we shall see the following.

Click on the 'excute' icon to request from the seer database. We will obtain a csv file.

  1. move the csv file to ./data/seer_raw.csv, then run the python script process_seer.py, as

    python process_seer.py

    we will obtain the processed seer data named seer_processed.csv.

📝 Functions

  • single event survival analysis
  • competing events survival analysis
  • multi-task learning
  • automatic hyperparameter grid-search

😄 If you find this result interesting, please consider to cite this paper:

@article{wang2021survtrace,
      title={Surv{TRACE}: Transformers for Survival Analysis with Competing Events}, 
      author={Zifeng Wang and Jimeng Sun},
      year={2021},
      eprint={2110.00855},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}
Owner
Zifeng
PhD student of Computer Science
Zifeng
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

Hiring We are hiring at all levels (including FTE researchers and interns)! If you are interested in working with us on NLP and large-scale pre-traine

Microsoft 7.8k Jan 09, 2023
Journalism AI – Quotes extraction for modular journalism

Quote extraction for modular journalism (JournalismAI collab 2021)

Journalism AI collab 2021 207 Dec 25, 2022
Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis (SV2TTS)

This repository is an implementation of Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis (SV2TTS) with a vocoder that works in real-time. Feel free to check my the

Corentin Jemine 38.5k Jan 03, 2023
Transformation spoken text to written text

Transformation spoken text to written text This model is used for formatting raw asr text output from spoken text to written text (Eg. date, number, i

Nguyen Binh 16 Dec 28, 2022
Using Bert as the backbone model for lime, designed for NLP task explanation (sentence pair text classification task)

Lime Comparing deep contextualized model for sentences highlighting task. In addition, take the classic explanation model "LIME" with bert-base model

JHJu 2 Jan 18, 2022
ACL'2021: Learning Dense Representations of Phrases at Scale

DensePhrases DensePhrases is an extractive phrase search tool based on your natural language inputs. From 5 million Wikipedia articles, it can search

Princeton Natural Language Processing 540 Dec 30, 2022
Use Google's BERT for named entity recognition (CoNLL-2003 as the dataset).

For better performance, you can try NLPGNN, see NLPGNN for more details. BERT-NER Version 2 Use Google's BERT for named entity recognition (CoNLL-2003

Kaiyinzhou 1.2k Dec 26, 2022
Japanese Long-Unit-Word Tokenizer with RemBertTokenizerFast of Transformers

Japanese-LUW-Tokenizer Japanese Long-Unit-Word (国語研長単位) Tokenizer for Transformers based on 青空文庫 Basic Usage from transformers import RemBertToken

Koichi Yasuoka 3 Dec 22, 2021
Athena is an open-source implementation of end-to-end speech processing engine.

Athena is an open-source implementation of end-to-end speech processing engine. Our vision is to empower both industrial application and academic research on end-to-end models for speech processing.

Ke Technologies 34 Sep 08, 2022
Topic Modelling for Humans

gensim – Topic Modelling in Python Gensim is a Python library for topic modelling, document indexing and similarity retrieval with large corpora. Targ

RARE Technologies 13.8k Jan 02, 2023
Unsupervised text tokenizer for Neural Network-based text generation.

SentencePiece SentencePiece is an unsupervised text tokenizer and detokenizer mainly for Neural Network-based text generation systems where the vocabu

Google 6.4k Jan 01, 2023
Journey is a NLP-Powered Developer assistant

Journey Journey is a NLP-Powered Developer assistant Using on the powerful Natural Language Processing library Mindmeld, this projects aims to assist

Christian Eilers 21 Dec 11, 2022
The following links explain a bit the idea of semantic search and how search mechanisms work by doing retrieve and rerank

Main Idea The following links explain a bit the idea of semantic search and how search mechanisms work by doing retrieve and rerank Semantic Search Re

Sergio Arnaud Gomez 2 Jan 28, 2022
scikit-learn wrappers for Python fastText.

skift scikit-learn wrappers for Python fastText. from skift import FirstColFtClassifier df = pandas.DataFrame([['woof', 0], ['meow', 1]], colu

Shay Palachy 233 Sep 09, 2022
AMUSE - financial summarization

AMUSE AMUSE - financial summarization Unzip data.zip Train new model: python FinAnalyze.py --task train --start 0 --count how many files,-1 for all

1 Jan 11, 2022
Code for text augmentation method leveraging large-scale language models

HyperMix Code for our paper GPT3Mix and conducting classification experiments using GPT-3 prompt-based data augmentation. Getting Started Installing P

NAVER AI 47 Dec 20, 2022
Materials (slides, code, assignments) for the NYU class I teach on NLP and ML Systems (Master of Engineering).

FREE_7773 Repo containing material for the NYU class (Master of Engineering) I teach on NLP, ML Sys etc. For context on what the class is trying to ac

Jacopo Tagliabue 90 Dec 19, 2022
The code for two papers: Feedback Transformer and Expire-Span.

transformer-sequential This repo contains the code for two papers: Feedback Transformer Expire-Span The training code is structured for long sequentia

Meta Research 125 Dec 25, 2022
Arabic speech recognition, classification and text-to-speech.

klaam Arabic speech recognition, classification and text-to-speech using many advanced models like wave2vec and fastspeech2. This repository allows tr

ARBML 177 Dec 27, 2022