Tutorial in Python targeted at Epidemiologists. Will discuss the basics of analysis in Python 3

Overview

Python-for-Epidemiologists

Join the chat at https://gitter.im/zEpid/community DOI

This repository is an introduction to epidemiology analyses in Python. Additionally, the tutorials for my library zEpid are hosted here. For more information on zEpid, see GitHub or ReadTheDocs.

The directory of this guide is

  1. Python Basics
  2. Basics of pandas (data management library)
  3. Epidemiology analyses in Python
    1. Basics
    2. Missing data
    3. Causal inference
      1. Time-fixed treatments
      2. Time-varying treatments
    4. Sensitivity analyses

Required packages for tutorial

To complete the tutorial, user must have the following packages installed: numpy, pandas, zepid, matplotlib, statsmodels, lifelines, and sklearn

IDE (Integrated Development Environment)

No IDE is required to complete the tutorial. All files are available in ipynb also known as jupyter notebooks. Code can be either downloaded or copied from the notebooks.

Here are some IDEs I have used in the past (and what I believe to be their advantages and disadvantages

Rodeo

This is the IDE I used for a long time. It is set up like RStudio

Advantages:

Basically RStudio but for Python, decent interface, easy to run line-by-line, easy to visualize plots (although it encourage bad habits)

Disadvantages:

Does not have all the features of RStudio (will delete changes if closed without saving), sucks up a lot of memory, sometimes the auto-complete would stop working if I hit more than 300+ lines of code, the environment tab is not great (don't expect it to open anything like RStudio)

Aside: their website has great tutorials how to run some basic stuff in Python if you are new to analysis in Python https://rodeo.yhat.com/

jupyter notebooks

Designed to be like a lab notebook, or like R markdown. Supports a pseudo-line-by-line concept Good for writing, since it allows for MarkDown. While I know a lot of people like jupyter, I only really use it for examples of code, not my personal programming. I never liked how it had to open via a Web Browser. I would rather have it be separate program. However, all guides were made using this IDE

PyCharm

This is the IDE I currently use

Advantages:

Easily set up virtual environments, interacts natively with Git, supports different file formats with plug-ins (e.g. .md), enforces certain coding conventions, better debug code features, organization of files under the project tab are convenient

Disadvantages:

Not great for running line-by-line code (it can do it, just not as elegantly), little more hardcore (I wouldn't really consider it a beginner's IDE. It requires some knowledge of set-up of Python)

IDLE

Ships with the basic Python 3.x installation. It is very basic and does not support line-by-line. Wouldn't recommend unless you are just starting with Python and don't want to commit to an IDE yet

Spyder

Ships with conda. Not bad but I didn't use it that much (I couldn't get the hang of it). Similarly it is an RStudio copy. Can't say too much since I haven't used it extensively

Basic Introduction to Python

If you have never used Python before, I have created some introductory materials to Python and the data management library I use, pandas. These are basic guides, but they also point to other resources. Please READ ALL OF THE BELOW BEFORE PROCEEDING.

Installing Python

To install, Python 3.x, we can download it directly from: https://www.python.org/downloads/

The installer provides an option to add Python3 to PATH, it is highly recommended you do this, since it allows you to avoid having to do it manually

Open Command Prompt / Terminal. When opened, type python and this should open Python in the same window. From here, you can quit by typing 'quit()' or closing the window. If this does NOT work, make sure your environmental variable was created properly

Installing Python Packages

Packages are what stores Python functions that we will use. These packages are contributed by various members of the community (including me)) and there is a wide array. To be able to download packages, we need to make sure we have an environmental variable created for python. We will discuss how to install packages

Python 3.x conveniently comes with a package manager. Basically it stores all the packages and we can use it to download new ones or update already downloaded ones.

To download a new package: Open Command Prompt/Terminal and use the following code (we will be installing pandas)

pip install pandas

To update a Python package, type the following command into Command Prompt. For example, we will update our pandas package

pip install pandas --upgrade

That concludes the basics. Please review parts 1 and 2 of the tutorials next

Comments
  • Cochran-Mantel-Haenszel

    Cochran-Mantel-Haenszel

    Thank you @pzivich for this amazing resource. Having the Hernan/Robbins causal model code in python is super helpful... g-estimation!

    I have a request... do you have a Cochran-Mantel-Haenszel script? If you get the chance, please, it would be useful to us to have in this repo. Thank you in advance!

    opened by opioiddatalab 2
  • Slight changes in Incidence Rate Ratio

    Slight changes in Incidence Rate Ratio

    Incidence Ratio Rate Paragraph

    • Fixed repetition
    • T1 & T0 are defined the same way. I believe that T0 is the person-time contributed by people NOT treated with ART
    opened by jaimiles23 0
  • Updates for v0.8.0

    Updates for v0.8.0

    Checklist for various notebooks to update with v0.8.0 release (hasn't released yet)

    • [x] IPTW update. Lots of major changes, so notebook needs to be completely overhauled

    • [x] Demonstrate new diagnostic functions for IPTW, g-formula, AIPW, TMLE

    • [x] Demonstrate g-bound argument

    • [x] Remove TMLE machine learning custom models. This is being removed in favor of cross-fitting. Can leave how to apply for now, but add the warning and mention will be cut in v0.9.0

    opened by pzivich 0
  • Notebooks not rendering in GitHub

    Notebooks not rendering in GitHub

    Sometimes GitHub has trouble rendering the notebooks. AFAIK the rendering system is behind the scenes at GitHub. Others have this same problem across repos and it sometimes occurs to me as well.

    If the notebook won't render in GitHub, you can copy the URL to the notebook you want to view and use the following site to view the notebook: https://nbviewer.jupyter.org/

    opened by pzivich 0
  • Replicate

    Replicate "Causal Inference"

    Issue to track progress on implementation of Hernan and Robins "Causal Inference" chapters

    • [x] Chapter 12: Inverse probability weights

    • [x] Chapter 13: Parametric g-formula

    • [x] Chapter 14: G-estimation of structural nested models

    • [x] ~Chapter 16: G-estimation for IV analysis~

    • [ ] Chapter 17: Causal survival analysis

    • [ ] Part III: Time-varying treatments

    ~G-estimation is not currently implemented. I will need to implement these before chapter 14 can be done.~

    Currently there are no plans to replicate Chapter 15 (propensity scores and regression) or Chapter 16 (instrumental variables) since the first method does not require zEpid and I am unfamiliar with the second. Maybe instrumental variables will be added in the future?

    For Chapter 16, I am considering demonstrating the usage of g-estimation instead of two-stage least-squares. Specifically, using the same data as done in Chapter 16 but following Technical Point 16.3

    enhancement 
    opened by pzivich 0
  • Tutorials

    Tutorials

    On the website, create quick tutorials demonstrating each of the implemented estimators, descriptions of how they work, and why you might want to use them. Might be more digestible than the current docs (also better justify why to choose one over the other)

    Reference to base on https://lifelines.readthedocs.io/en/latest/jupyter_notebooks/Proportional%20hazard%20assumption.html https://github.com/CamDavidsonPilon/lifelines/blob/master/docs/jupyter_notebooks/Proportional%20hazard%20assumption.ipynb

    TODO

    • [x] Basic measures

    • [x] splines

    • [x] IPTW: time-fixed treatment

    • [ ] IPTW: stochastic treatment

    • [ ] IPTW: time-varying treatment

    • [x] IPCW

    • [x] IPMW: single variable

    • [ ] IPMW: monotone

    • [ ] IPMW: nonmonotone (to add after implemented)

    • [x] G-formula: time-fixed binary treatment, binary outcome

    • [x] G-formula: time-fixed categorical treatment, binary outcome

    • [ ] G-formula: time-fixed continuous treatment, binary outcome (to add after implemented)

    • [x] G-formula: time-fixed binary treatment, continuous outcome

    • [x] G-formula: Monte Carlo

    • [x] G-formula: Iterative Conditional

    • [x] G-estimation of SNM

    • [x] AIPTW

    • [ ] AIPMW

    • [x] TMLE

    • [x] TMLE: stochastic treatment

    • [ ] LTMLE (to add after implemented)

    • [x] Quantitative bias analysis

    • [x] Functional form assessment

    • [x] Generalizability

    • [ ] Transportability (IPSW, g-transport, AIPSW)

    • [x] Monte Carlo g-formula by-hand (helps to explain underlying process)

    opened by pzivich 1
Releases(v0.8.0)
Owner
Paul Zivich
Epidemiology post-doc working in epidemiologic methods and infectious diseases.
Paul Zivich
(Preprint) Official PyTorch implementation of "How Do Vision Transformers Work?"

(Preprint) Official PyTorch implementation of "How Do Vision Transformers Work?"

xxxnell 656 Dec 30, 2022
ESGD-M - A stochastic non-convex second order optimizer, suitable for training deep learning models, for PyTorch

ESGD-M - A stochastic non-convex second order optimizer, suitable for training deep learning models, for PyTorch

Katherine Crowson 53 Dec 29, 2022
Torch-ngp - A pytorch implementation of the hash encoder proposed in instant-ngp

HashGrid Encoder (WIP) A pytorch implementation of the HashGrid Encoder from ins

hawkey 1k Jan 01, 2023
Official Pytorch implementation of Scene Representation Networks: Continuous 3D-Structure-Aware Neural Scene Representations

Scene Representation Networks This is the official implementation of the NeurIPS submission "Scene Representation Networks: Continuous 3D-Structure-Aw

Vincent Sitzmann 365 Jan 06, 2023
patchmatch和patchmatchstereo算法的python实现

patchmatch patchmatch以及patchmatchstereo算法的python版实现 patchmatch参考 github patchmatchstereo参考李迎松博士的c++版代码 由于patchmatchstereo没有做任何优化,并且是python的代码,主要是方便解析算

Sanders Bao 11 Dec 02, 2022
[WACV 2022] Contextual Gradient Scaling for Few-Shot Learning

CxGrad - Official PyTorch Implementation Contextual Gradient Scaling for Few-Shot Learning Sanghyuk Lee, Seunghyun Lee, and Byung Cheol Song In WACV 2

Sanghyuk Lee 4 Dec 05, 2022
Source code for NAACL 2021 paper "TR-BERT: Dynamic Token Reduction for Accelerating BERT Inference"

TR-BERT Source code and dataset for "TR-BERT: Dynamic Token Reduction for Accelerating BERT Inference". The code is based on huggaface's transformers.

THUNLP 37 Oct 30, 2022
Tensorflow Implementation for "Pre-trained Deep Convolution Neural Network Model With Attention for Speech Emotion Recognition"

Tensorflow Implementation for "Pre-trained Deep Convolution Neural Network Model With Attention for Speech Emotion Recognition" Pre-trained Deep Convo

Ankush Malaker 5 Nov 11, 2022
PiCIE: Unsupervised Semantic Segmentation using Invariance and Equivariance in clustering (CVPR2021)

PiCIE: Unsupervised Semantic Segmentation using Invariance and Equivariance in Clustering Jang Hyun Cho1, Utkarsh Mall2, Kavita Bala2, Bharath Harihar

Jang Hyun Cho 164 Dec 30, 2022
Official code release for 3DV 2021 paper Human Performance Capture from Monocular Video in the Wild.

Official code release for 3DV 2021 paper Human Performance Capture from Monocular Video in the Wild.

Chen Guo 58 Dec 24, 2022
This code provides a PyTorch implementation for OTTER (Optimal Transport distillation for Efficient zero-shot Recognition), as described in the paper.

Data Efficient Language-Supervised Zero-Shot Recognition with Optimal Transport Distillation This repository contains PyTorch evaluation code, trainin

Meta Research 45 Dec 20, 2022
FS2KToolbox FS2K Dataset Towards the translation between Face

FS2KToolbox FS2K Dataset Towards the translation between Face -- Sketch. Download (photo+sketch+annotation): Google-drive, Baidu-disk, pw: FS2K. For

Deng-Ping Fan 5 Jan 03, 2023
Organseg dags - The repository contains the codebase for multi-organ segmentation with directed acyclic graphs (DAGs) in CT.

Organseg dags - The repository contains the codebase for multi-organ segmentation with directed acyclic graphs (DAGs) in CT.

yzf 1 Jun 12, 2022
DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting

DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting Created by Yongming Rao*, Wenliang Zhao*, Guangyi Chen, Yansong Tang, Zheng Z

Yongming Rao 322 Dec 31, 2022
百度2021年语言与智能技术竞赛机器阅读理解Pytorch版baseline

项目说明: 百度2021年语言与智能技术竞赛机器阅读理解Pytorch版baseline 比赛链接:https://aistudio.baidu.com/aistudio/competition/detail/66?isFromLuge=true 官方的baseline版本是基于paddlepadd

周俊贤 54 Nov 23, 2022
CondenseNet: Light weighted CNN for mobile devices

CondenseNets This repository contains the code (in PyTorch) for "CondenseNet: An Efficient DenseNet using Learned Group Convolutions" paper by Gao Hua

Shichen Liu 690 Nov 30, 2022
A PyTorch implementation of SIN: Superpixel Interpolation Network

SIN: Superpixel Interpolation Network This is is a PyTorch implementation of the superpixel segmentation network introduced in our PRICAI-2021 paper:

6 Sep 28, 2022
Meta Representation Transformation for Low-resource Cross-lingual Learning

MetaXL: Meta Representation Transformation for Low-resource Cross-lingual Learning This repo hosts the code for MetaXL, published at NAACL 2021. [Meta

Microsoft 36 Aug 17, 2022
pix2pix in tensorflow.js

pix2pix in tensorflow.js This repo is moved to https://github.com/yining1023/pix2pix_tensorflowjs_lite See a live demo here: https://yining1023.github

Yining Shi 47 Oct 04, 2022
Code for Efficient Visual Pretraining with Contrastive Detection

Code for DetCon This repository contains code for the ICCV 2021 paper "Efficient Visual Pretraining with Contrastive Detection" by Olivier J. Hénaff,

DeepMind 56 Nov 13, 2022