CausalNLP is a practical toolkit for causal inference with text as treatment, outcome, or "controlled-for" variable.

Overview

CausalNLP

CausalNLP is a practical toolkit for causal inference with text as treatment, outcome, or "controlled-for" variable.

Install

  1. pip install -U pip
  2. pip install causalnlp

Usage

Example: What is the causal impact of a positive review on a product click?

import pandas as pd
df = pd.read_csv('sample_data/music_seed50.tsv', sep='\t', error_bad_lines=False)

The file music_seed50.tsv is a semi-simulated dataset from here. Columns of relevance include:

  • Y_sim: outcome, where 1 means product was clicked and 0 means not.
  • text: raw text of review
  • rating: rating associated with review (1 through 5)
  • T_true: 1 means rating less than 3, 0 means rating of 5, where T_true affects the outcome Y_sim.
  • T_ac: an approximation of true review sentiment (T_true) created with Autocoder from raw review text
  • C_true:confounding categorical variable (1=audio CD, 0=other)

We'll pretend the true sentiment (i.e., review rating and T_true) is hidden and only use T_ac as the treatment variable.

Using the text_col parameter, we include the raw review text as another "controlled-for" variable.

from causalnlp.causalinference import CausalInferenceModel
from lightgbm import LGBMClassifier
cm = CausalInferenceModel(df, 
                         metalearner_type='t-learner', learner=LGBMClassifier(num_leaves=500),
                         treatment_col='T_ac', outcome_col='Y_sim', text_col='text',
                         include_cols=['C_true'])
cm.fit()
outcome column (categorical): Y_sim
treatment column: T_ac
numerical/categorical covariates: ['C_true']
text covariate: text
preprocess time:  1.1179866790771484  sec
start fitting causal inference model
time to fit causal inference model:  10.361494302749634  sec

Estimating Treatment Effects

CausalNLP supports estimation of heterogeneous treatment effects (i.e., how causal impacts vary across observations, which could be documents, emails, posts, individuals, or organizations).

We will first calculate the overall average treatment effect (or ATE), which shows that a positive review increases the probability of a click by 13 percentage points in this dataset.

Average Treatment Effect (or ATE):

print( cm.estimate_ate() )
{'ate': 0.1309311542209525}

Conditional Average Treatment Effect (or CATE): reviews that mention the word "toddler":

print( cm.estimate_ate(df['text'].str.contains('toddler')) )
{'ate': 0.15559234254638685}

Individualized Treatment Effects (or ITE):

test_df = pd.DataFrame({'T_ac' : [1], 'C_true' : [1], 
                        'text' : ['I never bought this album, but I love his music and will soon!']})
effect = cm.predict(test_df)
print(effect)
[[0.80538201]]

Model Interpretability:

print( cm.interpret(plot=False)[1][:10] )
v_music    0.079042
v_cd       0.066838
v_album    0.055168
v_like     0.040784
v_love     0.040635
C_true     0.039949
v_just     0.035671
v_song     0.035362
v_great    0.029918
v_heard    0.028373
dtype: float64

Features with the v_ prefix are word features. C_true is the categorical variable indicating whether or not the product is a CD.

Text is Optional in CausalNLP

Despite the "NLP" in CausalNLP, the library can be used for causal inference on data without text (e.g., only numerical and categorical variables). See the examples for more info.

Documentation

API documentation and additional usage examples are available at: https://amaiya.github.io/causalnlp/

How to Cite

Please cite the following paper when using CausalNLP in your work:

@article{maiya2021causalnlp,
    title={CausalNLP: A Practical Toolkit for Causal Inference with Text},
    author={Arun S. Maiya},
    year={2021},
    eprint={2106.08043},
    archivePrefix={arXiv},
    primaryClass={cs.CL},
    journal={arXiv preprint arXiv:2106.08043},
}
You might also like...
Llvlir - Low Level Variable Length Intermediate Representation

Low Level Variable Length Intermediate Representation Low Level Variable Length

Semi-automated OpenVINO benchmark_app with variable parameters

Semi-automated OpenVINO benchmark_app with variable parameters. User can specify multiple options for any parameters in the benchmark_app and the progam runs the benchmark with all combinations of given options.

This is a repository for a Semantic Segmentation inference API using the Gluoncv CV toolkit
This is a repository for a Semantic Segmentation inference API using the Gluoncv CV toolkit

BMW Semantic Segmentation GPU/CPU Inference API This is a repository for a Semantic Segmentation inference API using the Gluoncv CV toolkit. The train

This is a repository for a semantic segmentation inference API using the OpenVINO toolkit
This is a repository for a semantic segmentation inference API using the OpenVINO toolkit

BMW-IntelOpenVINO-Segmentation-Inference-API This is a repository for a semantic segmentation inference API using the OpenVINO toolkit. It's supported

The Medical Detection Toolkit contains 2D + 3D implementations of prevalent object detectors such as Mask R-CNN, Retina Net, Retina U-Net, as well as a training and inference framework focused on dealing with medical images.
The Medical Detection Toolkit contains 2D + 3D implementations of prevalent object detectors such as Mask R-CNN, Retina Net, Retina U-Net, as well as a training and inference framework focused on dealing with medical images.

The Medical Detection Toolkit contains 2D + 3D implementations of prevalent object detectors such as Mask R-CNN, Retina Net, Retina U-Net, as well as a training and inference framework focused on dealing with medical images.

A DNN inference latency prediction toolkit for accurately modeling and predicting the latency on diverse edge devices.

Note: This is an alpha (preview) version which is still under refining. nn-Meter is a novel and efficient system to accurately predict the inference l

Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit

CNTK Chat Windows build status Linux build status The Microsoft Cognitive Toolkit (https://cntk.ai) is a unified deep learning toolkit that describes

Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit

CNTK Chat Windows build status Linux build status The Microsoft Cognitive Toolkit (https://cntk.ai) is a unified deep learning toolkit that describes

Code for Quantifying Ignorance in Individual-Level Causal-Effect Estimates under Hidden Confounding
Code for Quantifying Ignorance in Individual-Level Causal-Effect Estimates under Hidden Confounding

🍐 quince Code for Quantifying Ignorance in Individual-Level Causal-Effect Estimates under Hidden Confounding 🍐 Installation $ git clone [email protected]

Comments
  • Does your model support other languages than English?

    Does your model support other languages than English?

    Hi Amaiya, Thanks for your great package. Would you kindly let me know if your package supports languages other than English when using CausalBert?

    I'm also interested in knowing whether I can exploit other Transformers models from the Huggingface hub?

    question 
    opened by behroozazarkhalili 1
  • Error while fitting the model

    Error while fitting the model

    Hi,

    I ran to this bug while fitting the model. I checked the data and everything looks good. I don't get the root cause of this error.

    File /opt/conda/lib/python3.8/site-packages/causalnlp/meta/slearner.py:80, in BaseSLearner.fit(self, X, treatment, y, p)
         78 mask = (treatment == group) | (treatment == self.control_name)
         79 treatment_filt = treatment[mask]
    ---> 80 X_filt = X[mask]
         81 y_filt = y[mask]
         83 w = (treatment_filt == group).astype(int)
    
    IndexError: boolean index did not match indexed array along dimension 0
    
    opened by hfarhidzadeh 1
Releases(v0.7.0)
  • v0.7.0(Aug 2, 2022)

  • v0.6.0(Oct 20, 2021)

    0.6.0 (2021-10-20)

    New:

    • Added model_name parameter to CausalBertModel to support other DistilBert models (e.g., multilingual)

    Changed

    • N/A

    Fixed:

    • N/A
    Source code(tar.gz)
    Source code(zip)
  • v0.5.0(Sep 3, 2021)

    0.5.0 (2021-09-03)

    New:

    • Added support for CausalBert

    Changed

    • Added p parameter to CausalInferenceModel.fit and CausalInferenceModel.predict for user-supplied propensity scores in X-Learner and R-Learner.
    • Removed CV from propensity score computations in X-Learner and R-Learner and increase default max_iter to 10000

    Fixed:

    • Resolved problem with CausalInferenceModel.tune_and_use_default_learner when outcome is continuous
    • Changed to max_iter=10000 for default LogisticRegression base learner
    Source code(tar.gz)
    Source code(zip)
  • v0.4.0(Sep 3, 2021)

    0.4.0 (2021-07-20)

    New:

    • N/A

    Changed

    • Use LinearRegression and LogisticRegression as default base learners for s-learner.
    • changed parameter name of metalearner_type to method in CausalInferenceModel.

    Fixed:

    • Resolved mis-references in _balance method (renamed from _minimize_bias).
    • Fixed convergence issues and factored out propensity score computations to CausalInferenceModel.compute_propensity_scores.
    Source code(tar.gz)
    Source code(zip)
  • v0.3.1(Jul 19, 2021)

  • v0.3.0(Jul 15, 2021)

    0.3.0 (2021-07-15)

    New:

    • Added CausalInferenceModel.evaluate_robustness method to assess robustness of causal estimates using sensitivity analysis

    Changed

    • reduced dependencies with local metalearner implementations

    Fixed:

    • N/A
    Source code(tar.gz)
    Source code(zip)
  • v0.2.0(Jun 21, 2021)

  • v0.1.3(Jun 17, 2021)

  • v0.1.2(Jun 17, 2021)

    0.1.2 (2021-06-17)

    New:

    • N/A

    Changed

    • Better interpretability and explainability of treatment effects

    Fixed:

    • Fixes to some bugs in preprocessing
    Source code(tar.gz)
    Source code(zip)
  • v0.1.1(Jun 17, 2021)

  • v0.1.0(Jun 16, 2021)

Owner
Arun S. Maiya
computer scientist
Arun S. Maiya
A toy project using OpenCV and PyMunk

A toy project using OpenCV, PyMunk and Mediapipe the source code for my LindkedIn post It's just a toy project and I didn't write a documentation yet,

Amirabbas Asadi 82 Oct 28, 2022
NCNN implementation of Real-ESRGAN. Real-ESRGAN aims at developing Practical Algorithms for General Image Restoration.

NCNN implementation of Real-ESRGAN. Real-ESRGAN aims at developing Practical Algorithms for General Image Restoration.

Xintao 593 Jan 03, 2023
Codes and pretrained weights for winning submission of 2021 Brain Tumor Segmentation (BraTS) Challenge

Winning submission to the 2021 Brain Tumor Segmentation Challenge This repo contains the codes and pretrained weights for the winning submission to th

94 Dec 28, 2022
A code generator from ONNX to PyTorch code

onnx-pytorch Generating pytorch code from ONNX. Currently support onnx==1.9.0 and torch==1.8.1. Installation From PyPI pip install onnx-pytorch From

Wenhao Hu 94 Jan 06, 2023
Improving Non-autoregressive Generation with Mixup Training

MIST Training MIST TRAIN_FILE=/your/path/to/train.json VALID_FILE=/your/path/to/valid.json OUTPUT_DIR=/your/path/to/save_checkpoints CACHE_DIR=/your/p

7 Nov 22, 2022
Implementation for our ICCV 2021 paper: Dual-Camera Super-Resolution with Aligned Attention Modules

DCSR: Dual Camera Super-Resolution Implementation for our ICCV 2021 oral paper: Dual-Camera Super-Resolution with Aligned Attention Modules paper | pr

Tengfei Wang 110 Dec 20, 2022
Deep Crop Rotation

Deep Crop Rotation Paper (to come very soon!) We propose a deep learning approach to modelling both inter- and intra-annual patterns for parcel classi

Félix Quinton 5 Sep 23, 2022
This repository contains a PyTorch implementation of the paper Learning to Assimilate in Chaotic Dynamical Systems.

Amortized Assimilation This repository contains a PyTorch implementation of the paper Learning to Assimilate in Chaotic Dynamical Systems. Abstract: T

4 Aug 16, 2022
Python framework for Stochastic Differential Equations modeling

SDElearn: a Python package for SDE modeling This package implements functionalities for working with Stochastic Differential Equations models (SDEs fo

4 May 10, 2022
An implementation of the AdaOPS (Adaptive Online Packing-based Search), which is an online POMDP Solver used to solve problems defined with the POMDPs.jl generative interface.

AdaOPS An implementation of the AdaOPS (Adaptive Online Packing-guided Search), which is an online POMDP Solver used to solve problems defined with th

9 Oct 05, 2022
Full-featured Decision Trees and Random Forests learner.

CID3 This is a full-featured Decision Trees and Random Forests learner. It can save trees or forests to disk for later use. It is possible to query tr

Alejandro Penate-Diaz 3 Aug 15, 2022
基于深度强化学习的原神自动钓鱼AI

原神自动钓鱼AI由YOLOX, DQN两部分模型组成。使用迁移学习,半监督学习进行训练。 模型也包含一些使用opencv等传统数字图像处理方法实现的不可学习部分。

4.2k Jan 01, 2023
《Lerning n Intrinsic Grment Spce for Interctive Authoring of Grment Animtion》

Learning an Intrinsic Garment Space for Interactive Authoring of Garment Animation Overview This is the demo code for training a motion invariant enco

YuanBo 213 Dec 14, 2022
Caffe implementation for Hu et al. Segmentation for Natural Language Expressions

Segmentation from Natural Language Expressions This repository contains the Caffe reimplementation of the following paper: R. Hu, M. Rohrbach, T. Darr

10 Jul 27, 2021
Object Detection and Multi-Object Tracking

Object Detection and Multi-Object Tracking

Bobby Chen 1.6k Jan 04, 2023
LaBERT - A length-controllable and non-autoregressive image captioning model.

Length-Controllable Image Captioning (ECCV2020) This repo provides the implemetation of the paper Length-Controllable Image Captioning. Install conda

bearcatt 53 Nov 13, 2022
[NeurIPS 2021] ORL: Unsupervised Object-Level Representation Learning from Scene Images

Unsupervised Object-Level Representation Learning from Scene Images This repository contains the official PyTorch implementation of the ORL algorithm

Jiahao Xie 55 Dec 03, 2022
Learning Features with Parameter-Free Layers (ICLR 2022)

Learning Features with Parameter-Free Layers (ICLR 2022) Dongyoon Han, YoungJoon Yoo, Beomyoung Kim, Byeongho Heo | Paper NAVER AI Lab, NAVER CLOVA Up

NAVER AI 65 Dec 07, 2022
Use VITS and Opencpop to develop singing voice synthesis; Maybe it will VISinger.

Init Use VITS and Opencpop to develop singing voice synthesis; Maybe it will VISinger. 本项目基于 https://github.com/jaywalnut310/vits https://github.com/S

AmorTX 107 Dec 23, 2022
Predict halo masses from simulations via graph neural networks

HaloGraphNet Predict halo masses from simulations via Graph Neural Networks. Given a dark matter halo and its galaxies, creates a graph with informati

Pablo Villanueva Domingo 20 Nov 15, 2022