weightedcalcs
weightedcalcs is a pandas-based Python library for calculating weighted means, medians, standard deviations, and more.
Features
- Plays well with
pandas. - Support for weighted means, medians, quantiles, standard deviations, and distributions.
- Support for grouped calculations, using
DataFrameGroupByobjects. - Raises an error when your data contains null-values.
- Full test coverage.
Installation
pip install weightedcalcs
Usage
Getting started
Every weighted calculation in weightedcalcs begins with an instance of the weightedcalcs.Calculator class. Calculator takes one argument: the name of your weighting variable. So if you're analyzing a survey where the weighting variable is called "resp_weight", you'd do this:
import weightedcalcs as wc
calc = wc.Calculator("resp_weight")
Types of calculations
Currently, weightedcalcs.Calculator supports the following calculations:
calc.mean(my_data, value_var): The weighted arithmetic average ofvalue_var.calc.quantile(my_data, value_var, q): The weighted quantile ofvalue_var, whereqis between 0 and 1.calc.median(my_data, value_var): The weighted median ofvalue_var, equivalent to.quantile(...)whereq=0.5.calc.std(my_data, value_var): The weighted standard deviation ofvalue_var.calc.distribution(my_data, value_var): The weighted proportions ofvalue_var, interpretingvalue_varas categories.calc.count(my_data): The weighted count of all observations, i.e., the total weight.calc.sum(my_data, value_var): The weighted sum ofvalue_var.
The obj parameter above should one of the following:
- A
pandasDataFrameobject - A
pandasDataFrame.groupbyobject - A plain Python dictionary where the keys are column names and the values are equal-length lists.
Basic example
Below is a basic example of using weightedcalcs to find what percentage of Wyoming residents are married, divorced, et cetera:
import pandas as pd
import weightedcalcs as wc
# Load the 2015 American Community Survey person-level responses for Wyoming
responses = pd.read_csv("examples/data/acs-2015-pums-wy-simple.csv")
# `PWGTP` is the weighting variable used in the ACS's person-level data
calc = wc.Calculator("PWGTP")
# Get the distribution of marriage-status responses
calc.distribution(responses, "marriage_status").round(3).sort_values(ascending=False)
# -- Output --
# marriage_status
# Married 0.425
# Never married or under 15 years old 0.421
# Divorced 0.097
# Widowed 0.046
# Separated 0.012
# Name: PWGTP, dtype: float64
More examples
See this notebook to see examples of other calculations, including grouped calculations.
Max Ghenis has created a version of the example notebook that can be run directly in your browser, via Google Colab.
Weightedcalcs in the wild
- "Procesando los microdatos de la Encuesta Permanente de Hogares," by Manuel Aristarán
- BuzzFeedNews/2017-01-media-platform-and-news-trust-survey
- BuzzFeedNews/2016-12-transgender-rights-survey
