Data imputations library to preprocess datasets with missing data

Last update: Dec 05, 2022

Overview

Impyute

Impyute is a library of missing data imputation algorithms. This library was designed to be super lightweight, here's a sneak peak at what impyute can do.

>>> n = 5
>>> arr = np.random.uniform(high=6, size=(n, n))
>>> for _ in range(3):
>>>    arr[np.random.randint(n), np.random.randint(n)] = np.nan
>>> print(arr)
array([[0.25288643, 1.8149261 , 4.79943748, 0.54464834, np.nan],
       [4.44798362, 0.93518716, 3.24430922, 2.50915032, 5.75956805],
       [0.79802036, np.nan, 0.51729349, 5.06533123, 3.70669172],
       [1.30848217, 2.08386584, 2.29894541, np.nan, 3.38661392],
       [2.70989501, 3.13116687, 0.25851597, 4.24064355, 1.99607231]])
>>> import impyute as impy
>>> print(impy.mean(arr))
array([[0.25288643, 1.8149261 , 4.79943748, 0.54464834, 3.7122365],
       [4.44798362, 0.93518716, 3.24430922, 2.50915032, 5.75956805],
       [0.79802036, 1.99128649, 0.51729349, 5.06533123, 3.70669172],
       [1.30848217, 2.08386584, 2.29894541, 3.08994336, 3.38661392],
       [2.70989501, 3.13116687, 0.25851597, 4.24064355, 1.99607231]])

Feature Support

Imputation of Cross Sectional Data
- K-Nearest Neighbours
- Multivariate Imputation by Chained Equations
- Expectation Maximization
- Mean Imputation
- Mode Imputation
- Median Imputation
- Random Imputation
Imputation of Time Series Data
- Last Observation Carried Forward
- Moving Window
- Autoregressive Integrated Moving Average (WIP)
Diagnostic Tools
- Loggers
- Distribution of Null Values
- Comparison of imputations
- Little's MCAR Test (WIP)

Versions

Currently tested on 2.7, 3.4, 3.5, 3.6 and 3.7

Installation

To install impyute, run the following:

$ pip install impyute

Or to get the most current version:

$ git clone https://github.com/eltonlaw/impyute
$ cd impyute
$ python setup.py install

Documentation

Documentation is available here: http://impyute.readthedocs.io/

How to Contribute

Check out CONTRIBUTING

Data imputations library to preprocess datasets with missing data

Related tags

Overview

Impyute

Feature Support

Versions

Installation

Documentation

How to Contribute

Owner

Elton Law

PyIOmica (pyiomica) is a Python package for omics analyses.

Integrate bus data from a variety of sources (batch processing and real time processing).

Python beta calculator that retrieves stock and market data and provides linear regressions.

Convert tables stored as images to an usable .csv file

This is an analysis and prediction project for house prices in King County, USA based on certain features of the house

Repositori untuk menyimpan material Long Course STMKGxHMGI tentang Geophysical Python for Seismic Data Analysis

HyperSpy is an open source Python library for the interactive analysis of multidimensional datasets

wikirepo is a Python package that provides a framework to easily source and leverage standardized Wikidata information

Processo de ETL (extração, transformação, carregamento) realizado pela equipe no projeto final do curso da Soul Code Academy.

Python data processing, analysis, visualization, and data operations

Create HTML profiling reports from pandas DataFrame objects

The official repository for ROOT: analyzing, storing and visualizing big data, scientifically

Common bioinformatics database construction

Data Analysis for First Year Laboratory at Imperial College, London.

A Python package for Bayesian forecasting with object-oriented design and probabilistic models under the hood.

Desafio proposto pela IGTI em seu bootcamp de Cloud Data Engineer

An experimental project I'm undertaking for the sole purpose of increasing my Python knowledge

Spectral Analysis in Python

A 2-dimensional physics engine written in Cairo

Incubator for useful bioinformatics code, primarily in Python and R