Flenser

Have you ever been handed a dataset you've never seen before?

Flenser is a simple, minimal, automated exploratory data analysis tool. It runs a set of simple tests against each column within a dataset, and outputs a HTML file noting which tests trigger per column, alongside relevant outputs.

Flenser is intended to be run at the earliest stages of data exploration, when you have no familiarity with the dataset. It will do its best to tell you what is actually going on in the dataset, regardless of what is supposed to be going on in the dataset.

Flenser is designed to be helpful, not 'helpful': it will not attempt to modify or make assumptions about your dataset. Instead it will apply each simple test, to every column, and show you outputs that will allow your human brain to make decisions about what is actually going on.

Additional tests can be added by modifying the Test dataclass.

How to run

python3 flenser.py 'filename.csv'

Flenser will print its default list of nans. You may specify one or more additional nan values to use, as follows:

python3 flenser.py 'filename.csv' 'nan1' 'nan2' 'nan3' ...

With thanks to

Recurse
Kelly F
Rebecca H
Azhad S
Shivam S
Christina M
Adam K
Edith V
Justin R

Flenser is a simple, minimal, automated exploratory data analysis tool.

Related tags

Overview

Flenser

How to run

With thanks to

Owner

John McCambridge

PipeChain is a utility library for creating functional pipelines.

Karate Club: An API Oriented Open-source Python Framework for Unsupervised Learning on Graphs (CIKM 2020)

INFO-H515 - Big Data Scalable Analytics

MeSH2Matrix - A set of Python codes for the generation of biomedical ontologies from the MeSH keywords of the PubMed scholarly publications

Two phase pipeline + StreamlitTwo phase pipeline + Streamlit

MIR Cheatsheet - Survival Guidebook for MIR Researchers in the Lab

A computer algebra system written in pure Python

PySpark Structured Streaming ROS Kafka ApacheSpark Cassandra

Spectacular AI SDK fuses data from cameras and IMU sensors and outputs an accurate 6-degree-of-freedom pose of a device.

Python library for creating data pipelines with chain functional programming

Includes all files needed to satisfy hw02 requirements

A meta plugin for processing timelapse data timepoint by timepoint in napari

Tkinter Izhikevich Neuron Model With Python

API>local_db>AWS_RDS - Disclaimer! All data used is for educational purposes only.

Exploratory Data Analysis for Employee Retention Dataset

ELFXtract is an automated analysis tool used for enumerating ELF binaries

A pipeline that creates consensus sequences from a Nanopore reads. I

A Python 3 library making time series data mining tasks, utilizing matrix profile algorithms

vartests is a Python library to perform some statistic tests to evaluate Value at Risk (VaR) Models

Pipeline to convert a haploid assembly into diploid