A framework for feature exploration in Data Science

Related tags

ScienceBeehive
Overview

Beehive

A framework for feature exploration in Data Science

Background

What do we do when we finish one episode of feature exploration in a jupyter notebook?
We overwrite the notebook or clone the notebook for the next episode of feature exploration.

I say this is inefficient, thus this project.

Purpose

To help documentation for grouping all episode of feature exploration in a single notebook.

Project Elaboration

terms

episode: one set of feature trained on a model.
e.g.
episode_1 = (sepal_length, sepal_width) | (target_class)
episode_2 = (sepal_length, sepal_width, petal_length, petal_width) | (target_class)
episode_3 = (sepal_length_normalized, sepal_width_normalized, petal_length_normalized, petal_width_normalized) | (target_class)

get to know the classes

Combee

One Combee should represent one model and its features includes: (metrics, confusion matrix, feature importances)
(sklearn-based)

CombeKeras

One Combee should represent one model and its features includes: (metrics, confusion matrix)
(keras-based)
needed because Keras has different features and functions compared to sklearn

Vespiqueen

One Vespiqueen should represent one episode.
Vespiqueen memorizes the tranformation of data.
Vespiqueen governs all the Combees.
Can be used to pick which model is best in a Vespiqueen.

Beehive

Beehive governs the whole Vespiqueen.
Beehive is used to print all the metrics across Vespiqueen (or episodes)
Can also be used to pick new features from the last or best vespiqueen for the next episode.

How to Use

Owner
Steven IJ
Steven IJ
Discontinuous Galerkin finite element method (DGFEM) for Maxwell Equations

DGFEM Maxwell Equations Discontinuous Galerkin finite element method (DGFEM) for Maxwell Equations. Work in progress. Currently, the 1D Maxwell equati

Rafael de la Fuente 9 Aug 16, 2022
collection of interesting Computer Science resources

collection of interesting Computer Science resources

Kirill Bobyrev 137 Dec 22, 2022
An interactive explorer for single-cell transcriptomics data

an interactive explorer for single-cell transcriptomics data cellxgene (pronounced "cell-by-gene") is an interactive data explorer for single-cell tra

Chan Zuckerberg Initiative 424 Dec 15, 2022
Doing bayesian data analysis - Python/PyMC3 versions of the programs described in Doing bayesian data analysis by John K. Kruschke

Doing_bayesian_data_analysis This repository contains the Python version of the R programs described in the great book Doing bayesian data analysis (f

Osvaldo Martin 851 Dec 27, 2022
A flexible package manager that supports multiple versions, configurations, platforms, and compilers.

Spack Spack is a multi-platform package manager that builds and installs multiple versions and configurations of software. It works on Linux, macOS, a

Spack 3.1k Dec 31, 2022
Python Data Science Handbook: full text in Jupyter Notebooks

Python Data Science Handbook This repository contains the entire Python Data Science Handbook, in the form of (free!) Jupyter notebooks. How to Use th

Jake Vanderplas 36.9k Dec 28, 2022
An open-source application for biological image analysis

CellProfiler is a free open-source software designed to enable biologists without training in computer vision or programming to quantitatively measure

CellProfiler 734 Jan 08, 2023
Kedro is an open-source Python framework for creating reproducible, maintainable and modular data science code

A Python framework for creating reproducible, maintainable and modular data science code.

QuantumBlack Labs 7.9k Jan 01, 2023
CONCEPT (COsmological N-body CodE in PyThon) is a free and open-source simulation code for cosmological structure formation

CONCEPT (COsmological N-body CodE in PyThon) is a free and open-source simulation code for cosmological structure formation. The code should run on any Linux system, from massively parallel computer

Jeppe Dakin 62 Dec 08, 2022
Algorithms covered in the Bioinformatics Course part of the Cambridge Computer Science Tripos

Bioinformatics This is a repository of all the algorithms covered in the Bioinformatics Course part of the Cambridge Computer Science Tripos Algorithm

16 Jun 30, 2022
A mathematica expression evaluator with PokemonTypes

A simple mathematical expression evaluator that uses Pokemon types to replace symbols.

Arnav Jindal 2 Nov 14, 2021
Open Delmic Microscope Software

Odemis Odemis (Open Delmic Microscope Software) is the open-source microscopy software of Delmic B.V. Odemis is used for controlling microscopes of De

Delmic 32 Dec 14, 2022
Mathics is a general-purpose computer algebra system (CAS). It is an open-source alternative to Mathematica

Mathics is a general-purpose computer algebra system (CAS). It is an open-source alternative to Mathematica. It is free both as in "free beer" and as in "freedom".

Mathics 535 Jan 04, 2023
SeqLike - flexible biological sequence objects in Python

SeqLike - flexible biological sequence objects in Python Introduction A single object API that makes working with biological sequences in Python more

186 Dec 23, 2022
A modular single-molecule analysis interface

MOSAIC: A modular single-molecule analysis interface MOSAIC is a single molecule analysis toolbox that automatically decodes multi-state nanopore data

National Institute of Standards and Technology 35 Dec 13, 2022
Efficient Python Tricks and Tools for Data Scientists

Why efficient Python? Because using Python more efficiently will make your code more readable and run more efficiently.

Khuyen Tran 944 Dec 28, 2022
ReproZip is a tool that simplifies the process of creating reproducible experiments from command-line executions, a frequently-used common denominator in computational science.

ReproZip ReproZip is a tool aimed at simplifying the process of creating reproducible experiments from command-line executions, a frequently-used comm

267 Jan 01, 2023
A logical, reasonably standardized, but flexible project structure for doing and sharing data science work.

Cookiecutter Data Science A logical, reasonably standardized, but flexible project structure for doing and sharing data science work. Project homepage

Jon C Cline 0 Sep 05, 2021
Program that estimates antiderivatives utilising Maclaurin series.

AntiderivativeEstimator Program that estimates antiderivatives utilising Maclaurin series. Setup: Needs Python 3 and Git installed and added to PATH.

James Watson 3 Aug 04, 2021
CKAN is an open-source DMS (data management system) for powering data hubs and data portals. CKAN makes it easy to publish, share and use data. It powers catalog.data.gov, open.canada.ca/data, data.humdata.org among many other sites.

CKAN: The Open Source Data Portal Software CKAN is the world’s leading open-source data portal platform. CKAN makes it easy to publish, share and work

ckan 3.6k Dec 27, 2022