Feature engineering library that helps you keep track of feature dependencies, documentation and schema

Overview

featureclass

Feature engineering library that helps you keep track of feature dependencies, documentation and schema

This library helps define a featureclass.
featureclass is inspired by dataclass, and is meant to provide alternative way to define features engineering classes.

I have noticed that the below code is pretty common when doing feature engineering:

from statistics import variance
from math import sqrt
class MyFeatures:
    def calc_all(self, datapoint):
        out = {}
        out['var'] = self.calc_var(datapoint),
        out['stdev'] = self.calc_std(out['var'])
        return out
        
    def calc_var(self, data) -> float:
        return variance(data)

    def calc_stdev(self, var) -> float:
        return sqrt(var)

Some things were missing for me from this type of implementation:

  1. Implicit dependencies between features
  2. No simple schema
  3. No documentation for features
  4. Duplicate declaration of the same feature - once as a function and one as a dict key

This is why I created this library.
I turned the above code into this:

float: """Calc stdev""" return sqrt(self.var) print(feature_names(MyFeatures)) # ('var', 'stdev') print(feature_annotations(MyFeatures)) # {'var': float, 'stdev': float} print(asDict(MyFeatures([1,2,3,4,5]))) # {'var': 2.5, 'stdev': 1.5811388300841898} print(asDataclass(MyFeatures([1,2,3,4,5]))) # MyFeatures(stdev=1.5811388300841898, var=2.5) ">
from featureclass import feature, featureclass, feature_names, feature_annotations, asDict, asDataclass
from statistics import variance
from math import sqrt

@featureclass
class MyFeatures:
    def __init__(self, datapoint):
        self.datapoint = datapoint
    
    @feature()
    def var(self) -> float:
        """Calc variance"""
        return variance(self.datapoint)

    @feature()
    def stdev(self) -> float:
        """Calc stdev"""
        return sqrt(self.var)

print(feature_names(MyFeatures)) # ('var', 'stdev')
print(feature_annotations(MyFeatures)) # {'var': float, 'stdev': float}
print(asDict(MyFeatures([1,2,3,4,5]))) # {'var': 2.5, 'stdev': 1.5811388300841898}
print(asDataclass(MyFeatures([1,2,3,4,5]))) # MyFeatures(stdev=1.5811388300841898, var=2.5)

The feature decorator is using cached_property to cache the feature calculation,
making sure that each feature is calculated once per datapoint

You might also like...
ChainJacking is a tool to find which of your Go lang direct GitHub dependencies is susceptible to ChainJacking attack.
ChainJacking is a tool to find which of your Go lang direct GitHub dependencies is susceptible to ChainJacking attack.

ChainJacking is a tool to find which of your Go lang direct GitHub dependencies is susceptible to ChainJacking attack.

Bazel rules to install Python dependencies with Poetry

rules_python_poetry Bazel rules to install Python dependencies from a Poetry project. Works with native Python rules for Bazel. Getting started Add th

An assistant to guess your pip dependencies from your code, without using a requirements file.

Pip Sala Bim is an assistant to guess your pip dependencies from your code, without using a requirements file. Pip Sala Bim will tell you which packag

Repls goes to sleep due to inactivity, but to keep it awake, simply host a webserver and ping it.
Repls goes to sleep due to inactivity, but to keep it awake, simply host a webserver and ping it.

Repls goes to sleep due to inactivity, but to keep it awake, simply host a webserver and ping it. This repo will help you make a webserver with a bit of console controls.

A Puzzle A Day Keep the Work Away

A Puzzle A Day Keep the Work Away No moyu again!

Keep your company's passwords behind the firewall

TeamVault TeamVault is an open-source web-based shared password manager for behind-the-firewall installation. It requires Python 3.3+ and Postgres (wi

A 100% python file organizer. Keep your computer always organized!

PythonOrganizer A 100% python file organizer. Keep your computer always organized! To run the project, just clone the folder and run the installation

School helper, helps you at your pyllabus's.
School helper, helps you at your pyllabus's.

pyllabus, helps you at your syllabus's... WARNING: It won't run without config.py! You should add config.py yourself, it will include your APIKEY. e.g

Ssma is a tool that helps you collect your badges in a satr platform
Ssma is a tool that helps you collect your badges in a satr platform

satr-statistics-maker ssma is a tool that helps you collect your badges in a satr platform 🎖️ Requirements python = 3.7 Installation first clone the

Releases(0.3.0)
  • 0.3.0(Jan 19, 2022)

    What's Changed

      • rename asDict to as_dict and asDataclass to as_dataclass by @Itayazolay in https://github.com/Itayazolay/featureclass/pull/4

    Full Changelog: https://github.com/Itayazolay/featureclass/compare/0.2.1...0.3.0

    Source code(tar.gz)
    Source code(zip)
  • 0.2.1(Jan 11, 2022)

    What's Changed

    • docs, formatting and tests by @Itayazolay in https://github.com/Itayazolay/featureclass/pull/1
    • fine-tune mypy type ignore by @Itayazolay in https://github.com/Itayazolay/featureclass/pull/3

    New Contributors

    • @Itayazolay made their first contribution in https://github.com/Itayazolay/featureclass/pull/1

    Full Changelog: https://github.com/Itayazolay/featureclass/compare/0.2.0...0.2.1

    Source code(tar.gz)
    Source code(zip)
  • 0.2.0(Jan 9, 2022)

Movie recommend community

README 0. 초록 1) 목적 사용자의 Needs를 기반으로 영화를 추천해주는 커뮤니티 서비스 구현 2) p!ck 서비스란? "pick your taste!" 취향대로 영화 플레이리스트(이하 서비스 내에서의 명칭인 '바스켓'이라 함)를 만들고, 비슷한 취향을 가진

2 Dec 08, 2021
🦠 A simple and fast (< 200ms) API for tracking the global coronavirus (COVID-19, SARS-CoV-2) outbreak.

🦠 A simple and fast ( 200ms) API for tracking the global coronavirus (COVID-19, SARS-CoV-2) outbreak. It's written in python using the 🔥 FastAPI framework. Supports multiple sources!

Marius 1.6k Jan 04, 2023
carrier.py is a Python package/module that's used to save time when programming

carrier.py is a Python package/module that's used to save time when programming, it helps with functions such as 24 and 12 hour time, Discord webhooks, etc

Zacky2613 2 Mar 20, 2022
Make discord server By Coding!

Discord Server Maker Make discord server by Coding! FAQ How can i get role permissons? Open discord with chrome developer tool, go to network and clic

1 Jul 17, 2022
High-level bindings to the Valhalla framework.

Valhalla for Python This spin-off project simply offers improved Python bindings to the fantastic Valhalla project. Installation pip install valhalla

GIS • OPS 20 Dec 13, 2022
Data Poisoning based on Adversarial Attacks using Non-Robust Features

Data Poisoning based on Adversarial Attacks using Non-Robust Features Usage python main.py [-h] [--gpu | -g GPU] [--eps |-e EPSILON] [--pert | -p PER

Jonathan E. 1 Nov 02, 2021
Show my read on kindle this year

Show my kindle status on GitHub

yihong 26 Jun 20, 2022
Tool for running a high throughput data ingestion/transformation workload with MongoDB

Mongo Mangler The mongo-mangler tool is a lightweight Python utility, which you can run from a low-powered machine to execute a high throughput data i

Paul Done 9 Jan 02, 2023
A python script based on OpenCV-Python, you can automatically hang up the Destiny 2 Throne to get the Dawning Essence.

A python script based on OpenCV-Python, you can automatically hang up the Destiny 2 Throne to get the Dawning Essence.

1 Dec 19, 2021
Python scripts to interact with Upper Deck ePack online trading card platform

This script should connect to the Upper Deck ePack API using your browser cookies and download a list of your current collection and save it as a CSV.

Adrian Kent 1 Nov 22, 2021
An extension for Arma 3 that lets you write extensions in Python 3

An Arma 3 extension that lets you to write python extensions for Arma 3. And it's really simple and straightforward to use!

Lukasz Taczuk 48 Dec 18, 2022
Estimate the Market Size for Electic and Plug-In Hybrid Vehicles In Africa

Estimate the Market Size for Electic and Plug-In Hybrid Vehicles In Africa The goal of this repository is to use open data repositories to answer the

Leonce Nshuti 0 Feb 21, 2022
0CD - BinaryNinja plugin to introduce some quality of life utilities for obsessive compulsive CTF enthusiasts

0CD Author: b0bb Quality of life utilities for obsessive compulsive CTF enthusia

12 Sep 14, 2022
A python script that fetches the grades of a student from a WAEC result in pdf format.

About waec-result-analyzer A python script that fetches the grades of a student from a WAEC result in pdf format. Built for federal government college

Oshodi Kolapo 2 Dec 04, 2021
Contain the customization I made for my Linux rice.

dotfiles Contain the customization I made for my Linux rice. Credit and Respect Polybar Autohide Fulltime Rofi by adi1090x (only include my personal r

sora 3 Apr 04, 2022
Ramadhan countdown - Simple daily reminder about upcoming Ramadhan

Ramadhan Countdown Bot Simple bot for displaying daily reminder about Islamic pr

Abdurrahman Shofy Adianto 1 Feb 06, 2022
Uproot - A script to bring deeply nested files or directories to the surface

UPROOT Bring deeply nested files or folders to the surface Uproot helps convert

Ted 2 Jan 15, 2022
Python based scripts for obtaining system information from Linux.

sysinfo Python based scripts for obtaining system information from Linux. Python2 and Python3 compatible Output in JSON format Simple scripts and exte

Petr Vavrin 70 Dec 20, 2022
this is a basic python project that I made using python

this is a basic python project that I made using python. This project is only for practice because my python skills are still newbie.

Elvira Firmansyah 2 Dec 14, 2022
This tool for beginner and help those people they gather information about Email Header Analysis, Instagram Information, Instagram Username Check, Ip Information, Phone Number Information, Port Scan

This tool for beginner and help those people they gather information about Email Header Analysis, Instagram Information, Instagram Username Check, Ip Information, Phone Number Information, Port Scan.

cb-kali 5 Feb 18, 2022