Natural Intelligence is still a pretty good idea.

Overview

Downloads Version Code style: black DOI

Human Learn

Machine Learning models should play by the rules, literally.

Project Goal

Back in the old days, it was common to write rule-based systems. Systems that do;

Nowadays, it's much more fashionable to use machine learning instead. Something like;

We started wondering if we might have lost something in this transition. Sure, machine learning covers a lot of ground but it is also capable of making bad decisions. We need to remain careful about hype. We also shouldn't forget that many classification problems can be handled by natural intelligence too. If nothing else, it'd sure be a sensible benchmark.

This package contains scikit-learn compatible tools that should make it easier to construct and benchmark rule based systems that are designed by humans. You can also use it in combination with ML models.

Installation

You can install this tool via pip.

python -m pip install human-learn

The project builds on top of a modern installation of scikit-learn and pandas. It also uses bokeh for interactive jupyter elements, shapely for the point-in-poly algorithms and clumper to deal with json datastructures.

Documentation

Detailed documentation of this tool can be found here.

A free video course can be found on calmcode.io.

Features

This library hosts a couple of models that you can play with.

Interactive Drawings

This tool allows you to draw over your datasets. These drawings can later be converted to models or to preprocessing tools.

Classification Models

FunctionClassifier

This allows you to define a function that can make classification predictions. It's constructed in such a way that you can use the arguments of the function as a parameter that you can benchmark in a grid-search.

InteractiveClassifier

This allows you to draw decision boundaries in interactive charts to create a model. You can create charts interactively in the notebook and export it as a scikit-learn compatible model.

Regression Models

FunctionRegressor

This allows you to define a function that can make regression predictions. It's constructed in such a way that you can use the arguments of the function as a parameter that you can benchmark in a grid-search.

Outlier Detection Models

FunctionOutlierDetector

This allows you to define a function that can declare outliers. It's constructed in such a way that you can use the arguments of the function as a parameter that you can benchmark in a grid-search.

InteractiveOutlierDetector

This allows you to draw decision boundaries in interactive charts to create a model. If a point falls outside of these boundaries we might be able to declare it an outlier. There's a threshold parameter for how strict you might want to be.

Preprocessing Models

PipeTransformer

This allows you to define a function that can handle preprocessing. It's constructed in such a way that you can use the arguments of the function as a parameter that you can benchmark in a grid-search. This is especially powerful in combination with the pandas .pipe method. If you're unfamiliar with this amazing feature, you may appreciate this tutorial.

InteractivePreprocessor

This allows you to draw features that you'd like to add to your dataset or your machine learning pipeline. You can use it via tfm.fit(df).transform(df) and df.pipe(tfm).

Datasets

Titanic

This library hosts the popular titanic survivor dataset for demo purposes. The goal of this dataset is to predict who might have survived the titanic disaster.

Fish

The fish market dataset is also hosted in this library. The goal of this dataset is to predict the weight of fish. However, it can also be turned into a classification problem by predicting the species.

Contribution

We're open to ideas for the repository but please discuss any feature you'd like to add before working on a PR. This way folks will know somebody is working on a feature and the implementation can be discussed with the maintainer upfront.

If you want to quickly get started locally you can run the following command to set the local development environment up.

make develop

If you want to run all the tests/checks locally you can run.

make check

This will run flake8, black, pytest and test the documentation pages.

Comments
  • Idea for a simple rule based classifier

    Idea for a simple rule based classifier

    Ideas for a rule based classifier after discussion with

    @koaning: The hope with that idea is that you can define case_when like statements that can be used as a rule based system.

    This has a few benefits.

    1. It's simple to create for a domain person.
    2. It's possible to create a ui/webapp for it.
    3. You might even be able to generate SQL so that the ML system can also "be deployed" in a database.

    This classifier would not have the full power of Python, but is rather a collection of rules entered by domain experts who are not necessarily technical people.

    Rules

    Rules have no structure and are always interpreted as disjunctions (or) and can be composed of conjunctions (and). To resolve conflict they can have a simple priority field.

    Format of the rules could be

    term:
       feature_name op value
    
    op: '=', '<>', '<', '>', '<=', '>='
    
    expr: term 
           | term 'and' term
    
    rule : term '=>' prediction (prio)?
    

    Examples

    • age < 60 => low
    • sex = 'f' and fare <> => high 10

    Rules need not be expressed as plain text, but also a structured format of nested lists/arrays. A parser for a text format like this would be possible with a very simple recursive descent parser.

    API

    class ClassifierBase:
        def predict(self, X):
            return np.array([ self.predict_single(x) for x in X])
        def predict_proba(self, X):
            return np.array([probas[xi] for xi in self.predict(X)])
        def score(self, X, y):
            n = len(y)
            correct = 0
            predictions = self.predict(X)
            for prediction, ground_truth in zip(predictions, y):
                if prediction == ground_truth:
                    correct = correct + 1
            return correct / n
    
    class CaseWhenClassifier(ClassifierBase):
        def predict_single(self, x):
           ...
    
        def .from_sklearn_tree(self, tree):
           ...
    
        def .to_sklearn_tree(self):
           ...
    
        def to_python_code(self, code_style):
          ...
    
        def parse(self, rules_as_text):
          ...
    
    rules = ...
    rule_clf = CaseWhenClassifier(features, categories, rules)
    
    

    Debugging support for plotting pairwise decision boundaries would be helpful.

    opened by DJCordhose 12
  • Can not draw model on jupyter

    Can not draw model on jupyter

    Hi, I'm trying to draw model on jupyter by referring to this link but it doesn't aprear anything.

    image

    jupyter was run on ubuntu machine and accessed from another remote computer in the same subnet.

    bokeh==2.4.3
    human-learn==0.3.1
    ipywidgets==7.7.1
    jupyter==1.0.0
    jupyter-client==7.3.4
    jupyter-console==6.4.4
    jupyter-core==4.11.1
    jupyter-server==1.18.1
    jupyterlab==3.4.4
    jupyterlab-pygments==0.2.2
    jupyterlab-server==2.15.0
    jupyterlab-widgets==1.1.1
    
    opened by didw 9
  • Adding a tooltip would help make decision on where to draw the line when no labels are available

    Adding a tooltip would help make decision on where to draw the line when no labels are available

    Hey there! Human learn has been super helpful so far. One thing I am a bit missing is the ability to see some of the underlying data about each data point. It would be very helpful to have a tooltip and having the option to pick a list of columns from the data frame to see in the tooltip.

    Right now, I am using Plotly separately to do that which allows me to more easily explore clusters. Then I try to find this cluster and draw on it.

    Screenshot 2021-01-14 19:22:32

    What do you think? Cheers, Nicolas

    opened by nbeuchat 7
  • InteractiveCharts with more than 5 unique labels throws an error when adding a new chart

    InteractiveCharts with more than 5 unique labels throws an error when adding a new chart

    Hi there! I noticed that when the column used for the labels or the color in an InteractiveCharts contains more than 5 unique values, adding a chart throws an error because the number of available colors in _colors is too low.

    # group_kind contains 7 unique values
    clf = InteractiveCharts(dfs, labels=["spam", "not_spam"], color="group_kind")
    clf.add_chart(x="umap_1", y="umap_2")
    

    It throws the error:

    KeyError                                  Traceback (most recent call last)
    <ipython-input-108-2daa1de2581a> in <module>
    ----> 1 clf.add_chart(x="umap_1", y="umap_2")
    
    ~/anaconda3/envs/nlp_fb_posts_topics/lib/python3.8/site-packages/hulearn/experimental/interactive.py in add_chart(self, x, y, size, alpha, width, height, legend)
         84         ```
         85         """
    ---> 86         chart = SingleInteractiveChart(
         87             dataf=self.dataf.copy(),
         88             labels=self.labels,
    
    ~/anaconda3/envs/nlp_fb_posts_topics/lib/python3.8/site-packages/hulearn/experimental/interactive.py in __init__(self, dataf, labels, x, y, size, alpha, width, height, color, legend)
        160                 color_labels = list(dataf[self.color_column].unique())
        161                 d = {k: col for k, col in zip(color_labels, self._colors)}
    --> 162                 dataf = dataf.assign(color=[d[lab] for lab in dataf[self.color_column]])
        163             self.source = ColumnDataSource(data=dataf)
        164             self.labels = labels
    
    ~/anaconda3/envs/nlp_fb_posts_topics/lib/python3.8/site-packages/hulearn/experimental/interactive.py in <listcomp>(.0)
        160                 color_labels = list(dataf[self.color_column].unique())
        161                 d = {k: col for k, col in zip(color_labels, self._colors)}
    --> 162                 dataf = dataf.assign(color=[d[lab] for lab in dataf[self.color_column]])
        163             self.source = ColumnDataSource(data=dataf)
        164             self.labels = labels
    
    KeyError: 'bulletin_board'
    

    Maybe using a colormap instead of a fixed set of colors would fix the issue?

    opened by nbeuchat 5
  • Can't draw with InteractiveCharts

    Can't draw with InteractiveCharts

    Hi, I'm trying the library just like I've seen on https://calmcode.io/human-learn/draw.html, but with my own data. This is what I got:

    from hulearn.experimental.interactive import InteractiveCharts
    clf = InteractiveCharts(df_labeled, labels="cluster")
    

    BokehJS 2.2.1 successfully loaded

    clf.add_chart(x='dst_ip',y='avg_duration')
    

    The graph appears, data is colored as expected and I can interact with it (zoom and so), but I can't draw the areas.

    I'm using Python 3.7.3, IPython 7.14.0 and Jupyter 5.7.8

    opened by jartigag 5
  • charts not showing up in Visual Studio Code notebook

    charts not showing up in Visual Studio Code notebook

    I am trying basically to reproduce the PyData Berlin environment using human-learn with sentence embeddings and UMAP so that I can draw boundaries, explore, and quickly label text data.

    The problem I am having is that the human-learn charts are not rendering in the VSC notebook. VSC is using Jupyter for the notebook and I am on Windows. I can render Pyplot, Seaborn, even Bokeh into the notebooks but the human-learn charts do not display:

    image

    Is anyone else having this issue? Is there some Jupyter extension I need or some Jupyter command I need to run? Bokeh is 2.3.2, human-learn is 0.3.1

    opened by mschmill 4
  • Running into a traceback error when importing the interactive charts module

    Running into a traceback error when importing the interactive charts module

    I am trying to run the interactive classifier notebook downloaded from the link at the bottom of this page - https://koaning.github.io/human-learn/guide/drawing-classifier/drawing.html.

    This is being run on a Windows x86-64 laptop, with the latest minconda3, python3.8 and jupyter-lab. I run into a traceback error on cell 3 from hulearn.experimental.interactive import InteractiveCharts, InteractiveChart

    ImportError                               Traceback (most recent call last)
    <ipython-input-3-9933ce75800d> in <module>()
    ----> 1 from hulearn.experimental.interactive import InteractiveCharts, InteractiveChart
    
    ImportError: cannot import name 'InteractiveChart' from 'hulearn.experimental.interactive' (C:\<mypath>\miniconda3\envs\myenv\lib\site-packages\hulearn\experimental\interactive.py)
    

    Not able to figure out what's up; issue reproduces on a unix environment (on Mac) as well.

    opened by aishnaga 4
  • Bokeh Port Error

    Bokeh Port Error

    Sometimes I hit this error:

    ERROR:bokeh.server.views.ws:Refusing websocket connection from Origin 'http://localhost:8889';                       use --allow-websocket-origin=localhost:8889 or set BOKEH_ALLOW_WS_ORIGIN=localhost:8889 to permit this; currently we allow origins {'localhost:8888'}
    WARNING:tornado.access:403 GET /ws (::1) 1.65ms
    

    Would be nice to get an automated fix for this.

    opened by koaning 3
  • geos_c.dll missing

    geos_c.dll missing

    from hulearn.preprocessing import InteractivePreprocessor
    tfm = InteractivePreprocessor(json_desc=charts.data())
    
    df.pipe(tfm.pandas_pipe).loc[lambda d: d['group'] != 0].sample(10)
    
    

    gives error :

    
    ---------------------------------------------------------------------------
    FileNotFoundError                         Traceback (most recent call last)
    ~\AppData\Local\Temp/ipykernel_28956/1501149949.py in <module>
    ----> 1 from hulearn.preprocessing import InteractivePreprocessor
          2 tfm = InteractivePreprocessor(json_desc=charts.data())
          3 
          4 df.pipe(tfm.pandas_pipe).loc[lambda d: d['group'] != 0].sample(10)
    
    ~\AppData\Roaming\Python\Python39\site-packages\hulearn\preprocessing\__init__.py in <module>
          1 from hulearn.preprocessing.pipetransformer import PipeTransformer
    ----> 2 from hulearn.preprocessing.interactivepreprocessor import InteractivePreprocessor
          3 
          4 __all__ = ["PipeTransformer", "InteractivePreprocessor"]
    
    ~\AppData\Roaming\Python\Python39\site-packages\hulearn\preprocessing\interactivepreprocessor.py in <module>
          4 import numpy as np
          5 import pandas as pd
    ----> 6 from shapely.geometry import Point
          7 from shapely.geometry.polygon import Polygon
          8 
    
    ~\AppData\Roaming\Python\Python39\site-packages\shapely\geometry\__init__.py in <module>
          2 """
          3 
    ----> 4 from .base import CAP_STYLE, JOIN_STYLE
          5 from .geo import box, shape, asShape, mapping
          6 from .point import Point, asPoint
    
    ~\AppData\Roaming\Python\Python39\site-packages\shapely\geometry\base.py in <module>
         17 
         18 from shapely.affinity import affine_transform
    ---> 19 from shapely.coords import CoordinateSequence
         20 from shapely.errors import WKBReadingError, WKTReadingError
         21 from shapely.geos import WKBWriter, WKTWriter
    
    ~\AppData\Roaming\Python\Python39\site-packages\shapely\coords.py in <module>
          6 from ctypes import byref, c_double, c_uint
          7 
    ----> 8 from shapely.geos import lgeos
          9 from shapely.topology import Validating
         10 
    
    ~\AppData\Roaming\Python\Python39\site-packages\shapely\geos.py in <module>
        147     if os.getenv('CONDA_PREFIX', ''):
        148         # conda package.
    --> 149         _lgeos = CDLL(os.path.join(sys.prefix, 'Library', 'bin', 'geos_c.dll'))
        150     else:
        151         try:
    
    ~\Anaconda3\envs\human-learn\lib\ctypes\__init__.py in __init__(self, name, mode, handle, use_errno, use_last_error, winmode)
        380 
        381         if handle is None:
    --> 382             self._handle = _dlopen(self._name, mode)
        383         else:
        384             self._handle = handle
    
    FileNotFoundError: Could not find module 'C:\Users\BORG7803\Anaconda3\envs\human-learn\Library\bin\geos_c.dll' (or one of its dependencies). Try using the full path with constructor syntax.
    
    opened by Borg93 2
  • AttributeError: module 'tornado.ioloop' has no attribute '_Selectable'

    AttributeError: module 'tornado.ioloop' has no attribute '_Selectable'

    Hi Vincent,

    I was particularly impressed by how we could classify the data by just drawing. Kudos to you.

    However, I have been trying to implement the same in a different dataset but it's repeatedly throwing the below error .

    I am also linking my notebook just in case : https://www.kaggle.com/nishantrock/notebook8935105440

    Do suggest why this error is happening. I've tried it multiple times but it throws the same error.


    AttributeError Traceback (most recent call last) in ----> 1 clf.add_chart(x = 'Health Indicator', y = 'Reco_Policy_Premium')

    /opt/conda/lib/python3.7/site-packages/hulearn/experimental/interactive.py in add_chart(self, x, y, size, alpha, width, height, legend) 97 ) 98 self.charts.append(chart) ---> 99 chart.show() 100 101 def data(self):

    /opt/conda/lib/python3.7/site-packages/hulearn/experimental/interactive.py in show(self) 199 200 def show(self): --> 201 show(self.app) 202 203 def _replace_xy(self, data):

    /opt/conda/lib/python3.7/site-packages/bokeh/io/showing.py in show(obj, browser, new, notebook_handle, notebook_url, **kw) 135 # in Tornado) just in order to show a non-server object 136 if is_application or callable(obj): --> 137 return run_notebook_hook(state.notebook_type, 'app', obj, state, notebook_url, **kw) 138 139 return _show_with_state(obj, state, browser, new, notebook_handle=notebook_handle)

    /opt/conda/lib/python3.7/site-packages/bokeh/io/notebook.py in run_notebook_hook(notebook_type, action, *args, **kw) 296 if _HOOKS[notebook_type][action] is None: 297 raise RuntimeError("notebook hook for %r did not install %r action" % notebook_type, action) --> 298 return _HOOKS[notebook_type][action](*args, **kw) 299 300 #-----------------------------------------------------------------------------

    /opt/conda/lib/python3.7/site-packages/bokeh/io/notebook.py in show_app(app, state, notebook_url, port, **kw) 463 464 from tornado.ioloop import IOLoop --> 465 from ..server.server import Server 466 467 loop = IOLoop.current()

    /opt/conda/lib/python3.7/site-packages/bokeh/server/server.py in 39 # External imports 40 from tornado import version as tornado_version ---> 41 from tornado.httpserver import HTTPServer 42 from tornado.ioloop import IOLoop 43

    /opt/conda/lib/python3.7/site-packages/tornado/httpserver.py in 30 31 from tornado.escape import native_str ---> 32 from tornado.http1connection import HTTP1ServerConnection, HTTP1ConnectionParameters 33 from tornado import httputil 34 from tornado import iostream

    /opt/conda/lib/python3.7/site-packages/tornado/http1connection.py in 32 from tornado import gen 33 from tornado import httputil ---> 34 from tornado import iostream 35 from tornado.log import gen_log, app_log 36 from tornado.util import GzipDecompressor

    /opt/conda/lib/python3.7/site-packages/tornado/iostream.py in 208 209 --> 210 class BaseIOStream(object): 211 """A utility class to write to and read from a non-blocking file or socket. 212

    /opt/conda/lib/python3.7/site-packages/tornado/iostream.py in BaseIOStream() 284 self._closed = False 285 --> 286 def fileno(self) -> Union[int, ioloop._Selectable]: 287 """Returns the file descriptor for this stream.""" 288 raise NotImplementedError()

    AttributeError: module 'tornado.ioloop' has no attribute '_Selectable'

    opened by 123nishant 2
  • Adding common accessor for changing Chart Title, Legend Names, x label, y label etc

    Adding common accessor for changing Chart Title, Legend Names, x label, y label etc

    Currently, the library does not support adding custom title rather the x and y labels passed to the Interactive chart becomes the title

    self.plot = figure(width=width, height=height, title=f"{x} vs. {y}")

    as shown above we can add common accessors to deal with this?

    opened by tvash 2
  • Please cover a regression example

    Please cover a regression example

    Hi Vincent. I'm super into this framework. As a domain expert, I see some helpful ise cases with this tool involving regression. However, I'm not confident to apply regression as no example are provided.

    opened by FrancyJGLisboa 1
  • Raise `ValueErrors` on incorrect plot input.

    Raise `ValueErrors` on incorrect plot input.

    I noticed on reviewing this PR that SingleInteractiveChart does not check if the inputs make sense with regards to the dataframe that is passed in. We don't want to create an extra SingleInteractiveChart under the InteractiveCharts object because this causes side effects (unneeded json data).

    Let's add some ValueErrors there.

    opened by koaning 0
Releases(0.2.5)
Owner
vincent d warmerdam
Solving problems involving data. Mostly NLP these days. AskMeAnything[tm].
vincent d warmerdam
The offcial repository for 'CharacterBERT and Self-Teaching for Improving the Robustness of Dense Retrievers on Queries with Typos', SIGIR2022

CharacterBERT-DR The offcial repository for CharacterBERT and Self-Teaching for Improving the Robustness of Dense Retrievers on Queries with Typos, Sh

ielab 11 Nov 15, 2022
利用yolov5和TensorRT从0到1实现目标检测的模型训练到模型部署全过程

写在前面 利用TensorRT加速推理速度是以时间换取精度的做法,意味着在推理速度上升的同时将会有精度的下降,不过不用太担心,精度下降微乎其微。此外,要有NVIDIA显卡,经测试,CUDA10.2可以支持20系列显卡及以下,30系列显卡需要CUDA11.x的支持,并且目前有bug。 默认你已经完成了

Helium 6 Jul 28, 2022
This repository is an open-source implementation of the ICRA 2021 paper: Locus: LiDAR-based Place Recognition using Spatiotemporal Higher-Order Pooling.

Locus This repository is an open-source implementation of the ICRA 2021 paper: Locus: LiDAR-based Place Recognition using Spatiotemporal Higher-Order

Robotics and Autonomous Systems Group 96 Dec 15, 2022
Meandering In Networks of Entities to Reach Verisimilar Answers

MINERVA Meandering In Networks of Entities to Reach Verisimilar Answers Code and models for the paper Go for a Walk and Arrive at the Answer - Reasoni

Shehzaad Dhuliawala 271 Dec 13, 2022
Controlling Hill Climb Racing with Hand Tacking

Controlling Hill Climb Racing with Hand Tacking Opened Palm for Gas Closed Palm for Brake

Rohit Ingole 3 Jan 18, 2022
PyTorch code to run synthetic experiments.

Code repository for Invariant Risk Minimization Source code for the paper: @article{InvariantRiskMinimization, title={Invariant Risk Minimization}

Facebook Research 345 Dec 12, 2022
PyTorch implementation of paper: HPNet: Deep Primitive Segmentation Using Hybrid Representations.

HPNet This repository contains the PyTorch implementation of paper: HPNet: Deep Primitive Segmentation Using Hybrid Representations. Installation The

Siming Yan 42 Dec 07, 2022
[PNAS2021] The neural architecture of language: Integrative modeling converges on predictive processing

The neural architecture of language: Integrative modeling converges on predictive processing Code accompanying the paper The neural architecture of la

Martin Schrimpf 36 Dec 01, 2022
Object Detection using YOLO from PyImageSearch

Object Detection using YOLO from PyImageSearch By applying object detection, you’ll not only be able to determine what is in an image, but also where

Mohamed NIANG 1 Feb 09, 2022
Code for the Paper: Conditional Variational Capsule Network for Open Set Recognition

Conditional Variational Capsule Network for Open Set Recognition This repository hosts the official code related to "Conditional Variational Capsule N

Guglielmo Camporese 35 Nov 21, 2022
CountDown to New Year and shoot fireworks

CountDown and Shoot Fireworks About App This is an small application make you re

5 Dec 31, 2022
TensorLight - A high-level framework for TensorFlow

TensorLight is a high-level framework for TensorFlow-based machine intelligence applications. It reduces boilerplate code and enables advanced feature

Benjamin Kan 10 Jul 31, 2022
Implementation of experiments in the paper Clockwork Variational Autoencoders (project website) using JAX and Flax

Clockwork VAEs in JAX/Flax Implementation of experiments in the paper Clockwork Variational Autoencoders (project website) using JAX and Flax, ported

Julius Kunze 26 Oct 05, 2022
Voice control for Garry's Mod

WIP: Talonvoice GMod integrations Very work in progress voice control demo for Garry's Mod. HOWTO Install https://talonvoice.com/ Press https://i.imgu

Meta Construct 5 Nov 15, 2022
Adjust Decision Boundary for Class Imbalanced Learning

Adjusting Decision Boundary for Class Imbalanced Learning This repository is the official PyTorch implementation of WVN-RS, introduced in Adjusting De

Peyton Byungju Kim 16 Jan 04, 2023
Fast sparse deep learning on CPUs

SPARSEDNN **If you want to use this repo, please send me an email: [email pro

Ziheng Wang 44 Nov 30, 2022
PlenOctree Extraction algorithm

PlenOctrees_NeRF-SH This is an implementation of the Paper PlenOctrees for Real-time Rendering of Neural Radiance Fields. Not only the code provides t

49 Nov 05, 2022
Send text to girlfriend in the morning

Girlfriend Text Send text to girlfriend (or really anyone with a phone number) in the morning 1. Configure your settings in utils.py. phone_number = "

Paras Adhikary 199 Oct 25, 2022
《Fst Lerning of Temporl Action Proposl vi Dense Boundry Genertor》(AAAI 2020)

Update 2020.03.13: Release tensorflow-version and pytorch-version DBG complete code. 2019.11.12: Release tensorflow-version DBG inference code. 2019.1

Tencent 338 Dec 16, 2022
Lite-HRNet: A Lightweight High-Resolution Network

LiteHRNet Benchmark 🔥 🔥 Based on MMsegmentation 🔥 🔥 Cityscapes FCN resize concat config mIoU last mAcc last eval last mIoU best mAcc best eval bes

16 Dec 12, 2022