cleanlab is the data-centric ML ops package for machine learning with noisy labels.

Overview
cleanlab

cleanlab is the data-centric ML ops package for machine learning with noisy labels. cleanlab cleans labels and supports finding, quantifying, and learning with label errors in datasets. See datasets cleaned with cleanlab at labelerrors.com.

Check out the: cleanlab code documentation.

cleanlab is powered by confident learning, published in this paper | blog.

pypi os py_versions build_status coverage docs

Get started with tutorials


News! (2021) -- cleanlab finds pervasive label errors in the most common ML test sets (click to learn more)
News! (2020) -- cleanlab adds support for all OS, achieves state-of-the-art, supports co-teaching, and more (click to learn more)

Past release notes and future features planned is available here.


So fresh, so cleanlab

cleanlab finds and cleans label errors in any dataset using state-of-the-art algorithms to find label errors, characterize noise, and learn in spite of it. cleanlab is fast: its built on optimized algorithms and parallelized across CPU threads automatically. cleanlab is powered by provable guarantees of exact noise estimation and label error finding in realistic cases when model output probabilities are erroneous. cleanlab supports multi-label, multiclass, sparse matrices, etc. By default, cleanlab requires no hyper-parameters.

cleanlab implements the family of theory and algorithms called confident learning with provable guarantees of exact noise estimation and label error finding (even when model output probabilities are noisy/imperfect).

cleanlab supports most weak supervision tasks: multi-label, multiclass, sparse matrices, etc.

cleanlab is:

  1. backed-by-theory - Provable perfect label error finding in realistic conditions.
  2. fast - Non-iterative, parallelized algorithms (e.g. < 1 second to find label errors in ImageNet)
  3. general - Works with any ML or deep learning framework: PyTorch, Tensorflow, MxNet, Caffe2, scikit-learn, etc.
  4. unique - The only package for weak supervion with any dataset / classifier.

Find label errors with PyTorch, Tensorflow, MXNet, etc. in 1 line of code.

# Compute psx (n x m matrix of predicted probabilities) on your own, with any classifier.
# Here is an example that shows in detail how to compute psx on CIFAR-10:
#    https://github.com/cleanlab/cleanlab/tree/master/examples/cifar10
# Be sure you compute probs in a holdout/out-of-sample manner (e.g. cross-validation)
# Now getting label errors is trivial with cleanlab... its one line of code.
# Label errors are ordered by likelihood of being an error. First index is most likely error.
from cleanlab.pruning import get_noise_indices

ordered_label_errors = get_noise_indices(
    s=numpy_array_of_noisy_labels,
    psx=numpy_array_of_predicted_probabilities,
    sorted_index_method='normalized_margin', # Orders label errors
 )

Pre-computed out-of-sample predicted probabilities for CIFAR-10 train set are available here: [[LINK]].

Learning with noisy labels in 3 lines of code!

from cleanlab.classification import LearningWithNoisyLabels
from sklearn.linear_model import LogisticRegression

# Wrap around any classifier. Yup, you can use sklearn/pyTorch/Tensorflow/FastText/etc.
lnl = LearningWithNoisyLabels(clf=LogisticRegression())
lnl.fit(X=X_train_data, s=train_noisy_labels)
# Estimate the predictions you would have gotten by training with *no* label errors.
predicted_test_labels = lnl.predict(X_test)

Check out these examples and tests (includes how to use pyTorch, FastText, etc.).

Installation

Python 2.7, 3.4, 3.5, 3.6, and 3.7 are supported. Linux, macOS, and Windows are supported.

Stable release (pip):

$ pip install cleanlab  # Using pip

Stable release (conda):

$ conda install -c conda-forge cleanlab  # Using conda

Developer release:

$ pip install git+https://github.com/cleanlab/cleanlab.git

To install with the codebase (enabling you to make modifications):

$ conda update pip # if you use conda
$ git clone https://github.com/cleanlab/cleanlab.git
$ cd cleanlab
$ pip install -e .

Citations and Related Publications

If you use this package, please cite the confident learning paper (published April 2021, in the Journal of AI Research):

@article{northcutt2021confidentlearning,
   title={Confident Learning: Estimating Uncertainty in Dataset Labels},
   author={Curtis G. Northcutt and Lu Jiang and Isaac L. Chuang},
   journal={Journal of Artificial Intelligence Research (JAIR)},
   volume={70},
   pages={1373--1411},
   year={2021}
 }

If you use this package for binary classification, please also cite the rankpruning paper (published August 2017, in Uncertainty in AI):

@inproceedings{northcutt2017rankpruning,
 author={Northcutt, Curtis G. and Wu, Tailin and Chuang, Isaac L.},
 title={Learning with Confident Examples: Rank Pruning for Robust Classification with Noisy Labels},
 booktitle = {Proceedings of the Thirty-Third Conference on Uncertainty in Artificial Intelligence},
 series = {UAI'17},
 year = {2017},
 location = {Sydney, Australia},
 numpages = {10},
 url = {http://auai.org/uai2017/proceedings/papers/35.pdf},
 publisher = {AUAI Press},
}

Reproducing Results in confident learning paper

See cleanlab/examples/cifar10 and cleanlab/examples/imagenet. You'll need to git clone confidentlearning-reproduce which contains the data and files needed to reproduce the CIFAR-10 results.

cleanlab: State of the Art Learning with Noisy Labels in CIFAR

A [step-by-step guide] to reproduce these results is available [here]. This guide is also helpful as a tutorial to use cleanlab on any large-scale dataset.

Image depicting CIFAR10 benchmarks

Comparison of confident learning (CL) and cleanlab versus seven recent methods for learning with noisy labels in CIFAR-10. Highlighted cells show CL robustness to sparsity. The five CL methods estimate label errors, remove them, then train on the cleaned data using Co-Teaching.

Observe how cleanlab (CL methods) are robust to large sparsity in label noise whereas prior art tends to reduce in performance for increased sparsity, as shown by the red highlighted regions. This is important because real-world label noise is often sparse, e.g. a tiger is likely to be mislabeled as a lion, but not as most other classes like airplane, bathtub, and microwave.

cleanlab: Find Label Errors in ImageNet

Use cleanlab to identify ~100,000 label errors in the 2012 ImageNet training dataset.

Image depicting label errors in ImageNet train set

Top label issues in the 2012 ILSVRC ImageNet train set identified using cleanlab. Label Errors are boxed in red. Ontological issues in green. Multi-label images in blue.

cleanlab: Find Label Errors in MNIST

Use cleanlab to identify ~50 label errors in the MNIST dataset.

Image depicting label errors in MNIST train set

Label errors of the original MNIST train dataset identified algorithmically using cleanlab. Depicts the 24 least confident labels, ordered left-right, top-down by increasing self-confidence (probability of belonging to the given label), denoted conf in teal. The label with the largest predicted probability is in green. Overt errors are in red.

cleanlab Generality: View performance across 4 distributions and 9 classifiers.

Use cleanlab to learn with noisy labels regardless of dataset distribution or classifier.

Image depicting generality of cleanlab across datasets and classifiers

Each sub-figure in the figure above depicts the decision boundary learned using cleanlab.classification.LearningWithNoisyLabels in the presence of extreme (~35%) label errors. Label errors are circled in green. Label noise is class-conditional (not simply uniformly random). Columns are organized by the classifier used, except the left-most column which depicts the ground-truth dataset distribution. Rows are organized by dataset used.

The code to reproduce this figure is available here.

Each figure depicts accuracy scores on a test set as decimal values:

  1. LEFT (in black): The classifier test accuracy trained with perfect labels (no label errors).
  2. MIDDLE (in blue): The classifier test accuracy trained with noisy labels using cleanlab.
  3. RIGHT (in white): The baseline classifier test accuracy trained with noisy labels.

As an example, this is the noise matrix (noisy channel) P(s | y) characterizing the label noise for the first dataset row in the figure. s represents the observed noisy labels and y represents the latent, true labels. The trace of this matrix is 2.6. A trace of 4 implies no label noise. A cell in this matrix is read like, "A random 38% of '3' labels were flipped to '2' labels."

p(s|y) y=0 y=1 y=2 y=3
s=0 0.55 0.01 0.07 0.06
s=1 0.22 0.87 0.24 0.02
s=2 0.12 0.04 0.64 0.38
s=3 0.11 0.08 0.05 0.54

Get started with easy, quick examples.

New to cleanlab? Start with:

  1. Visualizing confident learning
  2. A simple example of learning with noisy labels on the multiclass Iris dataset.

These examples show how easy it is to characterize label noise in datasets, learn with noisy labels, identify label errors, estimate latent priors and noisy channels, and more.

Use cleanlab with any model (Tensorflow, caffe2, PyTorch, etc.)

All of the features of the cleanlab package work with any model. Yes, any model. Feel free to use PyTorch, Tensorflow, caffe2, scikit-learn, mxnet, etc. If you use a scikit-learn classifier, all cleanlab methods will work out-of-the-box. It’s also easy to use your favorite model from a non-scikit-learn package, just wrap your model into a Python class that inherits the sklearn.base.BaseEstimator:

from sklearn.base import BaseEstimator
class YourFavoriteModel(BaseEstimator): # Inherits sklearn base classifier
    def __init__(self, ):
        pass
    def fit(self, X, y, sample_weight=None):
        pass
    def predict(self, X):
        pass
    def predict_proba(self, X):
        pass
    def score(self, X, y, sample_weight=None):
        pass

# Now you can use your model with `cleanlab`. Here's one example:
from cleanlab.classification import LearningWithNoisyLabels
lnl = LearningWithNoisyLabels(clf=YourFavoriteModel())
lnl.fit(train_data, train_labels_with_errors)

Want to see a working example? Here’s a compliant PyTorch MNIST CNN class

As you can see here, technically you don’t actually need to inherit from sklearn.base.BaseEstimator, as you can just create a class that defines .fit(), .predict(), and .predict_proba(), but inheriting makes downstream scikit-learn applications like hyper-parameter optimization work seamlessly. For example, the LearningWithNoisyLabels() model is fully compliant.

Note, some libraries exists to do this for you. For pyTorch, check out the skorch Python library which will wrap your pytorch model into a scikit-learn compliant model.

Documentation by Example

cleanlab Core Package Components

  1. cleanlab/classification.py - The LearningWithNoisyLabels() class for learning with noisy labels.
  2. cleanlab/latent_algebra.py - Equalities when noise information is known.
  3. cleanlab/latent_estimation.py - Estimates and fully characterizes all variants of label noise.
  4. cleanlab/noise_generation.py - Generate mathematically valid synthetic noise matrices.
  5. cleanlab/polyplex.py - Characterizes joint distribution of label noise EXACTLY from noisy channel.
  6. cleanlab/pruning.py - Finds the indices of the examples with label errors in a dataset.

Many of these methods have default parameters that won’t be covered here. Check out the method docstrings for full documentation.

Methods to Standardize Research with Noisy Labels

cleanlab supports a number of functions to generate noise for benchmarking and standardization in research. This next example shows how to generate valid, class-conditional, unformly random noisy channel matrices:

# Generate a valid (necessary conditions for learnability are met) noise matrix for any trace > 1
from cleanlab.noise_generation import generate_noise_matrix_from_trace
noise_matrix=generate_noise_matrix_from_trace(
    K=number_of_classes,
    trace=float_value_greater_than_1_and_leq_K,
    py=prior_of_y_actual_labels_which_is_just_an_array_of_length_K,
    frac_zero_noise_rates=float_from_0_to_1_controlling_sparsity,
)

# Check if a noise matrix is valid (necessary conditions for learnability are met)
from cleanlab.noise_generation import noise_matrix_is_valid
is_valid=noise_matrix_is_valid(noise_matrix, prior_of_y_which_is_just_an_array_of_length_K)

For a given noise matrix, this example shows how to generate noisy labels. Methods can be seeded for reproducibility.

# Generate noisy labels using the noise_marix. Guarantees exact amount of noise in labels.
from cleanlab.noise_generation import generate_noisy_labels
s_noisy_labels = generate_noisy_labels(y_hidden_actual_labels, noise_matrix)

# This package is a full of other useful methods for learning with noisy labels.
# The tutorial stops here, but you don't have to. Inspect method docstrings for full docs.

Estimate the confident joint, the latent noisy channel matrix, P(s | y) and inverse, P(y | s), the latent prior of the unobserved, actual true labels, p(y), and the predicted probabilities.

s denotes a random variable that represents the observed, noisy label and y denotes a random variable representing the hidden, actual labels. Both s and y take any of the m classes as values. The cleanlab package supports different levels of granularity for computation depending on the needs of the user. Because of this, we support multiple alternatives, all no more than a few lines, to estimate these latent distribution arrays, enabling the user to reduce computation time by only computing what they need to compute, as seen in the examples below.

Throughout these examples, you’ll see a variable called confident_joint. The confident joint is an m x m matrix (m is the number of classes) that counts, for every observed, noisy class, the number of examples that confidently belong to every latent, hidden class. It counts the number of examples that we are confident are labeled correctly or incorrectly for every pair of obseved and unobserved classes. The confident joint is an unnormalized estimate of the complete-information latent joint distribution, Ps,y. Most of the methods in the cleanlab package start by first estimating the confident_joint. You can learn more about this in the confident learning paper.

Option 1: Compute the confident joint and predicted probs first. Stop if that’s all you need.

from cleanlab.latent_estimation import estimate_latent
from cleanlab.latent_estimation import estimate_confident_joint_and_cv_pred_proba

# Compute the confident joint and the n x m predicted probabilities matrix (psx),
# for n examples, m classes. Stop here if all you need is the confident joint.
confident_joint, psx = estimate_confident_joint_and_cv_pred_proba(
    X=X_train,
    s=train_labels_with_errors,
    clf=logreg(), # default, you can use any classifier
)

# Estimate latent distributions: p(y) as est_py, P(s|y) as est_nm, and P(y|s) as est_inv
est_py, est_nm, est_inv = estimate_latent(confident_joint, s=train_labels_with_errors)

Option 2: Estimate the latent distribution matrices in a single line of code.

from cleanlab.latent_estimation import estimate_py_noise_matrices_and_cv_pred_proba
est_py, est_nm, est_inv, confident_joint, psx = estimate_py_noise_matrices_and_cv_pred_proba(
    X=X_train,
    s=train_labels_with_errors,
)

Option 3: Skip computing the predicted probabilities if you already have them.

# Already have psx? (n x m matrix of predicted probabilities)
# For example, you might get them from a pre-trained model (like resnet on ImageNet)
# With the cleanlab package, you estimate directly with psx.
from cleanlab.latent_estimation import estimate_py_and_noise_matrices_from_probabilities
est_py, est_nm, est_inv, confident_joint = estimate_py_and_noise_matrices_from_probabilities(
    s=train_labels_with_errors,
    psx=psx,
)

Completely characterize label noise in a dataset:

The joint probability distribution of noisy and true labels, P(s,y), completely characterizes label noise with a class-conditional m x m matrix.

from cleanlab.latent_estimation import estimate_joint
joint = estimate_joint(
    s=noisy_labels,
    psx=probabilities,
    confident_joint=None,  # Provide if you have it already
)

PU learning with cleanlab:

PU learning is a special case when one of your classes has no error. P stands for the positive class and is assumed to have zero label errors and U stands for unlabeled data, but in practice, we just assume the U class is a noisy negative class that contains some positive examples. Thus, the goal of PU learning is to (1) estimate the proportion of positives in the negative class (see fraction_noise_in_unlabeled_class in the last example), (2) find the errors (see last example), and (3) train on clean data (see first example below). cleanlab does all three, taking into account that there is no label errors in whichever class you specify.

There are two ways to use cleanlab for PU learning. We'll look at each here.

Method 1. If you are using the cleanlab classifier LearningWithNoisyLabels(), and your dataset has exactly two classes (positive = 1, and negative = 0), PU learning is supported directly in cleanlab. You can perform PU learning like this:

from cleanlab.classification import LearningWithNoisyLabels
from sklearn.linear_model import LogisticRegression
# Wrap around any classifier. Yup, you can use sklearn/pyTorch/Tensorflow/FastText/etc.
pu_class = 0 # Should be 0 or 1. Label of class with NO ERRORS. (e.g., P class in PU)
lnl = LearningWithNoisyLabels(clf=LogisticRegression(), pulearning=pu_class)
lnl.fit(X=X_train_data, s=train_noisy_labels)
# Estimate the predictions you would have gotten by training with *no* label errors.
predicted_test_labels = lnl.predict(X_test)

Method 2. However, you might be using a more complicated classifier that doesn't work well with LearningWithNoisyLabels (see this example for CIFAR-10). Or you might have 3 or more classes. Here's how to use cleanlab for PU learning in this situation. To let cleanlab know which class has no error (in standard PU learning, this is the P class), you need to set the threshold for that class to 1 (1 means the probability that the labels of that class are correct is 1, i.e. that class has no error). Here's the code:

import numpy as np
# K is the number of classes in your dataset
# psx are the cross-validated predicted probabilities.
# s is the array/list/iterable of noisy labels
# pu_class is a 0-based integer for the class that has no label errors.
thresholds = np.asarray([np.mean(psx[:, k][s == k]) for k in range(K)])
thresholds[pu_class] = 1.0

Now you can use cleanlab however you were before. Just be sure to pass in this thresholds parameter wherever it applies. For example:

# Uncertainty quantification (characterize the label noise
# by estimating the joint distribution of noisy and true labels)
cj = compute_confident_joint(s, psx, thresholds=thresholds, )
# Now the noise (cj) has been estimated taking into account that some class(es) have no error.
# We can use cj to find label errors like this:
indices_of_label_errors = get_noise_indices(s, psx, confident_joint=cj, )

# In addition to label errors, we can find the fraction of noise in the unlabeled class.
# First we need the inv_noise_matrix which contains P(y|s) (proportion of mislabeling).
_, _, inv_noise_matrix = estimate_latent(confident_joint=cj, s=s, )
# Because inv_noise_matrix contains P(y|s), p (y = anything | s = pu_class) should be 0
# because the prob(true label is something else | example is in pu_class) is 0.
# What's more interesting is p(y = anything | s is not put_class), or in the binary case
# this translates to p(y = pu_class | s = 1 - pu_class) because pu_class is 0 or 1.
# So, to find the fraction_noise_in_unlabeled_class, for binary, you just compute:
fraction_noise_in_unlabeled_class = inv_noise_matrix[pu_class][1 - pu_class]

Now that you have indices_of_label_errors, you can remove those label errors and train on clean data (or only remove some of the label errors and iteratively use confident learning / cleanlab to improve results)

The Polyplex

The key to learning in the presence of label errors is estimating the joint distribution between the actual, hidden labels ‘y’ and the observed, noisy labels ‘s’. Using cleanlab and the theory of confident learning, we can completely characterize the trace of the latent joint distribution, trace(P(s,y)), given p(y), for any fraction of label errors, i.e. for any trace of the noisy channel, trace(P(s|y)).

You can check out how to do this yourself here: 1. Drawing Polyplices 2. Computing Polyplices

Join our community

Have ideas for the future of cleanlab? How are you using cleanlab? Join the discussion.

Have code improvements for cleanlab? Submit a code pull request.

Do you have an issue with cleanlab? Submit an issue.

License

Copyright (c) 2017-2050 Curtis G. Northcutt

cleanlab is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

cleanlab is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

See GNU General Public LICENSE for details.

THIS LICENSE APPLIES TO THIS VERSION AND ALL PREVIOUS VERSIONS OF cleanlab.

You might also like...
ELI5 is a Python package which helps to debug machine learning classifiers and explain their predictions
ELI5 is a Python package which helps to debug machine learning classifiers and explain their predictions

A library for debugging/inspecting machine learning classifiers and explaining their predictions

A simple machine learning package to cluster keywords in higher-level groups.
A simple machine learning package to cluster keywords in higher-level groups.

Simple Keyword Clusterer A simple machine learning package to cluster keywords in higher-level groups. Example: "Senior Frontend Engineer" -- "Fronte

A mindmap summarising Machine Learning concepts, from Data Analysis to Deep Learning.
A mindmap summarising Machine Learning concepts, from Data Analysis to Deep Learning.

A mindmap summarising Machine Learning concepts, from Data Analysis to Deep Learning.

A toolkit for making real world machine learning and data analysis applications in C++

dlib C++ library Dlib is a modern C++ toolkit containing machine learning algorithms and tools for creating complex software in C++ to solve real worl

A library of extension and helper modules for Python's data analysis and machine learning libraries.
A library of extension and helper modules for Python's data analysis and machine learning libraries.

Mlxtend (machine learning extensions) is a Python library of useful tools for the day-to-day data science tasks. Sebastian Raschka 2014-2021 Links Doc

A machine learning toolkit dedicated to time-series data

tslearn The machine learning toolkit for time series analysis in Python Section Description Installation Installing the dependencies and tslearn Getti

A machine learning toolkit dedicated to time-series data

tslearn The machine learning toolkit for time series analysis in Python Section Description Installation Installing the dependencies and tslearn Getti

Apache Liminal is an end-to-end platform for data engineers & scientists, allowing them to build, train and deploy machine learning models in a robust and agile way
Apache Liminal is an end-to-end platform for data engineers & scientists, allowing them to build, train and deploy machine learning models in a robust and agile way

Apache Liminals goal is to operationalise the machine learning process, allowing data scientists to quickly transition from a successful experiment to an automated pipeline of model training, validation, deployment and inference in production. Liminal provides a Domain Specific Language to build ML workflows on top of Apache Airflow.

Meerkat provides fast and flexible data structures for working with complex machine learning datasets.
Meerkat provides fast and flexible data structures for working with complex machine learning datasets.

Meerkat makes it easier for ML practitioners to interact with high-dimensional, multi-modal data. It provides simple abstractions for data inspection, model evaluation and model training supported by efficient and robust IO under the hood.

Releases(v1.0)
Owner
Cleanlab
A team of MIT/Harvard/Stanford scientists and engineers building the world's most reliable data-centric ML tools for the public.
Cleanlab
Skoot is a lightweight python library of machine learning transformer classes that interact with scikit-learn and pandas.

Skoot is a lightweight python library of machine learning transformer classes that interact with scikit-learn and pandas. Its objective is to ex

Taylor G Smith 54 Aug 20, 2022
Uber Open Source 1.6k Dec 31, 2022
pure-predict: Machine learning prediction in pure Python

pure-predict speeds up and slims down machine learning prediction applications. It is a foundational tool for serverless inference or small batch prediction with popular machine learning frameworks l

Ibotta 84 Dec 29, 2022
machine learning model deployment project of Iris classification model in a minimal UI using flask web framework and deployed it in Azure cloud using Azure app service

This is a machine learning model deployment project of Iris classification model in a minimal UI using flask web framework and deployed it in Azure cloud using Azure app service. We initially made th

Krishna Priyatham Potluri 73 Dec 01, 2022
This project used bitcoin, S&P500, and gold to construct an investment portfolio that aimed to minimize risk by minimizing variance.

minvar_invest_portfolio This project used bitcoin, S&P500, and gold to construct an investment portfolio that aimed to minimize risk by minimizing var

1 Jan 06, 2022
A machine learning toolkit dedicated to time-series data

tslearn The machine learning toolkit for time series analysis in Python Section Description Installation Installing the dependencies and tslearn Getti

2.3k Dec 29, 2022
A logistic regression model for health insurance purchasing prediction

Logistic_Regression_Model A logistic regression model for health insurance purchasing prediction This code is using these packages, so please make sur

ShawnWang 1 Nov 29, 2021
Graphsignal is a machine learning model monitoring platform.

Graphsignal is a machine learning model monitoring platform. It helps ML engineers, MLOps teams and data scientists to quickly address issues with data and models as well as proactively analyze model

Graphsignal 143 Dec 05, 2022
pandas, scikit-learn, xgboost and seaborn integration

pandas, scikit-learn and xgboost integration.

299 Dec 30, 2022
A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.

Light Gradient Boosting Machine LightGBM is a gradient boosting framework that uses tree based learning algorithms. It is designed to be distributed a

Microsoft 14.5k Jan 07, 2023
Repository for DCA0305, an undergraduate course about Machine Learning Workflows and Pipelines

Federal University of Rio Grande do Norte Technology Center Department of Computer Engineering and Automation Machine Learning Based Systems Design Re

Ivanovitch Silva 81 Oct 18, 2022
A simple guide to MLOps through ZenML and its various integrations.

ZenBytes Join our Slack Community and become part of the ZenML family Give the main ZenML repo a GitHub star to show your love ZenBytes is a series of

ZenML 127 Dec 27, 2022
Feature-engine is a Python library with multiple transformers to engineer and select features for use in machine learning models.

Feature-engine is a Python library with multiple transformers to engineer and select features for use in machine learning models. Feature-engine's transformers follow scikit-learn's functionality wit

Soledad Galli 33 Dec 27, 2022
A machine learning project that predicts the price of used cars in the UK

Car Price Prediction Image Credit: AA Cars Project Overview Scraped 3000 used cars data from AA Cars website using Python and BeautifulSoup. Cleaned t

Victor Umunna 7 Oct 13, 2022
ParaMonte is a serial/parallel library of Monte Carlo routines for sampling mathematical objective functions of arbitrary-dimensions

ParaMonte is a serial/parallel library of Monte Carlo routines for sampling mathematical objective functions of arbitrary-dimensions, in particular, the posterior distributions of Bayesian models in

Computational Data Science Lab 182 Dec 31, 2022
Simple structured learning framework for python

PyStruct PyStruct aims at being an easy-to-use structured learning and prediction library. Currently it implements only max-margin methods and a perce

pystruct 666 Jan 03, 2023
AtsPy: Automated Time Series Models in Python (by @firmai)

Automated Time Series Models in Python (AtsPy) SSRN Report Easily develop state of the art time series models to forecast univariate data series. Simp

Derek Snow 465 Jan 02, 2023
🌊 River is a Python library for online machine learning.

River is a Python library for online machine learning. It is the result of a merger between creme and scikit-multiflow. River's ambition is to be the go-to library for doing machine learning on strea

OnlineML 4k Jan 03, 2023
Management of exclusive GPU access for distributed machine learning workloads

TensorHive is an open source tool for managing computing resources used by multiple users across distributed hosts. It focuses on granting

Paweł Rościszewski 131 Dec 12, 2022
GRaNDPapA: Generator of Rad Names from Decent Paper Acronyms

Generator of Rad Names from Decent Paper Acronyms

264 Nov 08, 2022