Upgini : data search library for your machine learning pipelines

Last update: Jan 08, 2023

Related tags

Overview

Upgini : data search library for your machine learning pipelines

Find & deliver relevant external data & features to boost ML accuracy 🔥

❔ Overview

Upgini is a Python library for an automated data search to boost supervised ML tasks. It enriches your dataset with intelligently crafted features from a broad range of curated data sources, including open and commercial datasets. The search is conducted for any combination of public IDs contained in your tabular dateset: IP, date, etc. Only features that could improve the prediction power of your ML model are returned.
Motivation: for most ML tasks external data & features boost accuracy significantly better than any hyperparameters tuning. But lack of automated and time-efficient search tools for external data blocks massive adoption of external data in ML pipelines.
We want radically simplify data search and delivery for ML pipelines to make external data & features a standard approach. Like a hyperparameter tuning for machine learning nowadays.

🚀 Awesome features

⭐️ Automatically find only datasets and features that give accuracy improvement for ML algorithm according to metrics: ROC AUC, RMSE, Accuracy. Not just correlated with target variable data or features, which 9 out of 10 cases gives zero accuracy improvement for production ML cases
⭐️ Calculate accuracy metrics and uplifts if you'll enrich your existing ML model with found external data & features, right in search results
⭐️ Check the stability of accuracy gain from external data on out-of-time intervals and verification datasets. Mitigate risks of unstable external data dependencies in ML pipelines
⭐️ Scikit-learn compatible interfaces for easy data delivery into your existing ML pipelines
⭐️ Сurated and updated data sources, including open and commercial data
⭐️ Support for several search key types (including SHA256 hashed email, IPv4, phone, date/datetime), more to come...
⭐️ Supported ML tasks:

🏁 Quick start with kaggle example

🏎 Pre-build dev environment for quick start

Pre-built dev environments with a kaggle example notebooks/kaggle_example.ipynb right inside your browser:

🐍 Jupyter via PyPI

Just install library from PyPi and read this doc 🤓

!pip install upgini
import upgini

🐳 Docker-way

Clone $ git clone https://github.com/upgini/upgini or download upgini git repo locally and follow steps below to build docker container 👇
Build docker image

... from cloned git repo:

docker build -t upgini .

...or directly from GitHub:

DOCKER_BUILDKIT=0 docker build -t upgini [email protected]:upgini/upgini.git#main

Run docker image:

docker run -p 8888:8888 upgini

Open http://localhost:8888?token= in your browser

Kaggle notebook

Jupyter notebook with a kaggle example: notebooks/kaggle_example.ipynb. The problem being solved is a Kaggle competition Store Item Demand Forecasting Challenge. The goal is to predict future sales of different goods in different stores based on a 5-year history of sales. The evaluation metric is SMAPE.

Competition dataset was splited into train (2013-2016 year) and test (2017 year) parts. FeaturesEnricher was fitted on train part. And both datasets were enriched with external features. Finally, ML algorithm was fitted both of the initial and the enriched datasets to compare accuracy improvement. With a solid improvement of the evaluation metric achieved by the enriched ML model.

💻 How it works?

1. 🔑 Get access - API key

You'll need API key from User profile page https://profile.upgini.com
Pass API key via api_key parameter in FeaturesEnricher class constructor or export as environment variable:
... in python

import os
os.environ["UPGINI_API_KEY"] = "your_long_string_api_key_goes_here"

... in bash/zsh

export UPGINI_API_KEY = "your_long_string_api_key_goes_here"

2. 💡 Reuse existing labeled training datasets for search

To simplify things, you can reuse your existing labeled training datasets "as is" to initiate the search. Under the hood, we'll search for relevant data using:

search keys from training dataset to match records from potential external data sources and features
labels from training dataset to estimate relevancy of feature or dataset for your ML task and calculate metrics
your features from training dataset columns to find datasets and features only give accuracy gain to your existing data in the ML model and estimate accuracy uplift (optional)

Just load training dataset into pandas dataframe and separate features' columns from label column:

import pandas as pd
# labeled training dataset - customer_churn_prediction_train.csv
train_df = pd.read_csv("customer_churn_prediction_train.csv")
train_features = train_df.drop(columns="label")
train_label = train_df["label"]

3. 🔦 Choose at least one column as a search key

Search keys columns will be used to match records from all potential external data sources 👓 . Define at least one search key with FeaturesEnricher class initialization.

from upgini import FeaturesEnricher, SearchKey
enricher = FeaturesEnricher (
    search_keys={"subscription_activation_date": SearchKey.DATE},
    keep_input=True )

✨ Search key types we support (more is coming!)

Our team works hard to introduce new search key types, currently we support:

Search Key Meaning Type	Description	Example
SearchKey.EMAIL	e-mail	`[email protected]`
SearchKey.HEM	`sha256(lowercase(email))`	`0e2dfefcddc929933dcec9a5c7db7b172482814e63c80b8460b36a791384e955`
SearchKey.IP	IP address (version 4)	`192.168.0.1`
SearchKey.PHONE	phone number, E.164 standard	`443451925138`
SearchKey.DATE	date	`2020-02-12`(ISO-8601 standard) `12.02.2020` (non standard notation)
SearchKey.DATETIME	datetime	`2020-02-12 12:46:18` `12:46:18 12.02.2020` `unixtimestamp`

⚠️ Requirements for search initialization dataset

We do dataset verification and cleaning under the hood, but still there are some requirements to follow:

Pandas dataframe representation
Correct label column types: integers or strings for binary and multiclass lables, floats for regression
At least one column defined as a search key
Min size after deduplication by search key column and NAs removal: 1000 records
Max size after deduplication by search key column and NAs removal: 1 000 000 records

4. 🔍 Start your first data search!

The main abstraction you interact is FeaturesEnricher. FeaturesEnricher is scikit-learn compatible estimator, so you can easily add it into your existing ML pipelines. First, create instance of the FeaturesEnricher class. Once it created call

fit to search relevant datasets & features
than transform to enrich your dataset with features from search result

Let's try it out!

import pandas as pd
from upgini import FeaturesEnricher, SearchKey

# load labeled training dataset to initiate search
train_df = pd.read_csv("customer_churn_prediction_train.csv")
train_features = train_df.drop(columns="label")
train_target = train_df["label"]

# now we're going to create `FeaturesEnricher` class
# if you still didn't define UPGINI_API_KEY env variable - not a problem, you can do it via `api_key`
enricher = FeaturesEnricher(
    search_keys={"subscription_activation_date": SearchKey.DATE},
    keep_input=True,
    api_key="your_long_string_api_key_goes_here"
)

# everything is ready to fit! For 200к records fitting should take around 10 minutes,
# but don't worry - we'll send email notification. Accuracy metrics of trained model and uplifts
# will be shown automaticly
enricher.fit(train_features, train_target)

That's all). We have fitted FeaturesEnricher and any pandas dataframe, with exactly the same data schema, can be enriched with features from search results. Use transform method, and let magic to do the rest 🪄

# load dataset for enrichment
test_df = pd.read_csv("test.csv")
test_features = test_df.drop(columns="target")
# enrich it!
enriched_test_features = enricher.transform(test_features)
enriched_test_features.head()

You can get more details about FeaturesEnricher in runtime using docstrings, for example, via help(FeaturesEnricher) or help(FeaturesEnricher.fit).

✅ Optional: find datasets and features only give accuracy gain to your existing data in the ML model

If you already have a trained ML model, based on internal features or other external data sources, you can specifically search new datasets & features only give accuracy gain "on top" of them.
Just leave all these existing features in the labeled training dataset and Upgini library automatically use them as a baseline ML model to calculate accuracy metric uplift. And won't return any features that might not give an accuracy gain to the existing ML model feature set.

✅ Optional: check stability of ML accuracy gain from search result datasets & features

You can validate data quality from your search result on out-of-time dataset using eval_set parameter. Let's do that:

# load train dataset
train_df = pd.read_csv("train.csv")
train_features = train_df.drop(columns="target")
train_target = train_df["target"]

# load out-of-time validation dataset
eval_df = pd.read_csv("validation.csv")
eval_features = eval_df.drop(columns="eval_target")
eval_target = eval_df["eval_target"]
# create FeaturesEnricher
enricher = FeaturesEnricher(
    search_keys={"registration_date": SearchKey.DATE},
    keep_input=True
)

# now we fit WITH eval_set parameter to calculate accuracy metrics on OOT dataset.
# the output will contain quality metrics for both the training data set and
# the eval set (validation OOT data set)
enricher.fit(
  train_features,
  train_target,
  eval_set = [(eval_features, eval_target)]
)

⚠️ Requirements for out-of-time dataset

Same data schema as for search initialization dataset
Pandas dataframe representation

🧹 Search dataset validation

We validate and clean search initialization dataset uder the hood:
✂️ Check you search keys columns format
✂️ Check dataset for full row duplicates. If we find any, we remove duplicated rows and make a note on share of row duplicates
✂️ Check inconsistent labels - rows with the same record keys (not search keys!) but different labels, we remove them and make a note on share of row duplicates

🆙 Accuracy and uplift metrics calculations

We calculate all the accuracy metrics and uplifts for non-linear machine learning algorithms, like gradient boosting or neural networks. If your external data consumer is a linear ML algorithm (like log regression), you might notice different accuracy metrics after data enrichment.

💸 Why it's a paid service? Can I use it for free?

The short answer is Yes! We do have two options for that 🤓
Let us explain. This is a part-time project for our small team, but as you might know, search is a very infrastructure-intensive service. We pay infrustructure cost for every search request generated on the platform, as we mostly use serverless components under the hood. Both storage and compute.
To cover these run costs we introduce paid plans with a certain amount of search requests, which we hope will be affordable for most of the data scientists & developers in the community.

First option. Participate in beta testing

Now service is still in a beta stage, so registered beta testers will get an 80USD credits for 6 months. Feel free to start with the registration form 👉 here Please note that number of slots for beta testing is limited and we wont' be able to handle all the requests.

Second option. Share license-free data with community

If you have ANY data which you might consider as royalty and license-free (Open Data) and potentially valuable for supervised ML applications, we'll be happy to give free individual access in exchange for sharing this data with community.
Just upload your data sample right from Jupyter. We will check your data sharing proposal and get back to you ASAP:

import pandas as pd
from upgini import SearchKey
from upgini.ads import upload_user_ads
import os
os.environ["UPGINI_API_KEY"] = "your_long_string_api_key_goes_here"
#you can define custom search key which might not be supported yet, just use SearchKey.CUSTOM_KEY type
sample_df = pd.read_csv("path_to_data_sample_file")
upload_user_ads("test", sample_df, {
    "city": SearchKey.CUSTOM_KEY, "stats_date": SearchKey.DATE
})

🛠 Getting Help & Community

Requests and support channels, in preferred order

Please try to create bug reports that are:

Reproducible. Include steps to reproduce the problem.
Specific. Include as much detail as possible: which Python version, what environment, etc.
Unique. Do not duplicate existing opened issues.
Scoped to a Single Bug. One bug per report.

🧩 Contributing

We are a very small team and this is a part-time project for us, thus most probably we won't be able:

implement ALL the data delivery and integration interfaces for most common ML stacks and frameworks
implement ALL data verification and normalization capabilities for different types of search keys (we just started with current 4)

And we might need some help from community)
So, we'll be happy about every pull request you open and issue you find to make this library more awesome. Please note that it might sometimes take us a while to get back to you.
For major changes, please open an issue first to discuss what you would like to change

Developing

Some convinient ways to start contributing are:
⚙️ Visual Studio Code You can remotely open this repo in VS Code without cloning or automaticaly clone and open it inside a docker container.

⚙️ Gitpod You can use Gitpod to launch a fully functional development environment right in your browser.

🔗 Useful links

^{😔
Found mistype or a bug in code snippet? Our bad! Please report it here.}

A Python library for choreographing your machine learning research.

270 Jan 6, 2023

A library of extension and helper modules for Python's data analysis and machine learning libraries.

Mlxtend (machine learning extensions) is a Python library of useful tools for the day-to-day data science tasks. Sebastian Raschka 2014-2021 Links Doc

4.2k Dec 29, 2022

Python Automated Machine Learning library for tabular data.

Simple but powerful Automated Machine Learning library for tabular data. It uses efficient in-memory SAP HANA algorithms to automate routine Data Scie

47 Dec 17, 2022

Data science, Data manipulation and Machine learning package.

duality Data science, Data manipulation and Machine learning package. Use permitted according to the terms of use and conditions set by the attached l

3 Oct 19, 2022

Data Version Control or DVC is an open-source tool for data science and machine learning projects

Continuous Machine Learning project integration with DVC Data Version Control or DVC is an open-source tool for data science and machine learning proj

2 Jul 29, 2021

A mindmap summarising Machine Learning concepts, from Data Analysis to Deep Learning.

5.7k Dec 30, 2022

Turns your machine learning code into microservices with web API, interactive GUI, and more.

2.8k Jan 2, 2023

AutoTabular automates machine learning tasks enabling you to easily achieve strong predictive performance in your applications.

AutoTabular automates machine learning tasks enabling you to easily achieve strong predictive performance in your applications. With just a few lines of code, you can train and deploy high-accuracy machine learning and deep learning models tabular data.

55 Dec 27, 2022

AutoTabular automates machine learning tasks enabling you to easily achieve strong predictive performance in your applications.

AutoTabular AutoTabular automates machine learning tasks enabling you to easily achieve strong predictive performance in your applications. With just

2 Jun 26, 2022

Comments

Bug fixes
Rename client features to keep uniqueness

Fix multiclass target conversion

Fix incorrect message for autodetected search keys that have zero intersection with datasets

Fix incorrect message for empty intersection

Fix email autodetection
opened by c3p0-upgini 0
Bump numpy from 1.19.5 to 1.22.0
Bumps numpy from 1.19.5 to 1.22.0.

Release notes

Sourced from numpy's releases.

v1.22.0

NumPy 1.22.0 Release Notes

NumPy 1.22.0 is a big release featuring the work of 153 contributors spread over 609 pull requests. There have been many improvements, highlights are:

Annotations of the main namespace are essentially complete. Upstream is a moving target, so there will likely be further improvements, but the major work is done. This is probably the most user visible enhancement in this release.

A preliminary version of the proposed Array-API is provided. This is a step in creating a standard collection of functions that can be used across application such as CuPy and JAX.

NumPy now has a DLPack backend. DLPack provides a common interchange format for array (tensor) data.

New methods for quantile, percentile, and related functions. The new methods provide a complete set of the methods commonly found in the literature.

A new configurable allocator for use by downstream projects.

These are in addition to the ongoing work to provide SIMD support for commonly used functions, improvements to F2PY, and better documentation.

The Python versions supported in this release are 3.8-3.10, Python 3.7 has been dropped. Note that 32 bit wheels are only provided for Python 3.8 and 3.9 on Windows, all other wheels are 64 bits on account of Ubuntu, Fedora, and other Linux distributions dropping 32 bit support. All 64 bit wheels are also linked with 64 bit integer OpenBLAS, which should fix the occasional problems encountered by folks using truly huge arrays.

Expired deprecations

Deprecated numeric style dtype strings have been removed

Using the strings "Bytes0", "Datetime64", "Str0", "Uint32", and "Uint64" as a dtype will now raise a TypeError.

(gh-19539)

Expired deprecations for loads, ndfromtxt, and mafromtxt in npyio

numpy.loads was deprecated in v1.15, with the recommendation that users use pickle.loads instead. ndfromtxt and mafromtxt were both deprecated in v1.17 - users should use numpy.genfromtxt instead with the appropriate value for the usemask parameter.

(gh-19615)

... (truncated)

Commits

4adc87d Merge pull request #20685 from charris/prepare-for-1.22.0-release

fd66547 REL: Prepare for the NumPy 1.22.0 release.

125304b wip

c283859 Merge pull request #20682 from charris/backport-20416

5399c03 Merge pull request #20681 from charris/backport-20954

f9c45f8 Merge pull request #20680 from charris/backport-20663

794b36f Update armccompiler.py

d93b14e Update test_public_api.py

7662c07 Update init.py

311ab52 Update armccompiler.py

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies
opened by dependabot[bot] 0

Releases(1.1.19)

1.1.19(Aug 26, 2022)

Metrics tests fix Correct random_state Add shuffle if date/datetime keys are not presented
Source code(tar.gz)
Source code(zip)
1.1.18(Aug 26, 2022)

Hotfix search keys check
Source code(tar.gz)
Source code(zip)
1.1.17(Aug 26, 2022)

Correct validation
Source code(tar.gz)
Source code(zip)
1.1.16(Aug 26, 2022)

Additional validations Pass File MD5 during upload
Source code(tar.gz)
Source code(zip)
1.1.15(Aug 16, 2022)

Hotfix catboost categorical metrics
Source code(tar.gz)
Source code(zip)
1.1.14(Aug 16, 2022)

Refactoring and other improvements
Source code(tar.gz)
Source code(zip)
1.1.14a1(Aug 16, 2022)

Check uploading file size
Source code(tar.gz)
Source code(zip)
1.1.13(Aug 16, 2022)

Refactoring and other improvements
Source code(tar.gz)
Source code(zip)
1.1.12a1(Aug 16, 2022)

Hotfix index search key
Source code(tar.gz)
Source code(zip)
1.1.10a1(Aug 16, 2022)

Calculate metrics only if features present Adding email list for demo notebook
Source code(tar.gz)
Source code(zip)
1.1.12(Aug 16, 2022)

Bump version
Source code(tar.gz)
Source code(zip)
1.1.11(Aug 16, 2022)

Hotfix old dates
Source code(tar.gz)
Source code(zip)
1.1.10(Aug 16, 2022)

Refactoring & improvements Update target_utils.py
Source code(tar.gz)
Source code(zip)
1.1.8(Aug 4, 2022)

Code refactoring, performance optimizations
Source code(tar.gz)
Source code(zip)
1.1.7(Aug 4, 2022)

Fix, second fit run with index as a search key
Source code(tar.gz)
Source code(zip)
1.1.6(Aug 4, 2022)

Fix, index as a search key
Source code(tar.gz)
Source code(zip)
1.1.6a2(Aug 4, 2022)

Code refactoring, performance optimizations
Source code(tar.gz)
Source code(zip)
1.1.6a1(Aug 4, 2022)

Code refactoring, performance optimizations
Source code(tar.gz)
Source code(zip)
1.1.5(Aug 4, 2022)

Code refactoring, performance optimizations
Source code(tar.gz)
Source code(zip)
1.1.4(Aug 4, 2022)

Code refactoring, performance optimizations
Source code(tar.gz)
Source code(zip)
1.1.3(Aug 4, 2022)

Code refactoring, performance optimizations
Source code(tar.gz)
Source code(zip)
1.1.2(Aug 4, 2022)

Code refactoring, performance optimizations
Source code(tar.gz)
Source code(zip)
1.1.1(Aug 4, 2022)

Code refactoring, performance optimizations Optimized access token code
Source code(tar.gz)
Source code(zip)
1.1.0(Aug 4, 2022)

Removed dependecy - fork RandomUnderSampler
Source code(tar.gz)
Source code(zip)
0.10.0a95(Aug 4, 2022)

Full support for the datetime as a search key Rename ISO to COUNTRY
Source code(tar.gz)
Source code(zip)
0.10.0a93(Aug 4, 2022)

Support timeseries CV
Source code(tar.gz)
Source code(zip)
0.10.0a92(Aug 4, 2022)

Override default RMSLE metric calculation Correct geo search keys normalization
Source code(tar.gz)
Source code(zip)
0.10.0a91(Aug 4, 2022)

Geo search keys code refactoring
Source code(tar.gz)
Source code(zip)
0.10.0a90(Aug 4, 2022)

Geo search keys normalization
Source code(tar.gz)
Source code(zip)
0.10.0a89(Aug 4, 2022)

Sort dataset on fit
Source code(tar.gz)
Source code(zip)

Owner

Upgini

We build novel approach to boost accuracy of any AutoML pipeline

GitHub Repository https://upgini.com

OptaPy is an AI constraint solver for Python to optimize planning and scheduling problems.

OptaPy is an AI constraint solver for Python to optimize the Vehicle Routing Problem, Employee Rostering, Maintenance Scheduling, Task Assignment, School Timetabling, Cloud Optimization, Conference S

208 Dec 27, 2022

[HELP REQUESTED] Generalized Additive Models in Python

pyGAM Generalized Additive Models in Python. Documentation Official pyGAM Documentation: Read the Docs Building interpretable models with Generalized

747 Jan 05, 2023

A project based example of Data pipelines, ML workflow management, API endpoints and Monitoring.

MLOps template with examples for Data pipelines, ML workflow management, API development and Monitoring.

33 Dec 03, 2022

flexible time-series processing & feature extraction

A corona statistics and information telegram bot.

206 Dec 28, 2022

MLReef is an open source ML-Ops platform that helps you collaborate, reproduce and share your Machine Learning work with thousands of other users.

The collaboration platform for Machine Learning MLReef is an open source ML-Ops platform that helps you collaborate, reproduce and share your Machine

1.4k Dec 27, 2022

Lightweight Machine Learning Experiment Logging 📖

Simple logging of statistics, model checkpoints, plots and other objects for your Machine Learning Experiments (MLE). Furthermore, the MLELogger comes with smooth multi-seed result aggregation and co

65 Dec 08, 2022

Automated Machine Learning Pipeline for tabular data. Designed for predictive maintenance applications, failure identification, failure prediction, condition monitoring, etc.

10 May 15, 2022

GroundSeg Clustering Optimized Kdtree

ground seg and clustering based on kitti velodyne data, and a additional optimized kdtree for knn and radius nn search

2 Dec 02, 2021

A simple application that calculates the probability distribution of a normal distribution

probability-density-function General info An application that calculates the probability density and cumulative distribution of a normal distribution

1 Oct 25, 2022

healthy and lesion models for learning based on the joint estimation of stochasticity and volatility

health-lesion-stovol healthy and lesion models for learning based on the joint estimation of stochasticity and volatility Reference please cite this p

5 Nov 01, 2022

A machine learning toolkit dedicated to time-series data

tslearn The machine learning toolkit for time series analysis in Python Section Description Installation Installing the dependencies and tslearn Getti

2.3k Dec 29, 2022

The code from the Machine Learning Bookcamp book and a free course based on the book

5.5k Jan 09, 2023

A collection of video resources for machine learning

Machine Learning Videos This is a collection of recorded talks at machine learning conferences, workshops, seminars, summer schools, and miscellaneous

1.5k Dec 29, 2022

pandas, scikit-learn, xgboost and seaborn integration

pandas, scikit-learn and xgboost integration.

299 Dec 30, 2022

MosaicML Composer contains a library of methods, and ways to compose them together for more efficient ML training

MosaicML Composer MosaicML Composer contains a library of methods, and ways to compose them together for more efficient ML training. We aim to ease th

2.8k Jan 06, 2023

A Lucid Framework for Transparent and Interpretable Machine Learning Models.

Currently a Beta-Version lucidmode is an open-source, low-code and lightweight Python framework for transparent and interpretable machine learning mod

15 Aug 12, 2022

A python library for Bayesian time series modeling

PyDLM Welcome to pydlm, a flexible time series modeling library for python. This library is based on the Bayesian dynamic linear model (Harrison and W

438 Dec 17, 2022

Model Agnostic Confidence Estimator (MACEST) - A Python library for calibrating Machine Learning models' confidence scores

95 Dec 28, 2022

Arquivos do curso online sobre a estatística voltada para ciência de dados e aprendizado de máquina.

Estatistica para Ciência de Dados e Machine Learning Arquivos do curso online sobre a estatística voltada para ciência de dados e aprendizado de máqui

1 Jan 10, 2022

This is a Cricket Score Predictor that predicts the first innings score of a T20 Cricket match using Machine Learning

This is a Cricket Score Predictor that predicts the first innings score of a T20 Cricket match using Machine Learning. It is a Web Application.

3 Aug 04, 2022

Upgini : data search library for your machine learning pipelines

Related tags

Overview

Upgini : data search library for your machine learning pipelines

❔ Overview

🚀 Awesome features

🏁 Quick start with kaggle example

🏎 Pre-build dev environment for quick start

🐍 Jupyter via PyPI

🐳 Docker-way

Kaggle notebook

💻 How it works?

1. 🔑 Get access - API key

2. 💡 Reuse existing labeled training datasets for search

3. 🔦 Choose at least one column as a search key

✨ Search key types we support (more is coming!)

⚠️ Requirements for search initialization dataset

4. 🔍 Start your first data search!

✅ Optional: find datasets and features only give accuracy gain to your existing data in the ML model

✅ Optional: check stability of ML accuracy gain from search result datasets & features

⚠️ Requirements for out-of-time dataset

🧹 Search dataset validation

🆙 Accuracy and uplift metrics calculations

💸 Why it's a paid service? Can I use it for free?

First option. Participate in beta testing

Second option. Share license-free data with community

🛠 Getting Help & Community

🧩 Contributing

Developing

🔗 Useful links

You might also like...

A Python library for choreographing your machine learning research.

A library of extension and helper modules for Python's data analysis and machine learning libraries.

Python Automated Machine Learning library for tabular data.

Data science, Data manipulation and Machine learning package.

Data Version Control or DVC is an open-source tool for data science and machine learning projects

A mindmap summarising Machine Learning concepts, from Data Analysis to Deep Learning.

Turns your machine learning code into microservices with web API, interactive GUI, and more.

AutoTabular automates machine learning tasks enabling you to easily achieve strong predictive performance in your applications.

AutoTabular automates machine learning tasks enabling you to easily achieve strong predictive performance in your applications.

Comments

Bug fixes

Bump numpy from 1.19.5 to 1.22.0

v1.22.0

NumPy 1.22.0 Release Notes

Expired deprecations

Deprecated numeric style dtype strings have been removed

Expired deprecations for loads, ndfromtxt, and mafromtxt in npyio

Releases(1.1.19)

1.1.19(Aug 26, 2022)

1.1.18(Aug 26, 2022)

1.1.17(Aug 26, 2022)

1.1.16(Aug 26, 2022)

1.1.15(Aug 16, 2022)

1.1.14(Aug 16, 2022)

1.1.14a1(Aug 16, 2022)

1.1.13(Aug 16, 2022)

1.1.12a1(Aug 16, 2022)

1.1.10a1(Aug 16, 2022)

1.1.12(Aug 16, 2022)

1.1.11(Aug 16, 2022)

1.1.10(Aug 16, 2022)

1.1.8(Aug 4, 2022)

1.1.7(Aug 4, 2022)

1.1.6(Aug 4, 2022)

1.1.6a2(Aug 4, 2022)

1.1.6a1(Aug 4, 2022)

1.1.5(Aug 4, 2022)

1.1.4(Aug 4, 2022)

1.1.3(Aug 4, 2022)

1.1.2(Aug 4, 2022)

1.1.1(Aug 4, 2022)

1.1.0(Aug 4, 2022)

0.10.0a95(Aug 4, 2022)

0.10.0a93(Aug 4, 2022)

0.10.0a92(Aug 4, 2022)

0.10.0a91(Aug 4, 2022)

0.10.0a90(Aug 4, 2022)

0.10.0a89(Aug 4, 2022)

Owner

Expired deprecations for `loads`, `ndfromtxt`, and `mafromtxt` in npyio