36 Repositories
Latest Python Libraries
This simple script generates a backup of a given Python and R environment
Python Environment Backup It’s always good to maintain your Python and R Anaconda environment packages properly listed and well-kept in case you have
Transform ML models into a native code with zero dependencies
m2cgen (Model 2 Code Generator) - is a lightweight library which provides an easy way to transpile trained statistical models into a native code
 
R interface to fast.ai
R interface to fastai The fastai package provides R wrappers to fastai. The fastai library simplifies training fast and accurate neural nets using mod
 
List of Data Science Cheatsheets to rule the world
Data Science Cheatsheets List of Data Science Cheatsheets to rule the world. Table of Contents Business Science Business Science Problem Framework Dat
A comprehensive set of fairness metrics for datasets and machine learning models, explanations for these metrics, and algorithms to mitigate bias in datasets and models.
AI Fairness 360 (AIF360) The AI Fairness 360 toolkit is an extensible open-source library containg techniques developed by the research community to h
A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.
Light Gradient Boosting Machine LightGBM is a gradient boosting framework that uses tree based learning algorithms. It is designed to be distributed a
A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.
Light Gradient Boosting Machine LightGBM is a gradient boosting framework that uses tree based learning algorithms. It is designed to be distributed a
Best Practices on Recommendation Systems
Recommenders What's New (February 4, 2021) We have a new relase Recommenders 2021.2! It comes with lots of bug fixes, optimizations and 3 new algorith
 
Anomaly detection analysis and labeling tool, specifically for multiple time series (one time series per category)
taganomaly Anomaly detection labeling tool, specifically for multiple time series (one time series per category). Taganomaly is a tool for creating la
 
Metaflow is a human-friendly Python/R library that helps scientists and engineers build and manage real-life data science projects
Metaflow Metaflow is a human-friendly Python/R library that helps scientists and engineers build and manage real-life data science projects. Metaflow
SimpleITK is an image analysis toolkit with a large number of components supporting general filtering operations, image segmentation and registration
SimpleITK is an image analysis toolkit with a large number of components supporting general filtering operations, image segmentation and registration
Instant search for and access to many datasets in Pyspark.
SparkDataset Provides instant access to many datasets right from Pyspark (in Spark DataFrame structure). Drop a star if you like the project. 😃 Motiv
 
abess: Fast Best-Subset Selection in Python and R
abess: Fast Best-Subset Selection in Python and R Overview abess (Adaptive BEst Subset Selection) library aims to solve general best subset selection,
Apache Spark - A unified analytics engine for large-scale data processing
Apache Spark Spark is a unified analytics engine for large-scale data processing. It provides high-level APIs in Scala, Java, Python, and R, and an op
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.
Website | Documentation | Tutorials | Installation | Release Notes CatBoost is a machine learning method based on gradient boosting over decision tree
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.
Website | Documentation | Tutorials | Installation | Release Notes CatBoost is a machine learning method based on gradient boosting over decision tree
 
A web scraping pipeline project that retrieves TV and movie data from two sources, then transforms and stores data in a MySQL database.
New to Streaming Scraper An in-progress web scraping project built with Python, R, and SQL. The scraped data are movie and TV show information. The go
A logical, reasonably standardized, but flexible project structure for doing and sharing data science work.
Cookiecutter Data Science A logical, reasonably standardized, but flexible project structure for doing and sharing data science work. Project homepage
 
Portfolio Optimization and Quantitative Strategic Asset Allocation in Python
Riskfolio-Lib Quantitative Strategic Asset Allocation, Easy for Everyone. Description Riskfolio-Lib is a library for making quantitative strategic ass
Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth.
Prophet: Automatic Forecasting Procedure Prophet is a procedure for forecasting time series data based on an additive model where non-linear trends ar
r - a small subset of Python Requests
r a small subset of Python Requests a few years ago, when I was first learning Python and looking for http functionality, i found the batteries-includ
GlokyPortScannar is a really fast tool to scan TCP ports implemented in Python.
GlokyPortScannar is a really fast tool to scan TCP ports implemented in Python. Installation: This program requires Python 3.9. Linux
H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
H2O H2O is an in-memory platform for distributed, scalable machine learning. H2O uses familiar interfaces like R, Python, Scala, Java, JSON and the Fl
 
Realtime Web Apps and Dashboards for Python and R
H2O Wave Realtime Web Apps and Dashboards for Python and R New! R Language API Build and control Wave dashboards using R! New! Easily integrate AI/ML
 
Raster processing benchmarks for Python and R packages
Raster processing benchmarks This repository contains a collection of raster processing benchmarks for Python and R packages. The tests cover the most
 
coldcuts is an R package to automatically generate and plot segmentation drawings in R
coldcuts coldcuts is an R package that allows you to draw and plot automatically segmentations from 3D voxel arrays. The name is inspired by one of It
 
Tools for calculating and visualizing Elo-like ratings of MLB teams using Retosheet data
Overview This project uses historical baseball games data to calculate an Elo-like rating for MLB teams based on regular season match ups. The Elo rat
A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.
Light Gradient Boosting Machine LightGBM is a gradient boosting framework that uses tree based learning algorithms. It is designed to be distributed a
 
🛠 All-in-one web-based IDE specialized for machine learning and data science.
All-in-one web-based development environment for machine learning Getting Started • Features & Screenshots • Support • Report a Bug • FAQ • Known Issu
Spatiotemporal resampling methods for mlr3
mlr3spatiotempcv Package website: release | dev Spatiotemporal resampling methods for mlr3. This package extends the mlr3 package framework with spati
 
📚 Papermill is a tool for parameterizing, executing, and analyzing Jupyter Notebooks.
papermill is a tool for parameterizing, executing, and analyzing Jupyter Notebooks. Papermill lets you: parameterize notebooks execute notebooks This
 
Team Curie is a group of people working together to achieve a common aim
Team Curie is a group of people working together to achieve a common aim. We are enthusiasts!.... We are setting the pace!.... We offer encouragement and motivation....And we believe TeamWork makes t
 
Analytical Web Apps for Python, R, Julia, and Jupyter. No JavaScript Required.
Dash Dash is the most downloaded, trusted Python framework for building ML & data science web apps. Built on top of Plotly.js, React and Flask, Dash t
 
A ninja python package that unifies the Google Earth Engine ecosystem.
A Python package that unifies the Google Earth Engine ecosystem. EarthEngine.jl | rgee | rgee+ | eemont GitHub: https://github.com/r-earthengine/ee_ex
CoCalc: Collaborative Calculation in the Cloud
logo CoCalc Collaborative Calculation and Data Science CoCalc is a virtual online workspace for calculations, research, collaboration and authoring do
A Python and R autograding solution
Otter-Grader Otter Grader is a light-weight, modular open-source autograder developed by the Data Science Education Program at UC Berkeley. It is desi