Microsoft contributing libraries, tools, recipes, sample codes and workshop contents for machine learning & deep learning.

Overview

Machine Learning Collection

Microsoft contributing libraries, tools, recipes, sample codes and workshop contents for machine learning & deep learning.

Table of Contents


Boosting

  • LightGBM - A fast, distributed, high performance gradient boosting framework
  • Explainable Boosting Machines - interpretable model developed in Microsoft Research using bagging, gradient boosting, and automatic interaction detection to estimated generalized additive models.

AutoML

  • Neural Network Intelligence - An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.
  • Archai - Reproducible Rapid Research for Neural Architecture Search (NAS).
  • FLAML - A fast and lightweight AutoML library.
  • Azure Automated Machine Learning - Automated Machine Learning for Tabular data (regression, classification and forecasting) by Azure Machine Learning

Neural Network

  • bayesianize - A Bayesian neural network wrapper in pytorch.
  • O-CNN - Octree-based convolutional neural networks for 3D shape analysis.
  • ResNet - deep residual network.
  • CNTK - microsoft cognitive toolkit (CNTK), open source deep-learning toolkit.
  • InfiniBatch - Efficient, check-pointed data loading for deep learning with massive data sets.

Graph & Network

  • graspologic - utilities and algorithms designed for the processing and analysis of graphs with specialized graph statistical algorithms.
  • TF Graph Neural Network Samples - tensorFlow implementations of graph neural networks.
  • ptgnn - PyTorch Graph Neural Network Library
  • StemGNN - spectral temporal graph neural network (StemGNN) for multivariate time-series forecasting.
  • SPTAG - a distributed approximate nearest neighborhood search (ANN) library.

Vision

  • Microsoft Vision Model ResNet50 - a large pretrained vision ResNet-50 model using search engine's web-scale image data.
  • Oscar - Object-Semantics Aligned Pre-training for Vision-Language Tasks.

Time Series

  • luminol - anomaly detection and correlation library.
  • Greykite - flexible, intuitive and fast forecasts through its flagship algorithm, Silverkite.

NLP

  • T-ULRv2 - Turing multilingual language model.
  • Turing-NLG - Turing Natural Language Generation, 17 billion-parameter language model.
  • DeBERTa - Decoding-enhanced BERT with Disentangled Attention
  • UniLM - Unified Language Model Pre-training / Pre-training for NLP and Beyond
  • Unicoder - Unicoder model for understanding and generation.
  • NeuronBlocks - building your nlp dnn models like playing lego
  • Multilingual Model Transfer - new deep learning models for bootstrapping language understanding models for languages with no labeled data using labeled data from other languages.
  • MT-DNN - multi-task deep neural networks for natural language understanding.
  • inmt - interactive neural machine trainslation-lite
  • OpenKP - automatically extracting keyphrases that are salient to the document meanings is an essential step in semantic document understanding.
  • DeText - a deep neural text understanding framework for ranking and classification tasks.

Online Machine Learning

  • Vowpal Wabbit - fast, efficient, and flexible online machine learning techniques for reinforcement learning, supervised learning, and more.

Recommendation

  • Recommenders - examples and best practics for building recommendation systems (A2SVD, DKN, xDeepFM, LightGBM, LSTUR, NAML, NPA, NRMS, RLRMC, SAR, Vowpal Wabbit are invented/contributed by Microsoft).
  • GDMIX - A deep ranking personalization framework

Distributed

  • DeepSpeed - DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.
  • MMLSpark - machine learning library on spark.
  • pyton-ml - a scalable machine learning library on apache spark.
  • TonY - framwork to natively run deep learning frameworks on apache hadoop.

Casual Inference

  • EconML - Python package for estimating heterogeneous treatment effects from observational data via machine learning.
  • DoWhy - Python library for causal inference that supports explicit modeling and testing of causal assumptions.

Responsible AI

  • InterpretML - a toolkit to help understand models and enable responsbile machine learning.
    • Interpret Community - extends interpret repo with additional interpretability techniques and utility functions.
    • DiCE - diverse counterfactual explanations.
    • Interpret-Text - state-of-the-art explainers for text-based ml models and visualize with dashboard.
  • fairlearn - python package to assess and improve fairness of machine learning models.
  • LiFT - linkedin fairness toolkit.
  • RobustDG - Toolkit for building machine learning models that generalize to unseen domains and are robust to privacy and other attacks.
  • SHAP - a game theoretic approach to explain the output of any machine learning model (scott lundbert, Microsoft Research).
  • LIME - explaining the predictions of any machine learning classifier (Marco, Microsoft Research).
  • BackwardCompatibilityML - Project for open sourcing research efforts on Backward Compatibility in Machine Learning
  • confidential-ml-utils - Python utilities for training and deploying ML models against data you can't see.
  • presidio - context aware, pluggable and customizable data protection and anonymization service for text and images.
  • Confidential ONNX Inference Server - An Open Enclave port of the ONNX inference server with data encryption and attestation capabilities to enable confidential inference on Azure Confidential Computing.
  • Responsible-AI-Widgets - responsible AI user interfaces for Fairlearn, interpret-community, and Error Analysis, as well as foundational building blocks that they rely on.
  • Error Analysis - A toolkit to help analyze and improve model accuracy.
  • Secure Data Sandbox - A toolkit for conducting machine learning trials against confidential data.

Optimization

  • ONNXRuntime - cross-platfom, high performance ML inference and training accelerator.
  • Hummingbird - compile trained ml model into tensor computation for faster inference.
  • EdgeML -
  • DirectML - high-performance, hardware-accelerated DirectX 12 library for machine learning.
  • MMdnn - MMdnn is a set of tools to help users inter-operate among different deep learning frameworks. E.g. model conversion and visualization.
  • inifinibatch - Efficient, check-pointed data loading for deep learning with massive data sets.
  • InferenceSchema - Schema decoration for inference code
  • nnfusion - flexible and efficient deep neural network compiler.

Reinforcement Learning

  • AirSim - open source simulator for autonomous vehicles build on unreal engine / unity from microsoft research.
  • TextWorld - TextWorld is a sandbox learning environment for the training and evaluation of reinforcement learning (RL) agents on text-based games.
  • Moab - Project Moab, a new open-source balancing robot to help engineers and developers learn how to build real-world autonomous control systems with Project Bonsai.
  • MARO - multi-agent resource optimization (MARO) platfom.
  • Training Data-Driven or Surrogate Simulators - build simulation from data for use in RL and Bonsai platform for machine teaching.
  • Bonsai - low code industrial machine teaching platform.
    • Bonsai Python SDK - A python library for integrating data sources with Bonsai BRAIN.

Security

  • counterfit - a CLI that provides a generic automation layer for assessing the security of ML models.

Windows

Datasets

Debug & Benchmark

  • tensorwatch - debugging, monitoring and visualization for python machine learning and data science.
  • PYRIGHT - static type checker for python.
  • Bench ML - Python library to benchmark popular pre-built cloud AI APIs.
  • debugpy - An implementation of the Debug Adapter Protocol for Python
  • kineto - A CPU+GPU Profiling library that provides access to timeline traces and hardware performance counters contributed by Azure AI Platform team.
  • SuperBenchmark - a benchmarking and diagnosis tool for AI infrastructure (software & hardware).

Pipeline

  • GitHub Actions - Automate all your software workflows, now with world-class CI/CD. Build, test, and deploy your code right from GitHub.
  • Azure Pipelines - Automate your builds and deployments with Pipelines so you spend less time with the nuts and bolts and more time being creative.
  • Dagli - framework for defining machine learning models, including feature generation and transformations as DAG.

Platform

  • AI for Earth API Platform - distributed infrastructure designed to provide a secure, scalable, and customizable API hosting, designed to handle the needs of long-running/asynchronous machine learning model inference.
  • HivedDScheduler - Kubernetes Scheduler for Deep Learning.
  • Open Platfom for AI (OpenPAI - resource scheduling and cluster management for AI.
  • OpenPAI Runtime - Runtime for deep learning workload.
  • MLOS - Data Science powered infrastructure and methodology to democratize and automate Performance Engineering.
  • Platform for Situated Intelligence - an open-source framework for multimodal, integrative AI.
  • Qlib - an AI-oriented quantitative investment platform.

Tagging

  • TagAnomaly - Anomaly detection analysis and labeling tool, specifically for multiple time series (one time series per category)
  • VoTT - Visual object tagging tool

Developer tool

  • Visual Studio Code - Code editor redefined and optimized for building and debugging modern web and cloud applications.
  • Gather - adds gather functionality in the Python language to the Jupyter Extension.
  • Pylance - an extension that works alongside Python in Visual Studio Code to provide performant language support.
  • Azure ML Snippets - VSCode snippets for Azure Machine Learning

Sample Code

Workshop

🏃 coming soon

Competition

Book

Learning

Blog, News & Webinar



Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.

Owner
Microsoft
Open source projects and samples from Microsoft
Microsoft
A visual dataflow programming language for sklearn

Persimmon What is it? Persimmon is a visual dataflow language for creating sklearn pipelines. It represents functions as blocks, inputs and outputs ar

Álvaro Bermejo 194 Jan 04, 2023
High performance Python GLMs with all the features!

High performance Python GLMs with all the features!

QuantCo 200 Dec 14, 2022
Adaptive: parallel active learning of mathematical functions

adaptive Adaptive: parallel active learning of mathematical functions. adaptive is an open-source Python library designed to make adaptive parallel fu

741 Dec 27, 2022
Implemented four supervised learning Machine Learning algorithms

Implemented four supervised learning Machine Learning algorithms from an algorithmic family called Classification and Regression Trees (CARTs), details see README_Report.

Teng (Elijah) Xue 0 Jan 31, 2022
Multiple Linear Regression using the LinearRegression class from sklearn.linear_model library

Multiple-Linear-Regression-master - A python program to implement Multiple Linear Regression using the LinearRegression class from sklearn.linear model library

Kushal Shingote 1 Feb 06, 2022
MLR - Machine Learning Research

Machine Learning Research 1. Project Topic 1.1. Exsiting research Benmark: https://paperswithcode.com/sota ACL anthology for NLP papers: http://www.ac

Charles 69 Oct 20, 2022
Estudos e projetos feitos com PySpark.

PySpark (Spark com Python) PySpark é uma biblioteca Spark escrita em Python, e seu objetivo é permitir a análise interativa dos dados em um ambiente d

Karinne Cristina 54 Nov 06, 2022
虚拟货币(BTC、ETH)炒币量化系统项目。在一版本的基础上加入了趋势判断

🎉 第二版本 🎉 (现货趋势网格) 介绍 在第一版本的基础上 趋势判断,不在固定点位开单,选择更优的开仓点位 优势: 🎉 简单易上手 安全(不用将api_secret告诉他人) 如何启动 修改app目录下的authorization文件

幸福村的码农 250 Jan 07, 2023
Python bindings for MPI

MPI for Python Overview Welcome to MPI for Python. This package provides Python bindings for the Message Passing Interface (MPI) standard. It is imple

MPI for Python 604 Dec 29, 2022
Tools for Optuna, MLflow and the integration of both.

HPOflow - Sphinx DOC Tools for Optuna, MLflow and the integration of both. Detailed documentation with examples can be found here: Sphinx DOC Table of

Telekom Open Source Software 17 Nov 20, 2022
Python package for stacking (machine learning technique)

vecstack Python package for stacking (stacked generalization) featuring lightweight functional API and fully compatible scikit-learn API Convenient wa

Igor Ivanov 671 Dec 25, 2022
Backtesting an algorithmic trading strategy using Machine Learning and Sentiment Analysis.

Trading Tesla with Machine Learning and Sentiment Analysis An interactive program to train a Random Forest Classifier to predict Tesla daily prices us

Renato Votto 31 Nov 17, 2022
A library of sklearn compatible categorical variable encoders

Categorical Encoding Methods A set of scikit-learn-style transformers for encoding categorical variables into numeric by means of different techniques

2.1k Jan 07, 2023
A simple python program which predicts the success of a movie based on it's type, actor, actress and director

Movie-Success-Prediction A simple python program which predicts the success of a movie based on it's type, actor, actress and director. The program us

Mahalinga Prasad R N 1 Dec 17, 2021
easyNeuron is a simple way to create powerful machine learning models, analyze data and research cutting-edge AI.

easyNeuron is a simple way to create powerful machine learning models, analyze data and research cutting-edge AI.

Neuron AI 5 Jun 18, 2022
A repository for collating all the resources such as articles, blogs, papers, and books related to Bayesian Statistics.

A repository for collating all the resources such as articles, blogs, papers, and books related to Bayesian Statistics.

Aayush Malik 80 Dec 12, 2022
CD) in machine learning projectsImplementing continuous integration & delivery (CI/CD) in machine learning projects

CML with cloud compute This repository contains a sample project using CML with Terraform (via the cml-runner function) to launch an AWS EC2 instance

Iterative 19 Oct 03, 2022
A unified framework for machine learning with time series

Welcome to sktime A unified framework for machine learning with time series We provide specialized time series algorithms and scikit-learn compatible

The Alan Turing Institute 6k Jan 06, 2023
Python implementation of the rulefit algorithm

RuleFit Implementation of a rule based prediction algorithm based on the rulefit algorithm from Friedman and Popescu (PDF) The algorithm can be used f

Christoph Molnar 326 Jan 02, 2023
ETNA is an easy-to-use time series forecasting framework.

ETNA is an easy-to-use time series forecasting framework. It includes built in toolkits for time series preprocessing, feature generation, a variety of predictive models with unified interface - from

Tinkoff.AI 674 Jan 07, 2023