Tutorial on scikit-learn and IPython for parallel machine learning

Overview

Parallel Machine Learning with scikit-learn and IPython

Video Tutorial

Video recording of this tutorial given at PyCon in 2013. The tutorial material has been rearranged in part and extended. Look at the title of the of the notebooks to be able to follow along the presentation.

Browse the static notebooks on nbviewer.ipython.org.

Scope of this tutorial:

  • Learn common machine learning concepts and how they match the scikit-learn Estimator API.

  • Learn about scalable feature extraction for text classification and clustering

  • Learn how to perform parallel cross validation and hyper parameters grid search in parallel with IPython.

  • Learn to analyze the kinds of common errors predictive models are subject to and how to refine your modeling to take this analysis into account.

  • Learn to optimize memory allocation on your computing nodes with numpy memory mapping features.

  • Learn how to run a cheap IPython cluster for interactive predictive modeling on the Amazon EC2 spot instances using StarCluster.

Target audience

This tutorial targets developers with some experience with scikit-learn and machine learning concepts in general.

It is recommended to first go through one of the tutorials hosted at scikit-learn.org if you are new to scikit-learn.

You might might also want to have a look at SciPy Lecture Notes first if you are new to the NumPy / SciPy / matplotlib ecosystem.

Setup

Install NumPy, SciPy, matplotlib, IPython, psutil, and scikit-learn in their latest stable version (e.g. IPython 2.2.0 and scikit-learn 0.15.2 at the time of writing).

You can find up to date installation instructions on scikit-learn.org and ipython.org .

To check your installation, launch the ipython interactive shell in a console and type the following import statements to check each library:

>>> import numpy
>>> import scipy
>>> import matplotlib
>>> import psutil
>>> import sklearn

If you don't get any message, everything is fine. If you get an error message, please ask for help on the mailing list of the matching project and don't forget to mention the version of the library you are trying to install along with the type of platform and version (e.g. Windows 8.1, Ubuntu 14.04, OSX 10.9...).

You can exit the ipython shell by typing exit.

Fetching the data

It is recommended to fetch the datasets ahead of time before diving into the tutorial material itself. To do so run the fetch_data.py script in this folder:

python fetch_data.py

Using the IPython notebook to follow the tutorial

The tutorial material and exercises are hosted in a set of IPython executable notebook files.

To run them interactively do:

$ cd notebooks
$ ipython notebook

This should automatically open a new browser window listing all the notebooks of the folder.

You can then execute the cell in order by hitting the "Shift-Enter" keys and watch the output display directly under the cell and the cursor move on to the next cell. Go to the "Help" menu for links to the notebook tutorial.

Credits

Some of this material is adapted from the scipy 2013 tutorial:

http://github.com/jakevdp/sklearn_scipy2013

Original authors:

Owner
Olivier Grisel
Machine Learning Engineer a Inria Saclay (Parietal team).
Olivier Grisel
使用深度学习框架提取视频硬字幕;docker容器免安装深度学习库,使用本地api接口使得界面和后端识别分离;

extract-video-subtittle 使用深度学习框架提取视频硬字幕; 本地识别无需联网; CPU识别速度可观; 容器提供API接口; 运行环境 本项目运行环境非常好搭建,我做好了docker容器免安装各种深度学习包; 提供windows界面操作; 容器为CPU版本; 视频演示 https

歌者 16 Aug 06, 2022
《K-Adapter: Infusing Knowledge into Pre-Trained Models with Adapters》(2020)

K-Adapter: Infusing Knowledge into Pre-Trained Models with Adapters This repository is the implementation of the paper "K-Adapter: Infusing Knowledge

Microsoft 118 Dec 13, 2022
Official code for paper "Demystifying Local Vision Transformer: Sparse Connectivity, Weight Sharing, and Dynamic Weight"

Demysitifing Local Vision Transformer, arxiv This is the official PyTorch implementation of our paper. We simply replace local self attention by (dyna

138 Dec 28, 2022
PyTorch Implementation of PortaSpeech: Portable and High-Quality Generative Text-to-Speech

PortaSpeech - PyTorch Implementation PyTorch Implementation of PortaSpeech: Portable and High-Quality Generative Text-to-Speech. Model Size Module Nor

Keon Lee 279 Jan 04, 2023
This repository will be a summary and outlook on all our open, medical, AI advancements.

medical by LAION This repository will be a summary and outlook on all our open, medical, AI advancements. See the medical-general channel in the medic

LAION AI 18 Dec 30, 2022
chainladder - Property and Casualty Loss Reserving in Python

chainladder (python) chainladder - Property and Casualty Loss Reserving in Python This package gets inspiration from the popular R ChainLadder package

Casualty Actuarial Society 130 Dec 07, 2022
Pytorch code for our paper "Feedback Network for Image Super-Resolution" (CVPR2019)

Feedback Network for Image Super-Resolution [arXiv] [CVF] [Poster] Update: Our proposed Gated Multiple Feedback Network (GMFN) will appear in BMVC2019

Zhen Li 539 Jan 06, 2023
A PyTorch implementation of the architecture of Mask RCNN

EDIT (AS OF 4th NOVEMBER 2019): This implementation has multiple errors and as of the date 4th, November 2019 is insufficient to be utilized as a reso

Sai Himal Allu 975 Dec 30, 2022
A list of all named GANs!

The GAN Zoo Every week, new GAN papers are coming out and it's hard to keep track of them all, not to mention the incredibly creative ways in which re

Avinash Hindupur 12.9k Jan 08, 2023
Educational 2D SLAM implementation based on ICP and Pose Graph

slam-playground Educational 2D SLAM implementation based on ICP and Pose Graph How to use: Use keyboard arrow keys to navigate robot. Press 'r' to vie

Kirill 19 Dec 17, 2022
A set of tools for creating and testing machine learning features, with a scikit-learn compatible API

Feature Forge This library provides a set of tools that can be useful in many machine learning applications (classification, clustering, regression, e

Machinalis 380 Nov 05, 2022
TICC is a python solver for efficiently segmenting and clustering a multivariate time series

TICC TICC is a python solver for efficiently segmenting and clustering a multivariate time series. It takes as input a T-by-n data matrix, a regulariz

406 Dec 12, 2022
Official PyTorch implementation of "VITON-HD: High-Resolution Virtual Try-On via Misalignment-Aware Normalization" (CVPR 2021)

VITON-HD — Official PyTorch Implementation VITON-HD: High-Resolution Virtual Try-On via Misalignment-Aware Normalization Seunghwan Choi*1, Sunghyun Pa

Seunghwan Choi 250 Jan 06, 2023
DualGAN-tensorflow: tensorflow implementation of DualGAN

ICCV paper of DualGAN DualGAN: unsupervised dual learning for image-to-image translation please cite the paper, if the codes has been used for your re

Jack Yi 252 Nov 10, 2022
This code is an implementation for Singing TTS.

MLP Singer This code is an implementation for Singing TTS. The algorithm is based on the following papers: Tae, J., Kim, H., & Lee, Y. (2021). MLP Sin

Heejo You 22 Dec 23, 2022
Faster RCNN pytorch windows

Faster-RCNN-pytorch-windows Faster RCNN implementation with pytorch for windows Open cmd, compile this comands: cd lib python setup.py build develop T

Hwa-Rang Kim 1 Nov 11, 2022
A collection of random and hastily hacked together scripts for investigating EU-DCC

A collection of random and hastily hacked together scripts for investigating EU-DCC

Ryan Barrett 8 Mar 01, 2022
CVPR 2021 Official Pytorch Code for UC2: Universal Cross-lingual Cross-modal Vision-and-Language Pre-training

UC2 UC2: Universal Cross-lingual Cross-modal Vision-and-Language Pre-training Mingyang Zhou, Luowei Zhou, Shuohang Wang, Yu Cheng, Linjie Li, Zhou Yu,

Mingyang Zhou 28 Dec 30, 2022
Mask-invariant Face Recognition through Template-level Knowledge Distillation

Mask-invariant Face Recognition through Template-level Knowledge Distillation This is the official repository of "Mask-invariant Face Recognition thro

Fadi Boutros 35 Dec 06, 2022
Domain Adaptation with Invariant RepresentationLearning: What Transformations to Learn?

Domain Adaptation with Invariant RepresentationLearning: What Transformations to Learn? Repository Structure: DSAN |└───amazon |    └── dataset (Amazo

DMIRLAB 17 Jan 04, 2023