This is an example of a reproducible modelling project

Overview

An example of a reproducible modelling project

What are we doing?

This example was created for the 2021 fall lecture series of Stanford's Center for Open and REproducible Science (CORES).

A video of the talk can be found at: https://youtu.be/JAQot6b1Cng

The goal of this exemplary analysis is to explore the effect of varying different hyper-parameters of the training of a simple classification model on its performance in scikit-learn's handwritten digit dataset.

Specifically, we will study the effect of varying the learning rate, regularisation strength, number of gradient descent steps, and random shuffling of the data on the 3-fold cross-validation performance of scikit-learn's linear support vector machine classifier.

Importantly, each hyper-parameter is varied separately while all other hyper-parameters are set to default values (for details, see scripts/evaluate_hyper_params_effect.py).

Project organization

├── LICENSE            <- MIT License
├── Makefile           <- Makefile with targets to 'load', 'evaluate', and 'plot' ('make all' runs all three analysis steps)
├── poetry.lock        <- Details of used package versions
├── pyproject.toml     <- Lists all dependencies
├── README.md          <- This README file.
├── docs/              
|    └──               <- Slides of the practical tutorial
├── data/
|    └──               <- A copy of the handwritten digit dataset provided by scikit-learn
|
├── results/
|    ├── estimates/
|    │    └──          <- Generated estimates of classifier performance
|    └── figures/
|         └──          <- Generated figures
|
├── scrips/
|    ├── load_data.py                       <- Downloads the dataset to specified 'data-path'
|    ├── evaluate_hyper_params_effect.py    <- Runs cross-validated hyper-parameter evaluation
|    ├── plot_hyper_params_effect.py        <- Summarizes results of evaluation in a figure
|    └── run_analysis.sh                    <- Runs all analysis steps
|
└── src/
    ├── hyper/
    │    ├──  __init__.py                   <- Makes 'hyper' a Python module
    │    ├── grid.py                        <- Functionality to sample hyper-parameter grid
    │    ├── evaluation.py                  <- Functionality to evaluate classifier performance, given hyper-parameters
    │    └── plotting.py                    <- Functionality to visualize results
    └── setup.py                            <- Makes 'hyper' pip-installable (pip install -e .)  

Data description

We use the handwritten digits dataset provided by scikit-learn. For details on this dataset, see scikit-learn's documentation:

https://scikit-learn.org/stable/datasets/toy_dataset.html#digits-dataset

Installation

This project is written for Python 3.9.5 (we recommend pyenv for Python version management).

All software dependencies of this project are managed with Python Poetry. All details about the used package versions are provided in pyproject.toml.

To clone this repository to your local machine, run:

git clone https://github.com/athms/reproducible-modelling

To install all dependencies with poetry, run:

cd reproducible-modelling/
poetry install

To reproduce our analyses, you additionally need to install our custom Python module (src/hyper) in your poetry environment:

cd src/
poetry run pip install -e .

Reproducing our analysis

Our analysis can be reproduced either by running scripts/run_analysis.sh:

cd scripts
poetry run bash run_analysis.sh

..or by the use of make:

poetry run make <ANALYSIS TARGET>

We provide the following targets for make:

Analysis target Description
all Runs the entire analysis pipeline
load Downloads scikit-learn's handwritten digit dataset
evaluate Runs our cross-validated hyper-parameter evaluation
plot Creates our results figure

This README file is strongly inspired by the Cookiecutter Data Science Structure

Owner
Armin Thomas
Ram and Vijay Shriram Data Science Fellow at Stanford Data Science
Armin Thomas
Scene-Text-Detection-and-Recognition (Pytorch)

Scene-Text-Detection-and-Recognition (Pytorch) Competition URL: https://tbrain.t

Gi-Luen Huang 9 Jan 02, 2023
A Light in the Dark: Deep Learning Practices for Industrial Computer Vision

A Light in the Dark: Deep Learning Practices for Industrial Computer Vision This is the repository for our Paper/Contribution to the WI2022 in Nürnber

Maximilian Harl 6 Jan 17, 2022
Deeper insights into graph convolutional networks for semi-supervised learning

deeper_insights_into_GCNs Deeper insights into graph convolutional networks for semi-supervised learning References data and utils.py come from Implem

Davidham3 17 Dec 16, 2022
functorch is a prototype of JAX-like composable function transforms for PyTorch.

functorch is a prototype of JAX-like composable function transforms for PyTorch.

Facebook Research 1.2k Jan 09, 2023
MiniHack the Planet: A Sandbox for Open-Ended Reinforcement Learning Research

MiniHack the Planet: A Sandbox for Open-Ended Reinforcement Learning Research

Facebook Research 338 Dec 29, 2022
A smaller subset of 10 easily classified classes from Imagenet, and a little more French

Imagenette 🎶 Imagenette, gentille imagenette, Imagenette, je te plumerai. 🎶 (Imagenette theme song thanks to Samuel Finlayson) NB: Versions of Image

fast.ai 718 Jan 01, 2023
Research code for CVPR 2021 paper "End-to-End Human Pose and Mesh Reconstruction with Transformers"

MeshTransformer ✨ This is our research code of End-to-End Human Pose and Mesh Reconstruction with Transformers. MEsh TRansfOrmer is a simple yet effec

Microsoft 473 Dec 31, 2022
这是一个yolox-pytorch的源码,可以用于训练自己的模型。

YOLOX:You Only Look Once目标检测模型在Pytorch当中的实现 目录 性能情况 Performance 实现的内容 Achievement 所需环境 Environment 小技巧的设置 TricksSet 文件下载 Download 训练步骤 How2train 预测步骤

Bubbliiiing 613 Jan 05, 2023
Spatial Attentive Single-Image Deraining with a High Quality Real Rain Dataset (CVPR'19)

Spatial Attentive Single-Image Deraining with a High Quality Real Rain Dataset (CVPR'19) Tianyu Wang*, Xin Yang*, Ke Xu, Shaozhe Chen, Qiang Zhang, Ry

Steve Wong 177 Dec 01, 2022
U-Net for GBM

My Final Year Project(FYP) In National University of Singapore(NUS) You need Pytorch(stable 1.9.1) Both cuda version and cpu version are OK File Str

PinkR1ver 1 Oct 27, 2021
Fast SHAP value computation for interpreting tree-based models

FastTreeSHAP FastTreeSHAP package is built based on the paper Fast TreeSHAP: Accelerating SHAP Value Computation for Trees published in NeurIPS 2021 X

LinkedIn 369 Jan 04, 2023
Official Pytorch implementation of paper "Reverse Engineering of Generative Models: Inferring Model Hyperparameters from Generated Images"

Reverse_Engineering_GMs Official Pytorch implementation of paper "Reverse Engineering of Generative Models: Inferring Model Hyperparameters from Gener

100 Dec 18, 2022
Object Tracking and Detection Using OpenCV

Object tracking is one such application of computer vision where an object is detected in a video, otherwise interpreted as a set of frames, and the object’s trajectory is estimated. For instance, yo

Happy N. Monday 4 Aug 21, 2022
A torch.Tensor-like DataFrame library supporting multiple execution runtimes and Arrow as a common memory format

TorchArrow (Warning: Unstable Prototype) This is a prototype library currently under heavy development. It does not currently have stable releases, an

Facebook Research 536 Jan 06, 2023
NAACL2021 - COIL Contextualized Lexical Retriever

COIL Repo for our NAACL paper, COIL: Revisit Exact Lexical Match in Information Retrieval with Contextualized Inverted List. The code covers learning

Luyu Gao 108 Dec 31, 2022
Reproduce partial features of DeePMD-kit using PyTorch.

DeePMD-kit on PyTorch For better understand DeePMD-kit, we implement its partial features using PyTorch and expose interface consuing descriptors. Tec

Shaochen Shi 8 Dec 17, 2022
Lua-parser-lark - An out-of-box Lua parser written in Lark

An out-of-box Lua parser written in Lark Such parser handles a relaxed version o

Taine Zhao 2 Jul 19, 2022
Methods to get the probability of a changepoint in a time series.

Bayesian Changepoint Detection Methods to get the probability of a changepoint in a time series. Both online and offline methods are available. Read t

Johannes Kulick 554 Dec 30, 2022
Unifying Global-Local Representations in Salient Object Detection with Transformer

GLSTR (Global-Local Saliency Transformer) This is the official implementation of paper "Unifying Global-Local Representations in Salient Object Detect

11 Aug 24, 2022
TrTr: Visual Tracking with Transformer

TrTr: Visual Tracking with Transformer We propose a novel tracker network based on a powerful attention mechanism called Transformer encoder-decoder a

趙 漠居(Zhao, Moju) 66 Dec 27, 2022