This is an example of a reproducible modelling project

Last update: Oct 26, 2021

Related tags

Overview

An example of a reproducible modelling project

What are we doing?

This example was created for the 2021 fall lecture series of Stanford's Center for Open and REproducible Science (CORES).

A video of the talk can be found at: https://youtu.be/JAQot6b1Cng

The goal of this exemplary analysis is to explore the effect of varying different hyper-parameters of the training of a simple classification model on its performance in scikit-learn's handwritten digit dataset.

Specifically, we will study the effect of varying the learning rate, regularisation strength, number of gradient descent steps, and random shuffling of the data on the 3-fold cross-validation performance of scikit-learn's linear support vector machine classifier.

Importantly, each hyper-parameter is varied separately while all other hyper-parameters are set to default values (for details, see scripts/evaluate_hyper_params_effect.py).

Project organization

├── LICENSE            <- MIT License
├── Makefile           <- Makefile with targets to 'load', 'evaluate', and 'plot' ('make all' runs all three analysis steps)
├── poetry.lock        <- Details of used package versions
├── pyproject.toml     <- Lists all dependencies
├── README.md          <- This README file.
├── docs/              
|    └──               <- Slides of the practical tutorial
├── data/
|    └──               <- A copy of the handwritten digit dataset provided by scikit-learn
|
├── results/
|    ├── estimates/
|    │    └──          <- Generated estimates of classifier performance
|    └── figures/
|         └──          <- Generated figures
|
├── scrips/
|    ├── load_data.py                       <- Downloads the dataset to specified 'data-path'
|    ├── evaluate_hyper_params_effect.py    <- Runs cross-validated hyper-parameter evaluation
|    ├── plot_hyper_params_effect.py        <- Summarizes results of evaluation in a figure
|    └── run_analysis.sh                    <- Runs all analysis steps
|
└── src/
    ├── hyper/
    │    ├──  __init__.py                   <- Makes 'hyper' a Python module
    │    ├── grid.py                        <- Functionality to sample hyper-parameter grid
    │    ├── evaluation.py                  <- Functionality to evaluate classifier performance, given hyper-parameters
    │    └── plotting.py                    <- Functionality to visualize results
    └── setup.py                            <- Makes 'hyper' pip-installable (pip install -e .)

Data description

We use the handwritten digits dataset provided by scikit-learn. For details on this dataset, see scikit-learn's documentation:

https://scikit-learn.org/stable/datasets/toy_dataset.html#digits-dataset

Installation

This project is written for Python 3.9.5 (we recommend pyenv for Python version management).

All software dependencies of this project are managed with Python Poetry. All details about the used package versions are provided in pyproject.toml.

To clone this repository to your local machine, run:

git clone https://github.com/athms/reproducible-modelling

To install all dependencies with poetry, run:

cd reproducible-modelling/
poetry install

To reproduce our analyses, you additionally need to install our custom Python module (src/hyper) in your poetry environment:

cd src/
poetry run pip install -e .

Reproducing our analysis

Our analysis can be reproduced either by running scripts/run_analysis.sh:

cd scripts
poetry run bash run_analysis.sh

..or by the use of make:

poetry run make <ANALYSIS TARGET>

We provide the following targets for make:

Analysis target	Description
all	Runs the entire analysis pipeline
load	Downloads scikit-learn's handwritten digit dataset
evaluate	Runs our cross-validated hyper-parameter evaluation
plot	Creates our results figure

This README file is strongly inspired by the Cookiecutter Data Science Structure

This is an example of a reproducible modelling project

Related tags

Overview

An example of a reproducible modelling project

What are we doing?

Project organization

Data description

Installation

Reproducing our analysis

Owner

Armin Thomas

[ICLR 2022] Contact Points Discovery for Soft-Body Manipulations with Differentiable Physics

根据midi文件演奏“风物之诗琴”的脚本 "Windsong Lyre" auto play

A research toolkit for particle swarm optimization in Python

This repository is all about spending some time the with the original problem posed by Minsky and Papert

To Design and Implement Logistic Regression to Classify Between Benign and Malignant Cancer Types

This repository contains the source code for the paper "DONeRF: Towards Real-Time Rendering of Compact Neural Radiance Fields using Depth Oracle Networks",

Real-Time SLAM for Monocular, Stereo and RGB-D Cameras, with Loop Detection and Relocalization Capabilities

VACA: Designing Variational Graph Autoencoders for Interventional and Counterfactual Queries

Film review classification

A U-Net combined with a variational auto-encoder that is able to learn conditional distributions over semantic segmentations.

A Fast Monotone Rotating Shallow Water model

Official Pytorch implementation for video neural representation (NeRV)

This is the workbook I created while I was studying for the Qiskit Associate Developer exam. I hope this becomes useful to others as it was for me :)

A collection of Google research projects related to Federated Learning and Federated Analytics.

Example scripts for the detection of lanes using the ultra fast lane detection model in ONNX.

Dataloader tools for language modelling

OMLT: Optimization and Machine Learning Toolkit

Complete system for facial identity system. Include one-shot model, database operation, features visualization, monitoring

Safe Local Motion Planning with Self-Supervised Freespace Forecasting, CVPR 2021

The code for our CVPR paper PISE: Person Image Synthesis and Editing with Decoupled GAN, Project Page, supp.