A logical, reasonably standardized, but flexible project structure for doing and sharing data science work.

Overview

Cookiecutter Data Science

A logical, reasonably standardized, but flexible project structure for doing and sharing data science work.

Project homepage

Requirements to use the cookiecutter template:


  • Python 2.7 or 3.5+
  • Cookiecutter Python package >= 1.4.0: This can be installed with pip by or conda depending on how you manage your Python packages:
$ pip install cookiecutter

or

$ conda config --add channels conda-forge
$ conda install cookiecutter

To start a new project, run:


cookiecutter -c v1 https://github.com/drivendata/cookiecutter-data-science

asciicast

New version of Cookiecutter Data Science


Cookiecutter data science is moving to v2 soon, which will entail using the command ccds ... rather than cookiecutter .... The cookiecutter command will continue to work, and this version of the template will still be available. To use the legacy template, you will need to explicitly use -c v1 to select it. Please update any scripts/automation you have to append the -c v1 option (as above), which is available now.

The resulting directory structure


The directory structure of your new project looks like this:

├── LICENSE
├── Makefile           <- Makefile with commands like `make data` or `make train`
├── README.md          <- The top-level README for developers using this project.
├── data
│   ├── external       <- Data from third party sources.
│   ├── interim        <- Intermediate data that has been transformed.
│   ├── processed      <- The final, canonical data sets for modeling.
│   └── raw            <- The original, immutable data dump.
│
├── docs               <- A default Sphinx project; see sphinx-doc.org for details
│
├── models             <- Trained and serialized models, model predictions, or model summaries
│
├── notebooks          <- Jupyter notebooks. Naming convention is a number (for ordering),
│                         the creator's initials, and a short `-` delimited description, e.g.
│                         `1.0-jqp-initial-data-exploration`.
│
├── references         <- Data dictionaries, manuals, and all other explanatory materials.
│
├── reports            <- Generated analysis as HTML, PDF, LaTeX, etc.
│   └── figures        <- Generated graphics and figures to be used in reporting
│
├── requirements.txt   <- The requirements file for reproducing the analysis environment, e.g.
│                         generated with `pip freeze > requirements.txt`
│
├── setup.py           <- makes project pip installable (pip install -e .) so src can be imported
├── src                <- Source code for use in this project.
│   ├── __init__.py    <- Makes src a Python module
│   │
│   ├── data           <- Scripts to download or generate data
│   │   └── make_dataset.py
│   │
│   ├── features       <- Scripts to turn raw data into features for modeling
│   │   └── build_features.py
│   │
│   ├── models         <- Scripts to train models and then use trained models to make
│   │   │                 predictions
│   │   ├── predict_model.py
│   │   └── train_model.py
│   │
│   └── visualization  <- Scripts to create exploratory and results oriented visualizations
│       └── visualize.py
│
└── tox.ini            <- tox file with settings for running tox; see tox.readthedocs.io

Contributing

We welcome contributions! See the docs for guidelines.

Installing development requirements


pip install -r requirements.txt

Running the tests


py.test tests
artisan: visual scope for coffee roasters

Artisan Visual scope for coffee roasters WARNING: pre-release builds may not work. Use at your own risk. Summary Artisan is a software that helps coff

Artisan – Visual Scope for Coffee Roasters 705 Jan 05, 2023
ReproZip is a tool that simplifies the process of creating reproducible experiments from command-line executions, a frequently-used common denominator in computational science.

ReproZip ReproZip is a tool aimed at simplifying the process of creating reproducible experiments from command-line executions, a frequently-used comm

267 Jan 01, 2023
Efficient Python Tricks and Tools for Data Scientists

Why efficient Python? Because using Python more efficiently will make your code more readable and run more efficiently.

Khuyen Tran 944 Dec 28, 2022
An interactive explorer for single-cell transcriptomics data

an interactive explorer for single-cell transcriptomics data cellxgene (pronounced "cell-by-gene") is an interactive data explorer for single-cell tra

Chan Zuckerberg Initiative 424 Dec 15, 2022
OPEM (Open Source PEM Fuel Cell Simulation Tool)

Table of contents What is PEM? Overview Installation Usage Executable Library Telegram Bot Try OPEM in Your Browser! MATLAB Issues & Bug Reports Contr

ECSIM 133 Jan 04, 2023
3D visualization of scientific data in Python

Mayavi: 3D visualization of scientific data in Python Mayavi docs: http://docs.enthought.com/mayavi/mayavi/ TVTK docs: http://docs.enthought.com/mayav

Enthought, Inc. 1.1k Jan 06, 2023
Incubator for useful bioinformatics code, primarily in Python and R

Collection of useful code related to biological analysis. Much of this is discussed with examples at Blue collar bioinformatics. All code, images and

Brad Chapman 560 Dec 24, 2022
SeqLike - flexible biological sequence objects in Python

SeqLike - flexible biological sequence objects in Python Introduction A single object API that makes working with biological sequences in Python more

186 Dec 23, 2022
3D medical imaging reconstruction software

InVesalius InVesalius generates 3D medical imaging reconstructions based on a sequence of 2D DICOM files acquired with CT or MRI equipments. InVesaliu

443 Jan 01, 2023
Karate Club: An API Oriented Open-source Python Framework for Unsupervised Learning on Graphs (CIKM 2020)

Karate Club is an unsupervised machine learning extension library for NetworkX. Please look at the Documentation, relevant Paper, Promo Video, and Ext

Benedek Rozemberczki 1.8k Dec 31, 2022
PennyLane is a cross-platform Python library for differentiable programming of quantum computers.

PennyLane is a cross-platform Python library for differentiable programming of quantum computers. Train a quantum computer the same way as a neural network.

PennyLaneAI 1.6k Jan 04, 2023
Animation engine for explanatory math videos

Manim is an engine for precise programatic animations, designed for creating explanatory math videos. Note, there are two versions of manim. This repo

Grant Sanderson 48.9k Jan 03, 2023
A flexible package manager that supports multiple versions, configurations, platforms, and compilers.

Spack Spack is a multi-platform package manager that builds and installs multiple versions and configurations of software. It works on Linux, macOS, a

Spack 3.1k Dec 31, 2022
collection of interesting Computer Science resources

collection of interesting Computer Science resources

Kirill Bobyrev 137 Dec 22, 2022
CONCEPT (COsmological N-body CodE in PyThon) is a free and open-source simulation code for cosmological structure formation

CONCEPT (COsmological N-body CodE in PyThon) is a free and open-source simulation code for cosmological structure formation. The code should run on any Linux system, from massively parallel computer

Jeppe Dakin 62 Dec 08, 2022
Probabilistic Programming in Python: Bayesian Modeling and Probabilistic Machine Learning with Aesara

PyMC3 is a Python package for Bayesian statistical modeling and Probabilistic Machine Learning focusing on advanced Markov chain Monte Carlo (MCMC) an

PyMC 7.2k Dec 30, 2022
An open-source application for biological image analysis

CellProfiler is a free open-source software designed to enable biologists without training in computer vision or programming to quantitatively measure

CellProfiler 734 Jan 08, 2023
Python Data Science Handbook: full text in Jupyter Notebooks

Python Data Science Handbook This repository contains the entire Python Data Science Handbook, in the form of (free!) Jupyter notebooks. How to Use th

Jake Vanderplas 36.9k Dec 28, 2022
A computer algebra system written in pure Python

SymPy See the AUTHORS file for the list of authors. And many more people helped on the SymPy mailing list, reported bugs, helped organize SymPy's part

SymPy 9.9k Jan 08, 2023
A simple computer program made with Python on the brachistochrone curve.

Brachistochrone-curve This is a simple computer program made with Python on the brachistochrone curve. I decided to write it after a physics lesson on

Diego Romeo 1 Dec 16, 2021