A logical, reasonably standardized, but flexible project structure for doing and sharing data science work.

Overview

Cookiecutter Data Science

A logical, reasonably standardized, but flexible project structure for doing and sharing data science work.

Project homepage

Requirements to use the cookiecutter template:


  • Python 2.7 or 3.5+
  • Cookiecutter Python package >= 1.4.0: This can be installed with pip by or conda depending on how you manage your Python packages:
$ pip install cookiecutter

or

$ conda config --add channels conda-forge
$ conda install cookiecutter

To start a new project, run:


cookiecutter -c v1 https://github.com/drivendata/cookiecutter-data-science

asciicast

New version of Cookiecutter Data Science


Cookiecutter data science is moving to v2 soon, which will entail using the command ccds ... rather than cookiecutter .... The cookiecutter command will continue to work, and this version of the template will still be available. To use the legacy template, you will need to explicitly use -c v1 to select it. Please update any scripts/automation you have to append the -c v1 option (as above), which is available now.

The resulting directory structure


The directory structure of your new project looks like this:

├── LICENSE
├── Makefile           <- Makefile with commands like `make data` or `make train`
├── README.md          <- The top-level README for developers using this project.
├── data
│   ├── external       <- Data from third party sources.
│   ├── interim        <- Intermediate data that has been transformed.
│   ├── processed      <- The final, canonical data sets for modeling.
│   └── raw            <- The original, immutable data dump.
│
├── docs               <- A default Sphinx project; see sphinx-doc.org for details
│
├── models             <- Trained and serialized models, model predictions, or model summaries
│
├── notebooks          <- Jupyter notebooks. Naming convention is a number (for ordering),
│                         the creator's initials, and a short `-` delimited description, e.g.
│                         `1.0-jqp-initial-data-exploration`.
│
├── references         <- Data dictionaries, manuals, and all other explanatory materials.
│
├── reports            <- Generated analysis as HTML, PDF, LaTeX, etc.
│   └── figures        <- Generated graphics and figures to be used in reporting
│
├── requirements.txt   <- The requirements file for reproducing the analysis environment, e.g.
│                         generated with `pip freeze > requirements.txt`
│
├── setup.py           <- makes project pip installable (pip install -e .) so src can be imported
├── src                <- Source code for use in this project.
│   ├── __init__.py    <- Makes src a Python module
│   │
│   ├── data           <- Scripts to download or generate data
│   │   └── make_dataset.py
│   │
│   ├── features       <- Scripts to turn raw data into features for modeling
│   │   └── build_features.py
│   │
│   ├── models         <- Scripts to train models and then use trained models to make
│   │   │                 predictions
│   │   ├── predict_model.py
│   │   └── train_model.py
│   │
│   └── visualization  <- Scripts to create exploratory and results oriented visualizations
│       └── visualize.py
│
└── tox.ini            <- tox file with settings for running tox; see tox.readthedocs.io

Contributing

We welcome contributions! See the docs for guidelines.

Installing development requirements


pip install -r requirements.txt

Running the tests


py.test tests
A logical, reasonably standardized, but flexible project structure for doing and sharing data science work.

Cookiecutter Data Science A logical, reasonably standardized, but flexible project structure for doing and sharing data science work. Project homepage

Jon C Cline 0 Sep 05, 2021
A framework for feature exploration in Data Science

Beehive A framework for feature exploration in Data Science Background What do we do when we finish one episode of feature exploration in a jupyter no

Steven IJ 1 Jan 03, 2022
artisan: visual scope for coffee roasters

Artisan Visual scope for coffee roasters WARNING: pre-release builds may not work. Use at your own risk. Summary Artisan is a software that helps coff

Artisan – Visual Scope for Coffee Roasters 705 Jan 05, 2023
A mathematica expression evaluator with PokemonTypes

A simple mathematical expression evaluator that uses Pokemon types to replace symbols.

Arnav Jindal 2 Nov 14, 2021
Read-only mirror of https://gitlab.gnome.org/GNOME/pybliographer

Pybliographer Pybliographer provides a framework for working with bibliographic databases. This software is licensed under the GPLv2. For more informa

GNOME Github Mirror 15 May 07, 2022
Probabilistic Programming in Python: Bayesian Modeling and Probabilistic Machine Learning with Aesara

PyMC3 is a Python package for Bayesian statistical modeling and Probabilistic Machine Learning focusing on advanced Markov chain Monte Carlo (MCMC) an

PyMC 7.2k Dec 30, 2022
Mathics is a general-purpose computer algebra system (CAS). It is an open-source alternative to Mathematica

Mathics is a general-purpose computer algebra system (CAS). It is an open-source alternative to Mathematica. It is free both as in "free beer" and as in "freedom".

Mathics 535 Jan 04, 2023
3D medical imaging reconstruction software

InVesalius InVesalius generates 3D medical imaging reconstructions based on a sequence of 2D DICOM files acquired with CT or MRI equipments. InVesaliu

443 Jan 01, 2023
OPEM (Open Source PEM Fuel Cell Simulation Tool)

Table of contents What is PEM? Overview Installation Usage Executable Library Telegram Bot Try OPEM in Your Browser! MATLAB Issues & Bug Reports Contr

ECSIM 133 Jan 04, 2023
ckan 3.6k Dec 27, 2022
collection of interesting Computer Science resources

collection of interesting Computer Science resources

Kirill Bobyrev 137 Dec 22, 2022
Algorithms covered in the Bioinformatics Course part of the Cambridge Computer Science Tripos

Bioinformatics This is a repository of all the algorithms covered in the Bioinformatics Course part of the Cambridge Computer Science Tripos Algorithm

16 Jun 30, 2022
Python Data Science Handbook: full text in Jupyter Notebooks

Python Data Science Handbook This repository contains the entire Python Data Science Handbook, in the form of (free!) Jupyter notebooks. How to Use th

Jake Vanderplas 36.9k Dec 28, 2022
PsychoPy is an open-source package for creating experiments in behavioral science.

PsychoPy is an open-source package for creating experiments in behavioral science. It aims to provide a single package that is: precise enoug

PsychoPy 1.3k Dec 31, 2022
A simple computer program made with Python on the brachistochrone curve.

Brachistochrone-curve This is a simple computer program made with Python on the brachistochrone curve. I decided to write it after a physics lesson on

Diego Romeo 1 Dec 16, 2021
Wikidata scholarly profiles

Scholia is a python package and webapp for interaction with scholarly information in Wikidata. Webapp As a webapp, it currently runs from Wikimedia To

Finn Årup Nielsen 180 Dec 28, 2022
A logical, reasonably standardized, but flexible project structure for doing and sharing data science work.

Cookiecutter Data Science A logical, reasonably standardized, but flexible project structure for doing and sharing data science work. Project homepage

6.4k Jan 02, 2023
Datamol is a python library to work with molecules

Datamol is a python library to work with molecules. It's a layer built on top of RDKit and aims to be as light as possible.

datamol 276 Dec 19, 2022
🍊 :bar_chart: :bulb: Orange: Interactive data analysis

Orange Data Mining Orange is a data mining and visualization toolbox for novice and expert alike. To explore data with Orange, one requires no program

Bioinformatics Laboratory 3.9k Jan 05, 2023
An open-source application for biological image analysis

CellProfiler is a free open-source software designed to enable biologists without training in computer vision or programming to quantitatively measure

CellProfiler 734 Jan 08, 2023