G-Research-Crypto-Competition
Project for passing the ML exam. Dataset took from the competition on the kaggle https://www.kaggle.com/c/g-research-crypto-forecasting
In this repository you can find an example of using SnakeMake to solve ML tasks.
The workflow automation system Snakemake is a tool for creating reproducible and scalable pipelines. Pipelines are described using a human-readable Python-based language. They can be easily scaled for server, cluster, network and cloud environments without having to change the workflow definition. Finally, Snakemake workflows can include a description of the necessary software that will be automatically deployed in any runtime environment.
Getting Started:
- Create virtual environment for development:
$ conda env create -f devenv.yaml
- Activate virtual environment:
$ conda activate G-Research-Crypto-Competition
- Start snakemake pipelines with 8 cores:
$ snakemake --cores 8
Project Organization
├── README.rst <- The top-level readme for developers.
│
├── data
│ ├── external <- Data from third party sources.
│ ├── interim <- Intermediate data that has been transformed.
│ ├── processed <- The final, canonical data sets for modeling.
│ └── raw <- The original, immutable data dump.
│
├── docs <- Technical documentation.
│
├── models <- Trained and serialized models, model predictions,
│ or model summaries.
│
├── notebooks <- Jupyter notebooks. Naming convention is a number
│ (for ordering), the creator's initials, and a
│ short `-` delimited description, e.g.
│ `001-jqp-initial-data-exploration`.
│
├── references <- Data dictionaries, manuals, and all other
│ explanatory materials.
│
├── reports <- Generated analysis as HTML, PDF, LaTeX, etc.
│ └── figures <- Generated graphics and figures to be used in
│ reporting.
│
├── devenv.yaml <- The environment file for reproducing the analysis
│ environment, e.g. generated with
│ `conda env export --from-history > devenv.yaml`
│
├── src <- Source code for use in this project.
│ ├── __init__.py <- Makes src a Python package.
│ │
│ ├── data <- Scripts to download or generate data.
│ │
│ ├── features <- Scripts to turn raw data into features for
│ │ modeling.
│ │
│ ├── models <- Scripts to train models and then use trained
│ │ models to make predictions.
│ │
│ └── reports <- Scripts to create exploratory reports and results
│ oriented visualizations.
│
├── workflow <- Snakemake workflow storage.
│ ├── envs <- Conda environments for snakemake rules.
│ │ └── default.yaml
│ │
│ ├── rules <- Rules as modules.
│ │ └── clean.smk
│ │
│ ├── scripts <- Support functions for using in snakemake workflow.
│ │
│ ├── config.yaml <- Parameters for workflow in yaml format.
│ │
│ └── Snakefile <- Entrypoint of the workflow (it will be
│ automatically discovered when running snakemake
│ from the root of above structure).
│
└── .env.example <- Example of file for environment variables, like
MinIO access or Postgresql credentials. It is
necessary to create an `.env` file based on it.