Toolchest provides APIs for scientific and bioinformatic data analysis.

Overview

Toolchest Python Client

Toolchest provides APIs for scientific and bioinformatic data analysis. It allows you to abstract away the costliness of running tools on your own resources by running the same jobs on secure, powerful remote servers.

This package contains the Python client for using Toolchest. For the R client, see here.

Installation

The Toolchest client is available on PyPI:

pip install toolchest-client

Usage

Using a tool in Toolchest is as simple as:

import toolchest_client as toolchest
toolchest.set_key("YOUR_TOOLCHEST_KEY")
toolchest.kraken2(
  tool_args="",
  inputs="path/to/input.fastq",
  output_path="path/to/output.fastq",
)

For a list of available tools, see the documentation.

Configuration

To use Toolchest, you must have an authentication key stored in the TOOLCHEST_KEY environment variable.

import toolchest_client as toolchest
toolchest.set_key("YOUR_TOOLCHEST_KEY") # or a file path containing the key

Contact Toolchest if:

  • you need a key
  • you’ve forgotten your key
  • the key is producing authentication errors.

Documentation & User Guide available at Read the Docs

Comments
  • Enable paired reads for `kraken2`

    Enable paired reads for `kraken2`

    Adds the option to use paired-read inputs for kraken2, via the read_one and read_two arguments (or a list of two paths via inputs).

    Adds/removes --paired to tool_args as necessary.

    opened by bcai2 3
  • v0.4.0

    v0.4.0

    • Add Poetry, remove Twine

    • Add CircleCI automatic deploy to PyPI (untested for prod PyPI)

    Note: CircleCI will be failing because v0.4.0 already exists on test PyPI. That is to be expected, because I already bumped it to v0.4.0 when testing.

    opened by lebovic 3
  • S3 chaining

    S3 chaining

    Adds:

    • Output class returned by all toolchest.tool() calls, which contains s3_uri, presigned_s3_url, and (local) output_path variables
    • S3 chaining, via supplying output.s3_uri from a previous tool as the inputs parameter for a following tool
    • the ability to skip download of any tool's output, by setting output_path=None (set to None by default)
    opened by lebovic 2
  • Polish tool_arg handling, add more STAR args

    Polish tool_arg handling, add more STAR args

    Adds:

    • More STAR args
    • Add multiple levels of tool_arg handling (whitelist, dangerlist, blacklist)
    • Error on unknown or blacklisted args
    • Reduce complexity (validation and parallelization for now) if a dangerous argument is passed

    Requires:

    • https://github.com/trytoolchest/toolchest-worker-node/pull/24
    • https://github.com/trytoolchest/toolchest-api/pull/22

    This does not fix:

    • Bigger disk/memory/etc requirements for larger files where args trigger reduced complexity / no parallelization
    opened by lebovic 2
  • STAR whitelist options

    STAR whitelist options

    • Adds basic whitelist options for STAR.

    • Adds support for tags with variable amounts of arguments. Adds the --quantMode tag for STAR.

    (This should be merged in after the kraken2 paired read commit.)

    opened by bcai2 2
  • feat: centrifuge base

    feat: centrifuge base

    • Adds the centrifuge tool.
    • Adds docs.
    • Refactors how prefix_mapping is generated for megahit with a new module (input_util.py) and function (convert_input_params_to_prefix_mapping). Adds a unit test for the function.
    opened by bcai2 1
  • fix: upload/download tracker bugfixes

    fix: upload/download tracker bugfixes

    • Refactors the tracking printed statements into a pythonic print call with string formatting.
    • Fixes status update logic in uploading. (This was causing the terminal output to stall at the "uploading" stage.)
    • Adds integration test dirs to .gitignore.
    opened by bcai2 1
  • fix: remove pysam due to multiple issues

    fix: remove pysam due to multiple issues

    Pysam has caused multiple issues as a package and STAR parallelization is not currently used so this pr fully removes pysam as a dependency. Either a different library or custom sam file merging code is planned to be implemented later so parallelization framework is remaining in the code for now.

    opened by jherr-dev 1
  • feat: add preliminary alphafold support

    feat: add preliminary alphafold support

    Adds basic support for running AlphaFold via Toolchest. Code needs to be cleaned up and better documented. Currently limited to 1 input fasta.

    use_reduced_dbs and is_prokaryote_list are currently disabled until further implementation and testing is done. Integration will come with reduced dbs since full dbs take 45 minutes to an hour to run even on simple input.

    opened by jherr-dev 1
  • feat: support async execution

    feat: support async execution

    Adds:

    • Support for async execution

    See https://gist.github.com/lebovic/72fbb857119f1667c7959a4d7e28cd50 (or the integration test) for a hacky example on how to run Toolchest with async execution.

    opened by lebovic 1
  • fix: set default version number

    fix: set default version number

    Sets the version number to a default instead of erroring if the client is run from source (i.e., without the toolchest-client package being installed via pip).

    Open question: the version number defaults to 0.0.0, which can be confusing -- are there any other labels that might be better (e.g., dev or just the empty string)?

    opened by bcai2 1
Releases(v0.11.3)
Owner
Toolchest
Toolchest
Elementary is an open-source data reliability framework for modern data teams. The first module of the framework is data lineage.

Data lineage made simple, reliable, and automated. Effortlessly track the flow of data, understand dependencies and analyze impact. Features Visualiza

898 Jan 09, 2023
Very basic but functional Kakuro solver written in Python.

kakuro.py Very basic but functional Kakuro solver written in Python. It uses a reduction to exact set cover and Ali Assaf's elegant implementation of

Louis Abraham 4 Jan 15, 2022
Program that predicts the NBA mvp based on data from previous years.

NBA MVP Predictor A machine learning model using RandomForest Regression that predicts NBA MVP's using player data. Explore the docs » View Demo · Rep

Muhammad Rabee 1 Jan 21, 2022
This cosmetics generator allows you to generate the new Fortnite cosmetics, Search pak and search cosmetics!

COSMETICS GENERATOR This cosmetics generator allows you to generate the new Fortnite cosmetics, Search pak and search cosmetics! Remember to put the l

ᴅᴊʟᴏʀ3xᴢᴏ 11 Dec 13, 2022
For making Tagtog annotation into csv dataset

tagtog_relation_extraction for making Tagtog annotation into csv dataset How to Use On Tagtog 1. Go to Project Downloads 2. Download all documents,

hyeong 4 Dec 28, 2021
Improving your data science workflows with

Make Better Defaults Author: Kjell Wooding [email protected] This is the git re

Kjell Wooding 18 Dec 23, 2022
TextDescriptives - A Python library for calculating a large variety of statistics from text

A Python library for calculating a large variety of statistics from text(s) using spaCy v.3 pipeline components and extensions. TextDescriptives can be used to calculate several descriptive statistic

150 Dec 30, 2022
PyNHD is a part of HyRiver software stack that is designed to aid in watershed analysis through web services.

A part of HyRiver software stack that provides access to NHD+ V2 data through NLDI and WaterData web services

Taher Chegini 23 Dec 14, 2022
MDAnalysis is a Python library to analyze molecular dynamics simulations.

MDAnalysis Repository README [*] MDAnalysis is a Python library for the analysis of computer simulations of many-body systems at the molecular scale,

MDAnalysis 933 Dec 28, 2022
Get mutations in cluster by querying from LAPIS API

Cluster Mutation Script Get mutations appearing within user-defined clusters. Usage Clusters are defined in the clusters dict in main.py: clusters = {

neherlab 1 Oct 22, 2021
Synthetic Data Generation for tabular, relational and time series data.

An Open Source Project from the Data to AI Lab, at MIT Website: https://sdv.dev Documentation: https://sdv.dev/SDV User Guides Developer Guides Github

The Synthetic Data Vault Project 1.2k Jan 07, 2023
Python implementation of Principal Component Analysis

Principal Component Analysis Principal Component Analysis (PCA) is a dimension-reduction algorithm. The idea is to use the singular value decompositio

Ignacio Darago 1 Nov 06, 2021
Tokyo 2020 Paralympics, Analytics

Tokyo 2020 Paralympics, Analytics Thanks for checking out my app! It was built entirely using matplotlib and Tokyo 2020 Paralympics data. This applica

Petro Ivaniuk 1 Nov 18, 2021
A variant of LinUCB bandit algorithm with local differential privacy guarantee

Contents LDP LinUCB Description Model Architecture Dataset Environment Requirements Script Description Script and Sample Code Script Parameters Launch

Weiran Huang 4 Oct 25, 2022
A tax calculator for stocks and dividends activities.

Revolut Stocks calculator for Bulgarian National Revenue Agency Information Processing and calculating the required information about stock possession

Doino Gretchenliev 200 Oct 25, 2022
Utilize data analytics skills to solve real-world business problems using Humana’s big data

Humana-Mays-2021-HealthCare-Analytics-Case-Competition- The goal of the project is to utilize data analytics skills to solve real-world business probl

Yongxian (Caroline) Lun 1 Dec 27, 2021
ped-crash-techvol: Texas Ped Crash Tech Volume Pack

ped-crash-techvol: Texas Ped Crash Tech Volume Pack In conjunction with the Final Report "Identifying Risk Factors that Lead to Increase in Fatal Pede

Network Modeling Center; Center for Transportation Research; The University of Texas at Austin 2 Sep 28, 2022
Bearsql allows you to query pandas dataframe with sql syntax.

Bearsql adds sql syntax on pandas dataframe. It uses duckdb to speedup the pandas processing and as the sql engine

14 Jun 22, 2022
Amundsen is a metadata driven application for improving the productivity of data analysts, data scientists and engineers when interacting with data.

Amundsen is a metadata driven application for improving the productivity of data analysts, data scientists and engineers when interacting with data.

Amundsen 3.7k Jan 03, 2023