Datamol is a python library to work with molecules

Overview

Molecular Manipulation Made Easy


Binder PyPI Conda PyPI - Downloads Conda PyPI - Python Version license GitHub Repo stars GitHub Repo stars

Datamol is a python library to work with molecules. It's a layer built on top of RDKit and aims to be as light as possible.

  • 🐍 Simple pythonic API
  • ⚗️ RDKit first: all you manipulate are rdkit.Chem.Mol objects.
  • Manipulating molecules often rely on many options; Datamol provides good defaults by design.
  • 🧠 Performance matters: built-in efficient parallelization when possible with optional progress bar.
  • 🕹️ Modern IO: out-of-the-box support for remote paths using fsspec to read and write multiple formats (sdf, xlsx, csv, etc).

Try Online

Visit Binder and try Datamol online.

Documentation

Visit https://doc.datamol.io.

Installation

Use conda:

mamba install -c conda-forge datamol

Quick API Tour

import datamol as dm

# Common functions
mol = dm.to_mol("O=C(C)Oc1ccccc1C(=O)O", sanitize=True)
fp = dm.to_fp(mol)
selfies = dm.to_selfies(mol)
inchi = dm.to_inchi(mol)

# Standardize and sanitize
mol = dm.to_mol("O=C(C)Oc1ccccc1C(=O)O")
mol = dm.fix_mol(mol)
mol = dm.sanitize_mol(mol)
mol = dm.standardize_mol(mol)

# Dataframe manipulation
df = dm.data.freesolv()
mols = dm.from_df(df)

# 2D viz
legends = [dm.to_smiles(mol) for mol in mols[:10]]
dm.viz.to_image(mols[:10], legends=legends)

# Generate conformers
smiles = "O=C(C)Oc1ccccc1C(=O)O"
mol = dm.to_mol(smiles)
mol_with_conformers = dm.conformers.generate(mol)

# 3D viz (using nglview)
dm.viz.conformers(mol, n_confs=10)

# Compute SASA from conformers
sasa = dm.conformers.sasa(mol_with_conformers)

# Easy IO
mols = dm.read_sdf("s3://my-awesome-data-lake/smiles.sdf", as_df=False)
dm.to_sdf(mols, "gs://data-bucket/smiles.sdf")

Compatibilities

Version compatibilities are an essential topic for production-software stacks. We are cautious about documenting compatibility between datamol, python and rdkit.

datamol python rdkit
0.3 >=3.7,<=3.9 >=2020.09,<=2021.03

CI Status

master
Lib build & Testing GitHub Workflow Status
Code Sanity (linting and type analysis) GitHub Workflow Status
Documentation Build GitHub Workflow Status

Changelogs

See the latest changelogs at CHANGELOG.rst.

License

Under the Apache-2.0 license. See LICENSE.

Authors

See AUTHORS.rst.

Comments
  • change force field to MMFF94s without electorstatic component

    change force field to MMFF94s without electorstatic component

    Proposed change to conformer force field from UFF to MMFF94s_noEstat, inspired by Openeye's Omega force field. In addition to FF change, added filters (ewindow, eratio), to allow high-energy conformers to be filtered out.

    opened by matudor 16
  • [ERROR] Runtime.ImportModuleError: Unable to import module 'app': No module named 'sascorer' Traceback (most recent call last):

    [ERROR] Runtime.ImportModuleError: Unable to import module 'app': No module named 'sascorer' Traceback (most recent call last):

    [ERROR] Runtime.ImportModuleError: Unable to import module 'app': No module named 'sascorer' Traceback (most recent call last):
    

    Posting so it's logged somewhere.

    Is this a new dependency for datamol? Disclaimer, I didn't update datamol in my project (lambdomics) since a couple months (I was at 0.5 before upgrading). sascorer seems to have appear in commit 8576d4b with the descriptors module, and this module is imported by default when using import datamol as dm .

    sascorer seems to be a module coming from RDKit but I'm not sure. Is it just a matter of making sure to import RDKit before datamol if you use both in your project?

    bug question 
    opened by MichelML 14
  • Added graph matching and template re-ordering

    Added graph matching and template re-ordering

    Is this relevant to datamol? If so, I'll improve the PR and make some tests.

    Functions to re-order a molecule from a template ordering, and to match molecules with different orderings.

    Checklist:

    • [x] Add tests to cover the fixed bug(s) or the new introduced feature(s) (if appropriate).
    • [x] Update the API documentation is a new function is added or an existing one is deleted.
    • [x] Added a news entry.
      • copy news/TEMPLATE.rst to news/my-feature-or-branch.rst) and edit it.

    opened by DomInvivo 8
  • better sanitize handling in read_sdf

    better sanitize handling in read_sdf

    Checklist:

    • [x] Add tests to cover the fixed bug(s) or the new introduced feature(s) (if appropriate).
    • [x] Added a news entry.
      • copy news/TEMPLATE.rst to news/my-feature-or-branch.rst) and edit it.

    Calling dm.sanitize_mol delete the conformers which should not happens by default since you often want the 3d structure when you read an SDF file.

    (bug spotted while running unit tests on berth with a recent version of datamol)

    opened by hadim 7
  • Mc edits

    Mc edits

    Made small changes to tutorials and documentation Checklist:

    • [x] Removed repeated lines in docs/tutorials/The_Basics.ipynb
    • [x] Removed comments around a couple lines in docs/tutorials/Preprocessing_Molecules.ipynb

    Other questions

    • [x] Fragment and scaffold tutorial does not seem to load correctly. Is it still a wip?
    • [x] In the API section, datamol.actions contains this .. warning:: This is computationally expensive. Should the formatting around 'warning' be changed?
    • [x] Same idea as above in datamol.fragments, but with ..note ::

    opened by craigmichaelm 7
  • `dm.read_smi()` copies a file to an already existing location

    `dm.read_smi()` copies a file to an already existing location

    When I try to load a remote .smi file using Datamol, I run into the following exception:

    ---------------------------------------------------------------------------
    ValueError                                Traceback (most recent call last)
    (...)
    
    File ~/mambaforge/envs/mood/lib/python3.10/site-packages/datamol/io.py:320, in read_smi(urlpath)
        318 if not fsspec.utils.can_be_local(str(urlpath)):
        319     active_path = pathlib.Path(tempfile.mkstemp()[1])
    --> 320     dm.utils.fs.copy_file(urlpath, active_path)
        322 # Read the molecules
        323 supplier = rdmolfiles.SmilesMolSupplier(str(active_path), titleLine=0)
    
    File ~/mambaforge/envs/mood/lib/python3.10/site-packages/datamol/utils/fs.py:250, in copy_file(source, destination, chunk_size, force, progress, leave_progress)
        247     raise ValueError(f"The file being copied does not exist or is not a file: {source}")
        249 if not force and is_file(destination_file):  # type: ignore
    --> 250     raise ValueError(f"The destination file to copy already exists: {destination}")
        252 with source_file as source_stream:
        253     with destination_file as destination_stream:
    
    ValueError: The destination file to copy already exists: /tmp/tmpkzrm7u58
    

    Seems like this function is not up-to-date with recent changes in the dm.utils.fs module? Seems to me like active_path should be set differently to a non-existing file.

    bug 
    opened by cwognum 6
  • New align + news descriptors + some cleaning

    New align + news descriptors + some cleaning

    Checklist:

    • [x] Add tests to cover the fixed bug(s) or the new introduced feature(s) (if appropriate).
    • [ ] Update the API documentation is a new function is added or an existing one is deleted.
    • [x] Added a news entry.
      • copy news/TEMPLATE.rst to news/my-feature-or-branch.rst) and edit it.

    I still need to add the new align code.

    opened by hadim 6
  • expose clean_it arg

    expose clean_it arg

    Exposing this arg can prevent errors for certain problematic molecules when enumerating stereochemistry.

    E.g.

    sm = "N=1C(NC2CC2)=C3C(=NC1)N(/C=C/C=4C=C(C=CC4C)C(=O)NC=5C=C(C=C(C5)N6CCN(CC6)C)C(F)(F)F)C=N3"
    mol = datamol.to_mol(sm)
    
    datamol.enumerate_stereoisomers(mol) # Fails
    datamol.enumerate_stereoisomers(mol, clean_it=False) # Succeeds
    
    opened by jdhorwood 6
  • Use native batch_size from joblib

    Use native batch_size from joblib

    Fix #58 by replacing Sequence by Iterable.

    I had the same bug a few hours ago. @DomInvivo

    Checklist:

    • [x] Add tests to cover the fixed bug(s) or the new introduced feature(s) (if appropriate).
    • [ ] Update the API documentation is a new function is added or an existing one is deleted.
    • [x] Added a news entry.
      • copy news/TEMPLATE.rst to news/my-feature-or-branch.rst) and edit it.

    opened by maclandrol 6
  • add sanitize arg in read_sdf

    add sanitize arg in read_sdf

    Checklist:

    • [x] Add tests to cover the fixed bug(s) or the new introduced feature(s) (not needed).
    • [ ] Added a news entry.
      • copy news/TEMPLATE.rst to news/my-feature-or-branch.rst) and edit it.

    opened by Ishan-Kumar2 6
  • More options for matching with ambiguity

    More options for matching with ambiguity

    Added more options for allow_ambiguous_match, passed as strings. Notably, option "best" allows to find the match with the least amount of errors in terms of atomic number, bond type, and bond stereo.

    Checklist:

    • [x] Add tests to cover the fixed bug(s) or the new introduced feature(s) (if appropriate).
    • [x] Update the API documentation is a new function is added or an existing one is deleted.
    • [x] Added a news entry.
      • copy news/TEMPLATE.rst to news/my-feature-or-branch.rst) and edit it.

    opened by DomInvivo 5
  • Move the fs module to its dedicated section in the docs

    Move the fs module to its dedicated section in the docs

    This is a very useful module, making it under the utils section undermines its visibility imho.

    fsspec ftw

    ref: https://doc.datamol.io/stable/api/datamol.utils.html#datamol.utils.fs

    documentation 
    opened by MichelML 1
  • Add dm.read_records in io module

    Add dm.read_records in io module

    This would simplify things when working in the context of querying a rest api where the json response contains a list of molecules. Quick REST API examples:

    • Molport
    • Chemspace
    • Mcule
    • CDD Vault
    • Dotmatics
    • PubChem
    • Etc...
    enhancement low-priority 
    opened by MichelML 7
  • rdkit.Chem.GetSubstructMatch called in dm.scaffold.fuzzy_scaffolding takes rdkit mol as input, not str

    rdkit.Chem.GetSubstructMatch called in dm.scaffold.fuzzy_scaffolding takes rdkit mol as input, not str

    According to the datamol documentation and code, the enforce_subs parameter of dm.scaffold.fuzzy_scaffolding should be a list of str (presumably SMILES or SMARTS). However, passing in a list of str gives the following rdkit argument error:

    image

    This is because the dm.scaffold.fuzyy_scaffolding function passes the enfocre_subs parameter to rdkit.Chem.GetSubstructMatch, which should take as input an rdkit molecule object, not a string.

    ~~Additionally, upon loading the pattern SMILES as molecule object and passing that into the enforce_subs parameter, the function runs without error but dm.scaffold.fuzyy_scaffolding will miss the loaded pattern:~~

    Edit: Image removed. This was a bad example I falsely thought wasn't capturing an enforce_subs r-group. Will find a better example.

    bug documentation 
    opened by colinrsmall 3
  • `dm.align.template_align` modify the global rdkit behaviour

    `dm.align.template_align` modify the global rdkit behaviour

    See https://github.com/datamol-org/datamol/blob/013d93012abb7fb309a382d8b3eedaca4c2f4425/datamol/align.py#L107

    We should not modify the global behaviour. The best is probably to put back the original value if it has been modified. This is not ideal in when doing multithreading.

    bug low-priority 
    opened by hadim 2
  • Refactor `dm.scaffold.fuzzy_scaffolding`

    Refactor `dm.scaffold.fuzzy_scaffolding`

    dm.scaffold.fuzzy_scaffolding is quite a powerful function but its output is often hard to understand and also process for downstream task.

    We could keep backward compat by keeping dm.scaffold.fuzzy_scaffolding and propose an alternative function that will do the same kind of processing under the hood but return a data structure that is more intuitive and easier to use (a dataframe or a list of dataframe?).

    enhancement low-priority 
    opened by hadim 0
Releases(0.8.8)
  • 0.8.8(Dec 15, 2022)

  • 0.8.7(Nov 29, 2022)

    Added:

    • Add multiple utilities to work with mapped SMILES with hydrogens.
    • Add dm.clear_atom_props() to remove atom's properties.
    • Add dm.clear_atom_map_number() to remove the atom map number property.
    • Add dm.get_atom_positions() to retrieve the atomic positions of a conformer of a molecule.
    • Add dm.set_atom_positions() to add a new confomer to a molecule given a list of atomic positions.

    Changed:

    • Add new arguments to dm.to_mol: allow_cxsmiles, parse_name, remove_hs and strict_cxsmiles. Refers to the docstring for the details.
    • Set copy to True by default to dm.atom_indices_to_mol().
    • Allow to specify the property keys to clear in dm.clear_mol_props(). If not set, the original default beahviour is to clear everything.

    Authors:

    • Hadrien Mary
    Source code(tar.gz)
    Source code(zip)
    datamol-0.8.7.tar.gz(3.42 MB)
  • 0.8.6(Nov 28, 2022)

  • 0.8.5(Nov 28, 2022)

    Added:

    • Support for max_num_mols in dm.read_sdf(). Useful when files are large and debugging code.
    • Support for returning the invalid molecules in dm.read_sdf. Useful when we need to know which one failed.
    • Support for more compression formats when reading SDF files using fssep.open(..., compression="infer").
    • Add CODEOWNERS file.
    • Add dm.descriptors.n_spiro_atoms and dm.descriptors.n_stereo_centers_unspecified.

    Changed:

    • Overload output types for dm.read_sdf and dm.data.*.
    • Reduce tests duration (especially in CI).

    Authors:

    • DomInvivo
    • Hadrien Mary
    Source code(tar.gz)
    Source code(zip)
    datamol-0.8.5.tar.gz(3.42 MB)
  • 0.8.4(Nov 11, 2022)

  • 0.8.3(Nov 5, 2022)

  • 0.8.2(Oct 31, 2022)

  • 0.8.1(Oct 28, 2022)

  • 0.8.0(Oct 28, 2022)

    Added:

    • dm.Atom and dm.Bond types.
    • Add RDKit as a pypi dep.
    • Add datamol.hash_mol() based on rdkit.Chem.RegistrationHash.

    Changed:

    • RDKit 2022.09: use Draw.shouldKekulize instead of Draw._okToKekulizeMol.
    • RDKit 2022.09: don't use dm.convert._ChangeMoleculeRendering for RDKit >=2022.09.

    Authors:

    • Hadrien Mary
    Source code(tar.gz)
    Source code(zip)
    datamol-0.8.0.tar.gz(3.42 MB)
  • 0.7.18(Oct 18, 2022)

  • 0.7.17(Oct 14, 2022)

  • 0.7.16(Oct 12, 2022)

    Changed:

    • Bump upstream GH actions versions.
    • dm.fs.copy_dir now uses the internal fsspec copy when the two source and destination fs are the same. It makes the copy much faster.

    Fixed:

    • Use os.PathLike to recognize a broader range of string-based path inputs in the dm.fs module. It prevents file objects such as py._path.local.LocalPath not being recognized as path.

    Authors:

    • Hadrien Mary
    Source code(tar.gz)
    Source code(zip)
    datamol-0.7.16.tar.gz(3.36 MB)
  • 0.7.15(Oct 11, 2022)

  • 0.7.14(Oct 3, 2022)

    Added:

    • Add with_atom_indices to dm.to_smiles. If enable, atom indices will be added to the SMILES.

    Changed:

    • Changed the default for dm.fs.is_file() from True`` toFalse`.
    • Refactor the API doc to breakdown all the submodules in individual doc. Thanks to @MichelML for the suggestion.
    • Re-enable pipy activity in rever.

    Fixed:

    • Minor typo in the documentation of dm.conformers.generate()

    Authors:

    • Cas
    • Hadrien Mary
    • Valence-JonnyHsu
    Source code(tar.gz)
    Source code(zip)
    datamol-0.7.14.tar.gz(3.36 MB)
  • 0.7.13(Sep 9, 2022)

  • 0.7.12(Sep 6, 2022)

  • 0.7.11(Sep 4, 2022)

    Added:

    • Add configurations for dev containers based on the micromamba Docker image. More informations about dev container at https://docs.github.com/en/codespaces/setting-up-your-project-for-codespaces/introduction-to-dev-containers.
    • support for two additional forcefields: MMFF94s with and without electrostatic component
    • energies output along with delta-energy to lowest energy conformer

    Changed:

    • API of dm.conformers.generate() to support choice of forcefield. In addition ewindow and eratio flags added to reject high energy conformers, either on absoute scale, or as ratio to rotatable bonds
    • Revamped all the datamol tutorials and add new tutorials. Huge thanks to @Valence-jonnyhsu for leading the refactoring of the datamol tutorials.
    • Improve documentation for dm.standardize_mol()
    • Multiple various docstring and typing improvments.
    • Embed the cdk2.sdf and solubility_*.sdf files within the datamol package to prevent issue with the RDKit config dir.
    • Enable strict mode on the documentation to prevent any issues and inconsistency with the types and docstrings of datamol.
    • Refactor micromamba CI to use latest and simplify it.

    Removed:

    • Remove unused and unmaintained dm.actions and dm.reactions module.
    • Remove copy args from add_hs and remove_hs (RDKit already returns copies).

    Fixed:

    • Errors in ECFP fingerprints that computes FCFP instead of ECFP.

    Authors:

    • Emmanuel Noutahi
    • Hadrien Mary
    • Matt
    Source code(tar.gz)
    Source code(zip)
    datamol-0.7.11.tar.gz(2.95 MB)
  • 0.7.10(Jul 18, 2022)

    Added:

    • New possibilities for ambiguous matching of molecules in the function reorder_mol_from_template

    Changed:

    • Replaced allow_ambiguous_hs_only by the option "hs_only" for the ambiguous_match_mode parameter
    • ambiguous_match_mode is now a String, no longer a bool.

    Deprecated:

    • allow_ambiguous_hs_only is no longer deprecated, but without warning since the feature is brand new.
    • Same for ambiguous_match_mode being a bool.

    Authors:

    • DomInvivo
    • Hadrien Mary
    Source code(tar.gz)
    Source code(zip)
    datamol-0.7.10.tar.gz(1.03 MB)
  • 0.7.9(Jun 17, 2022)

    Added:

    • datamol.graph.match_molecular_graphs, with unit-tests
    • datamol.graph.reorder_mol_from_template, with unit-tests

    Changed:

    • Typing in datamol.graph.py, changed rdkit.Chem.rdchem.Mol to dm.Mol

    Deprecated:

    • NOTHING

    Removed:

    • NOTHING

    Fixed:

    • NOTHING

    Security:

    • NOTHING

    Authors:

    • DomInvivo
    • Emmanuel Noutahi
    Source code(tar.gz)
    Source code(zip)
    datamol-0.7.9.tar.gz(1.03 MB)
  • 0.7.8(Jun 3, 2022)

  • 0.7.7(May 30, 2022)

  • 0.7.6(May 20, 2022)

  • 0.7.5(May 19, 2022)

  • 0.7.4(May 13, 2022)

  • 0.7.3(Apr 12, 2022)

  • 0.7.2(Mar 22, 2022)

  • 0.7.1(Mar 18, 2022)

    Added:

    • A new dm.align module with various functions to align a list of molecules. Use dm.align.template_align to align a molecule to a template and dm.align.auto_align_many to automatically partition and align a list of molecules.
    • New descriptors: formal_charge
    • New descriptors: refractivity
    • New descriptors: n_rigid_bonds
    • New descriptors: n_stereo_centers
    • New descriptors: n_charged_atoms
    • Add dm.clear_props to clear all the properties of a mol.
    • Add a new dataset in addition to freesolv based on RDKit CDK2 at dm.cdk2().
    • Add dm.strip_mol_to_core to remove all R groups from a molecule.
    • Add dm.UNSPECIFIED_BOND
    • dm.compute_ring_system to extract the ring systems from a molecule.

    Changed:

    • Improve typing.
    • Improve relative imports coverage.
    • Adapt dm.to_image to use the align module.

    Removed:

    • Remove a lot of # type: ignore as those can be error prone (hopefully the tests are here!)

    Authors:

    • Hadrien Mary
    Source code(tar.gz)
    Source code(zip)
    datamol-0.7.1.tar.gz(1.03 MB)
  • 0.7.0(Mar 11, 2022)

    Added:

    • Add dm.conformers.keep_conformers in order to only keep one or multiple conformers from a molecules.

    Changed:

    • Change the conformer generation arguments to use useRandomCoords=True by default.
    • Start using explicit Optional instead of implicit Optional for typing.
    • Start using relative imports instead of absolute ones.
    • When conformers are not minimized, sort them by energy (can be turned to False).

    Removed:

    • Remove fallback_to_random_coords argument from generate_conformers.

    Authors:

    • Hadrien Mary
    Source code(tar.gz)
    Source code(zip)
    datamol-0.7.0.tar.gz(1.02 MB)
  • 0.6.9(Mar 2, 2022)

    Added:

    • Support for selfies<2.0.0 in tests

    Changed:

    • Behaviour of all inchi functions to return None with a warning instead of silently returning an empty string
    • Order of str evaluation on convertion function. isinstance(str) is now evaluated before is None

    Fixed:

    • Bug in unique_id making this evaluation falling back on 'd41d8cd98f00b204e9800998ecf8427e' on unsupported inputs. Instead None is returned now

    Authors:

    • Emmanuel Noutahi
    Source code(tar.gz)
    Source code(zip)
    datamol-0.6.9.tar.gz(1.02 MB)
  • 0.6.8(Feb 27, 2022)

Owner
datamol
A python library to work with molecules.
datamol
Datamol is a python library to work with molecules

Datamol is a python library to work with molecules. It's a layer built on top of RDKit and aims to be as light as possible.

datamol 276 Dec 19, 2022
Efficient Python Tricks and Tools for Data Scientists

Why efficient Python? Because using Python more efficiently will make your code more readable and run more efficiently.

Khuyen Tran 944 Dec 28, 2022
A computer algebra system written in pure Python

SymPy See the AUTHORS file for the list of authors. And many more people helped on the SymPy mailing list, reported bugs, helped organize SymPy's part

SymPy 9.9k Jan 08, 2023
🍊 :bar_chart: :bulb: Orange: Interactive data analysis

Orange Data Mining Orange is a data mining and visualization toolbox for novice and expert alike. To explore data with Orange, one requires no program

Bioinformatics Laboratory 3.9k Jan 05, 2023
Probabilistic Programming in Python: Bayesian Modeling and Probabilistic Machine Learning with Aesara

PyMC3 is a Python package for Bayesian statistical modeling and Probabilistic Machine Learning focusing on advanced Markov chain Monte Carlo (MCMC) an

PyMC 7.2k Dec 30, 2022
Wikidata scholarly profiles

Scholia is a python package and webapp for interaction with scholarly information in Wikidata. Webapp As a webapp, it currently runs from Wikimedia To

Finn Årup Nielsen 180 Dec 28, 2022
artisan: visual scope for coffee roasters

Artisan Visual scope for coffee roasters WARNING: pre-release builds may not work. Use at your own risk. Summary Artisan is a software that helps coff

Artisan – Visual Scope for Coffee Roasters 705 Jan 05, 2023
A flexible package manager that supports multiple versions, configurations, platforms, and compilers.

Spack Spack is a multi-platform package manager that builds and installs multiple versions and configurations of software. It works on Linux, macOS, a

Spack 3.1k Dec 31, 2022
Graphic notes on Gilbert Strang's "Linear Algebra for Everyone"

Graphic notes on Gilbert Strang's "Linear Algebra for Everyone"

Kenji Hiranabe 3.2k Jan 08, 2023
Mathics is a general-purpose computer algebra system (CAS). It is an open-source alternative to Mathematica

Mathics is a general-purpose computer algebra system (CAS). It is an open-source alternative to Mathematica. It is free both as in "free beer" and as in "freedom".

Mathics 535 Jan 04, 2023
Read-only mirror of https://gitlab.gnome.org/GNOME/pybliographer

Pybliographer Pybliographer provides a framework for working with bibliographic databases. This software is licensed under the GPLv2. For more informa

GNOME Github Mirror 15 May 07, 2022
An open-source application for biological image analysis

CellProfiler is a free open-source software designed to enable biologists without training in computer vision or programming to quantitatively measure

CellProfiler 734 Jan 08, 2023
Statsmodels: statistical modeling and econometrics in Python

About statsmodels statsmodels is a Python package that provides a complement to scipy for statistical computations including descriptive statistics an

statsmodels 8.1k Dec 30, 2022
Incubator for useful bioinformatics code, primarily in Python and R

Collection of useful code related to biological analysis. Much of this is discussed with examples at Blue collar bioinformatics. All code, images and

Brad Chapman 560 Dec 24, 2022
CONCEPT (COsmological N-body CodE in PyThon) is a free and open-source simulation code for cosmological structure formation

CONCEPT (COsmological N-body CodE in PyThon) is a free and open-source simulation code for cosmological structure formation. The code should run on any Linux system, from massively parallel computer

Jeppe Dakin 62 Dec 08, 2022
Python Data Science Handbook: full text in Jupyter Notebooks

Python Data Science Handbook This repository contains the entire Python Data Science Handbook, in the form of (free!) Jupyter notebooks. How to Use th

Jake Vanderplas 36.9k Dec 28, 2022
Book on Julia for Data Science

Book on Julia for Data Science

Julia Data Science 349 Dec 25, 2022
Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis

Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis. You write a high level configuration file specifying your in

Blue Collar Bioinformatics 915 Dec 29, 2022
AnuGA for the simulation of the shallow water equation

ANUGA Contents ANUGA What is ANUGA? Installation Documentation and Help Mailing Lists Web sites Latest source code Bug reports Developer information L

Geoscience Australia 147 Dec 14, 2022
ReproZip is a tool that simplifies the process of creating reproducible experiments from command-line executions, a frequently-used common denominator in computational science.

ReproZip ReproZip is a tool aimed at simplifying the process of creating reproducible experiments from command-line executions, a frequently-used comm

267 Jan 01, 2023