Write reproducible code for getting and processing ChEMBL

Overview

chembl_downloader

PyPI PyPI - Python Version PyPI - License DOI Code style: black

Don't worry about downloading/extracting ChEMBL or versioning - just use chembl_downloader to write code that knows how to download it and use it automatically.

Installation

$ pip install chembl-downloader

Usage

Download A Specific Version

import chembl_downloader

path = chembl_downloader.download(version='28')

After it's been downloaded and extracted once, it's smart and does not need to download again. It gets stored using pystow automatically in the ~/.data/chembl directory.

We'd like to implement something such that it could load directly into SQLite from the archive, but it appears this is a paid feature.

Download the Latest Version

First, you'll have to install bioversions with pip install bioversions, whose job it is to look up the latest version of many databases. Then, you can modify the previous code slightly by omitting the version keyword argument:

import chembl_downloader

path = chembl_downloader.download()

The version keyword argument is available for all functions in this package (e.g., including connect(), cursor(), and query()), but will be omitted below for brevity.

Automate Connection

Inside the archive is a single SQLite database file. Normally, people manually untar this folder then do something with the resulting file. Don't do this, it's not reproducible! Instead, the file can be downloaded and a connection can be opened automatically with:

import chembl_downloader

with chembl_downloader.connect() as conn:
    with conn.cursor() as cursor:
        cursor.execute(...)  # run your query string
        rows = cursor.fetchall()  # get your results

The cursor() function provides a convenient wrapper around this operation:

import chembl_downloader

with chembl_downloader.cursor() as cursor:
    cursor.execute(...)  # run your query string
    rows = cursor.fetchall()  # get your results

Run a query and get a pandas DataFrame

The most powerful function is query() which builds on the previous connect() function in combination with pandas.read_sql to make a query and load the results into a pandas DataFrame for any downstream use.

import chembl_downloader

sql = """
SELECT
    MOLECULE_DICTIONARY.chembl_id,
    MOLECULE_DICTIONARY.pref_name
FROM MOLECULE_DICTIONARY
JOIN COMPOUND_STRUCTURES ON MOLECULE_DICTIONARY.molregno == COMPOUND_STRUCTURES.molregno
WHERE molecule_dictionary.pref_name IS NOT NULL
LIMIT 5
"""

df = chembl_downloader.query(sql)
df.to_csv(..., sep='\t', index=False)

Suggestion 1: use pystow to make a reproducible file path that's portable to other people's machines (e.g., it doesn't have your username in the path).

Suggestion 2: RDKit is now pip-installable with pip install rdkit-pypi, which means most users don't have to muck around with complicated conda environments and configurations. One of the powerful but understated tools in RDKit is the rdkit.Chem.PandasTools module.

Store in a Different Place

If you want to store the data elsewhere using pystow (e.g., in pyobo I also keep a copy of this file), you can use the prefix argument.

import chembl_downloader

# It gets downloaded/extracted to 
# ~/.data/pyobo/raw/chembl/29/chembl_29/chembl_29_sqlite/chembl_29.db
path = chembl_downloader.download(prefix=['pyobo', 'raw', 'chembl'])

See the pystow documentation on configuring the storage location further.

The prefix keyword argument is available for all functions in this package (e.g., including connect(), cursor(), and query()).

Download via CLI

After installing, run the following CLI command to ensure it and send the path to stdout

$ chembl_downloader

Use --test to show two example queries

$ chembl_downloader --test

Contributing

If you'd like to contribute, there's a submodule called chembl_downloader.queries where you can add an SQL query along with a description of what it does for easy importing.

Comments
  • Repo status

    Repo status

    Dear @cthoyt,

    I know that you have multiple responsibilities, but I was wondering if the current repo is in working condition or if is it a legacy repo which worked with a specific version of ChEMBL? It would be great if you could add a batch on the repo for the same.

    Thank You.

    opened by YojanaGadiya 4
  • Add SQL for getting activities by target

    Add SQL for getting activities by target

    This PR adds some functionality for generating target-based datasets, motivated by https://github.com/PatWalters/yamc/issues/14.

    See the notebook here (note that this is pinned with a permalink to the state after merging this PR).

    opened by cthoyt 1
  • Improve ChEBI mapping notebook

    Improve ChEBI mapping notebook

    This filters out about 10% of the possible ChEMBL - ChEBI curations since ChEBI externally already took care of that

    -> move this into biomappings repo

    opened by cthoyt 0
  • Call for additional functionality

    Call for additional functionality

    • What other operations do people commonly want to do with the entire ChEMBL database/SDF file that would be good to wrap (including loading other files released by ChEMBL)?
    • What other operations like the RDKit supplier exist in other libraries that might be worth wrapping?

    @iwatobipen do you have any suggestions?

    opened by cthoyt 0
  • Add functionality for bacting

    Add functionality for bacting

    @egonw are there any bulk SMILES, InChI, or SDF loading operations in bacting that are exposed by pybacting that would be nice to wrap inside this library for full loading of ChEMBL? On the readme, you can see I made a specific function for RDKit's "supplier" that reads an SDF file

    opened by cthoyt 3
Releases(v0.4.1)
  • v0.4.1(Nov 19, 2022)

    What's Changed

    • Add SQL for getting activities by target by @cthoyt in https://github.com/cthoyt/chembl-downloader/pull/8
    • Improve ChEBI mapping notebook by @cthoyt in https://github.com/cthoyt/chembl-downloader/pull/10
    • Add UniProt target mapping functions by @cthoyt in https://github.com/cthoyt/chembl-downloader/pull/11

    Full Changelog: https://github.com/cthoyt/chembl-downloader/compare/v0.4.0...v0.4.1

    Source code(tar.gz)
    Source code(zip)
  • v0.4.0(Oct 28, 2022)

    This PR does several things:

    1. Removes dependency on bioversions and just implements the code locally
    2. Adds a CLI for generating a statistics table for all versions of ChEMBL
    3. Add proper project skeleton (documentation, unit tests, code quality assurance, CI)
    4. Improve SQLite loading in case you delete the compressed data

    Notebooks

    1. Adds notebook about drug indications
    2. Adds notebook about mapping to ChEBI
    Source code(tar.gz)
    Source code(zip)
  • v0.3.0(Mar 19, 2022)

    This release adds two new functions:

    1. chembl_downloader.download_monomer_library which gets this file https://ftp.ebi.ac.uk/pub/databases/chembl/ChEMBLdb/latest/chembl_30_monomer_library.xml for whatever version you specify
    2. chembl_downloader.get_monomer_library_root which does the same as the downloader but also parses the XML for you

    Thanks to @iwatobipen and his recent blog post for inspiring this.

    Source code(tar.gz)
    Source code(zip)
  • v0.2.0(Jan 14, 2022)

    New Functions

    • chembl_downloader.download_fps downloads the pre-computed Morgan fingerprint file
    • chembl_downloader.download_chemreps downloads the chembl-smiles-inchi-inchikey map
    • chembl_downloader.get_chemreps_df builds on chembl_downloader.download_chemreps and loads them in a pandas dataframe

    Misc

    • Add isort to code quality checking
    • Enable many functions with return_version to make a tuple with the version, which is useful if you're having it infer the latest version.
    Source code(tar.gz)
    Source code(zip)
  • v0.1.3(Dec 20, 2021)

    This release adds the get_substructure_library() for automating the generation of an RDKit substructure library as described in Greg Landrum's RDKit blog post, Some new features in the SubstructLibrary. The following example shows how it can be used to accomplish some of the first tasks presented in the post:

    from rdkit import Chem
    
    import chembl_downloader
    
    library = chembl_downloader.get_substructure_library()
    query = Chem.MolFromSmarts('[O,N]=C-c:1:c:c:n:c:c:1')
    matches = library.GetMatches(query)
    

    Full Changelog: https://github.com/cthoyt/chembl-downloader/compare/v0.1.2...v0.1.3

    Source code(tar.gz)
    Source code(zip)
  • v0.1.2(Dec 20, 2021)

  • v0.1.1(Aug 5, 2021)

  • v0.1.0(Aug 4, 2021)

    • rename download() to download_extract_sqlite() to make room for other download functions
    • added supplier() function for loading the SDF dump through RDKit
    Source code(tar.gz)
    Source code(zip)
  • v0.0.4(Jul 28, 2021)

  • v0.0.3(Jul 27, 2021)

  • v0.0.2(Jul 27, 2021)

  • v0.0.1(Jul 27, 2021)

Owner
Charles Tapley Hoyt
Bio/cheminformatician, open scientist, maintainer of @pybel and @pykeen, part of @indralab (he/him)
Charles Tapley Hoyt
Python Youtube Video-Playlist Downloader

Youtube-Video-Playlist-Downloader-PyQt5 You can download videos and playlists on YouTube with this script. Script has GUI. Enjoy. Setup git clone http

Yunus Emre Öztürk 2 Jun 06, 2022
Easy automated ebook downloader using openbooks as the backend

Easy automated ebook downloader using openbooks as the backend

27 Nov 06, 2022
Web Downloader With Python

Web Downloader Introduction This module will provide API to download the webpage components : html file, image file, css fil, javascript file, href li

3 Dec 28, 2022
A Fast as F*** Downloader

FAFD A Fast as F*** Downloader Github Usages You'll want to use a URL like this: https://github.com/RPowell-C/FAFD/raw/main/FAFD.py It's easier DONT F

1 Jan 19, 2022
Python library to download bulk of images from Bing.com

Python library to download bulk of images form Bing.com. This package uses async url, which makes it very fast while downloading.

Guru Prasad Singh 105 Dec 14, 2022
Libretrofuzz - Fuzzy Retroarch thumbnail downloader

Fuzzy Retroarch thumbnail downloader In Retroarch, when you use the manual scann

8 Nov 26, 2022
📼Command line tool based on youtube-dl to easily download selected channels from your subscriptions.

youtube-cdl Command line tool based on youtube-dl to easily download selected channels from your subscriptions. This tool is very handy if you want to

Anatoly 64 Dec 25, 2022
Google Art Image Downloader Tkinter

Google-Art-Image-Downloader-Tkinter 由 google-art-downloader 整改的批量 Google 艺术展平台高清图片下载 ⭐ It works perfectly from 2018 year till today, thanks for stars!

PY-GZKY 1 Jan 05, 2022
VK sticker downloader with python

VK Sticker Downloader This repository is used to automate download file from VK Sticker How to use Execute the file ./downloader.py Writedown full url

Hartawan Bahari M. 1 Dec 29, 2021
An automatic beatmapset downloader via txt file, suitable for tourney mappools.

Pooler Pooler is a bulk osu! mapset downloader, perfect for use with osu! Tournament Mappools. Prerequisites Python 3.10 Requests (pip install request

Thomas 0 Feb 11, 2022
Pytube ve tkinter kütüphanesi ile yapmış olduğum basit ve temel bir youtube video indirme programı.

PyTube Pytube ve tkinter kütüphanesi ile yapmış olduğum basit ve temel bir youtube video indirme programı. Videolar 720p çözünürlükte indirilmektedir.

1 Nov 12, 2021
Bulk Downloader for Reddit

saveddit is a bulk media downloader for reddit pip3 install saveddit Setting up authorization Register an application with Reddit Write down your clie

Pranav 136 Jan 03, 2023
Make YouTube videos tasks in Todoist faster and time efficient!

Youtubist Basically fork of yt-dlp python module to my needs. You can paste playlist or channel link on the YouTube. It will automatically format to s

Konrad Konieczny 1 Dec 04, 2022
Music and video downloader, Made with love by Bryan Herrera

Python-Mp3Mp4-Downloader Music and video downloader, Made with love by Bryan Herrera Requirements CHOCOLATELY windows command If your system does not

ርᚱ1ናተᛰ ᚻህᚥተპᚱ 104 Dec 27, 2022
Automatically download multiple papers by keywords in CVPR

Automatically download multiple papers by keywords in CVPR

46 Jun 08, 2022
Let's you download entire YT-playlists.

Youtube MP3 Playlist Downloader Let's you download entire youtube playlists as mp3 files. This application is basically a script that makes it easier

11 Dec 18, 2022
Heroic-gogdl - GOG Downloading module for Heroic Games Launcher

heroic-gogdl GOG download module for Heroic Games Launcher Purpose This will tak

Paweł Lidwin 36 Dec 23, 2022
A downloader for the ISIS service of TU Berlin

isis_dl A downloading utility for the ISIS tool of TU-Berlin. Version 0.4 Features Downloads all Material from all courses of your ISIS page. Efficien

1 Nov 06, 2021
Download India Stocks Historical Data

Kite Helper - Download Stock Market Data 🌎 Website Simple Application to Download any stock market data in .csv format using Kite 🏃‍♂️ Running Serve

Pishang Ujeniya 12 Dec 06, 2022
Download YouTube videos that are available in the given playlist

Youtube-Playlist-Downloader Download YouTube videos that are available in the given playlist Project assets: music downloaded music folder. (will be g

Sultan Aljaberi 1 Dec 22, 2021