Write reproducible code for getting and processing ChEMBL

Overview

chembl_downloader

PyPI PyPI - Python Version PyPI - License DOI Code style: black

Don't worry about downloading/extracting ChEMBL or versioning - just use chembl_downloader to write code that knows how to download it and use it automatically.

Installation

$ pip install chembl-downloader

Usage

Download A Specific Version

import chembl_downloader

path = chembl_downloader.download(version='28')

After it's been downloaded and extracted once, it's smart and does not need to download again. It gets stored using pystow automatically in the ~/.data/chembl directory.

We'd like to implement something such that it could load directly into SQLite from the archive, but it appears this is a paid feature.

Download the Latest Version

First, you'll have to install bioversions with pip install bioversions, whose job it is to look up the latest version of many databases. Then, you can modify the previous code slightly by omitting the version keyword argument:

import chembl_downloader

path = chembl_downloader.download()

The version keyword argument is available for all functions in this package (e.g., including connect(), cursor(), and query()), but will be omitted below for brevity.

Automate Connection

Inside the archive is a single SQLite database file. Normally, people manually untar this folder then do something with the resulting file. Don't do this, it's not reproducible! Instead, the file can be downloaded and a connection can be opened automatically with:

import chembl_downloader

with chembl_downloader.connect() as conn:
    with conn.cursor() as cursor:
        cursor.execute(...)  # run your query string
        rows = cursor.fetchall()  # get your results

The cursor() function provides a convenient wrapper around this operation:

import chembl_downloader

with chembl_downloader.cursor() as cursor:
    cursor.execute(...)  # run your query string
    rows = cursor.fetchall()  # get your results

Run a query and get a pandas DataFrame

The most powerful function is query() which builds on the previous connect() function in combination with pandas.read_sql to make a query and load the results into a pandas DataFrame for any downstream use.

import chembl_downloader

sql = """
SELECT
    MOLECULE_DICTIONARY.chembl_id,
    MOLECULE_DICTIONARY.pref_name
FROM MOLECULE_DICTIONARY
JOIN COMPOUND_STRUCTURES ON MOLECULE_DICTIONARY.molregno == COMPOUND_STRUCTURES.molregno
WHERE molecule_dictionary.pref_name IS NOT NULL
LIMIT 5
"""

df = chembl_downloader.query(sql)
df.to_csv(..., sep='\t', index=False)

Suggestion 1: use pystow to make a reproducible file path that's portable to other people's machines (e.g., it doesn't have your username in the path).

Suggestion 2: RDKit is now pip-installable with pip install rdkit-pypi, which means most users don't have to muck around with complicated conda environments and configurations. One of the powerful but understated tools in RDKit is the rdkit.Chem.PandasTools module.

Store in a Different Place

If you want to store the data elsewhere using pystow (e.g., in pyobo I also keep a copy of this file), you can use the prefix argument.

import chembl_downloader

# It gets downloaded/extracted to 
# ~/.data/pyobo/raw/chembl/29/chembl_29/chembl_29_sqlite/chembl_29.db
path = chembl_downloader.download(prefix=['pyobo', 'raw', 'chembl'])

See the pystow documentation on configuring the storage location further.

The prefix keyword argument is available for all functions in this package (e.g., including connect(), cursor(), and query()).

Download via CLI

After installing, run the following CLI command to ensure it and send the path to stdout

$ chembl_downloader

Use --test to show two example queries

$ chembl_downloader --test

Contributing

If you'd like to contribute, there's a submodule called chembl_downloader.queries where you can add an SQL query along with a description of what it does for easy importing.

Comments
  • Repo status

    Repo status

    Dear @cthoyt,

    I know that you have multiple responsibilities, but I was wondering if the current repo is in working condition or if is it a legacy repo which worked with a specific version of ChEMBL? It would be great if you could add a batch on the repo for the same.

    Thank You.

    opened by YojanaGadiya 4
  • Add SQL for getting activities by target

    Add SQL for getting activities by target

    This PR adds some functionality for generating target-based datasets, motivated by https://github.com/PatWalters/yamc/issues/14.

    See the notebook here (note that this is pinned with a permalink to the state after merging this PR).

    opened by cthoyt 1
  • Improve ChEBI mapping notebook

    Improve ChEBI mapping notebook

    This filters out about 10% of the possible ChEMBL - ChEBI curations since ChEBI externally already took care of that

    -> move this into biomappings repo

    opened by cthoyt 0
  • Call for additional functionality

    Call for additional functionality

    • What other operations do people commonly want to do with the entire ChEMBL database/SDF file that would be good to wrap (including loading other files released by ChEMBL)?
    • What other operations like the RDKit supplier exist in other libraries that might be worth wrapping?

    @iwatobipen do you have any suggestions?

    opened by cthoyt 0
  • Add functionality for bacting

    Add functionality for bacting

    @egonw are there any bulk SMILES, InChI, or SDF loading operations in bacting that are exposed by pybacting that would be nice to wrap inside this library for full loading of ChEMBL? On the readme, you can see I made a specific function for RDKit's "supplier" that reads an SDF file

    opened by cthoyt 3
Releases(v0.4.1)
  • v0.4.1(Nov 19, 2022)

    What's Changed

    • Add SQL for getting activities by target by @cthoyt in https://github.com/cthoyt/chembl-downloader/pull/8
    • Improve ChEBI mapping notebook by @cthoyt in https://github.com/cthoyt/chembl-downloader/pull/10
    • Add UniProt target mapping functions by @cthoyt in https://github.com/cthoyt/chembl-downloader/pull/11

    Full Changelog: https://github.com/cthoyt/chembl-downloader/compare/v0.4.0...v0.4.1

    Source code(tar.gz)
    Source code(zip)
  • v0.4.0(Oct 28, 2022)

    This PR does several things:

    1. Removes dependency on bioversions and just implements the code locally
    2. Adds a CLI for generating a statistics table for all versions of ChEMBL
    3. Add proper project skeleton (documentation, unit tests, code quality assurance, CI)
    4. Improve SQLite loading in case you delete the compressed data

    Notebooks

    1. Adds notebook about drug indications
    2. Adds notebook about mapping to ChEBI
    Source code(tar.gz)
    Source code(zip)
  • v0.3.0(Mar 19, 2022)

    This release adds two new functions:

    1. chembl_downloader.download_monomer_library which gets this file https://ftp.ebi.ac.uk/pub/databases/chembl/ChEMBLdb/latest/chembl_30_monomer_library.xml for whatever version you specify
    2. chembl_downloader.get_monomer_library_root which does the same as the downloader but also parses the XML for you

    Thanks to @iwatobipen and his recent blog post for inspiring this.

    Source code(tar.gz)
    Source code(zip)
  • v0.2.0(Jan 14, 2022)

    New Functions

    • chembl_downloader.download_fps downloads the pre-computed Morgan fingerprint file
    • chembl_downloader.download_chemreps downloads the chembl-smiles-inchi-inchikey map
    • chembl_downloader.get_chemreps_df builds on chembl_downloader.download_chemreps and loads them in a pandas dataframe

    Misc

    • Add isort to code quality checking
    • Enable many functions with return_version to make a tuple with the version, which is useful if you're having it infer the latest version.
    Source code(tar.gz)
    Source code(zip)
  • v0.1.3(Dec 20, 2021)

    This release adds the get_substructure_library() for automating the generation of an RDKit substructure library as described in Greg Landrum's RDKit blog post, Some new features in the SubstructLibrary. The following example shows how it can be used to accomplish some of the first tasks presented in the post:

    from rdkit import Chem
    
    import chembl_downloader
    
    library = chembl_downloader.get_substructure_library()
    query = Chem.MolFromSmarts('[O,N]=C-c:1:c:c:n:c:c:1')
    matches = library.GetMatches(query)
    

    Full Changelog: https://github.com/cthoyt/chembl-downloader/compare/v0.1.2...v0.1.3

    Source code(tar.gz)
    Source code(zip)
  • v0.1.2(Dec 20, 2021)

  • v0.1.1(Aug 5, 2021)

  • v0.1.0(Aug 4, 2021)

    • rename download() to download_extract_sqlite() to make room for other download functions
    • added supplier() function for loading the SDF dump through RDKit
    Source code(tar.gz)
    Source code(zip)
  • v0.0.4(Jul 28, 2021)

  • v0.0.3(Jul 27, 2021)

  • v0.0.2(Jul 27, 2021)

  • v0.0.1(Jul 27, 2021)

Owner
Charles Tapley Hoyt
Bio/cheminformatician, open scientist, maintainer of @pybel and @pykeen, part of @indralab (he/him)
Charles Tapley Hoyt
bing image downloader app used to download bulk images for a specific search term created using streamlit and bing_image_downloader python packages

bing image downloader app bing image downloader app is used to download bulk images for a specific search term. bing image downloader app gets the sea

Siva Prakash 8 Apr 05, 2022
A python module to download ISO Standards

ISO Standards Downloader A python module to download ISO Standards from https://standards.iso.org/iso-iec/ Report Bug · Request Feature Table of conte

Daniel 1 Dec 29, 2021
Automatically download multiple papers by keywords in CVPR

Automatically download multiple papers by keywords in CVPR

46 Jun 08, 2022
DYA ( Ditch YouTube API ) is a package created to power the user with YouTube Data API functionality without any API Key

Ditch YouTubeAPI (BETA) DYA ( Ditch YouTube API ) is a package created to power the user with YouTube Data API functionality without any API Key Detai

Sougata Jana 23 Dec 22, 2022
A Spotify downloader needing only a python interpreter and ffmpeg.

ZSpotify A Spotify downloader needing only a python interpreter and ffmpeg. Discord Server - Matrix Server - Gitea Mirror - Main Site Requirements: B

2.4k Dec 14, 2021
A toolkit to automatically crawl the paper list and download paper pdfs of ACL Ahthology.

ACL-Anthology-Crawler A toolkit to automatically crawl the paper list and download paper pdfs of ACL Anthology

Ray GG 9 Oct 09, 2022
YouTube Downloader Bot With Python

TG YᴏᴜTᴜʙᴇ Uᴘʟᴏᴀᴅᴇʀ * Commands YouTube for Audio & Video and sends it to telegram after receiving valid URL [Do not forwarded any just copy and paste

Pʀᴇᴅᴀᴛᴏʀ 5 Oct 21, 2022
利用python3,爬取并下载91porn网站上面的视频

91porn_python 利用python3,爬取并下载91porn网站上面的视频 增加爬取t66y论坛图片的脚本 该脚本支持一下功能: 支持多线程 下载视频有进度条显示 支持从特定页的特定视频开始下载 将m3u8和mp4格式的视频下载到不同文件夹,加以分类 自动过滤已经下载过的视频

253 Feb 23, 2021
Downloads .ksy files and their dependencies straight from the official kaitai-struct format gallery.

ksy-dl Downloads .ksy files and their dependencies straight from the official kaitai-struct format gallery. This tool will: Fetch any of the official

3 Jun 20, 2022
S3 file download with Python and access with VBA

S3 file download with Python and access with VBA This simple project is using the following stacks: Python AWS S3 VBA/Excel A Bitcoin API With this st

Julio Cesar Scheidt 0 Dec 07, 2021
YT-Spammer-Purge - Allows you easily scan for and delete scam comments using several methods

YouTube Spammer Purge What Is This? - Allows you to filter and search for spamme

4.3k Dec 31, 2022
Download h3t4y for later read

h3nt4y_dl Download h3nt4y for later read Tải h3nt4y về đọc thôi nào các bạn ơiiiiiiii! (Tải từ h**taivn nhé) Usage: python get_that_ht4i.py New versio

1 Dec 03, 2021
YT-Downloader is a Tool to download youtube video.

YT-Downloader YT-Downloader is a Tool to download youtube video.If you are looking for a simple video downloader tool Than This YT-Downloader may be u

Pradip Thapa 7 May 11, 2022
Ripurei is a free-to-use osu! replay downloader, that can be configured to download from any osu! server.

Ripurei Ripurei is a fully functional osu! replay downloader, fully capable of downloading from almost any osu! server. Functionality Timeline ✔️ Able

Thomas 0 Feb 11, 2022
A downloader for the ISIS service of TU Berlin

isis_dl A downloading utility for the ISIS tool of TU-Berlin. Version 0.4 Features Downloads all Material from all courses of your ISIS page. Efficien

1 Nov 06, 2021
An Inline Telegram bot that can download YouTube videos with permanent thumbnail support

Tube (YouTube Downloader) An Inline Telegram bot that can download YouTube videos with permanent thumbnail support About Bot need to be in Inline Mode

Renjith Mangal 30 Dec 14, 2022
Python script to automate youtube-dl downloads

Automated Download Tool !! Project status I am writing a new version of this program, which will solve several errors. The new version only supports G

Devil64-Dev 21 Sep 22, 2022
Youtube Downloader Telegram Bot 😉

Youtube Dl bot 😉 Prerequisite ffmpeg install dependencies pip3 install -r requirements.txt Setup Bot - Change configuration config.py File - insta

Aryan Vikash 285 Dec 06, 2022
A Quick demo of how to use the youtube_dl module in python.

youtube_dl python module demo A Quick demo of how to use the youtube_dl module in python. Whole documentation for the youtube_dl Installation git

7 Aug 27, 2021
File Downloader

File Downloader Watches a file containing download links and runs a command to download them. The link file is in form of: # comment DOWNLOAD_LINK

Pouriya 1 Jan 08, 2022