PubMed Mapper: A Python library that map PubMed XML to Python object

Last update: Dec 08, 2022

Related tags

Database Drivers pubmed-mapper

Overview

pubmed-mapper: A Python Library that map PubMed XML to Python object

中文文档

1. Philosophy

view UML

Programmatically access PubMed article is a common task for me. Luckily, with the help of eutils, we can access full article data in XML format. What I need is Python objects, not just XML strings, so pubmed-mapper was born.

2. Installation

pip install pubmed-mapper

3. Usage

3.1 use as library

3.1.1 parse a PubMed ID

from pubmed_mapper import Article


article = Article.parse_pmid('32329900')

# PubMed ID
print(article.pmid)  # 32329900

# ids
print(article.ids)  # [pubmed: 32329900, doi: 10.1111/jgs.16467]
print(article.ids[1].id_type)  # doi
print(article.ids[1].id_value)  # 10.1111/jgs.16467

# title
print(article.title)  # Associations of Coffee...

# abstract
print(article.abstract)  # <p><strong>Background: </strong>Coffee and tea...

# keywords
print(article.keywords)  # ['aging', 'coffee; diet; longevity', 'tea']

# MeSH headings
print(article.mesh_headings)  # ['Aged', 'Body Mass Index', '...']

# authors
print(article.authors)  # [hadyab AH Aladdin H, Manson JE JoAnn E, ...]
print(article.authors[0].last_name)  # Shadyab
print(article.authors[0].forename)  # Aladdin H
print(article.authors[0].initials)  # AH
print(article.authors[0].affiliation)  # Department of Family...

# journal
print(article.journal)  # Journal of the American Geriatrics Society
print(article.journal.issn)  # 1532-5415
print(article.journal.issn_type)  # Electronic
print(article.journal.title)  # Journal of the American Geriatrics Society
print(article.journal.abbr)  # J Am Geriatr Soc

# volume
print(article.volume)  # 68

# issue
print(article.issue)  # 9

# references
print(article.references)  # [n. 2013;129:643-659....]
print(article.references[0].citation)  # Lotfield E, Freedman ND...
print(article.references[0].ids)  # []

# pubdate
print(article.pubdate)  # 2020-09-01

3.1.2 parse a downloaded XML file

from lxml import etree
from pubmed_mapper import Article


infile = 'xxx.xml'
with open(infile) as fp:
    root = etree.parse(fp)


articles = []
for pubmed_article_element in root.xpath('/PubmedArticleSet/PubmedArticle'):
    article =  Article.parse_element(pubmed_article_element)
    articles.append(article)

3.2 use as command line software

3.2.1 parse PubMed ID

pubmed-mapper pmid -p 32329900

3.2.2 parse single PubMed XML file

pubmed-mapper file -i data/pubmed21n0001.xml -o output/pubmed21n0001.jl

3.2.3 parse a directory who contains multiple PubMed XML files

pubmed-mapper directory -i data/ -o output/pubmed-mapper.jl

4. FAQs

4.1 There many types of PubMed article publication date, how do you convert it to datetime.date object?

Parse publication date is a hard work, until now pubmed-mapper can't parse all types of them. The types pubmed-mapper can be parsed and the parsed value are:

type	value
2021-03-13	2021-03-13
2021-03	2021-03-01
2021 Spring	2021-04-01
2021	2021-01-01
2021 Jan-Feb	2021-01-01
2021 Mar 13-15	2021-03-13
2021 Mar-2022 Jan	2021-03-01
2021-2022	2021-01-01
2021 Mar 13-Dec 15	2021-03-13
1976-1977 Winter	1976-01-01
1977-1978 Fall-Winter	1977-10-01

4.2 What is pubmed-mapper.log generated by pubmed-mapper?

pubmed-mapper.log is the default log file generate by pubmed-mapper, you can change the file by using --log-file options:

pubmed-mapper --log-file my-custom.log file -i data/pubmed21n0001.xml -o output/pubmed21n0001.jl

You can go to this log file to find out more parsing details.

4.3 I want log detail message in my log file?

Using --log-level can log more detail message:

pubmed-mapper --log-file my-custom.log --log-level DEBUG file -i data/pubmed21n0001.xml -o output/pubmed21n0001.jl

PubMed Mapper: A Python library that map PubMed XML to Python object

Related tags

Overview

pubmed-mapper: A Python Library that map PubMed XML to Python object

1. Philosophy

2. Installation

3. Usage

3.1 use as library

3.1.1 parse a PubMed ID

3.1.2 parse a downloaded XML file

3.2 use as command line software

3.2.1 parse PubMed ID

3.2.2 parse single PubMed XML file

3.2.3 parse a directory who contains multiple PubMed XML files

4. FAQs

4.1 There many types of PubMed article publication date, how do you convert it to datetime.date object?

4.2 What is pubmed-mapper.log generated by pubmed-mapper?

4.3 I want log detail message in my log file?

Owner

灵魂工具人

DataStax Python Driver for Apache Cassandra

SpyQL - SQL with Python in the middle

SQL queries to collections

A Relational Database Management System for a miniature version of Twitter written in MySQL with CLI in python.

Making it easy to query APIs via SQL

Micro ODM for MongoDB

Official Python low-level client for Elasticsearch

An extension package of 🤗 Datasets that provides support for executing arbitrary SQL queries on HF datasets

Estoult - a Python toolkit for data mapping with an integrated query builder for SQL databases

A supercharged SQLite library for Python

The Database Toolkit for Python

Redis Python Client - The Python interface to the Redis key-value store.

DBMS Mini-project: Recruitment Management System

A tiny python web application based on Flask to set, get, expire, delete keys of Redis database easily with direct link at the browser.

A wrapper for SQLite and MySQL, Most of the queries wrapped into commands for ease.

Google Sheets Python API v4

Python MYSQL CheatSheet.

Python script to clone SQL dashboard from one workspace to another

SQL for Humans™

Pony Object Relational Mapper