pubmex.py - a script to get a fancy paper title based on given DOI or PMID

Overview

Pu(b)mex

tag PyPI version

pubmex.py is a script to get a fancy paper title based on given DOI or PMID (can be also combined with macOS Finder)

Format of the title:

a first author . a last author - (title("dotted") or your customed title) . PMID . journal . year . pdf
e.g.
  Kelley.Scott.The.evolution.biology.shift.towards.engineering.prediction-generating.tools.away.traditional.research.practice.EMBORep.2008.pdf

Nowadays, it’s not a big issue, with all Mendeley and other tools, however...

I don’t want to put any PDF file collected on the way into my library, because then it gets super big (and then it’s hard to sync it for example with Dropbox). So now I can keep these PDF files into pdf-icebox and re-name them niecely automatically:

$ ls
Hnisz.Sharp.Phase.Separation.Model.Transcriptional.Control.Cell.2017.pdf
Sharp.Hockfield.Convergence.The.future.health.Science.2017.pdf

Usage:

./Balas.Johnson.Establishing.RNA-RNA.interactions.remodels.lncRNA.structure.promotes.PRC2.activity.SciAdv.2021.pdf ">
$ pubmex.py sharp2017.pdf
Sharp.Hockfield.Convergence.The.future.health.Science.2017.pdf
mv sharp2017.pdf --> ./Sharp.Hockfield.Convergence.The.future.health.Science.2017.pdf

$ pubmex.py Query.Konarska.pdf
mv Query.Konarska.pdf --> ./Smith.Konarska."Nought.may.endure.but.mutability".spliceosome.dynamics.regulation.splicing.MolCell.2008.pdf
    
$ pubmex.py eabc9191.full.pdf
mv  eabc9191.full.pdf --> ./Balas.Johnson.Establishing.RNA-RNA.interactions.remodels.lncRNA.structure.promotes.PRC2.activity.SciAdv.2021.pdf

DEPENDENCIES

INSTALLATION

pip install pubmex
# Ubuntu (Debian-based system)
apt-get install xclip python-biopython pdftotext
# macOS
brew install poppler biopython # or "sudo port install poppler biopython"

HISTORY

  • 1.4 Add osx-automator
  • 1.3 Fixed #4 #5
  • 1.2 Fixed #2
  • 1.1 Simplify input, pubmex.py *.pdf
  • 1.0 With recent bugfixes 2021
  • 0.3 OSX installation
  • 0.2 Small changes
  • 0.1 Init version in 2010! :-)
Comments
  • Automator not working

    Automator not working

    It seems that when using the automator installations that come with the pubmex the pubmex.py can not be found.

        for f in "$@"
        do
            pubmex.py $f
        done
    

    The following error is displayed:

    The action “Run Shell Script” encountered an error: “zsh:3: command not found: pubmex.py”

    When specifying the direct location of just the pubmex.py file another error occures.

        for f in "$@"
        do
            /users/suntim/miniforge3/bin/pubmex.py $f
        done
    

    The following error is displayed:

    The action “Run Shell Script” encountered an error: “”

    When specifying the direct location of python and the pubmex.py file another error occures.

        for f in "$@"
        do
            /usr/local/bin/python3 /users/suntim/miniforge3/bin/pubmex.py $f
        done
    

    The following error is displayed:

    The action “Run Shell Script” encountered an error: “Traceback (most recent call last): File "/users/suntim/miniforge3/bin/pubmex.py", line 27, in <module> from Bio import Entrez ModuleNotFoundError: No module named 'Bio'”

    I have all dependencies installed pip3 install pubmex, pip3 install biopython, brew install poppler. As it says in the readme.md that biopython should be isntalled via brew I assume that was a mistake. I instead installed it via pip3.

    The same error messages occure regardless of using the zsh or bash version.

    opened by LinusKaiser 2
  • Not found in PubMed, although DOI (.ORG/10.1016/J.BBAGRM.2015.08.009) was detected

    Not found in PubMed, although DOI (.ORG/10.1016/J.BBAGRM.2015.08.009) was detected

    [email protected]:~/Desktop/pdfs$ pubmex.py -a -r -f 1-s2.0-S1874939915001868-main.pdf ERROR: Not found in PubMed, although DOI (.ORG/10.1016/J.BBAGRM.2015.08.009) was detected in the pdf! Traceback (most recent call last): File "/home/magnus/bin/pubmex.py", line 472, in main() File "/home/magnus/bin/pubmex.py", line 451, in main title = get_title_auto_from_text(text, OPTIONS.debug, False, OPTIONS.keywords) File "/home/magnus/bin/pubmex.py", line 239, in get_title_auto_from_text return get_title_via_doi(doi, debug, reference, customed_title) File "/home/magnus/bin/pubmex.py", line 359, in get_title_via_doi pmid = get_pmid_via_doi_net(doi) File "/home/magnus/bin/pubmex.py", line 333, in get_pmid_via_doi_net return get_value('citation_pmid', content) TypeError: get_value() takes exactly 3 arguments (2 given)

    opened by mmagnus 2
  • Invalid git clone (edit: on windows machines)

    Invalid git clone (edit: on windows machines)

    The colon in 'demo/10.1261:rna.418407.pdf' causes problems in cloning from windows machines.

    Cloning into 'pubmex'... remote: Enumerating objects: 426, done. remote: Counting objects: 100% (9/9), done. remote: Total 426 (delta 8), reused 8 (delta 8), pack-reused 417 eceiving obj Receiving objects: 100% (426/426), 3.79 MiB | 2.86 MiB/s, done. Resolving deltas: 100% (252/252), done. error: invalid path 'demo/10.1261:rna.418407.pdf' fatal: unable to checkout working tree warning: Clone succeeded, but checkout failed. You can inspect what was checked out with 'git status' and retry with 'git restore --source=HEAD :/'

    opened by gcasale 1
  • ct200162x.pdf

    ct200162x.pdf

    (py37) [mx] rna$ pubmex.py ct200162x.pdf --debug
    filename: .......... ct200162x.pdf
    filename: .......... ct200162x.pdf
    doi: ............... ct200162x
    IdList.............. []
    pmid: .............. False
    ERROR: 		Not found in PubMed, although DOI (ct200162x) was detected in the pdf!
    generate ./temp.....[OK]
    out:
    err:
    temp is going to be opened
    doi_line: .......... DX.DOI.ORG/10.1021/CT200162X | J. CHEM. THEORY COMPUT. 2011, 7, 28862902
    doi is found: ...... 10.1021/CT200162X
    doi: ............... 10.1021/CT200162X
    IdList.............. ['21921995']
    pmid: .............. 21921995
    summary_dict........ {'Item': [], 'Id': '21921995', 'PubDate': '2011 Sep 13', 'EPubDate': '2011 Aug 2', 'Source': 'J Chem Theory Comput', 'AuthorList': ['Zgarbová M', 'Otyepka M', 'Sponer J', 'Mládek A', 'Banáš P', 'Cheatham TE 3rd', 'Jurečka P'], 'LastAuthor': 'Jurečka P', 'Title': 'Refinement of the Cornell et al. Nucleic Acids Force Field Based on Reference Quantum Chemical Calculations of Glycosidic Torsion Profiles.', 'Volume': '7', 'Issue': '9', 'Pages': '2886-2902', 'LangList': ['English'], 'NlmUniqueID': '101232704', 'ISSN': '1549-9618', 'ESSN': '1549-9626', 'PubTypeList': ['Journal Article'], 'RecordStatus': 'PubMed', 'PubStatus': 'ppublish+epublish', 'ArticleIds': {'pubmed': ['21921995'], 'medline': [], 'doi': '10.1021/ct200162x', 'pmc': 'PMC3171997', 'rid': '21921995', 'eid': '21921995', 'pmcid': 'pmc-id: PMC3171997;'}, 'DOI': '10.1021/ct200162x', 'History': {'pubmed': ['2011/09/17 06:00'], 'medline': ['2011/09/17 06:01'], 'received': '2011/03/08 00:00', 'entrez': '2011/09/17 06:00'}, 'References': [], 'HasAbstract': IntegerElement(1, attributes={}), 'PmcRefCount': IntegerElement(242, attributes={}), 'FullJournalName': 'Journal of chemical theory and computation', 'ELocationID': '', 'SO': '2011 Sep 13;7(9):2886-2902'}
    ERROR: 		Problem! The pubmex could not find automatically a title for the pdf file! Sorry!
    
    opened by mmagnus 0
  • gkz1184.pdf

    gkz1184.pdf

    (py37) [mx] rna$ pubmex.py gkz1184.pdf --debug
    filename: .......... gkz1184.pdf
    filename: .......... gkz1184.pdf
    doi: ............... gkz1184
    IdList.............. []
    pmid: .............. False
    ERROR: 		Not found in PubMed, although DOI (gkz1184) was detected in the pdf!
    generate ./temp.....[OK]
    out:
    err:
    temp is going to be opened
    doi_line: .......... 11641174 NUCLEIC ACIDS RESEARCH, 2020, VOL. 48, NO. 3 DOI: 10.1093/NAR/GKZ1184
    doi is found: ...... 10.1093/NAR/GKZ1184
    doi: ............... 10.1093/NAR/GKZ1184
    IdList.............. ['31889193']
    pmid: .............. 31889193
    summary_dict........ {'Item': [], 'Id': '31889193', 'PubDate': '2020 Feb 20', 'EPubDate': '', 'Source': 'Nucleic Acids Res', 'AuthorList': ['Reißer S', 'Zucchelli S', 'Gustincich S', 'Bussi G'], 'LastAuthor': 'Bussi G', 'Title': 'Conformational ensembles of an RNA hairpin using molecular dynamics and sparse NMR data.', 'Volume': '48', 'Issue': '3', 'Pages': '1164-1174', 'LangList': ['English'], 'NlmUniqueID': '0411011', 'ISSN': '0305-1048', 'ESSN': '1362-4962', 'PubTypeList': ['Journal Article'], 'RecordStatus': 'PubMed - indexed for MEDLINE', 'PubStatus': 'ppublish', 'ArticleIds': {'pubmed': ['31889193'], 'medline': [], 'pii': '5691221', 'doi': '10.1093/nar/gkz1184', 'pmc': 'PMC7026608', 'rid': '31889193', 'eid': '31889193', 'pmcid': 'pmc-id: PMC7026608;'}, 'DOI': '10.1093/nar/gkz1184', 'History': {'pubmed': ['2020/01/01 06:00'], 'medline': ['2020/03/20 06:00'], 'accepted': '2019/12/09 00:00', 'revised': '2019/12/05 00:00', 'received': '2019/10/14 00:00', 'entrez': '2020/01/01 06:00'}, 'References': [], 'HasAbstract': IntegerElement(1, attributes={}), 'PmcRefCount': IntegerElement(3, attributes={}), 'FullJournalName': 'Nucleic acids research', 'ELocationID': 'doi: 10.1093/nar/gkz1184', 'SO': '2020 Feb 20;48(3):1164-1174'}
    ERROR: 		Problem! The pubmex could not find automatically a title for the pdf file! Sorry!
    
    opened by mmagnus 0
  • some problem when I removed some prints to make the script quite

    some problem when I removed some prints to make the script quite

    (py37) [mx] d$ pubmex -p 10.1016/j.molcel.2020.11.004
    (py37) [mx] d$ pubmex -p 10.1016/j.molcel.2020.11.004 -d
    doi: ............... 10.1016/j.molcel.2020.11.004
    IdList.............. ['33259809']
    pmid: .............. 33259809
    summary_dict........ {'Item': [], 'Id': '33259809', 'PubDate': '2020 Dec 17', 'EPubDate': '2020 Nov 5', 'Source': 'Mol Cell', 'AuthorList': ['Ziv O', 'Price J', 'Shalamova L', 'Kamenova T', 'Goodfellow I', 'Weber F', 'Miska EA'], 'LastAuthor': 'Miska EA', 'Title': 'The Short- and Long-Range RNA-RNA Interactome of SARS-CoV-2.', 'Volume': '80', 'Issue': '6', 'Pages': '1067-1077.e5', 'LangList': ['English'], 'NlmUniqueID': '9802571', 'ISSN': '1097-2765', 'ESSN': '1097-4164', 'PubTypeList': ['Journal Article'], 'RecordStatus': 'PubMed - indexed for MEDLINE', 'PubStatus': 'ppublish+epublish', 'ArticleIds': {'pubmed': ['33259809'], 'medline': [], 'pii': 'S1097-2765(20)30782-6', 'doi': '10.1016/j.molcel.2020.11.004', 'pmc': 'PMC7643667', 'rid': '33259809', 'eid': '33259809', 'pmcid': 'pmc-id: PMC7643667;'}, 'DOI': '10.1016/j.molcel.2020.11.004', 'History': {'pubmed': ['2020/12/02 06:00'], 'medline': ['2021/01/12 06:00'], 'received': '2020/07/20 00:00', 'revised': '2020/10/05 00:00', 'accepted': '2020/10/29 00:00', 'entrez': '2020/12/01 20:08'}, 'References': [], 'HasAbstract': IntegerElement(1, attributes={}), 'PmcRefCount': IntegerElement(10, attributes={}), 'FullJournalName': 'Molecular cell', 'ELocationID': 'doi: 10.1016/j.molcel.2020.11.004', 'SO': '2020 Dec 17;80(6):1067-1077.e5'}
    Ziv.Miska.The.Short-Long-Range.RNA-RNA.Interactome.SARS-CoV-2.MolCell.2020.pdf
    
    bug 
    opened by mmagnus 0
Releases(1.4.2)
  • 1.4.2(Mar 15, 2022)

    Now you can see in Finder QuickAction pubmex to quick run it on a number of PDFs files.

    Install pubmex_zsh.workflow from pubmex/osx-automator/ for if you default shell is zsh, or pubmex_bash.workflow for bash.

    158028806-039d4ec6-caf5-446e-bcb0-835face858ee

    Source code(tar.gz)
    Source code(zip)
  • 1.4.1(Mar 12, 2022)

  • 1.4(Sep 27, 2021)

    Now you can see in Finder QuickAction pubmex to quick run it on a number of PDFs files.

    Install pubmex_zsh.workflow from pubmex/osx-automator/ for if you default shell is zsh, or pubmex_bash.workflow for bash.

    pubmex-osx-automator

    Source code(tar.gz)
    Source code(zip)
  • 1.3(Sep 26, 2021)

  • 1.2(Sep 14, 2021)

  • 1.1(Aug 18, 2021)

    Simplify input to pubmex.py *.pdf. Fixed #2

    Now, usage:

    $ pubmex.py sharp2017.pdf
    mv  sharp2017.pdf --> ./Sharp.Hockfield.Convergence.The.future.health.Science.2017.pdf
    
    $ pubmex.py  Query.Konarska.pdf
    mv  Query.Konarska.pdf --> Smith.Konarska."Nought.may.endure.but.mutability".spliceosome.dynamics.regulation.splicing.MolCell.2008.pdf
    
    $ pubmex.py eabc9191.full.pdf
    mv  eabc9191.full.pdf --> ./Balas.Johnson.Establishing.RNA-RNA.interactions.remodels.lncRNA.structure.promotes.PRC2.activity.SciAdv.2021.pdf
    
    Source code(tar.gz)
    Source code(zip)
  • 1.0(Jun 23, 2021)

    I don’t want to put any PDF file collected on the way into my library, because then it gets super big (and then it’s hard to sync it for example with Dropbox). So now I can keep these PDF files into pdf-icebox and re-name them niecely automatically:

    Usage:

    $ pubmex.py -a -f sharp2017.pdf -r
    mv  sharp2017.pdf --> ./Sharp.Hockfield.Convergence.The.future.health.Science.2017.pdf
    
    $ pubmex.py -a -f Query.Konarska.pdf -r
    mv  Query.Konarska.pdf --> Smith.Konarska."Nought.may.endure.but.mutability".spliceosome.dynamics.regulation.splicing.MolCell.2008.pdf
    
    $ pubmex.py -a -f eabc9191.full.pdf -r
    mv  eabc9191.full.pdf --> ./Balas.Johnson.Establishing.RNA-RNA.interactions.remodels.lncRNA.structure.promotes.PRC2.activity.SciAdv.2021.pdf
    

    .. and we get a file:

    Smith.Konarska."Nought.may.endure.but.mutability".spliceosome.dynamics.regulation.splicing.MolCell.2008.pdf

    Source code(tar.gz)
    Source code(zip)
Owner
Marcin Magnus
Ph.D., molecular biologist & bioinformatician, uses Pen & Paper and Emacs for notes, coding & RNA!
Marcin Magnus
Download Photo and Video from Wall of specific user or community

vkontakte-downloader Download Photo and Video from Wall of specific User or Community on https://vk.com Setup Clone the project git clone https://gith

4 Jul 20, 2022
File Downloader

File Downloader Watches a file containing download links and runs a command to download them. The link file is in form of: # comment DOWNLOAD_LINK

Pouriya 1 Jan 08, 2022
A simple contents download module using url for python

A simple contents download module using url for python

Fayas Noushad 16 Oct 20, 2022
Spy Ad Network - Spy Ad Network Detection With Python

Spy Ad Network Spy Ad Network Detection Jumps from link to link to access a site

Baris Dincer 2 Jan 13, 2022
Download all posts and comments in a subreddit

subreddit downloader This subreddit downloader downloads all posts and comments in a subreddit For a tutorial to use this program please follow this m

Guneet 6 Dec 16, 2022
Yahoo! Finance next gen python 3 / pandas market data downloader

Yahoo! Finance-ng python3 / pandas market data downloader Ever since Yahoo! finance decommissioned their historical data API, many programs that relie

Pedro Larroy 7 Dec 09, 2022
A collection of modules I have created to programmatically search for/download imagery from live cam feeds across the state of California.

A collection of modules that I have created to programmatically search for/download imagery from all publicly available live cam feeds across the state of California. In no way am I affiliated with a

Chad Groom 5 Nov 21, 2022
music downloader written in python. (Uses jiosaavn API)

music downloader written in python. (Uses jiosaavn API)

Rohn Chatterjee 35 Jul 20, 2022
Neon: an add-on for making it easier to handle component interactions

Neon Neon is an add-on for Lightbulb making it easier to handle component interactions. Installation pip install git+https://github.com/neonjonn/light

Neon Jonn 9 Apr 29, 2022
Aline file downloader automator!

AlineDorker Aline is used for donwloading files with google dorking , dowloading specified links such as dorks. Dependences: python3 installed pip ins

27 Nov 16, 2022
Simple tool downloads public PoC (refer from nomi-sec)

PoC Collection This is the little script to collect the proof-of-concept which is refered from nomi-sec. The repository now is only develop for linux-

2 Aug 17, 2022
Simple python script to download .mp3 formatted files from YouTube video URLs

Introduction: Simple python script to download .mp3 formatted files from YouTube video URLs Requirements: Requires: youtube_dl module Requires: ffmpeg

Pat 2 Aug 18, 2022
Downloads state flags from wikipedia for states/regions from all countries

world-state-flags Downloads state flags from wikipedia for states/regions from all countries This data is NOT curated Uses https://github.com/dr5hn/co

João Ribeiro Bezerra 2 Dec 15, 2022
Download h3t4y for later read

h3nt4y_dl Download h3nt4y for later read Tải h3nt4y về đọc thôi nào các bạn ơiiiiiiii! (Tải từ h**taivn nhé) Usage: python get_that_ht4i.py New versio

1 Dec 03, 2021
Newsemble is an API that provides easy access to the current news for programmatic analysis

Newsemble is an API that provides easy access to the current news for programmatic analysis. It has been built using Python, BeautifulSoup and MongoDB.

Rishabh 43 Dec 16, 2022
Download clips from youtube videos with a few clicks and a GUI!

YouClip v2.0.0 Table Of Contents: What Is YouClip Installation Usage Stuff To Fix Changelog What Is YouClip? ! IMPORTANT: The source files are a total

ador 2 Oct 05, 2021
Automatically download multiple papers by keywords in CVPR

Automatically download multiple papers by keywords in CVPR

46 Jun 08, 2022
Youtube Video Downloader Using Python Gui Appliction with progress Bar

Youtube-Video-Downloader Youtube Video Downloader Using Python Gui Appliction with progress Bar Module Used Pytube Tkinter Pil Urllib Bytes Io LICENSE

Community Programmer 6 Dec 19, 2022
mescrappy - Python + Selenium Youtube scraper

mescrappy - Python + Selenium Youtube scraper Youtube Sraping With Python (Selenium) Table of Contents About The Project Built With Getting Started In

Merdan Chariyarov 12 Nov 28, 2021
A python script that discovers hidden YouTube API clients. Just a research project.

YouTube-Internal-Clients A script that discovers hidden internal clients of the YouTube (Innertube) API using bruteforce methods. The script tries cli

David 97 Jan 02, 2023