A Python client for the Softcite software mention recognizer server

Overview

Softcite software mention recognizer client

Python client for using the Softcite software mention recognition service. It can be applied to

  • individual PDF files

  • recursively to a local directory, processing all the encountered PDF

  • to a collection of documents harvested by biblio-glutton-harvester and article-dataset-builder, with the benefit of re-using the collection manifest for injectng metadata and keeping track of progress. The collection can be stored locally or on a S3 storage.

Requirements

The client has been tested with Python 3.5-3.7.

The client requires a working Softcite software mention recognition service. Service host and port can be changed in the config.json file of the client.

Install

cd software_mention_client/

It is advised to setup first a virtual environment to avoid falling into one of these gloomy python dependency marshlands:

virtualenv --system-site-packages -p python3 env

source env/bin/activate

Install the dependencies, use:

pip3 install -r requirements.txt

Usage and options

usage: software_mention_client.py [-h] [--repo-in REPO_IN] [--file-in FILE_IN]
                                  [--file-out FILE_OUT]
                                  [--data-path DATA_PATH] [--config CONFIG]
                                  [--reprocess] [--reset] [--load]
                                  [--diagnostic] [--scorched-earth]

Softcite software mention recognizer client

optional arguments:
  -h, --help            show this help message and exit
  --repo-in REPO_IN     path to a directory of PDF files to be processed by
                        the Softcite software mention recognizer
  --file-in FILE_IN     a single PDF input file to be processed by the
                        Softcite software mention recognizer
  --file-out FILE_OUT   path to a single output the software mentions in JSON
                        format, extracted from the PDF file-in
  --data-path DATA_PATH
                        path to the resource files created/harvested by
                        biblio-glutton-harvester
  --config CONFIG       path to the config file, default is ./config.json
  --reprocess           reprocessed failed PDF
  --reset               ignore previous processing states and re-init the
                        annotation process from the beginning
  --load                load json files into the MongoDB instance, the --repo-
                        in parameter must indicate the path to the directory
                        of resulting json files to be loaded
  --diagnostic          perform a full count of annotations and diagnostic
                        using MongoDB regarding the harvesting and
                        transformation process
  --scorched-earth      remove a PDF file after its successful processing in
                        order to save storage space, careful with this!

The logs are written by default in a file ./client.log, but the location of the logs can be changed in the configuration file (default ./config.json).

Processing local PDF files

For processing a single file., the resulting json being written as file at the indicated output path:

python3 software_mention_client.py --file-in toto.pdf --file-out toto.json

For processing recursively a directory of PDF files, the results will be:

  • written to a mongodb server and database indicated in the config file

  • and in the directory of PDF files, as json files, together with each processed PDF

python3 software_mention_client.py --repo-in /mnt/data/biblio/pmc_oa_dir/

The default config file is ./config.json, but could also be specified via the parameter --config:

python3 software_mention_client.py --repo-in /mnt/data/biblio/pmc_oa_dir/ --config ./my_config.json

Processing a collection of PDF harvested by biblio-glutton-harvester

biblio-glutton-harvester and article-dataset-builder creates a collection manifest as a LMDB database to keep track of the harvesting of large collection of files. Storage of the resource can be located on a local file system or on a AWS S3 storage. The software-mention client will use the collection manifest to process these harvested documents.

  • locally:

python3 software_mention_client.py --data-path /mnt/data/biblio-glutton-harvester/data/

--data-path indicates the path to the repository of data harvested by biblio-glutton-harvester.

The resulting JSON files will be enriched by the metadata records of the processed PDF and will be stored together with each processed PDF in the data repository.

If the harvested collection is located on a S3 storage, the access information must be indicated in the configuration file of the client config.json. The extracted software mention will be written in a file with extension .software.json, for example:

-rw-rw-r-- 1 lopez lopez 1.1M Aug  8 03:26 0100a44b-6f3f-4cf7-86f9-8ef5e8401567.pdf
-rw-rw-r-- 1 lopez lopez  485 Aug  8 03:41 0100a44b-6f3f-4cf7-86f9-8ef5e8401567.software.json

If a MongoDB server access information is indicated in the configuration file config.json, the extracted information will additionally be written in MongoDB.

License and contact

Distributed under Apache 2.0 license. The dependencies used in the project are either themselves also distributed under Apache 2.0 license or distributed under a compatible license.

Main author and contact: Patrice Lopez ([email protected])

Discord Selfbot, 90+ commands

Setting the bot up. STEP 1: copy the directory yook.club selfbot was downloaded and extracted into, open cmd and type "cd " then paste. STEP 2: python

yook 1 Dec 12, 2021
Python powered spreadsheets

Marmir is powerful and fun Marmir takes Python data structures and turns them into spreadsheets. It is xlwt and google spreadsheets on steroids. It al

Brian Ray 170 Dec 14, 2022
An Async Bot/API wrapper for Twitch made in Python.

TwitchIO is an asynchronous Python wrapper around the Twitch API and IRC, with a powerful command extension for creating Twitch Chat Bots. TwitchIO co

TwitchIO 590 Jan 03, 2023
Based on nonebot, a common bot framework for maimai.

mai bot 使用指南 此 README 提供了最低程度的 mai bot 教程与支持。 Step 1. 安装 Python 请自行前往 https://www.python.org/ 下载 Python 3 版本( 3.7)并将其添加到环境变量(在安装过程中勾选 Add to system P

Diving-Fish 150 Jan 01, 2023
Python SDK for the Buycoins API.

This library provides easy access to the Buycoins API using the Python programming language. It provides all the feature of the API so that you don't need to interact with the API directly. This libr

Musa Rasheed 48 May 04, 2022
OMDB-and-TasteDive-Mashup - Mashing up data from two different APIs to make movie recommendations.

OMDB-and-TasteDive-Mashup This hadns-on project is in the Python 3 Programming Specialization offered by University of Michigan via Coursera. Mashing

Eszter Pai 1 Jan 05, 2022
Crosschat - A bot for cross-server communication

CrossChat A bot for cross-server communication. Running the bot To run the bot y

8 May 15, 2022
A simple Discord Mass-Ban that's still working with Member Scraper.

Mass-Ban [!] This was made for education / you can use for revenge. Please don't skid it. [!] If you want to use it, please use member scraper before

WoahThatsHot 1 Nov 20, 2021
Discord-Token-Formatter - A simple script to convert discord tokens from email token to token only format

Discord-Token-Formatter A simple script to convert discord tokens from email:pas

2 Oct 23, 2022
Powerful Telegram userbot to turn your PROFILE PICTURE & LAST NAME into a real time clock & to change your BIO automatically.

DATE_TIME_USERBOT-TeLeTiPs Powerful Telegram userbot to turn your PROFILE PICTURE & LAST NAME into a real time clock & to change your BIO automaticall

53 Jan 05, 2023
Utilizing the freqtrade high-frequency cryptocurrency trading framework to build and optimize trading strategies. The bot runs nonstop on a Rasberry Pi.

Freqtrade Strategy Repository Please test all scripts and dry run them before using them in live mode Contact me on discord if you have any questions!

Michael Fourie 90 Jan 01, 2023
Die wichtigsten APIs Deutschlands in einem Python Paket.

Deutschland A python package that gives you easy access to the most valuable datasets of Germany. Installation pip install deutschland Geographic data

Bundesstelle für Open Data 921 Jan 08, 2023
A slack bot that notifies you when a restaurant is available for orders

Slack Wolt Notifier A Slack bot that notifies you when a Wolt restaurant or venue is available for orders. How does it work? Slack supports bots that

Gil Matok 8 Oct 24, 2022
Telegram bot to check availability of vaccination slots in India.

cowincheckbot Telegram bot to check availability of vaccination slots in India. Setup Install requirements using pip3 install -r requirements.txt Crea

Muhammed Shameem 10 Jun 11, 2022
a simple python script that monitors the binance hotwallet and refunds the withdrawal fee to encourage people to withdraw their Nano and help decentralisation

Nano_Binance_Refund_Bot a simple python script that monitors the binance hotwallet and refunds the withdrawal fee to encourage people to withdraw thei

James Coxon 5 Apr 07, 2022
A small module to communicate with Triller's API

A small, UNOFFICIAL module to communicate with Triller's API. I plan to add more features/methods in the future.

A3R0 1 Nov 01, 2022
Python script to backup/convert your Spotify playlists into the XSPF format.

Python script to backup/convert your Spotify playlists into the XSPF format.

Chris Ovenden 4 Jun 09, 2022
a simple floating window for watch cryptocurrency price

floating-monitor with cryptocurrency 浮動視窗虛擬貨幣價格監控 a floating monitor window to show price of cryptocurrency. use binance api to get price 半透明的浮動視窗讓你方便

Lin_Yi_Shen 1 Oct 22, 2021
🤟The VC Music Source code of @DaisyXBot ❤️ v3 Out now

DAISYXMUSIC V3 🎵 A bot that can play music on telegram group's voice call Available on telegram as @DaisyXbot Whats new 🔥 Thumbnail Support Playlist

TeamDaisyX 207 Dec 05, 2022
A Python wrapper around the Pushbullet API to send different types of push notifications to your phone or/and computer.

pushbullet-python A Python wrapper around the Pushbullet API to send different types of push notifications to your phone or/and computer. Installation

Janu Lingeswaran 1 Jan 14, 2022