A Python client for the Softcite software mention recognizer server

Overview

Softcite software mention recognizer client

Python client for using the Softcite software mention recognition service. It can be applied to

  • individual PDF files

  • recursively to a local directory, processing all the encountered PDF

  • to a collection of documents harvested by biblio-glutton-harvester and article-dataset-builder, with the benefit of re-using the collection manifest for injectng metadata and keeping track of progress. The collection can be stored locally or on a S3 storage.

Requirements

The client has been tested with Python 3.5-3.7.

The client requires a working Softcite software mention recognition service. Service host and port can be changed in the config.json file of the client.

Install

cd software_mention_client/

It is advised to setup first a virtual environment to avoid falling into one of these gloomy python dependency marshlands:

virtualenv --system-site-packages -p python3 env

source env/bin/activate

Install the dependencies, use:

pip3 install -r requirements.txt

Usage and options

usage: software_mention_client.py [-h] [--repo-in REPO_IN] [--file-in FILE_IN]
                                  [--file-out FILE_OUT]
                                  [--data-path DATA_PATH] [--config CONFIG]
                                  [--reprocess] [--reset] [--load]
                                  [--diagnostic] [--scorched-earth]

Softcite software mention recognizer client

optional arguments:
  -h, --help            show this help message and exit
  --repo-in REPO_IN     path to a directory of PDF files to be processed by
                        the Softcite software mention recognizer
  --file-in FILE_IN     a single PDF input file to be processed by the
                        Softcite software mention recognizer
  --file-out FILE_OUT   path to a single output the software mentions in JSON
                        format, extracted from the PDF file-in
  --data-path DATA_PATH
                        path to the resource files created/harvested by
                        biblio-glutton-harvester
  --config CONFIG       path to the config file, default is ./config.json
  --reprocess           reprocessed failed PDF
  --reset               ignore previous processing states and re-init the
                        annotation process from the beginning
  --load                load json files into the MongoDB instance, the --repo-
                        in parameter must indicate the path to the directory
                        of resulting json files to be loaded
  --diagnostic          perform a full count of annotations and diagnostic
                        using MongoDB regarding the harvesting and
                        transformation process
  --scorched-earth      remove a PDF file after its successful processing in
                        order to save storage space, careful with this!

The logs are written by default in a file ./client.log, but the location of the logs can be changed in the configuration file (default ./config.json).

Processing local PDF files

For processing a single file., the resulting json being written as file at the indicated output path:

python3 software_mention_client.py --file-in toto.pdf --file-out toto.json

For processing recursively a directory of PDF files, the results will be:

  • written to a mongodb server and database indicated in the config file

  • and in the directory of PDF files, as json files, together with each processed PDF

python3 software_mention_client.py --repo-in /mnt/data/biblio/pmc_oa_dir/

The default config file is ./config.json, but could also be specified via the parameter --config:

python3 software_mention_client.py --repo-in /mnt/data/biblio/pmc_oa_dir/ --config ./my_config.json

Processing a collection of PDF harvested by biblio-glutton-harvester

biblio-glutton-harvester and article-dataset-builder creates a collection manifest as a LMDB database to keep track of the harvesting of large collection of files. Storage of the resource can be located on a local file system or on a AWS S3 storage. The software-mention client will use the collection manifest to process these harvested documents.

  • locally:

python3 software_mention_client.py --data-path /mnt/data/biblio-glutton-harvester/data/

--data-path indicates the path to the repository of data harvested by biblio-glutton-harvester.

The resulting JSON files will be enriched by the metadata records of the processed PDF and will be stored together with each processed PDF in the data repository.

If the harvested collection is located on a S3 storage, the access information must be indicated in the configuration file of the client config.json. The extracted software mention will be written in a file with extension .software.json, for example:

-rw-rw-r-- 1 lopez lopez 1.1M Aug  8 03:26 0100a44b-6f3f-4cf7-86f9-8ef5e8401567.pdf
-rw-rw-r-- 1 lopez lopez  485 Aug  8 03:41 0100a44b-6f3f-4cf7-86f9-8ef5e8401567.software.json

If a MongoDB server access information is indicated in the configuration file config.json, the extracted information will additionally be written in MongoDB.

License and contact

Distributed under Apache 2.0 license. The dependencies used in the project are either themselves also distributed under Apache 2.0 license or distributed under a compatible license.

Main author and contact: Patrice Lopez ([email protected])

A pypi package that helps in generating discord bots.

A pypi package that helps in generating discord bots.

KlevrHQ 3 Nov 17, 2021
RChecker - Checker for minecraft servers

๐Ÿ”Ž RChecker v1.0 Checker for Minecraft Servers ๐Ÿ’ป Supported operating systems: โœ…

Pedro Vega 1 Aug 30, 2022
Python client for the Datadog API

datadog-api-client-python This repository contains a Python API client for the Datadog API. The code is generated using openapi-generator and apigento

Datadog, Inc. 58 Dec 16, 2022
A webhook API for Discord.

Webhook API A webhook API for Discord. Requirements requests Usage

1 Feb 08, 2022
A bot written in python that send prefilled Google Forms. It supports multithreading for faster execution time.

GoogleFormsBot https://flassy.xyz https://github.com/Shawey/GoogleFormsBot Requirements: os (Default) ast (Default) threading (Default) configparser (

Shawey 1 Jul 10, 2022
An open source, multipurpose, configurable discord bot that does it all

Spacebot - Discord Bot Music, Moderation, Fun, Utilities, Games and Fully Configurable. Overview โ€ข Contributing โ€ข Self hosting โ€ข Documentation (not re

Dhravya Shah 41 Dec 10, 2022
Automatic Video Library Manager for TV Shows

Automatic Video Library Manager for TV Shows. It watches for new episodes of your favorite shows, and when they are posted it does its magic. Dependen

1.5k Dec 22, 2022
Cryptocurrency Trading Bot - A trading bot to automate cryptocurrency trading strategies using Python, equipped with a basic GUI

Cryptocurrency Trading Bot - A trading bot to automate cryptocurrency trading strategies using Python, equipped with a basic GUI. Used REST and WebSocket API to connect to two of the most popular cry

Francis 8 Sep 15, 2022
This repository contains free labs for setting up an entire workflow and DevOps environment from a real-world perspective in AWS

DevOps-The-Hard-Way-AWS This tutorial contains a full, real-world solution for setting up an environment that is using DevOps technologies and practic

Mike Levan 1.6k Jan 05, 2023
Spodcast is a caching Spotify podcast to RSS proxy

Spodcast Spodcast is a caching Spotify podcast to RSS proxy. Using Spodcast you can follow Spotify-hosted netcasts/podcasts using any player which sup

Frank de Lange 260 Jan 01, 2023
๐€ ๐ฆ๐จ๐๐ฎ๐ฅ๐š๐ซ ๐“๐ž๐ฅ๐ž๐ ๐ซ๐š๐ฆ ๐†๐ซ๐จ๐ฎ๐ฉ ๐ฆ๐š๐ง๐š๐ ๐ž๐ฆ๐ž๐ง๐ญ ๐›๐จ๐ญ ๐ฐ๐ข๐ญ๐ก ๐ฎ๐ฅ๐ญ๐ข๐ฆ๐š๐ญ๐ž ๐Ÿ๐ž๐š๐ญ๐ฎ๐ซ๐ž๐ฌ !!

๐‡๐จ๐ฐ ๐“๐จ ๐ƒ๐ž๐ฉ๐ฅ๐จ๐ฒ For easiest way to deploy this Bot click on the below button ๐Œ๐š๐๐ž ๐๐ฒ ๐’๐ฎ๐ฉ๐ฉ๐จ๐ซ๐ญ ๐†๐ซ๐จ๐ฎ๐ฉ ๐’๐จ๐ฎ๐ซ๐œ๐ž๐ฌ ๐†๐ž๐ง๐ž?

Mukesh Solanki 1 Dec 10, 2021
Clipboard-watcher - Keep an eye on the apps that are using your clipboard

clipboard-watcher This repository contains the code of an experiment, in order t

Gonรงalo Valรฉrio 48 Oct 13, 2022
A full-fledged discord bot with moderation and a lot more.

HOT-BOT-POL-POT โญ Star me on GitHub m'lady.... hot-bot-pol-pot is a moderation discord bot written using enhanced-dpy library with many functionalitie

Pure Cheekbones 4 Oct 08, 2022
pyDuinoCoin is a simple python integration for the DuinoCoin REST API, that allows developers to communicate with DuinoCoin Master Server

PyDuinoCoin PyDuinoCoin is a simple python integration for the DuinoCoin REST API, that allows developers to communicate with DuinoCoin Main Server. I

BackrndSource 6 Jul 14, 2022
Easily report Instagram pages and close the page

Program Features - ๐Ÿ“Œ Delete target post on Instagram. - ๐Ÿ“Œ Delete Media Target post on Instagram - ๐Ÿ“Œ Complete deletion of the target account on Inst

hack4lx 11 Nov 25, 2022
Tools for Twitter

Tools for Twitter Data This is a start of a collection of tools to use for collecting data via the Twitter API. If you do not have a Twitter Developer

DiscoverText 36 Oct 13, 2022
Stop writing scripts to interact with your APIs. Call them as CLIs instead.

Zum Stop writing scripts to interact with your APIs. Call them as CLIs instead. Zum (German word roughly meaning "to the" or "to" depending on the con

Daniel Leal 84 Nov 17, 2022
Code for "Multimodal Trajectory Prediction Conditioned on Lane-Graph Traversals," CoRL 2021.

Multimodal Trajectory Prediction Conditioned on Lane-Graph Traversals This repository contains code for "Multimodal trajectory prediction conditioned

Nachiket Deo 113 Dec 28, 2022
DongTai API SDK For Python

DongTai-SDK-Python Quick start You need a config file config.json { "DongTai":{ "token":"your token", "url":"http://127.0.0.1:90"

huoxian 50 Nov 24, 2022
Ethone-Selfbot - Open Source Discord Self-Bot, written in discord.py

Ethone SB Table of contents Newest open-source Discord SelfBot with useful commands and easy documentation on how to add your own and change the exist

Ethone 3 Jan 08, 2022