A Python client for the Softcite software mention recognizer server

Overview

Softcite software mention recognizer client

Python client for using the Softcite software mention recognition service. It can be applied to

  • individual PDF files

  • recursively to a local directory, processing all the encountered PDF

  • to a collection of documents harvested by biblio-glutton-harvester and article-dataset-builder, with the benefit of re-using the collection manifest for injectng metadata and keeping track of progress. The collection can be stored locally or on a S3 storage.

Requirements

The client has been tested with Python 3.5-3.7.

The client requires a working Softcite software mention recognition service. Service host and port can be changed in the config.json file of the client.

Install

cd software_mention_client/

It is advised to setup first a virtual environment to avoid falling into one of these gloomy python dependency marshlands:

virtualenv --system-site-packages -p python3 env

source env/bin/activate

Install the dependencies, use:

pip3 install -r requirements.txt

Usage and options

usage: software_mention_client.py [-h] [--repo-in REPO_IN] [--file-in FILE_IN]
                                  [--file-out FILE_OUT]
                                  [--data-path DATA_PATH] [--config CONFIG]
                                  [--reprocess] [--reset] [--load]
                                  [--diagnostic] [--scorched-earth]

Softcite software mention recognizer client

optional arguments:
  -h, --help            show this help message and exit
  --repo-in REPO_IN     path to a directory of PDF files to be processed by
                        the Softcite software mention recognizer
  --file-in FILE_IN     a single PDF input file to be processed by the
                        Softcite software mention recognizer
  --file-out FILE_OUT   path to a single output the software mentions in JSON
                        format, extracted from the PDF file-in
  --data-path DATA_PATH
                        path to the resource files created/harvested by
                        biblio-glutton-harvester
  --config CONFIG       path to the config file, default is ./config.json
  --reprocess           reprocessed failed PDF
  --reset               ignore previous processing states and re-init the
                        annotation process from the beginning
  --load                load json files into the MongoDB instance, the --repo-
                        in parameter must indicate the path to the directory
                        of resulting json files to be loaded
  --diagnostic          perform a full count of annotations and diagnostic
                        using MongoDB regarding the harvesting and
                        transformation process
  --scorched-earth      remove a PDF file after its successful processing in
                        order to save storage space, careful with this!

The logs are written by default in a file ./client.log, but the location of the logs can be changed in the configuration file (default ./config.json).

Processing local PDF files

For processing a single file., the resulting json being written as file at the indicated output path:

python3 software_mention_client.py --file-in toto.pdf --file-out toto.json

For processing recursively a directory of PDF files, the results will be:

  • written to a mongodb server and database indicated in the config file

  • and in the directory of PDF files, as json files, together with each processed PDF

python3 software_mention_client.py --repo-in /mnt/data/biblio/pmc_oa_dir/

The default config file is ./config.json, but could also be specified via the parameter --config:

python3 software_mention_client.py --repo-in /mnt/data/biblio/pmc_oa_dir/ --config ./my_config.json

Processing a collection of PDF harvested by biblio-glutton-harvester

biblio-glutton-harvester and article-dataset-builder creates a collection manifest as a LMDB database to keep track of the harvesting of large collection of files. Storage of the resource can be located on a local file system or on a AWS S3 storage. The software-mention client will use the collection manifest to process these harvested documents.

  • locally:

python3 software_mention_client.py --data-path /mnt/data/biblio-glutton-harvester/data/

--data-path indicates the path to the repository of data harvested by biblio-glutton-harvester.

The resulting JSON files will be enriched by the metadata records of the processed PDF and will be stored together with each processed PDF in the data repository.

If the harvested collection is located on a S3 storage, the access information must be indicated in the configuration file of the client config.json. The extracted software mention will be written in a file with extension .software.json, for example:

-rw-rw-r-- 1 lopez lopez 1.1M Aug  8 03:26 0100a44b-6f3f-4cf7-86f9-8ef5e8401567.pdf
-rw-rw-r-- 1 lopez lopez  485 Aug  8 03:41 0100a44b-6f3f-4cf7-86f9-8ef5e8401567.software.json

If a MongoDB server access information is indicated in the configuration file config.json, the extracted information will additionally be written in MongoDB.

License and contact

Distributed under Apache 2.0 license. The dependencies used in the project are either themselves also distributed under Apache 2.0 license or distributed under a compatible license.

Main author and contact: Patrice Lopez ([email protected])

Discord Account Generator that will create Account with hCaptcha bypass. Using socks4 proxies

Account-Generator [!] This was made for education. Please use socks4 proxies for nice experiences. [!] Please install these modules - "pip3 install ht

RyanzSantos 10 Feb 23, 2022
A discord.py bot template with Cogs implemented.

discord-cogs-template A discord.py bot template with Cogs implemented. Instructions Before you start ⚠ Basic knowledge of python is required. Steps If

censor 2 Sep 02, 2022
This is a simple collection of instructions and scripts to accompany the computerphile video about mininet and openflow.

How to get going. This project should work on Linux or MacOS. I used Ubuntu 20.04 and provide some notes here. Note, this is certainly not intended as

Richard G. Clegg 70 Jan 02, 2023
Fully undetected auto skillcheck hack for dead by daylight that works decently well

Auto-skillcheck was made by Love ❌ code ✅ ❔ ・How to use Start off by installing python ofc Open cmd in the same directory and type pip install -r requ

Rdimo 10 Aug 13, 2022
Join & Leave spam for aminoapps using aminoboi

JLspam.py Join & Leave spam for https://aminoapps.com using aminoboi Instalação apt-get update -y apt-get upgrade -y apt-get install git pkg install

Moleey 1 Dec 21, 2021
Бот - Гуль для твоего телеграм аккаунта

Я - Гуль (бот), теперь работает в чатах Отблагодарить автора за проделанную работу можно здесь Помощь с установкой тут Установка на Андроид После уста

57 Nov 06, 2022
Simple Python Auto Follow Bot

Instagram-Auto-Follow-Bot Description Một IG BOT đơn giản. Tự động follow những người mà bạn muốn cướp follow. Tự động unfollow. Tự động đăng nhập vào

CodingLinhTinh 3 Aug 27, 2022
Instagram bot that upload images for you which scrape posts from 9gag meme website or other Instagram users , which is 24/7 Automated Runnable.

Autonicgram Automates your Instagram posts by taking images from sites like 9gag or other Instagram accounts and posting it onto your page. Features A

Mastermind 20 Sep 17, 2022
Python gets the friend's articles from hexo's friend-links

你是否经常烦恼于友链过多但没有时间浏览?那么友链朋友圈将解决这一痛点。你可以随时获取友链网站的更新内容,并了解友链的活跃情况。

129 Dec 28, 2022
Um script simples para consultar dados, com API's simples.

Info sobre o Script Esta é uma das mais simples ferramentas para consultar dados. Daqui um tempo eu farei um UPGRADE no painel, irei adicionar um banc

Crowley 6 Apr 11, 2022
A modular Telegram Python bot running on python3 with a sqlalchemy database.

Nao Tomori Robot Found Me On Telegram As Nao Tomori 🌼 A modular Telegram Python bot running on python3 with a sqlalchemy database. How to setup/deplo

Stinkyproject 1 Nov 24, 2021
Simple python program to execute terminal commands on telegram chats directly.

Small python code which can be handy when using telegram and you don't want to use VPS again and again. By configuring the code in your VPS, You can execute commands and get your output within telegr

Veshraj Ghimire 34 Dec 05, 2022
Authenticate your League of legends account on riot client in a few lines of code.

lol-authenticator v1.0.0 Content index Project Setup Dependencies Project Setup Dependencies Python v3.9.6 If you don't have Python installed on your

Cássio Fontoura 5 Aug 28, 2022
The Github repository for the Amari API wrapper.

Amari.py Amari.py is an async, easy to use API wrapper for the AmariBot. Installation Enter any of these commands to install the library: pip install

TheF1ng3r 5 Dec 19, 2022
A Discord Rich Presence App to set your own custom rich presence.

discord-rich-presence A Discord Rich Presence App to set your own custom rich presence. #BUILDS Ready to use package are available inside "finalpackag

1 Nov 22, 2021
Python client library for Postmark API

Postmarker Python client library for Postmark API. Gitter: https://gitter.im/Stranger6667/postmarker Installation Postmarker can be obtained with pip:

Dmitry Dygalo 109 Dec 13, 2022
A bot which provides online/offline and player status for Thicc SMP, using Replit.

AlynaaStatus A bot which provides online/offline and player status for Thicc SMP. Currently being hosted on Replit. How to use? Create a repl on Repli

QuanTrieuPCYT 8 Dec 15, 2022
A simple Python wrapper for the Amazon.com Product Advertising API ⛺

Amazon Simple Product API A simple Python wrapper for the Amazon.com Product Advertising API. Features An object oriented interface to Amazon products

Yoav Aviram 789 Dec 26, 2022
Automate and Manage Telegram Channels

Channel Automation Bot @ChannelAutomateBot A star ⭐ from you means a lot to us! Telegram bot to automate and manage channels. Usage Deploy to Heroku T

Stark Bots 61 Dec 29, 2022
Simple VK API wrapper for Python

VK Admier: documentation VK Admier is simple VK API wrapper for community bot development. Authorization You should create bot object from Client clas

Egor Light 2 Nov 10, 2022