A Python client for the Softcite software mention recognizer server

Overview

Softcite software mention recognizer client

Python client for using the Softcite software mention recognition service. It can be applied to

  • individual PDF files

  • recursively to a local directory, processing all the encountered PDF

  • to a collection of documents harvested by biblio-glutton-harvester and article-dataset-builder, with the benefit of re-using the collection manifest for injectng metadata and keeping track of progress. The collection can be stored locally or on a S3 storage.

Requirements

The client has been tested with Python 3.5-3.7.

The client requires a working Softcite software mention recognition service. Service host and port can be changed in the config.json file of the client.

Install

cd software_mention_client/

It is advised to setup first a virtual environment to avoid falling into one of these gloomy python dependency marshlands:

virtualenv --system-site-packages -p python3 env

source env/bin/activate

Install the dependencies, use:

pip3 install -r requirements.txt

Usage and options

usage: software_mention_client.py [-h] [--repo-in REPO_IN] [--file-in FILE_IN]
                                  [--file-out FILE_OUT]
                                  [--data-path DATA_PATH] [--config CONFIG]
                                  [--reprocess] [--reset] [--load]
                                  [--diagnostic] [--scorched-earth]

Softcite software mention recognizer client

optional arguments:
  -h, --help            show this help message and exit
  --repo-in REPO_IN     path to a directory of PDF files to be processed by
                        the Softcite software mention recognizer
  --file-in FILE_IN     a single PDF input file to be processed by the
                        Softcite software mention recognizer
  --file-out FILE_OUT   path to a single output the software mentions in JSON
                        format, extracted from the PDF file-in
  --data-path DATA_PATH
                        path to the resource files created/harvested by
                        biblio-glutton-harvester
  --config CONFIG       path to the config file, default is ./config.json
  --reprocess           reprocessed failed PDF
  --reset               ignore previous processing states and re-init the
                        annotation process from the beginning
  --load                load json files into the MongoDB instance, the --repo-
                        in parameter must indicate the path to the directory
                        of resulting json files to be loaded
  --diagnostic          perform a full count of annotations and diagnostic
                        using MongoDB regarding the harvesting and
                        transformation process
  --scorched-earth      remove a PDF file after its successful processing in
                        order to save storage space, careful with this!

The logs are written by default in a file ./client.log, but the location of the logs can be changed in the configuration file (default ./config.json).

Processing local PDF files

For processing a single file., the resulting json being written as file at the indicated output path:

python3 software_mention_client.py --file-in toto.pdf --file-out toto.json

For processing recursively a directory of PDF files, the results will be:

  • written to a mongodb server and database indicated in the config file

  • and in the directory of PDF files, as json files, together with each processed PDF

python3 software_mention_client.py --repo-in /mnt/data/biblio/pmc_oa_dir/

The default config file is ./config.json, but could also be specified via the parameter --config:

python3 software_mention_client.py --repo-in /mnt/data/biblio/pmc_oa_dir/ --config ./my_config.json

Processing a collection of PDF harvested by biblio-glutton-harvester

biblio-glutton-harvester and article-dataset-builder creates a collection manifest as a LMDB database to keep track of the harvesting of large collection of files. Storage of the resource can be located on a local file system or on a AWS S3 storage. The software-mention client will use the collection manifest to process these harvested documents.

  • locally:

python3 software_mention_client.py --data-path /mnt/data/biblio-glutton-harvester/data/

--data-path indicates the path to the repository of data harvested by biblio-glutton-harvester.

The resulting JSON files will be enriched by the metadata records of the processed PDF and will be stored together with each processed PDF in the data repository.

If the harvested collection is located on a S3 storage, the access information must be indicated in the configuration file of the client config.json. The extracted software mention will be written in a file with extension .software.json, for example:

-rw-rw-r-- 1 lopez lopez 1.1M Aug  8 03:26 0100a44b-6f3f-4cf7-86f9-8ef5e8401567.pdf
-rw-rw-r-- 1 lopez lopez  485 Aug  8 03:41 0100a44b-6f3f-4cf7-86f9-8ef5e8401567.software.json

If a MongoDB server access information is indicated in the configuration file config.json, the extracted information will additionally be written in MongoDB.

License and contact

Distributed under Apache 2.0 license. The dependencies used in the project are either themselves also distributed under Apache 2.0 license or distributed under a compatible license.

Main author and contact: Patrice Lopez ([email protected])

Instagram Bot posting earthquakes with magnitude greater than or equal to 3.5.

Instagram Bot posting earthquakes with magnitude greater than or equal to 3.5

Alican Yüksel 4 Aug 22, 2022
Hack WhatsApp Account Easily(Android)

X-Whatsapp Hack WhatsApp Account Easily(Android) HOW TO RUN 👇 (Termux) pkg update && pkg upgrade pkg install python pkg install git git clone https:/

KiLL3R_xRO 72 Dec 21, 2022
Python client and module for BGP Ranking

Python client and module for BGP Ranking THis project will make querying BGP Ranking easier. Installation pip install pybgpranking Usage Command line

D4 project 3 Dec 16, 2021
Code done for/during the course

Serverless Course Autumn 2021 - Code This repository contains a set of examples developed during, but not limited to the live coding sessions. Lesson

Alexandru Burlacu 4 Dec 21, 2021
Telegram Client and Bot that use Artificial Intelligence to auto-reply to scammers and waste their time

scamminator Blocking a scammer is not enough. It is time to fight back. Wouldn't be great if there was a tool that uses Artificial Intelligence to rep

Federico Galatolo 6 Nov 12, 2022
A Phyton script working for stream twits from twitter by tweepy to mongoDB

twitter-to-mongo A python script that uses the Tweepy library to pull Tweets with specific keywords from Twitter's Streaming API, and then stores the

3 Feb 03, 2022
🛒 Bot de lista de compras compartilhada para o Telegram

Lista de Compras Lista de compras de Cuducos e Flávia. Comandos do bot Comando Descrição /add item Adiciona item à lista de compras /remove item

Eduardo Cuducos 4 Jan 15, 2022
E0 AI Bot is based on the message, it prints the answer with the highest probability using probability from the database.

E0 AI Chat Bot Based on the message, it prints the answer with the highest probability using probability from the database. Install on linux (Arch,Deb

Error 27 Dec 03, 2022
The bot I used to win a 3d printing filament giveaway.

Instagram-CommentBot-For-Giveaways This is the bot I used to win a 3d printer filament giveaway on Instagram. Usually giveaways require you to tag oth

Esad Yusuf Atik 1 Aug 01, 2022
Example of a discord bot in Python

discordbot.py Example of a discord bot in Python Requirements Python 3.8 or higher Discord Bot Setting Up Clone this repo or download the files Rename

Debert Jamie 1 Oct 23, 2021
My personal template for a discord bot, including an asynchronous database and colored logging :)

My personal template for a discord bot, including an asynchronous database and colored logging :)

Timothy Pidashev 9 Dec 24, 2022
Github-Checker - Simple Tool To Check If Github User Available Or Not

Github Checker Simple Tool To Check If Github User Available Or Not Socials: Lan

ميخائيل 7 Jan 30, 2022
Sample code helps get you started with a simple Python web service using AWS Lambda and Amazon API Gateway

Welcome to the AWS CodeStar sample web service This sample code helps get you started with a simple Python web service using AWS Lambda and Amazon API

0 Jan 20, 2022
OpenSea-Python-Bot - OpenSea Python Bot can be used in 2 modes

OpenSea-Python-Bot OpenSea Python Bot can be used in 2 modes. When --nft paramet

49 Feb 10, 2022
🔍 📊 Look up information about anime, manga and much more directly in Discord!

AniSearch The source code of the AniSearch Discord Bot. Contribute You have an idea or found a bug? Open a new issue with detailed explanation. You wa

私はレオンです 19 Dec 07, 2022
Instagram-follower-bot - An Instagram follower bot written in Python

Instagram Follower Bot An Instagram follower bot written in Python. The bot follows the follower of which account you want. e.g. (You want to follow @

Aytaç Kaşoğlu 1 Dec 31, 2021
Ark API Wrapper in Python

Pythark Ark API Wrapper in Python. Built with Python Requests Installation Pythark uses Arky to create a new transaction, if you want to use this feat

Jolan 14 Mar 11, 2021
Discord Bot Personnal Server - Ha-Neul

Haneul Bot, it's a discord for help me on my personnal discord, she do a lot of boring and repetitive stain. You can use on your own server if you want, you just need to find a host for the programm

Maxvyr 1 Feb 03, 2022
A Discord Token Spammer, multi webhooks compatibility, made in python +3.7. By Ezermoz

DiscordWebhookSpammer A Discord Token Spammer, multi webhooks compatibility, made in python +3.7. By Ezermoz Put you webhook in webhooks.txt if you wa

3 Nov 24, 2021
Telegram bot to scrape images from the reddit universe

Telegram bot to scrape images from the reddit universe

XD22 3 Sep 30, 2022