This repository provides a set functions to extract paragraphs from AWS Textract responses.

Overview

extract-paragraphs-with-aws-textract

Since AWS Textract (the AWS OCR service) does not have a native function to extract paragraphs, this repository provides a set of Python 3.X functions built on top of the AWS Python SDK (boto3) to extract paragraphs from AWS Textract responses.

PLEASE NOTE THAT:

  1. It is assumed that your client has the neccesary IAM permissions to access the different AWS resources required.
  2. Since AWS Textract analyze PDF files by running asynchronous operations, the current version assumes that you've already created an s3 bucket and that the PDF files are already stored there. If not, please go to the boto3 docs to know how to create a bucket as well as upload files.
  3. The paragraph_constructor is an ad hoc function for my use case. You may have to adapt it based on the space between lines in your data.

UPCOMING FEATURES:

  • Address abstract cases with the paragrpah_constructor function.
  • Export data in different formats.
  • AWS CloudFormation template for a serverless architecture to execute the functions when a new object is uploaded in your S3 bucket.

Please feel free to suggest new features or improvements to the current code. <3

Owner
Juan Anzola
Juan Anzola
Exports saved posts and comments on Reddit to a csv file.

reddit-saved-to-csv Exports saved posts and comments on Reddit to a csv file. Columns: ID, Name, Subreddit, Type, URL, NoSFW ID: Starts from 1 and inc

70 Jan 02, 2023
This is new discord nitro generator for discord

Hello! This is new discord nitro generator for discord. If you want use it, To generator i added checker for no seraching generator and checker. This tool maked by .

ItzBolt 1 Jan 16, 2022
A multipurpose Telegram Bot written in Python for mirroring files on the Internet to Google Drive

Mirror Leech Bot Mirror Leech Bot is a multipurpose Telegram Bot written in Python for mirroring files on the Internet to our beloved Google Drive. Ba

1 Jan 01, 2022
Track player's stats, find out when they're online and grinding!

Hypixel Stats Tracker Track player's stats, find out when they're online and playing games! INFO Showcase Server: https://discord.gg/yY5qQHPar6 Suppor

4 Dec 18, 2022
Information about the weather in a city written using Python

Information about the weather in a city Enter the desired city Climate information of the target city This program is written using Python programming

Amir Hussein Sharifnezhad 4 Nov 17, 2021
A Telegram Repo For Devs To Controll The Bots Under Maintenance.This Bot Is For Developers, If Your Bot Is Down, Use This Repo To Give Your Dear Subscribers Some Support By Providing Them Response.

Maintenance Bot A Telegram Repo For Devs To Controll The Bots Under Maintenance About This Bot This Bot Is For Developers, If Your Bot Is Down, Use Th

Vɪᴠᴇᴋ 47 Dec 29, 2022
Un petit tool qui est la pour envoier des message avec des webhook en bêta

📎 Webhook-discord Le but de se tool c'est que tu peux envoier vos webhook discord sur vos serveur et les customiser Pour lancer le projet il faut avo

2 Oct 10, 2021
A telegram bot to download mega.nz links. (made with pyrogram).

Mega Link Downloader Bot This is a telegram bot to download mega.nz links and return them as files/videos - Made by a 100% noob! (When I mean noob I r

171 Dec 27, 2022
Azure free vpn for students only! (Self hosted/No sketchy services/Fast and free)

Azpn-Azure-Free-VPN Azure free vpn for students only! (Self hosted/No sketchy services/Fast and free) This is an alternative secure way of accessing f

Harishankar Kumar 6 Mar 19, 2022
通过GitHub的actions 自动采集节点 生成订阅信息

VmessActions 通过GitHub的actions 自动采集节点 自动生成订阅信息 订阅内容自动更新再仓库的 clash.yml 和 v2ray.txt 中 然后PC端/手机端根据自己的软件支持的格式,订阅对应的链接即可

skywolf627 372 Jan 04, 2023
Moderation By Pokemon Bot (Discord)

Moderation Bot By Pokémon Bot (Discord) Official Moderation Bot for Pokemon Bot functional and based in the Discord Server, the bot is written in Pyth

Aakash Manoj Agrawal 6 Jan 04, 2022
Jira-cache - Jira cache with python

Direct queries to Jira have two issues: they are sloooooow many queries are impo

John Scott 6 Oct 08, 2022
b2blaze

b2blaze Welcome to the b2blaze library for Python. Backblaze B2 provides the cheapest cloud object storage and transfer available on the internet. Com

George Sibble 603 Jan 03, 2023
Automatically scrape all of your artifacts in Genshin Impact.

Genshin Artifact Scraper Automatically scrape all of your artifacts in Genshin Impact. Features: Simple recalibration (2 steps). GUI to select OCR reg

21 Dec 17, 2022
Bot interpretation of the carbon.now.sh site

📒 Source code of the @PicodeBot 🧸 Developer: @hoosnick Run $ git clone https://github.com/hoosnick/picodebot.git $ pip install -r requirements.txt P

Husniddin Murodov 13 Oct 02, 2022
A Next-Gen modular Python3 Telegram-Bot with Anime Theme to it.

Hsea Robot A modular Telegram Python bot running on python3 with a sqlalchemy database and an entirely themed persona to make Cutiepii suitable for An

Wahyusaputra 1 Dec 29, 2021
Use Node JS Keywords In Python!!!

Use Node JS Keywords In Python!!!

Sancho Godinho 1 Oct 23, 2021
Td-Ameritrade, Tradingview, Webhook, AWS Chalice

TDA-Autobot TDA-Autobot is an automated fire and forget trading mechanism utilizing Alex Golec's(Author) tda-api wrapper, Tradingview webhook alerts,

Kyle Jorgensen 2 Dec 12, 2021
Utility for downloading fanfiction in bulk from the Archive of Our Own

What is this? This is a program intended to help you download fanfiction from the Archive of Our Own in bulk. This program is primarily intended to wo

73 Dec 30, 2022
A discord bot written in python

arch-bot A discord bot written in python prefix: . help: .help Installation Requirements A discord bot token Your user id Python installed. For window

3 Jan 10, 2022