ocr-table
This project aims to extract tables from scanned image PDFs using Optical Character Recognition.
Install Requirements
-  Tesseract OCR sudo apt-get install tesseract-ocr 
-  Imagemagick sudo apt-get install imagemagick 
-  PDF Utilities sudo apt-get install poppler-utils 
-  Python packages sudo pip install -r requirements.txt 
Usage
-  Clear the pdf/ folder and copy all your pdf files to be scanned in it. 
-  Run the OCR: python3 shellocr.py 
-  The scanned text files shall be available in the txt/ folder once the process completes. 
Alternate
-  If the above doesn't work for you, try the alternate method. 
-  Save your file as input.pdf in the root directory. 
-  Run python3 pdf_miner.py