A command line tool written in python that reads a pdf/zip file and outputs a text file using tesseract OCR engine. Given an appropriate alias you can run Input and output OCR samples are available at ...
The goal is to be able to quickly extract all the available information in the document to a python dictionay. The dictionay can then be stored in a database or a csv file (for a later Machine ...