From heavy format to clean files
Every document review and classification project starts from a document. And we all know what real world documents look like! Badly scanned, rotated, tilted, shaded. Heavily formatted. With a rich layout. Thanks to unique Vision AI capabilities our technology is able to ingest and process files in “less than ideal conditions” and in several formats, including pdfs and word documents. We support both digital documents and scanned documents.
From images to text
Once the file is “cleaned”, our state-of-the-art OCR technology helps convert images of typed, handwritten or printed text into machine readable text. Our OCR system is based on Tesseract4, which is one of the most accurate open-source OCR engines available.
The next step is converting text into meaning, effectively understanding and extracting key points and obligations from any legal document.
A job for our NLP engine.