Page Based Data Extraction
Humans read and understand documents based on the data and structure of words on a page. A computer “reads” using Optical Character Recognition (OCR) to produce a sequence of characters. OCR is just a mechanical process that doesn’t recognize the meaning of data. Creating synthetic understanding of data is a difficult task and we’ve done it.


Use Features and Data to create Information and Understanding
Your documents and data come from many places. Files stores, emails, Sharepoint, Google, Information Systems, Repositories and more. Organize and extract your data with Intellese.
Identify pages
What is a page?
- A physical Document page
- Scanned images
- An electronic document or file
- Structured / Unstrructurd
- Changing / Inconsistant
Identify Features
What is a Feature?
- Context
- Language
- Format
- Specific Data Element
- Content tagging attribute
- Named entiities
- N-grams
- Straucture elements
- Visual elments
Curate Data
How?
Use features to make data models that produce synthetic understanding of data.
Stream valuable data into workflows and downstream software applications.

Intelligent Document Processing
The Need: Curate consistent information and knowledge from any document source. The Problem: Variations in source data and layout make data integration from documents challenging.
The Solution: Augment human understanding of unstructured data using data science tools in each stage of document data integration.
Our Framework: Grooper data model engineering is based on page, paragraph, section, phrase, and text classification. Features are formed into data models for Grooper processing engines.
Data Science Workbench
The working environment where decisions get made. Relationships between internal and external data elements are formed to engineer understanding and context.
Natural Language Processing
Allows data collection from free-form documents in which data can exist anywhere on a page.
Table Extraction
Enables collection of full rows of data by utilizing fuzzy matching on individual columns
Classification
Lexical, rules-based, and visual classification options for transparent trainable document classification.
Signature Extractions
Dermines the presence or absence of a signature with great precision by dropping out lines and other elements near the signature.
Layered OCR
Many documents contain varying fonts, unaligned text, and handwriting. Collect more data with higher accuracy without the limitations of traditional OCR.
Fuzzy Regular Expression
Matches data correctly despite OCR misreads using transparentweighting algorithms.
Industry Specific Lexicons
Matching smart lookups on fields conatining known values.
Image Processing
Use over 70 built-in image processing commands to create two document images. One for high-accuracy OCR and the other for pristine archival images.
Multi-Industry Support
Identify, Extract, & Group any Document Type
Use machine learning and rules-based logic to organize the chaos of structured and unstructured documents.
And you don’t have to be a data scientist to use the Intellese Analyzer. Discover how to use Artificial intellegence Data Logic with transparent features that you train and control.
