Extract Data

Page Based Data Extraction

Humans read and understand documents based on the data and structure of words on a page. A computer “reads” using Optical Character Recognition (OCR) to produce a sequence of characters. OCR is just a mechanical process that doesn’t recognize the meaning of data. Creating synthetic understanding of data is a difficult task and we’ve done it.

Use Features and Data to create Information and Understanding

Your documents and data come from many places. Files stores, emails, Sharepoint, Google, Information Systems, Repositories and more. Organize and extract your data with Intellese.


Identify pages

What is a page?

  • A physical Document page
  • Scanned images
  • An electronic document or file
  • Structured / Unstrructurd
  • Changing / Inconsistant

Identify Features

What is a Feature?

  • Context
  • Language
  • Format
  • Specific Data Element
  • Content tagging attribute
  • Named entiities
  • N-grams
  • Straucture elements
  • Visual elments

Curate Data


Use features to make data models that produce synthetic understanding of data.

Stream valuable data into workflows and downstream software applications.

Intelligent Document Processing

The Need: Curate consistent information and knowledge from any document source. The Problem: Variations in source data and layout make data integration from documents challenging.

The Solution: Augment human understanding of unstructured data using data science tools in each stage of document data integration.

Our Framework: Grooper data model engineering is based on page, paragraph, section, phrase, and text classification. Features are formed into data models for Grooper processing engines.

Data Science Workbench

The working environment where decisions get made. Relationships between internal and external data elements are formed to engineer understanding and context.


Natural Language Processing

Allows data collection from free-form documents in which data can exist anywhere on a page.

Table Extraction

Enables collection of full rows of data by utilizing fuzzy matching on individual columns



Lexical, rules-based, and visual classification options for transparent trainable document classification.


Signature Extractions

Dermines the presence or absence of a signature with great precision by dropping out lines and other elements near the signature.


Layered OCR

Many documents contain varying fonts, unaligned text, and handwriting. Collect more data with higher accuracy without the limitations of traditional OCR.

Fuzzy Regular Expression

Matches data correctly despite OCR misreads using transparentweighting algorithms.

Industry Specific Lexicons

Matching smart lookups on fields conatining known values.

Image Processing

Use over 70 built-in image processing commands to create two document images. One for high-accuracy OCR and the other for pristine archival images.

Multi-Industry Support

Identify, Extract, & Group any Document Type

Use machine learning and rules-based logic to organize the chaos of structured and unstructured documents.

And you don’t have to be a data scientist to use the Intellese Analyzer. Discover how to use Artificial intellegence Data Logic with transparent features that you train and control.