Schema for representing OCR results exported from FineReader 10.0 SDK. Copyright 2001-2011 ABBYY, Inc.
Global document data
Paragraph formatting styles collection
Paragraph formatting style
Document sections collection
Section
Recognized page
Recognized block
Page Section
Running titles and artefacts
If true, all coordinates are relative to original image before opening, otherwise they are relative to the opened (deskewed) image
Page section is the sequence of page streams
Page Stream is the sequence of page elements
text
Table
Barcode
Picture
Table captions
Table cells
Picture captions
Text Stream is the sequence of paragraphs and/or blocks
Id of page element
Block region, the set of rectangles
Recognized block text, presents if blockType attribute is Text
The set of table rows, presents if blockType attribute is Table
Separators box block, presents if blockType attribute is SeparatorsBox
Separator block, presents if blockType attribute is Separator
Text paragraph
Table cell
Cell text
Text paragraph line
Group of characters with uniform formatting
Attributes of characters are alternated with word's recognition variants. The variants of recognition of the word are written before the word
Attributes of single character
Variants of recognition of the next word
Starting point of the separator
Ending point of the separator