Misplaced Pages

Page Analysis and Ground Truth Elements

Article snapshot taken from Wikipedia with creative commons attribution-sharealike license. Give it a read and then ask your questions in the chat. We can research this topic together.
(Redirected from PAGE (XML))
This article is an orphan, as no other articles link to it. Please introduce links to this page from related articles; try the Find link tool for suggestions. (September 2022)

Page Analysis and Ground Truth Elements (PAGE) is an XML standard for encoding digitised documents. Comparable to ALTO (XML), it allows the organisation and structure of a page and its contents to be described.

PAGE XML can be used to describe:

  • page content (regions, lines of text, words, glyphs, reading order, text content, ...)
  • the evaluation of the layout analysis (evaluation profiles, evaluation results, ...)
  • the cutting of the document image (cutting grids)

The format is developed by the Pattern Recognition & Image Analysis Lab (PRIMA) at the University of Salford in Manchester.

It was designed to be used in conjunction with automatic segmentation and transcription techniques (OCR and HTR): indeed, PAGE aims to support each of the different steps in the processing chain for image document analysis (from image enhancement to layout analysis to OCR).

The PAGE XML schema is notably used as an export and import format by automatic transcription software such as eScriptorium and Transkribus. It is also an export format used by Kraken, a turnkey OCR system optimised for documents in historical and non-Latin scripts.

References

  1. "PAGE-XML". July 12, 2022 – via GitHub.
  2. "eScripta – Digital Tools and Techniques for the Study of Ancient Writing".
  3. "How To Export Documents from Transkribus". READ-COOP.
  4. Kiessling, Benjamin (April 5, 2022). "The Kraken OCR system" – via GitHub.

External links

Categories: