Text region extraction in a document image based on the Delaunay tessellation

Yi Xiao, Hong Yan

Research output: Journal Publications and ReviewsRGC 21 - Publication in refereed journalpeer-review

Abstract

In this paper, Delaunay triangulation is applied for the extraction of text areas in a document image. By representing the location of connected components in a document image with their centroids, the page structure is described as a set of points in two-dimensional space. When imposing Delaunay triangulation on these points, the text regions in the Delaunay triangulation will have distinguishing triangular features from image and drawing regions. For analysis, the Delaunay triangles are divided into four classes. The study reveals that specific triangles in text areas can be clustered together and identified as text body. Using this method, text regions in a document image containing fragments can also be recognized accurately. Experiments show the method is also very efficient. © 2002 Pattern Recognition Society. Publishedby Elsevier Science Ltd. All rights reserved.
Original languageEnglish
Pages (from-to)799-809
JournalPattern Recognition
Volume36
Issue number3
DOIs
Publication statusPublished - Mar 2003

Research Keywords

  • Delaunay triangulation
  • Document image analysis
  • Page segmentation

Fingerprint

Dive into the research topics of 'Text region extraction in a document image based on the Delaunay tessellation'. Together they form a unique fingerprint.

Cite this