Text region extraction in a document image based on the Delaunay tessellation

Research output: Journal Publications and ReviewsRGC 21 - Publication in refereed journalpeer-review

46 Scopus Citations
View graph of relations

Author(s)

Related Research Unit(s)

Detail(s)

Original languageEnglish
Pages (from-to)799-809
Journal / PublicationPattern Recognition
Volume36
Issue number3
Publication statusPublished - Mar 2003

Abstract

In this paper, Delaunay triangulation is applied for the extraction of text areas in a document image. By representing the location of connected components in a document image with their centroids, the page structure is described as a set of points in two-dimensional space. When imposing Delaunay triangulation on these points, the text regions in the Delaunay triangulation will have distinguishing triangular features from image and drawing regions. For analysis, the Delaunay triangles are divided into four classes. The study reveals that specific triangles in text areas can be clustered together and identified as text body. Using this method, text regions in a document image containing fragments can also be recognized accurately. Experiments show the method is also very efficient. © 2002 Pattern Recognition Society. Publishedby Elsevier Science Ltd. All rights reserved.

Research Area(s)

  • Delaunay triangulation, Document image analysis, Page segmentation