基于分层密度特征的文档图像检索

Document image retrieval based on multi-density features

Research output: Journal Publications and Reviews (RGC: 21, 22, 62)21_Publication in refereed journalNot applicablepeer-review

1 Scopus Citations
View graph of relations

Author(s)

Detail(s)

Original languageChinese (Simplified)
Pages (from-to)1231-1234
Journal / PublicationQinghua Daxue Xuebao/Journal of Tsinghua University
Volume46
Issue number7
Publication statusPublished - Jul 2006

Abstract

为克服基于版面重建的文档图像检索方法对图像质量要求高,且局限于部分文种,以及基于版面分割的文档图像检索方法受限于版面分割技术等问题,提出了一种基于二值文档图像分层密度特征的检索方法。该方法通过倾斜校正、去除黑边等预处理得到有效文本区域,提取有效文本区域的长宽比和分层密度特征,通过特征比对进行检索。实验表明:该方法对不同分辨率以及不同的输入设备具有自适应能力,对复杂版面和批注等噪声鲁棒性好,漏检率为2%,是一种简单有效的文档图像检索方法。
The development of document image databases is challenging document image retrieval techniques. Traditional layout reconstructed-based methods rely on high quality document images and can only deal with several widely used languages. The complexity of document layouts greatly hinter layout analysis-based approaches. This paper describes a multi-density feature-based algorithm for binary document images, which is independent of optical character recognition (OCR) or layout analyses. The text area is extracted after preprocessing including skew correction and marginal noise removal. Then the aspect ratio and multi-density features are extracted from the text area to select the best candidates from the document image database. Experimental results show that this approach is simple with loss rates less than 2% and can efficiently analyze images with different resolutions and different input systems. The system is also robust to noise due to such as notes and complex layouts.

Research Area(s)

  • Document image, Image retrieval, Multi-density features, Skew correction

Citation Format(s)

基于分层密度特征的文档图像检索. / 胡芝兰; 林行刚; 严洪.

In: Qinghua Daxue Xuebao/Journal of Tsinghua University, Vol. 46, No. 7, 07.2006, p. 1231-1234.

Research output: Journal Publications and Reviews (RGC: 21, 22, 62)21_Publication in refereed journalNot applicablepeer-review