Skip to main navigation Skip to search Skip to main content

基于分层密度特征的文档图像检索

Translated title of the contribution: Document image retrieval based on multi-density features

胡芝兰, 林行刚, 严洪

    Research output: Journal Publications and ReviewsRGC 21 - Publication in refereed journalpeer-review

    Abstract

    The development of document image databases is challenging document image retrieval techniques. Traditional layout reconstructed-based methods rely on high quality document images and can only deal with several widely used languages. The complexity of document layouts greatly hinter layout analysis-based approaches. This paper describes a multi-density feature-based algorithm for binary document images, which is independent of optical character recognition (OCR) or layout analyses. The text area is extracted after preprocessing including skew correction and marginal noise removal. Then the aspect ratio and multi-density features are extracted from the text area to select the best candidates from the document image database. Experimental results show that this approach is simple with loss rates less than 2% and can efficiently analyze images with different resolutions and different input systems. The system is also robust to noise due to such as notes and complex layouts.
    Translated title of the contributionDocument image retrieval based on multi-density features
    Original languageChinese (Simplified)
    Pages (from-to)1231-1234
    Journal清华大学学报 (自然科学版)/Journal of Tsinghua University (Science and Technology)
    Volume46
    Issue number7
    Publication statusPublished - Jul 2006

    Research Keywords

    • Document image
    • Image retrieval
    • Multi-density features
    • Skew correction

    Fingerprint

    Dive into the research topics of 'Document image retrieval based on multi-density features'. Together they form a unique fingerprint.

    Cite this