Piers : An efficient model for similarity search in DNA sequence databases

Research output: Journal Publications and Reviews (RGC: 21, 22, 62)21_Publication in refereed journalpeer-review

14 Scopus Citations
View graph of relations

Author(s)

Detail(s)

Original languageEnglish
Pages (from-to)39-44
Journal / PublicationSIGMOD Record
Volume33
Issue number2
Publication statusPublished - Jun 2004
Externally publishedYes

Abstract

Growing interest in genomic research has resulted in the creation of huge biological sequence databases. In this paper, we present a hash-based pier model for efficient homology search in large DNA sequence databases. In our model, only certain segments in the databases called 'piers' need to be accessed during searches as opposite to other approaches which require a full scan on the biological sequence database. To further improve the search efficiency, the piers are stored in a specially designed hash table which helps to avoid expensive alignment operation. The hash table is small enough to reside in main memory, hence avoiding I/O in the search steps. We show theoretically and empirically that the proposed approach can efficiently detect biological sequences that are similar to a query sequence with very high sensitivity.

Citation Format(s)

Piers : An efficient model for similarity search in DNA sequence databases. / Cao, Xia; Li, Shuai Cheng; Ooi, Beng Chin; Tung, Anthony K. H.

In: SIGMOD Record, Vol. 33, No. 2, 06.2004, p. 39-44.

Research output: Journal Publications and Reviews (RGC: 21, 22, 62)21_Publication in refereed journalpeer-review