Subspace-based Methods for Periodic Signal Detection in DNA Sequences

Project: Research

View graph of relations



Genomic sequence data contain several types of periodic signals that may have significant biological or structural functions. For example, an important characteristic of protein-coding exons in a gene is that the nucleotides produce a signal component with the frequency equal to 1/3. A large portion of the human genome consists of various repeats, which are periodic patterns. It is already known that some repeats can cause neurodegenerative diseases although it is still unclear what functions many other repeats have. There are also reports of several types of long-range cyclic patterns in DNA sequences in the literature. Some of these patterns can have periods up to hundreds and even thousands of base pairs.In general, periodic signals in a DNA sequence are far from perfect. They are often mixed with other signals and can be corrupted by nucleotide insertion, deletion and mutation. Some of them, such as an exon, can be very short, in the order of a few hundred and even tens of base pairs in length. Protein-coding exons represent less than 5% of contents in the human genome and identification of such exons is like "looking for a needle in a haystack". Thus, detection of periodic signals in a DNA sequence can be a highly challenging task.In this project, the researchers aim to study subspace-based signal processing methods systematically for periodic signal detection in DNA sequences. They will investigate a decomposition technique for separating the signal representing a DNA sequence into different components and for identifying the periodic components. They will also develop a projection onto convex sets (POCS) based algorithm for signal restoration and reconstruction by incorporating natural constraints of the input data and study multiple sequence features and multi-resolution based methods to deal with weak periodic signals and non-stationary of these signals. The subspace-based methods will be tested with many large genome datasets.This project can lead to a deeper understanding of the characteristics of periodic signals in genomic data and their roles in the genome architecture and provide novel techniques for detecting the signals. In addition, the project will produce useful computer tools for practical applications, including short gene recognition, repeat identification, sequence annotation and information deciphering of genomic sequences in general.


Project number9041506
Grant typeGRF
Effective start/end date1/01/104/09/14