Abstract
This paper presents a new segmentation method based on spectral analysis to locate borders between short protein coding regions and non-coding regions. We formulate the innovative double curve representation of a DNA sequence and apply local three-codon measurement on the discrete Fourier spectral features at 1/3 frequency to identify short protein coding regions. The proposed spectral segmentation method based on double curves requires no prior knowledge of the DNA data. Our simulation results show that the proposed spectral method greatly improves the accuracy of identifying short coding regions in DNA sequences compared with the results obtained from the other methods that analyse DNA sequences directly. Copyright © 2008 Inderscience Enterprises Ltd.
| Original language | English |
|---|---|
| Pages (from-to) | 15-35 |
| Journal | International Journal of Data Mining and Bioinformatics |
| Volume | 2 |
| Issue number | 1 |
| DOIs | |
| Publication status | Published - Jan 2008 |
Research Keywords
- Bioinformatics
- Data mining
- DNA sequence analysis
- Double curves
- Fourier spectrum
- Gene identification
- Short human exons
- Spectral analysis
- Triplets