Abstract
This study performs a quantitative evaluation of the different coding features in terms of their information content for the classification of coding and non-coding regions for three species. Our study indicated that coding features that are effective for yeast or C. elegans are generally not very effective for human, which has a short average exon length. By performing a correlation analysis, we identified a subset of human coding features with high discriminative power, but complementary in their information content. For this subset, a classification accuracy of up to 90% was obtained using a simple kNN classifier. © 2005 Inderscience Enterprises Ltd.
| Original language | English |
|---|---|
| Pages (from-to) | 181-201 |
| Journal | International Journal of Bioinformatics Research and Applications |
| Volume | 1 |
| Issue number | 2 |
| DOIs | |
| Publication status | Published - 2005 |
Research Keywords
- coding statistics
- correlation analysis
- DNA sequence
- exon-intron classification
- feature selection
- information content
Fingerprint
Dive into the research topics of 'Effective statistical features for coding and non-coding DNA sequence classification for yeast, C. elegans and human'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver