Skip to main navigation Skip to search Skip to main content

Effective statistical features for coding and non-coding DNA sequence classification for yeast, C. elegans and human

Alan Wee-Chung Liew, Yonghui Wu, Hong Yan, Mengsu Yang

    Research output: Journal Publications and ReviewsRGC 21 - Publication in refereed journalpeer-review

    Abstract

    This study performs a quantitative evaluation of the different coding features in terms of their information content for the classification of coding and non-coding regions for three species. Our study indicated that coding features that are effective for yeast or C. elegans are generally not very effective for human, which has a short average exon length. By performing a correlation analysis, we identified a subset of human coding features with high discriminative power, but complementary in their information content. For this subset, a classification accuracy of up to 90% was obtained using a simple kNN classifier. © 2005 Inderscience Enterprises Ltd.
    Original languageEnglish
    Pages (from-to)181-201
    JournalInternational Journal of Bioinformatics Research and Applications
    Volume1
    Issue number2
    DOIs
    Publication statusPublished - 2005

    Research Keywords

    • coding statistics
    • correlation analysis
    • DNA sequence
    • exon-intron classification
    • feature selection
    • information content

    Fingerprint

    Dive into the research topics of 'Effective statistical features for coding and non-coding DNA sequence classification for yeast, C. elegans and human'. Together they form a unique fingerprint.

    Cite this