Dr. LI Shuaicheng (李帥成)

BSc MSc NUS, PhD U of Waterloo

Visiting address
YEUNG-Y6634
Phone: +852 34429412

Author IDs

Prizes/Honours

  • NSERC Postdoctoral Fellowship (2009-2011)
    Natural Sciences and Engineering Research Council of Canada
    (Held at International Computer Science Institute, U Berkeley)
  • Outstanding Achievement in Graduate Studies Award (2010)
    University of Waterloo, Canada
  • Cheriton Scholarship, University of Waterloo (2006 - 2009)
    University of Waterloo, Canada
  • International Student Scholarship, University of Waterloo (2004 - 2007)
    University of Waterloo, Canada
  • Best Paper Award (2007)
    The 18th International Conference on Genome Informatics (GIW 2007)
  • Entrance Scholarship, University of Waterloo (2004 - 2005)
    University of Waterloo, Canada
  • Best Paper Award (2005)
    10th International Conference on Database Systems for Advanced Applications (DASFAA 2005)
  • Research Scholarship, National University of Singapore (2001 - 2002)
    National University of Singapore, Singapore
  • First Runner-up, Algorithmic Programming Contest (2002)
    NUS-MIT Alliance Course on Algorithms (a course jointly conducted by NUS and MIT)
  • Dean's list, School of Computing, NUS (1997 - 1998)
    National University of Singapore, Singapore

Research Interests/Areas

  • Bioinformatics
  • Machine Learning
  • Algorithms

Selected Research Projects: 

  • FALCON is a software system for protein structure prediction. It ranked among the top 3 protein structure prediction systems on hard targets in CASP8 (Assessment of Techniques for Protein Structure Prediction 8, http://robetta.bakerlab.org/CASP8_eval_domains/CASP8.FR_H.First-GDT_MM.html.) The system uses a simple position-specific hidden Markov model to predict protein structures. The new framework naturally repeats itself to converge to a final target, conglomerating fragment assembly, clustering, target selection, refinement, and consensus, all in one process. Our initial implementation of this system converged to within 6 Angstrom of the native structures for 100% decoys on all 6 standard benchmark proteins used to evaluate the state-of-the-art system called ROSETTA, which achieved only 14% to 94% on the same data. The qualities of the best decoys and the final decoys our system converged to were also notably better. Recently, we completed an automatic system for determining protein structures from NMR spectra. It usually takes well trained experts several months of experimentations to infer a structure from NMR data manually. Our system, AMR, completely automates this process, and reduces the time needed to infer high resolution structures from several months to one day. The system works in a three parts pipeline: peak picking, chemical shift assignment and structure generation. My work was on the structure generation part, which is an extension of FALCON to work with partial NMR constraints: to accept chemical shift information, tolerate errors and refine structures. Initial results show that our system managed to build high resolution structures that are comparable to those produced by human experts.
  • FRAZOR utilizes a linear programming model for finding structural alphabet candidates for a target sequence. The 3D structure of a protein sequence can be assembled from substructures that correspond to small segments of the sequence. For each small sequence segment, there are only a few likely substructures. They are called the structural alphabet for the segment. Classical approaches such as ROSETTA used sequence profile and secondary structure information to predict structural alphabet. In contrast, we utilized more structural information, such as solvent accessibility and contact capacity, for finding structural alphabet. We used an integer linear programming technique to derive the best combination of these sequences and structural information. Using this additional information, we were able to generate significantly more accurate and succinct structural alphabets ? more than 50% improvement over the accuracies obtained previously by others. With these novel structural alphabets, we are able to construct more accurate protein structures than the state-of-the-art ab initio protein structure prediction programs such as ROSETTA. We are also able to reduce the Kolodny's library size by a factor of 8, at the same accuracy.