Categorical Matrix Completion with Active Learning for High-throughput Screening

Research output: Journal Publications and Reviews (RGC: 21, 22, 62)21_Publication in refereed journal

View graph of relations

Related Research Unit(s)


Original languageEnglish
Number of pages11
Journal / PublicationIEEE/ACM Transactions on Computational Biology and Bioinformatics
Publication statusOnline published - 20 Mar 2020


The recent advances in wet-lab automation enable high-throughput experiments to be conducted seamlessly. In particular, the exhaustive enumeration of all possible conditions is always involved in high-throughput screening. Nonetheless, such a screening strategy is hardly believed to be optimal and cost-effective. By incorporating artificial intelligence, we design an open-source model based on categorical matrix completion and active machine learning to guide high throughput screening experiments. Specifically, we narrow our scope to the high-throughput screening for chemical compound effects on diverse protein sub-cellular locations. In the proposed model, we believe that exploration is more important than the exploitation in the long-run of high-throughput screening experiment, Therefore, we design several innovations to circumvent the existing limitations. In particular, categorical matrix completion is designed to accurately impute the missing experiments while margin sampling is also implemented for uncertainty estimation. The model is systematically tested on both simulated and real data. The simulation results reflect that our model can be robust to diverse scenarios, while the real data results demonstrate the wet-lab applicability of our model for high-throughput screening experiments. Lastly, we attribute the model success to its exploration ability by revealing the related matrix ranks and distinct experiment coverage comparisons.

Research Area(s)

  • Active Learning, High Throughput Screening, Matrix Completion, Data Driven Experiment