Incorporating Latent Proteomics Space from AlphaFold into Cell-cell Interactions

Project: Research

View graph of relations

Description

In this proposal, we take advantage of recent breakthroughs in protein structure prediction to devise new methods as well as improve existing procedures in studying spatial transcriptomics data. Spatial transcriptomics technology allows transcript (or sequence) abundances to be read from a tissue sample at the cell-level resolution, giving us the abundances at individual points of a grid over the sample. The ability to examine gene expressions (from the transcript abundances) across neighboring grid points (or spots) allows us to study and understand cell-cell interactions (CCIs), which regulate tissue formation and cellular functions. However, current spatial transcriptomics technologies suffer at least two drawbacks to studying CCIs. First, they cannot extract transcriptomes from precisely within the cell, which results in ambiguities in the inferred interactions. Second, spatial transcriptomics technologies have high dropout rates. Current analysis methods for inferring the content of two neighboring spot interactions are based on observing the co-occurrences of known bindings between ligands and receptors. This severely restricts their inference power. To remedy this, we take advantage of the near-complete collection of predicted protein structures from AlphaFold. A large machine-learning model can learn the latent structural space and infer possible interactions between ligands and receptors. Single-cell sequencing (scRNA-seq) technologies are able to specifically extract the expressions from a single cell with a relatively lower dropout rate, allowing us to infer expressions of single cells in the spatial transcriptomics data. Our approach incorporates this as well. For this project, we use a deep learning model to perform dimensionality reduction on the structures, with additional information from curated biological data to supervise the learning. Then, we assess the usability of resultant vectors with the protein structure similarity search problem. Next, the resultant vectors, together with the spatial transcriptomics data, are then used to infer the affinities between ligand and receptor and to infer novel ligand-receptor pairs. The affinities inferred are then used in a novel method that combines spatial transcriptomics data, scRNA-seq data, and the learned embeddings to infer cell-cell interactions (CCIs). Furthermore, with the results obtained, we intend to find spatial variable ligand-receptor pairs from the CCIs, reconstruct cell hierarchies for the CCIs, and integrate multi-omics data with the CCIs.

Detail(s)

Project number9043559
Grant typeGRF
StatusActive
Effective start/end date1/01/24 → …