Just as it’s hard to understand a conversation without knowing its context, it can be difficult for biologists to grasp the significance of gene expression without knowing a cell’s environment. To solve that problem, researchers at Princeton Engineering have developed a method to elucidate a cell’s surroundings so that biologists can make more meaning of gene expression information.
The researchers, led by Professor of Computer Science Ben Raphael, hope the new system will open the door to identifying rare cell types and choosing cancer treatment options with new precision. Raphael is the senior author of a paper describing the method published May 16 in Nature Methods.
The basic technique of linking gene expression with a cell’s environment, called spatial transcriptomics (ST), has been around for several years. Scientists break down tissue samples onto a microscale grid and link each spot on the grid with information about gene expression. The problem is that current computational tools can only analyze spatial patterns of gene expression in two dimensions. Experiments that use multiple slices from a single tissue sample — such as a region of a brain, heart or tumor — are difficult to synthesize into a complete picture of the cell types in the tissue.
The Princeton researchers’ method, called PASTE (for Probabilistic Alignment of ST Experiments), integrates information from multiple slices taken from the same tissue sample, providing a three-dimensional view of gene expression within a tumor or a developing organ. When sequence coverage in an experiment is limited due to technical or cost issues, PASTE can also merge information from multiple tissue slices into a single two-dimensional consensus slice with richer gene expression information.
“Our method was motivated by the observation that oftentimes biologists will perform multiple experiments from the same tissue,” said Raphael. “Now, these replicate experiments are not exactly the same cells, but they’re from the same tissue and therefore should be highly similar.”
The team’s technique can align multiple slices from a single tissue sample, categorizing cells based on their gene expression profiles while preserving the physical location of the cells within the tissue.
The project began in the summer of 2020 after Max Land, a mathematics concentrator from Princeton’s Class of 2021, took Raphael’s course “Algorithms in Computational Biology.” Excited by the rapidly evolving field and the opportunity to improve understanding of human health and disease, Land approached Raphael about getting involved in research, and began working on code to develop what became the PASTE method. He was advised by Raphael and by lead study author Ron Zeira, a former postdoctoral researcher at Princeton who is now a research scientist at the precision health company Verily.
The work was the focus of Land’s senior thesis, and he cowrote the paper along with Zeira, Raphael and Alexander Strzalkowski, a computer science Ph.D. student. Now a computational biologist at Memorial Sloan Kettering Cancer Center in New York City, Land said that Zeira’s and Raphael’s mentorship has been instrumental in his pursuit of a research career.
The team developed their method using simulated gene expression data from a spatial transcriptomics study of a breast tumor, where the correspondence between tissue slices was previously established. They then evaluated the method on data collected from samples of the brain’s prefrontal cortex, which has a known structure consisting of layers of different cell types with unique gene expression signatures.
The researchers also applied PASTE to data collected from four different patients’ skin cancer biopsies. A previous analysis of this data had suggested a complex patchwork of cell types, with a high degree of intermingled cancerous and healthy cells. The PASTE method, however, revealed that the apparent low spatial coherence in three of the patients’ samples was likely due to low sequence coverage in the experiments. The new analysis showed that the cells were grouped into more contiguous clusters, a more biologically plausible scenario.
“After we integrate several of these slices and effectively increase the coverage of the data, we get more spatially coherent groupings of cells, which is more reasonable than every cell type being randomly positioned in the tissue,” said Zeira.
So far, the largest data set the team has analyzed was a sample of heart tissue with nine slices, but they have their sights set on experiments from mouse embryos that include more than 30 slices. Aside from computational considerations, spatial transcriptomics experiments on this scale remain expensive for many laboratories, said Raphael.
Still, he added, “we hope that having a tool like PASTE will encourage more researchers to perform replicate experiments, because now they can actually use the information from additional slices in a way that they couldn’t readily do before.”
The research article, “Alignment and Integration of Spatial Transcriptomics Data,” was supported by funding from the U.S. National Cancer Institute.