Deconvoluting spatial data
With the deconvolution_spatial workflow, one or multiple spatial slides can be deconvoluted in one run. For that, a SpatialData object for each slide is expected. The spatial slides are deconvoluted using the same reference. For the reference, one MuData with the gene expression data saved in mdata.mod["rna"] is expected as input.
The workflow provides the possibility to run deconvolution using Cell2Location and Tangram.
Steps
Cell2Location
For the reference and each spatial slide the following steps are run. Note, that if multiple slides are deconvoluted in one run, the same parameter setting is used for each slide.
Gene selection. There are two possibilities for the gene selection:
Genes of a user-provided feature set (csv-file) are used for deconvolution. All genes of that gene list need to be present in both, spatial slides and scRNA-Seq reference.
Feature selection performed according to Cell2Location, i.e. via the function: cell2location.utils.filtering.filter_genes
Note: if no csv-file is provided by the user, the workflow will run the feature selection via the function: cell2location.utils.filtering.filter_genes. Thus, gene selection is not optional.
Regression/reference model is fitted and a plot of the training history as well as QC plots are saved in the
./figures/Cell2Locationdirectory. Additionally, a csv-fileCell2Loc_inf_anver.csvwith the estimated expression of every gene in every cell type is saved in./cell2location.output.(Optional) Reference model is saved in
./cell2location.outputSpatial mapping model is fitted. Training history and QC plots are saved in the
./figures/Cell2Locationdirectory. Plot of the spatial embedding coloured byq05_cell_abundance_w_sfis also saved in./figures/Cell2Location.(Optional) A gene by spot matrix for each cell type is saved to a layer in the table of the
SpatialDataobject(Optional) Spatial mapping model is saved in
./cell2location.outputThe
SpatialDataobject of the spatial slide and theMuDataobject of the reference are saved in./cell2location.output. TheSpatialDataobject of the spatial slide contains the estimated cell type abundances.
Tangram
For the reference and each spatial slide the following steps are run. Note, that if multiple slides are deconvoluted in one run, the same parameter setting is used for each slide.
Gene selection with two possibilities:
Genes of a user-provided feature set (csv-file) are used for deconvolution. All genes of that gene list need to be present in both, spatial slides and scRNA-Seq reference.
sc.tl.rank_genes_groups is used to select genes. The top n genes of each group make up the reduced gene set.
Note: if no csv-file is provided by the user, the workflow will run the feature selection via sc.tl.rank_genes_groups. Thus, gene selection is not optional.
Data is preprocessed with tangram.pp_adatas
Tangram model is fitted with tangram.mapping_utils.map_cells_to_space and annotations are transfered from single-cell data onto space with tangram.project_cell_annotations
Plot of the spatial embedding coloured by
tangram_ct_predis saved in./figures/TangramThe
SpatialDataobject of the spatial slide and theMuDataobject of the reference are saved in./tangram.output. TheSpatialDataobject of the spatial slide contains the deconvolution predictions.
Steps to run
Activate conda environment
conda activate pipeline_envGenerate yaml and log file
panpipes deconvolution_spatial configSpecify the parameter setting in the pipeline.yml file
Run complete deconvolution workflow with
panpipes deconvolution_spatial make full --local
The Deconvoluting spatial data tutorial guides you through deconvolution workflow of Panpipes step by step.