Spatial Deconvolution YAML

In this documentation, the parameters of the deconvolution_spatial configuration yaml file are explained. This file is generated running panpipes deconvolution_spatial config.
The individual steps run by the pipeline are described in the spatial deconvolution workflow.

When running the deconvolution workflow, panpipes provides a basic pipeline.yml file. To run the workflow on your own data, you need to specify the parameters described below in the pipeline.yml file to meet the requirements of your data. However, we do provide pre-filled versions of the pipeline.yml file for individual tutorials. You can download the different deconvolution pipeline.yml files here:

Basic pipeline.yml file (not prefilled) that is generated when calling panpipes deconvolution_spatial config: Download here
pipeline.yml file for Deconvoluting spatial data Tutorial: Download here

For more information on functionalities implemented in panpipes to read the configuration files, such as reading blocks of parameters and reusing blocks with &anchors and *scalars, please check our documentation

0. Compute Resource Options

resources
Computing resources to use, specifically the number of threads used for parallel jobs. Check threads_tasks_panpipes for more information on which threads each specific task requires. Specified by the following three parameters:

threads_high Integer, Default: 1
Number of threads used for high intensity computing tasks.
threads_medium Integer, Default: 1
Number of threads used for medium intensity computing tasks. For each thread, there must be enough memory to load your SpatialData and do computationally light tasks.
threads_low Integer, Default: 1
Number of threads used for low intensity computing tasks. For each thread, there must be enough memory to load text files and do plotting, requires much less memory than the other two.

condaenv String
Path to conda environment that should be used to run panpipes. Leave blank if running native or your cluster automatically inherits the login node environment

1. Input Options

With the deconvolution_spatial workflow, one or multiple spatial slides can be deconvoluted in one run. For that, a SpatialData object for each slide is expected. The spatial slides are deconvoluted using the same reference. For the reference, one MuData with the gene expression data saved in mdata.mod["rna"] is expected as input. Please note, that the same parameter setting is used for each slide.
For the spatial input, the workflow, therefore, reads in all .zarr objects of a directory (see below).

input

spatial String, Mandatory parameter
Path to folder containing one or multiple SpatialDatas of spatial data. The pipeline is reading in all SpatialData files in that folder.
singlecell String, Mandatory parameter
Path to the MuData file (not folder) of the reference single-cell data.

2. Cell2Location Options

For each deconvolution method you can specify whether to run it or not:

run Boolean, Default: None
Whether to run Cell2location

2.1 Feature Selection

You can select genes that are used for deconvolution in two ways. The first option is to provide a reduced feature set as a csv-file that is then used for deconvolution. The second option is to perform gene selection according to Cell2Location.
Please note, that gene selection is not optional. If no csv-file is provided, feature selection according to Cell2Location. is performed.

feature_selection

gene_list String, Default: None
Path to a csv file containing a reduced feature set. A header in the csv is expected in the first row. All genes of that gene list need to be present in both, spatial slides and scRNA-Seq reference.
remove_mt Boolean, Default: True
Whether to remove mitochondrial genes from the dataset. This step is performed before running gene selection.
cell_count_cutoff Integer, Default: 15
All genes detected in less than cell_count_cutoff cells will be excluded. Parameter of the Cell2Location’s gene selection function.
cell_percentage_cutoff2 Float, Default: 0.05
All genes detected in at least this percentage of cells will be included. Parameter of the Cell2Location’s gene selection function.
nonz_mean_cutoff Float, Default: 1.12
Genes detected in the number of cells between the above-mentioned cutoffs are selected only when their average expression in non-zero cells is above this cutoff. Parameter of the Cell2Location’s gene selection function.

2.2 Reference Model

reference

labels_key String, Default: None
Key in .obs for label (cell type) information.
batch_key String, Default: None
Key in .obs for batch information.
layer String, Default: None
Layer in .layers to use for the reference model. If None, .X will be used. Please note, that Cell2Location expects raw counts as input.
categorical_covariate_key String, Default: None
Comma-separated without spaces, e.g. key1,key2,key3. Keys in .obs that correspond to categorical data. These covariates can be added in addition to the batch covariate and are also treated as nuisance factors (i.e., the model tries to minimize their effects on the latent space).
continuous_covariate_keys String, Default: None
Comma-separated without spaces, e.g. key1,key2,key3. Keys in .obs that correspond to continuous data. These covariates can be added in addition to the batch covariate and are also treated as nuisance factors (i.e., the model tries to minimize their effects on the latent space)
max_epochs Integer, Default: np.min([round((20000 / n_cells) * 400), 400])
Number of epochs.
use_gpu Boolean, Default: True
Whether to use GPU for training.

2.3 Spatial Model

spatial

batch_key String, Default: None
Key in .obs for batch information.
layer String, Default: None
Layer in .layers to use for the reference model. If None, .X will be used. Please note, that Cell2Location expects raw counts as input.
categorical_covariate_key String, Default: None
Comma-separated without spaces, e.g. key1,key2,key3. Keys in .obs that correspond to categorical data. These covariates can be added in addition to the batch covariate and are also treated as nuisance factors (i.e., the model tries to minimize their effects on the latent space).
continuous_covariate_keys String, Default: None
Comma-separated without spaces, e.g. key1,key2,key3. Keys in .obs that correspond to continuous data. These covariates can be added in addition to the batch covariate and are also treated as nuisance factors (i.e., the model tries to minimize their effects on the latent space)
N_cells_per_location Integer, Mandatory parameter
Expected cell abundance per voxel. Please refer to the Cell2Location documentation for more information.
detection_alpha Float, Mandatory parameter
Regularization of with-in experiment variation in RNA detection sensitivity. Please refer to the Cell2Location documentation for more information.
max_epochs Integer, Default: np.min([round((20000 / n_cells) * 400), 400])
Number of epochs.
use_gpu Boolean, Default: True
Whether to use GPU for training.

You can specify whether both models (spatial and reference) should be saved with the following parameter:

save_models, Default: False
Whether to save the reference & spatial mapping models.

export_gene_by_spot, Default: False
Whether to save a gene by spot matrix for each cell type in a layer.

3. Tangram Options

For each deconvolution method you can specify whether to run it or not:

run Boolean, Default: None
Whether to run Tangram

3.1 Feature Selection

You can select genes that are used for deconvolution in two ways. The first option is to provide a reduced feature set as a csv-file that is then used for deconvolution. The second option is to perform gene selection via scanpy.tl.rank_genes_groups on the reference scRNA-Seq data, as suggested by Tangram. The top n_genes of each group make up the reduced gene set.
Please note, that gene selection is not optional. If no csv-file is provided, feature selection via scanpy.tl.rank_genes_groups is performed.

feature_selection

gene_list String, Default: None
Path to a csv file containing a reduced feature set. A header in the csv is expected in the first row. All genes of that gene list need to be present in both, spatial slides and scRNA-Seq reference.

Parameters for scanpy.tl.rank_genes_groups gene selection

rank_genes
- labels_key String, Default: None
  Which column in .obs of the reference to use for the groupby parameter of scanpy.tl.rank_genes_groups.
- layer String, Default: None
  Which layer of the reference to use for scanpy.tl.rank_genes_groups. If None, .X is used.
- n_genes Integer, Default: 100
  How many top genes to select of each groupby group.
- test_method ['logreg', 't-test', 'wilcoxon', 't-test_overestim_var'], Default: ‘t-test_overestim_var’
  Which test method to use.
- correction_method ['benjamini-hochberg', 'bonferroni'], Default: ‘ benjamini-hochberg’
  Which p-value correction method to use. Used only for ‘t-test’, ‘t-test_overestim_var’, and ‘wilcoxon’.

3.2 Model

model

labels_key String, Default: None
Key in .obs for label (cell type) information.
num_epochs Integer, Default: 1000
Number of epochs.
device String, Default: ‘cpu’
Which device to use.
kwargs
In kwargs, the user has the possibility to specify parameters for tangram.mapping_utils.map_cells_to_space. You can add or remove any parameters of the function.