Spatial Deconvolution YAML
In this documentation, the parameters of the deconvolution_spatial configuration yaml file are explained.
This file is generated running panpipes deconvolution_spatial config.
The individual steps run by the pipeline are described in the spatial deconvolution workflow.
When running the deconvolution workflow, panpipes provides a basic pipeline.yml file.
To run the workflow on your own data, you need to specify the parameters described below in the pipeline.yml file to meet the requirements of your data.
However, we do provide pre-filled versions of the pipeline.yml file for individual tutorials.
You can download the different deconvolution pipeline.yml files here:
Basic
pipeline.ymlfile (not prefilled) that is generated when callingpanpipes deconvolution_spatial config: Download herepipeline.ymlfile for Deconvoluting spatial data Tutorial: Download here
For more information on functionalities implemented in panpipes to read the configuration files, such as reading blocks of parameters and reusing blocks with &anchors and *scalars, please check our documentation
0. Compute Resource Options
resources
Computing resources to use, specifically the number of threads used for parallel jobs. Check threads_tasks_panpipes for more information on which threads each specific task requires.
Specified by the following three parameters:
threads_high
Integer, Default: 1
Number of threads used for high intensity computing tasks.threads_medium
Integer, Default: 1
Number of threads used for medium intensity computing tasks. For each thread, there must be enough memory to load your SpatialData and do computationally light tasks.threads_low
Integer, Default: 1
Number of threads used for low intensity computing tasks. For each thread, there must be enough memory to load text files and do plotting, requires much less memory than the other two.
condaenv String
Path to conda environment that should be used to run panpipes.
Leave blank if running native or your cluster automatically inherits the login node environment
1. Input Options
With the deconvolution_spatial workflow, one or multiple spatial slides can be deconvoluted in one run. For that, a SpatialData object for each slide is expected. The spatial slides are deconvoluted using the same reference. For the reference, one MuData with the gene expression data saved in mdata.mod["rna"] is expected as input. Please note, that the same parameter setting is used for each slide.
For the spatial input, the workflow, therefore, reads in all .zarr objects of a directory (see below).
input
spatial
String, Mandatory parameter
Path to folder containing one or multipleSpatialDatasof spatial data. The pipeline is reading in allSpatialDatafiles in that folder.singlecell
String, Mandatory parameter
Path to the MuData file (not folder) of the reference single-cell data.
2. Cell2Location Options
For each deconvolution method you can specify whether to run it or not:
run Boolean, Default: None
Whether to run Cell2location
2.1 Feature Selection
You can select genes that are used for deconvolution in two ways. The first option is to provide a reduced feature set as a csv-file that is then used for deconvolution. The second option is to perform gene selection according to Cell2Location.
Please note, that gene selection is not optional. If no csv-file is provided, feature selection according to Cell2Location. is performed.
feature_selection
gene_list
String, Default: None
Path to a csv file containing a reduced feature set. A header in the csv is expected in the first row. All genes of that gene list need to be present in both, spatial slides and scRNA-Seq reference.remove_mt
Boolean, Default: True
Whether to remove mitochondrial genes from the dataset. This step is performed before running gene selection.cell_count_cutoff
Integer, Default: 15
All genes detected in less than cell_count_cutoff cells will be excluded. Parameter of the Cell2Location’s gene selection function.cell_percentage_cutoff2
Float, Default: 0.05
All genes detected in at least this percentage of cells will be included. Parameter of the Cell2Location’s gene selection function.nonz_mean_cutoff
Float, Default: 1.12
Genes detected in the number of cells between the above-mentioned cutoffs are selected only when their average expression in non-zero cells is above this cutoff. Parameter of the Cell2Location’s gene selection function.
2.2 Reference Model
reference
labels_key
String, Default: None
Key in.obsfor label (cell type) information.batch_key
String, Default: None
Key in.obsfor batch information.layer
String, Default: None
Layer in.layersto use for the reference model. If None,.Xwill be used. Please note, that Cell2Location expects raw counts as input.categorical_covariate_key
String, Default: None
Comma-separated without spaces, e.g. key1,key2,key3. Keys in.obsthat correspond to categorical data. These covariates can be added in addition to the batch covariate and are also treated as nuisance factors (i.e., the model tries to minimize their effects on the latent space).continuous_covariate_keys
String, Default: None
Comma-separated without spaces, e.g. key1,key2,key3. Keys in.obsthat correspond to continuous data. These covariates can be added in addition to the batch covariate and are also treated as nuisance factors (i.e., the model tries to minimize their effects on the latent space)max_epochs
Integer, Default: np.min([round((20000 / n_cells) * 400), 400])
Number of epochs.use_gpu
Boolean, Default: True
Whether to use GPU for training.
2.3 Spatial Model
spatial
batch_key
String, Default: None
Key in.obsfor batch information.layer
String, Default: None
Layer in.layersto use for the reference model. If None,.Xwill be used. Please note, that Cell2Location expects raw counts as input.categorical_covariate_key
String, Default: None
Comma-separated without spaces, e.g. key1,key2,key3. Keys in.obsthat correspond to categorical data. These covariates can be added in addition to the batch covariate and are also treated as nuisance factors (i.e., the model tries to minimize their effects on the latent space).continuous_covariate_keys
String, Default: None
Comma-separated without spaces, e.g. key1,key2,key3. Keys in.obsthat correspond to continuous data. These covariates can be added in addition to the batch covariate and are also treated as nuisance factors (i.e., the model tries to minimize their effects on the latent space)N_cells_per_location
Integer, Mandatory parameter
Expected cell abundance per voxel. Please refer to the Cell2Location documentation for more information.detection_alpha
Float, Mandatory parameter
Regularization of with-in experiment variation in RNA detection sensitivity. Please refer to the Cell2Location documentation for more information.max_epochs
Integer, Default: np.min([round((20000 / n_cells) * 400), 400])
Number of epochs.use_gpu
Boolean, Default: True
Whether to use GPU for training.
You can specify whether both models (spatial and reference) should be saved with the following parameter:
save_models, Default: False
Whether to save the reference & spatial mapping models.
export_gene_by_spot, Default: False
Whether to save a gene by spot matrix for each cell type in a layer.
3. Tangram Options
For each deconvolution method you can specify whether to run it or not:
run Boolean, Default: None
Whether to run Tangram
3.1 Feature Selection
You can select genes that are used for deconvolution in two ways. The first option is to provide a reduced feature set as a csv-file that is then used for deconvolution. The second option is to perform gene selection via scanpy.tl.rank_genes_groups on the reference scRNA-Seq data, as suggested by Tangram. The top n_genes of each group make up the reduced gene set.
Please note, that gene selection is not optional. If no csv-file is provided, feature selection via scanpy.tl.rank_genes_groups is performed.
feature_selection
gene_list
String, Default: None
Path to a csv file containing a reduced feature set. A header in the csv is expected in the first row. All genes of that gene list need to be present in both, spatial slides and scRNA-Seq reference.
Parameters for scanpy.tl.rank_genes_groups gene selection
rank_genes
labels_key
String, Default: None
Which column in.obsof the reference to use for thegroupbyparameter of scanpy.tl.rank_genes_groups.layer
String, Default: None
Which layer of the reference to use for scanpy.tl.rank_genes_groups. If None,.Xis used.n_genes
Integer, Default: 100
How many top genes to select of eachgroupbygroup.test_method
['logreg', 't-test', 'wilcoxon', 't-test_overestim_var'], Default: ‘t-test_overestim_var’
Which test method to use.correction_method
['benjamini-hochberg', 'bonferroni'], Default: ‘ benjamini-hochberg’
Which p-value correction method to use. Used only for ‘t-test’, ‘t-test_overestim_var’, and ‘wilcoxon’.
3.2 Model
model
labels_key
String, Default: None
Key in.obsfor label (cell type) information.num_epochs
Integer, Default: 1000
Number of epochs.device
String, Default: ‘cpu’
Which device to use.kwargs
Inkwargs, the user has the possibility to specify parameters for tangram.mapping_utils.map_cells_to_space. You can add or remove any parameters of the function.