Spatial Deconvolution YAML
In this documentation, the parameters of the deconvolution_spatial
configuration yaml file are explained.
This file is generated running panpipes deconvolution_spatial config
.
The individual steps run by the pipeline are described in the spatial deconvolution workflow.
When running the deconvolution workflow, panpipes provides a basic pipeline.yml
file.
To run the workflow on your own data, you need to specify the parameters described below in the pipeline.yml
file to meet the requirements of your data.
However, we do provide pre-filled versions of the pipeline.yml
file for individual tutorials.
You can download the different deconvolution pipeline.yml files here:
Basic
pipeline.yml
file (not prefilled) that is generated when callingpanpipes deconvolution_spatial config
: Download herepipeline.yml
file for Deconvoluting spatial data Tutorial: Download here
For more information on functionalities implemented in panpipes
to read the configuration files, such as reading blocks of parameters and reusing blocks with &anchors
and *scalars
, please check our documentation
0. Compute Resource Options
resources
Computing resources to use, specifically the number of threads used for parallel jobs. Check threads_tasks_panpipes for more information on which threads each specific task requires.
Specified by the following three parameters:
threads_high
Integer
, Default: 1
Number of threads used for high intensity computing tasks.threads_medium
Integer
, Default: 1
Number of threads used for medium intensity computing tasks. For each thread, there must be enough memory to load your mudata and do computationally light tasks.threads_low
Integer
, Default: 1
Number of threads used for low intensity computing tasks. For each thread, there must be enough memory to load text files and do plotting, requires much less memory than the other two.
condaenv String
Path to conda environment that should be used to run panpipes.
Leave blank if running native or your cluster automatically inherits the login node environment
1. Input Options
With the deconvolution_spatial
workflow, one or multiple spatial slides can be deconvoluted in one run. For that, a MuData
object for each slide is expected, with the spatial data saved in mdata.mod["spatial"]
. The spatial slides are deconvoluted using the same reference. For the reference, one MuData
with the gene expression data saved in mdata.mod["rna"]
is expected as input. Please note, that the same parameter setting is used for each slide.
For the spatial input, the workflow, therefore, reads in all .h5mu
objects of a directory (see below). The spatial and single-cell data thus need to be saved in different folders.
input
spatial
String
, Mandatory parameter
Path to folder containing one or multipleMuDatas
of spatial data. The pipeline is reading in allMuData
files in that folder and assuming that they areMuDatas
of spatial slides.singlecell
String
, Mandatory parameter
Path to the MuData file (not folder) of the reference single-cell data.
2. Cell2Location Options
For each deconvolution method you can specify whether to run it or not:
run Boolean
, Default: None
Whether to run Cell2location
2.1 Feature Selection
You can select genes that are used for deconvolution in two ways. The first option is to provide a reduced feature set as a csv-file that is then used for deconvolution. The second option is to perform gene selection according to Cell2Location.
Please note, that gene selection is not optional. If no csv-file is provided, feature selection according to Cell2Location. is performed.
feature_selection
gene_list
String
, Default: None
Path to a csv file containing a reduced feature set. A header in the csv is expected in the first row. All genes of that gene list need to be present in both, spatial slides and scRNA-Seq reference.remove_mt
Boolean
, Default: True
Whether to remove mitochondrial genes from the dataset. This step is performed before running gene selection.cell_count_cutoff
Integer
, Default: 15
All genes detected in less than cell_count_cutoff cells will be excluded. Parameter of the Cell2Location’s gene selection function.cell_percentage_cutoff2
Float
, Default: 0.05
All genes detected in at least this percentage of cells will be included. Parameter of the Cell2Location’s gene selection function.nonz_mean_cutoff
Float
, Default: 1.12
Genes detected in the number of cells between the above-mentioned cutoffs are selected only when their average expression in non-zero cells is above this cutoff. Parameter of the Cell2Location’s gene selection function.
2.2 Reference Model
reference
labels_key
String
, Default: None
Key in.obs
for label (cell type) information.batch_key
String
, Default: None
Key in.obs
for batch information.layer
String
, Default: None
Layer in.layers
to use for the reference model. If None,.X
will be used. Please note, that Cell2Location expects raw counts as input.categorical_covariate_key
String
, Default: None
Comma-separated without spaces, e.g. key1,key2,key3. Keys in.obs
that correspond to categorical data. These covariates can be added in addition to the batch covariate and are also treated as nuisance factors (i.e., the model tries to minimize their effects on the latent space).continuous_covariate_keys
String
, Default: None
Comma-separated without spaces, e.g. key1,key2,key3. Keys in.obs
that correspond to continuous data. These covariates can be added in addition to the batch covariate and are also treated as nuisance factors (i.e., the model tries to minimize their effects on the latent space)max_epochs
Integer
, Default: np.min([round((20000 / n_cells) * 400), 400])
Number of epochs.use_gpu
Boolean
, Default: True
Whether to use GPU for training.
2.3 Spatial Model
spatial
batch_key
String
, Default: None
Key in.obs
for batch information.layer
String
, Default: None
Layer in.layers
to use for the reference model. If None,.X
will be used. Please note, that Cell2Location expects raw counts as input.categorical_covariate_key
String
, Default: None
Comma-separated without spaces, e.g. key1,key2,key3. Keys in.obs
that correspond to categorical data. These covariates can be added in addition to the batch covariate and are also treated as nuisance factors (i.e., the model tries to minimize their effects on the latent space).continuous_covariate_keys
String
, Default: None
Comma-separated without spaces, e.g. key1,key2,key3. Keys in.obs
that correspond to continuous data. These covariates can be added in addition to the batch covariate and are also treated as nuisance factors (i.e., the model tries to minimize their effects on the latent space)N_cells_per_location
Integer
, Mandatory parameter
Expected cell abundance per voxel. Please refer to the Cell2Location documentation for more information.detection_alpha
Float
, Mandatory parameter
Regularization of with-in experiment variation in RNA detection sensitivity. Please refer to the Cell2Location documentation for more information.max_epochs
Integer
, Default: np.min([round((20000 / n_cells) * 400), 400])
Number of epochs.use_gpu
Boolean
, Default: True
Whether to use GPU for training.
You can specify whether both models (spatial and reference) should be saved with the following parameter:
save_models, Default: False
Whether to save the reference & spatial mapping models.
3. Tangram Options
For each deconvolution method you can specify whether to run it or not:
run Boolean
, Default: None
Whether to run Tangram
3.1 Feature Selection
You can select genes that are used for deconvolution in two ways. The first option is to provide a reduced feature set as a csv-file that is then used for deconvolution. The second option is to perform gene selection via scanpy.tl.rank_genes_groups on the reference scRNA-Seq data, as suggested by Tangram. The top n_genes
of each group make up the reduced gene set.
Please note, that gene selection is not optional. If no csv-file is provided, feature selection via scanpy.tl.rank_genes_groups is performed.
feature_selection
gene_list
String
, Default: None
Path to a csv file containing a reduced feature set. A header in the csv is expected in the first row. All genes of that gene list need to be present in both, spatial slides and scRNA-Seq reference.
Parameters for scanpy.tl.rank_genes_groups
gene selection
rank_genes
labels_key
String
, Default: None
Which column in.obs
of the reference to use for thegroupby
parameter of scanpy.tl.rank_genes_groups.layer
String
, Default: None
Which layer of the reference to use for scanpy.tl.rank_genes_groups. If None,.X
is used.n_genes
Integer
, Default: 100
How many top genes to select of eachgroupby
group.test_method
['logreg', 't-test', 'wilcoxon', 't-test_overestim_var']
, Default: ‘t-test_overestim_var’
Which test method to use.correction_method
['benjamini-hochberg', 'bonferroni']
, Default: ‘ benjamini-hochberg’
Which p-value correction method to use. Used only for ‘t-test’, ‘t-test_overestim_var’, and ‘wilcoxon’.
3.2 Model
model
labels_key
String
, Default: None
Key in.obs
for label (cell type) information.num_epochs
Integer
, Default: 1000
Number of epochs.device
String
, Default: ‘cpu’
Which device to use.kwargs
Inkwargs
, the user has the possibility to specify parameters for tangram.mapping_utils.map_cells_to_space. You can add or remove any parameters of the function.