Spatial Deconvolution YAML

In this documentation, the parameters of the deconvolution_spatial configuration yaml file are explained. This file is generated running panpipes deconvolution_spatial config.
The individual steps run by the pipeline are described in the spatial deconvolution workflow.

When running the deconvolution workflow, panpipes provides a basic pipeline.yml file. To run the workflow on your own data, you need to specify the parameters described below in the pipeline.yml file to meet the requirements of your data. However, we do provide pre-filled versions of the pipeline.yml file for individual tutorials. You can download the different deconvolution pipeline.yml files here:

For more information on functionalities implemented in panpipes to read the configuration files, such as reading blocks of parameters and reusing blocks with &anchors and *scalars, please check our documentation

0. Compute Resource Options

resources
Computing resources to use, specifically the number of threads used for parallel jobs. Check threads_tasks_panpipes for more information on which threads each specific task requires. Specified by the following three parameters:

  • threads_high Integer, Default: 1
    Number of threads used for high intensity computing tasks.

  • threads_medium Integer, Default: 1
    Number of threads used for medium intensity computing tasks. For each thread, there must be enough memory to load your mudata and do computationally light tasks.

  • threads_low Integer, Default: 1
    Number of threads used for low intensity computing tasks. For each thread, there must be enough memory to load text files and do plotting, requires much less memory than the other two.

condaenv String
Path to conda environment that should be used to run panpipes. Leave blank if running native or your cluster automatically inherits the login node environment

1. Input Options

With the deconvolution_spatial workflow, one or multiple spatial slides can be deconvoluted in one run. For that, a MuData object for each slide is expected, with the spatial data saved in mdata.mod["spatial"]. The spatial slides are deconvoluted using the same reference. For the reference, one MuData with the gene expression data saved in mdata.mod["rna"] is expected as input. Please note, that the same parameter setting is used for each slide.
For the spatial input, the workflow, therefore, reads in all .h5mu objects of a directory (see below). The spatial and single-cell data thus need to be saved in different folders.

input

  • spatial String, Mandatory parameter
    Path to folder containing one or multiple MuDatas of spatial data. The pipeline is reading in all MuData files in that folder and assuming that they are MuDatas of spatial slides.

  • singlecell String, Mandatory parameter
    Path to the MuData file (not folder) of the reference single-cell data.

2. Cell2Location Options

For each deconvolution method you can specify whether to run it or not:

run Boolean, Default: None
Whether to run Cell2location

2.1 Feature Selection

You can select genes that are used for deconvolution in two ways. The first option is to provide a reduced feature set as a csv-file that is then used for deconvolution. The second option is to perform gene selection according to Cell2Location.
Please note, that gene selection is not optional. If no csv-file is provided, feature selection according to Cell2Location. is performed.

feature_selection

  • gene_list String, Default: None
    Path to a csv file containing a reduced feature set. A header in the csv is expected in the first row. All genes of that gene list need to be present in both, spatial slides and scRNA-Seq reference.

  • remove_mt Boolean, Default: True
    Whether to remove mitochondrial genes from the dataset. This step is performed before running gene selection.

  • cell_count_cutoff Integer, Default: 15
    All genes detected in less than cell_count_cutoff cells will be excluded. Parameter of the Cell2Location’s gene selection function.

  • cell_percentage_cutoff2 Float, Default: 0.05
    All genes detected in at least this percentage of cells will be included. Parameter of the Cell2Location’s gene selection function.

  • nonz_mean_cutoff Float, Default: 1.12
    Genes detected in the number of cells between the above-mentioned cutoffs are selected only when their average expression in non-zero cells is above this cutoff. Parameter of the Cell2Location’s gene selection function.

2.2 Reference Model

reference

  • labels_key String, Default: None
    Key in .obs for label (cell type) information.

  • batch_key String, Default: None
    Key in .obs for batch information.

  • layer String, Default: None
    Layer in .layers to use for the reference model. If None, .X will be used. Please note, that Cell2Location expects raw counts as input.

  • categorical_covariate_key String, Default: None
    Comma-separated without spaces, e.g. key1,key2,key3. Keys in .obs that correspond to categorical data. These covariates can be added in addition to the batch covariate and are also treated as nuisance factors (i.e., the model tries to minimize their effects on the latent space).

  • continuous_covariate_keys String, Default: None
    Comma-separated without spaces, e.g. key1,key2,key3. Keys in .obs that correspond to continuous data. These covariates can be added in addition to the batch covariate and are also treated as nuisance factors (i.e., the model tries to minimize their effects on the latent space)

  • max_epochs Integer, Default: np.min([round((20000 / n_cells) * 400), 400])
    Number of epochs.

  • use_gpu Boolean, Default: True
    Whether to use GPU for training.

2.3 Spatial Model

spatial

  • batch_key String, Default: None
    Key in .obs for batch information.

  • layer String, Default: None
    Layer in .layers to use for the reference model. If None, .X will be used. Please note, that Cell2Location expects raw counts as input.

  • categorical_covariate_key String, Default: None
    Comma-separated without spaces, e.g. key1,key2,key3. Keys in .obs that correspond to categorical data. These covariates can be added in addition to the batch covariate and are also treated as nuisance factors (i.e., the model tries to minimize their effects on the latent space).

  • continuous_covariate_keys String, Default: None
    Comma-separated without spaces, e.g. key1,key2,key3. Keys in .obs that correspond to continuous data. These covariates can be added in addition to the batch covariate and are also treated as nuisance factors (i.e., the model tries to minimize their effects on the latent space)

  • N_cells_per_location Integer, Mandatory parameter
    Expected cell abundance per voxel. Please refer to the Cell2Location documentation for more information.

  • detection_alpha Float, Mandatory parameter
    Regularization of with-in experiment variation in RNA detection sensitivity. Please refer to the Cell2Location documentation for more information.

  • max_epochs Integer, Default: np.min([round((20000 / n_cells) * 400), 400])
    Number of epochs.

  • use_gpu Boolean, Default: True
    Whether to use GPU for training.


You can specify whether both models (spatial and reference) should be saved with the following parameter:

save_models, Default: False
Whether to save the reference & spatial mapping models.

3. Tangram Options

For each deconvolution method you can specify whether to run it or not:

run Boolean, Default: None
Whether to run Tangram

3.1 Feature Selection

You can select genes that are used for deconvolution in two ways. The first option is to provide a reduced feature set as a csv-file that is then used for deconvolution. The second option is to perform gene selection via scanpy.tl.rank_genes_groups on the reference scRNA-Seq data, as suggested by Tangram. The top n_genes of each group make up the reduced gene set.
Please note, that gene selection is not optional. If no csv-file is provided, feature selection via scanpy.tl.rank_genes_groups is performed.

feature_selection

  • gene_list String, Default: None
    Path to a csv file containing a reduced feature set. A header in the csv is expected in the first row. All genes of that gene list need to be present in both, spatial slides and scRNA-Seq reference.

Parameters for scanpy.tl.rank_genes_groups gene selection

  • rank_genes

    • labels_key String, Default: None
      Which column in .obs of the reference to use for the groupby parameter of scanpy.tl.rank_genes_groups.

    • layer String, Default: None
      Which layer of the reference to use for scanpy.tl.rank_genes_groups. If None, .X is used.

    • n_genes Integer, Default: 100
      How many top genes to select of each groupby group.

    • test_method ['logreg', 't-test', 'wilcoxon', 't-test_overestim_var'], Default: ‘t-test_overestim_var’
      Which test method to use.

    • correction_method ['benjamini-hochberg', 'bonferroni'], Default: ‘ benjamini-hochberg’
      Which p-value correction method to use. Used only for ‘t-test’, ‘t-test_overestim_var’, and ‘wilcoxon’.

3.2 Model

model

  • labels_key String, Default: None
    Key in .obs for label (cell type) information.

  • num_epochs Integer, Default: 1000
    Number of epochs.

  • device String, Default: ‘cpu’
    Which device to use.

  • kwargs
    In kwargs, the user has the possibility to specify parameters for tangram.mapping_utils.map_cells_to_space. You can add or remove any parameters of the function.