Spatial QC YAML

In this documentation, the parameters of the qc_spatial configuration yaml file are explained. This file is generated running panpipes qc_spatial config.
The individual steps run by the pipeline are described in the spatial QC workflow.

When running the qc workflow, panpipes provides a basic pipeline.yml file. To run the workflow on your own data, you need to specify the parameters described below in the pipeline.yml file to meet the requirements of your data. However, we do provide pre-filled versions of the pipeline.yml file for individual tutorials. You can download the different ingestion pipeline.yml files here:

Basic pipeline.yml file (not prefilled) that is generated when calling panpipes qc_spatial config: Download here
pipeline.yml file for Ingesting 10X Visium data Tutorial: Download here
pipeline.yml file for Ingesting MERFISH data Tutorial: Download here

For more information on functionalities implemented in panpipes to read the configuration files, such as reading blocks of parameters and reusing blocks with &anchors and *scalars, please check our documentation

0. Compute Resource Options

resources
Computing resources to use, specifically the number of threads used for parallel jobs. Check threads_tasks_panpipes for more information on which threads each specific task requires. Specified by the following three parameters:

threads_high Integer, Default: 1
Number of threads used for high intensity computing tasks. For each thread, there must be enough memory to load all your input files at once and create the SpatialData object.
threads_medium Integer, Default: 1
Number of threads used for medium intensity computing tasks. For each thread, there must be enough memory to load your SpatialData and do computationally light tasks.
threads_low Integer, Default: 1
Number of threads used for low intensity computing tasks. For each thread, there must be enough memory to load text files and do plotting, requires much less memory than the other two.

condaenv String
Path to conda environment that should be used to run panpipes. Leave blank if running native or your cluster automatically inherits the login node environment

1. Loading Options

project String, Default: None
Project name.

submission_file String, Mandatory parameter
Path to the submission file. The submission file specifies the input files. Please refer to the general guidelines for details on the format of the file.

2. QC Options

This part of the workflow allows to generate additional QC metrics that can be used for filtering/preprocessing. Basic QC metrics using scanpy.pp.calculate_qc_metrics are in every case calculated. The computation of the additional QC metrics is optional. Please, leave the parameters empty to avoid running.

ccgenes String, Default: None
Path to tsv-file used to run the function scanpy.tl.score_genes_cell_cycle. It is expected, that the tsv-file has two columns with names cc_phase and gene_name. cc_phase can either be s or g2m.Varying the column names or cc_phase values will result in an error. Please refer to the general guidelines for more information on the tsv file.
Instead of a path, the user can specify the parameter as “default” which then uses a provided tsv file.

custom_genes_file String, Default: None
Path to csv-file containing a gene list with columns group and feature. Varying the column names will result in an error. Please refer to the general guidelines for more information about the file.
The gene list is used to calculate the proportions of genes of a group in the cells/spots. More precise, the groups & genes are used for the qc_vars parameter of the function scanpy.pp.calculate_qc_metrics which accordingly calculates proportions.
Additionally the gene list is used to compute gene scores with scanpy.tl.score_genes
Instead of a path, the user can specify the parameter as “default” which then uses a provided csv file.

calc_proportions String, Default: None
Comma-separated string without spaces, e.g. mito,hp,rp.
For which groups of the csv-file specified in custom_genes_file to calculate percentages.

score_genes String, Default: None
Comma-separated string without spaces, e.g. mito,hp,rp.
For which groups of the csv-file specified in custom_genes_file to run scanpy.tl.score_genes

The following parameters specify the QC metrics to plot in violin and spatial embedding plots. Plots are generated for each slide specified in the submission file separately.

plotqc

grouping_var String, Default: None
Comma-separated string without spaces, e.g. sample_id,batch of categorical columns in .obs. One violin will be created for each group in the violin plot. Not mandatory, can be left empty.
spatial_metrics String, Default: None
Comma-separated string without spaces, e.g. total_counts,n_genes_by_counts of columns in .obs or .var.
Specifies which metrics to plot. If metric is present in both, .obs and .var, both will be plotted.