Spatial QC YAML
In this documentation, the parameters of the qc_spatial
configuration yaml file are explained.
This file is generated running panpipes qc_spatial config
.
The individual steps run by the pipeline are described in the spatial QC workflow.
When running the qc workflow, panpipes provides a basic pipeline.yml
file.
To run the workflow on your own data, you need to specify the parameters described below in the pipeline.yml
file to meet the requirements of your data.
However, we do provide pre-filled versions of the pipeline.yml
file for individual tutorials.
You can download the different ingestion pipeline.yml files here:
Basic
pipeline.yml
file (not prefilled) that is generated when callingpanpipes qc_spatial config
: Download herepipeline.yml
file for Ingesting 10X Visium data Tutorial: Download herepipeline.yml
file for Ingesting MERFISH data Tutorial: Download here
For more information on functionalities implemented in panpipes
to read the configuration files, such as reading blocks of parameters and reusing blocks with &anchors
and *scalars
, please check our documentation
0. Compute Resource Options
resources
Computing resources to use, specifically the number of threads used for parallel jobs.
Check threads_tasks_panpipes for more information on which threads each specific task requires.
Specified by the following three parameters:
threads_high
Integer
, Default: 1
Number of threads used for high intensity computing tasks. For each thread, there must be enough memory to load all your input files at once and create the MuData object.threads_medium
Integer
, Default: 1
Number of threads used for medium intensity computing tasks. For each thread, there must be enough memory to load your mudata and do computationally light tasks.threads_low
Integer
, Default: 1
Number of threads used for low intensity computing tasks. For each thread, there must be enough memory to load text files and do plotting, requires much less memory than the other two.
condaenv String
Path to conda environment that should be used to run panpipes.
Leave blank if running native or your cluster automatically inherits the login node environment
1. Loading Options
project String
, Default: None
Project name.
submission_file String
, Mandatory parameter
Path to the submission file. The submission file specifies the input files. Please refer to the general guidelines for details on the format of the file.
2. QC Options
This part of the workflow allows to generate additional QC metrics that can be used for filtering/preprocessing. Basic QC metrics using scanpy.pp.calculate_qc_metrics are in every case calculated. The computation of the additional QC metrics is optional. Please, leave the parameters empty to avoid running.
ccgenes String
, Default: None
Path to tsv-file used to run the function scanpy.tl.score_genes_cell_cycle. It is expected, that the tsv-file has two columns with names cc_phase
and gene_name
. cc_phase
can either be s
or g2m
.Varying the column names or cc_phase
values will result in an error. Please refer to the general guidelines for more information on the tsv file.
Instead of a path, the user can specify the parameter as “default” which then uses a provided tsv file.
custom_genes_file String
, Default: None
Path to csv-file containing a gene list with columns group
and feature
. Varying the column names will result in an error. Please refer to the general guidelines for more information about the file.
The gene list is used to calculate the proportions of genes of a group in the cells/spots. More precise, the groups & genes are used for the qc_vars
parameter of the function scanpy.pp.calculate_qc_metrics which accordingly calculates proportions.
Additionally the gene list is used to compute gene scores with scanpy.tl.score_genes
Instead of a path, the user can specify the parameter as “default” which then uses a provided csv file.
calc_proportions String
, Default: None
Comma-separated string without spaces, e.g. mito,hp,rp.
For which groups of the csv-file specified in custom_genes_file
to calculate percentages.
score_genes String
, Default: None
Comma-separated string without spaces, e.g. mito,hp,rp.
For which groups of the csv-file specified in custom_genes_file
to run scanpy.tl.score_genes
The following parameters specify the QC metrics to plot in violin and spatial embedding plots. Plots are generated for each slide specified in the submission file separately.
plotqc
grouping_var
String
, Default: None
Comma-separated string without spaces, e.g. sample_id,batch of categorical columns in.obs
. One violin will be created for each group in the violin plot. Not mandatory, can be left empty.spatial_metrics
String
, Default: None
Comma-separated string without spaces, e.g. total_counts,n_genes_by_counts of columns in.obs
or.var
.
Specifies which metrics to plot. If metric is present in both,.obs
and.var
, both will be plotted.