Visualization YAML

In this documentation, the parameters of the visualization configuration yaml file are explained. This file is generated by running panpipes vis config.
The individual steps run by the pipeline are described in the visualization workflow. When running the visualization workflow, panpipes provides a basic pipeline.yml file. To run the workflow on your own data, you need to specify the parameters described below in the pipeline.yml file to meet the requirements of your data. However, we do provide pre-filled versions of the pipeline.yml file for individual tutorials.

For more information on functionalities implemented in panpipes to read the configuration files, such as reading blocks of parameters and reusing blocks with &anchors and *scalars, please check our documentation

You can download the different ingestion pipeline.yml files here:

Basic pipeline.yml file (not prefilled) that is generated when calling panpipes vis config: Download here
pipeline.yml file for Visualizing data Tutorial: Download here

Compute resources options

resources
Computing resources to use, specifically the number of threads used for parallel jobs. Check threads_tasks_panpipes for more information on which threads each specific task requires. Specified by the following three parameters:

threads_high Integer, Default: 1
Number of threads used for high intensity computing tasks. For each thread, there must be enough memory to load all your input files at once and create the MuData object.
threads_medium Integer, Default: 1
Number of threads used for medium intensity computing tasks. For each thread, there must be enough memory to load your mudata and do computationally light tasks.
threads_low Integer, Default: 1
Number of threads used for low intensity computing tasks. For each thread, there must be enough memory to load text files and do plotting, requires much less memory than the other two.

condaenv String (Path)
Path to conda environment that should be used to run panpipes. Leave blank if running native or your cluster automatically inherits the login node environment

Loading and merging data options

Data format

sample_prefix String, Mandatory parameter, Default: test
Prefix for the sample that comes out of the filtering/ preprocessing steps of the workflow.

mudata_obj String, Mandatory parameter
Path to the output file from preprocessing (e.g. ../vis/test.h5mu). Ensure that the submission file is in the right format and that the correct path is provided.

modalities
rna Boolean, Default: True
prot Boolean, Default: True
atac Boolean, Default: False
rep Boolean, Default: True
multimodal Boolean, Default: True
Set the modalities to True or False depending on what is present in the input mudata_obj

grouping_vars String, Default: sample_id rna:leiden_res0.6
On dot plots and bar plots, grouping vars are used to group other features (for categorical, continuous, and feature plots). Should be provided as a list as follows:

grouping_vars: 
  - sample_id
  - rna:leiden_res0.6

Plot Markers

Check gene_list_format.md for Plot marker csv format instructions.

The csv files containing the long/short gene lists for visulisations can be specified in the vis configuration file as follows:

pipeline_vis config file: (pipeline.yml)

# the long list will be plotted in dot plots and matrix plots, one plot per group
full:
 - long_file1.csv
 - long_file2.csv
# the shorter list will be plotted on umaps as well as other plot types, one plot per group
minimal:
 - short_file1.csv

custom_markers

files
- full:
  The long list will be plotted in dot plots and matrix plots, with one plot per group.
- minimal:
  The shorter list will be plotted on umaps as well as other plot types, with one plot per group.
paired_scatter:String, Default:
Produces a scatter plot. When different normalisations exists for a modality in the input MuData object, specifiy which layer to use or set X or leave blank to use the mdata[mod].X. assay.
layers:
- rna:String, Default: logged_counts
- prot:String, Default: clr
- atac:String, Default: signac_norm

Plot metadata variables

categorical_vars:String, Default: &categorical_vars
- all:String, Default: rep:receptor_subtype sample_id
  Metrics to be plotted on every modality.
- rna:String, Default: rna:predicted_doublets rna:phase
- prot:String, Default: prot:leiden_res0.2 prot:leiden_res1
- atac:String, Default:
- rep:String, Default: rep:has_ir
- multimodal:String, Default: leiden_totalVI mdata_colsr
continuous_vars:String, Default: &continuous_vars
- all:String, Default:leiden_res0.5
  Metrics to be plotted on every modality.
- rna:String, Default: rna:total_counts
- prot:String, Default: prot:total_counts
- atac:String, Default:
- multimodal:String, Default: rna:total_counts prot:total_counts
<span class=”parameter”paired_scatter:String, Default: scatter_features.csv

Plot style

Choose the plot type desired.

do_plots:

Plot each categorical variable as a bar plot. For example, categorical variable “cluster” on x axis and n cells on y
- categorical_barplots:Boolean, Default: True
Plot each grouping var as a bar plot, with categorical variables stacked. For example, grouping var “sample_id” on x axis and n cells on y and colored by categorical variable “cluster” in a stack
- categorical_stacked_barplots:Boolean, Default: True
Plot each continuous variable as a violin plot. For example, grouping var “sample_id” on x axis and the continuous variable “doublet_scores” on y
- continuous_violin:Boolean, Default: True
Plot marker dotplots as produced by scanpy.pl.dotplot
- marker_dotplots:Boolean, Default: True
Plots marker matrixplot as produced by scanpy.pl.matrixplot.
- marker_matrixplots:Boolean, Default: True
Plots scatter plots as defined in paired_scatters csv file (scatter_features.csv).
- paired_scatters:Boolean, Default: True
embedding:
Define the embedding plots (e.g. UMAP, PCA) using the modality and embedding basis specified. This will plot all of minimal markers csv, and categorical, and continuous variables
- rna:
  - run:Boolean, Default:True
  - basis:String, Default: X_umap_mindist_0.25
- prot:
  - run:Boolean, Default:True
  - basis:String, Default:X_umap X_pca
- atac:
  - run:Boolean, Default:False
  - basis:String, Default:X_umap