Visualization YAML
In this documentation, the parameters of the visualization
configuration yaml file are explained.
This file is generated by running panpipes vis config
.
The individual steps run by the pipeline are described in the visualization workflow.
When running the visualization workflow, panpipes provides a basic pipeline.yml
file.
To run the workflow on your own data, you need to specify the parameters described below in the pipeline.yml
file to meet the requirements of your data.
However, we do provide pre-filled versions of the pipeline.yml
file for individual tutorials.
For more information on functionalities implemented in panpipes
to read the configuration files, such as reading blocks of parameters and reusing blocks with &anchors
and *scalars
, please check our documentation
You can download the different ingestion pipeline.yml
files here:
Basic
pipeline.yml
file (not prefilled) that is generated when callingpanpipes vis config
: Download herepipeline.yml
file for Visualizing data Tutorial: Download here
Compute resources options
resources
Computing resources to use, specifically the number of threads used for parallel jobs. Check threads_tasks_panpipes for more information on which threads each specific task requires.
Specified by the following three parameters:
threads_high
Integer
, Default: 1
Number of threads used for high intensity computing tasks. For each thread, there must be enough memory to load all your input files at once and create the MuData object.threads_medium
Integer
, Default: 1
Number of threads used for medium intensity computing tasks. For each thread, there must be enough memory to load your mudata and do computationally light tasks.threads_low
Integer
, Default: 1
Number of threads used for low intensity computing tasks. For each thread, there must be enough memory to load text files and do plotting, requires much less memory than the other two.
condaenv String
(Path)
Path to conda environment that should be used to run panpipes.
Leave blank if running native or your cluster automatically inherits the login node environment
Loading and merging data options
Data format
sample_prefix String
, Mandatory parameter, Default: test
Prefix for the sample that comes out of the filtering/ preprocessing steps of the workflow.
mudata_obj String
, Mandatory parameter
Path to the output file from preprocessing (e.g. ../vis/test.h5mu
).
Ensure that the submission file is in the right format and that the correct path is provided.
modalities
rna Boolean
, Default: True
prot Boolean
, Default: True
atac Boolean
, Default: False
rep Boolean
, Default: True
multimodal Boolean
, Default: True
Set the modalities to True or False depending on what is present in the input mudata_obj
grouping_vars String
, Default: sample_id rna:leiden_res0.6
On dot plots and bar plots, grouping vars are used to group other features (for categorical, continuous, and feature plots).
Should be provided as a list as follows:
grouping_vars:
- sample_id
- rna:leiden_res0.6
Plot Markers
Check gene_list_format.md for Plot marker csv format instructions.
The csv files containing the long/short gene lists for visulisations can be specified in the vis
configuration file as follows:
pipeline_vis config file: (pipeline.yml)
# the long list will be plotted in dot plots and matrix plots, one plot per group
full:
- long_file1.csv
- long_file2.csv
# the shorter list will be plotted on umaps as well as other plot types, one plot per group
minimal:
- short_file1.csv
custom_markers
files
full:
The long list will be plotted in dot plots and matrix plots, with one plot per group.minimal:
The shorter list will be plotted on umaps as well as other plot types, with one plot per group.
paired_scatter:
String
, Default:
Produces a scatter plot. When different normalisations exists for a modality in the input MuData object, specifiy which layer to use or set X or leave blank to use themdata[mod].X
. assay.layers:
rna:
String
, Default: logged_countsprot:
String
, Default: clratac:
String
, Default: signac_norm
Plot metadata variables
categorical_vars:
String
, Default: &categorical_varsall:
String
, Default: rep:receptor_subtype sample_id
Metrics to be plotted on every modality.rna:
String
, Default: rna:predicted_doublets rna:phaseprot:
String
, Default: prot:leiden_res0.2 prot:leiden_res1atac:
String
, Default:rep:
String
, Default: rep:has_irmultimodal:
String
, Default: leiden_totalVI mdata_colsr
continuous_vars:
String
, Default: &continuous_varsall:
String
, Default:leiden_res0.5
Metrics to be plotted on every modality.rna:
String
, Default: rna:total_countsprot:
String
, Default: prot:total_countsatac:
String
, Default:multimodal:
String
, Default: rna:total_counts prot:total_counts
<span class=”parameter”paired_scatter:
String
, Default: scatter_features.csv
Plot style
Choose the plot type desired.
do_plots:
Plot each categorical variable as a bar plot. For example, categorical variable “cluster” on x axis and n cells on y
categorical_barplots:
Boolean
, Default: True
Plot each grouping var as a bar plot, with categorical variables stacked. For example, grouping var “sample_id” on x axis and n cells on y and colored by categorical variable “cluster” in a stack
categorical_stacked_barplots:
Boolean
, Default: True
Plot each continuous variable as a violin plot. For example, grouping var “sample_id” on x axis and the continuous variable “doublet_scores” on y
continuous_violin:
Boolean
, Default: True
Plot marker dotplots as produced by scanpy.pl.dotplot
marker_dotplots:
Boolean
, Default: True
Plots marker matrixplot as produced by scanpy.pl.matrixplot.
marker_matrixplots:
Boolean
, Default: True
Plots scatter plots as defined in paired_scatters csv file (scatter_features.csv).
paired_scatters:
Boolean
, Default: True
embedding:
Define the embedding plots (e.g. UMAP, PCA) using the modality and embedding basis specified. This will plot all of minimal markers csv, and categorical, and continuous variablesrna:
run:
Boolean
, Default:Truebasis:
String
, Default: X_umap_mindist_0.25
prot:
run:
Boolean
, Default:Truebasis:
String
, Default:X_umap X_pca
atac:
run:
Boolean
, Default:Falsebasis:
String
, Default:X_umap