Preprocessing spatial data
The preprocess_spatial
workflow filters the data and preprocesses the data by normalization, HVG selection, and PCA computation. Multiple MuData
objects of the same assay (Visium
or Vizgen
), each with a spatial
modality, can be filtered and preprocessed in one run.
Steps
If multiple MuData
objects are provided, the following steps are run for each with the same parameter setting.
MuData
object is filtered by the specified thresholds in the pipeline.yml. Note, that the filtering step is optional. You can avoid filtering by setting therun
parameter in the pipeline.yml underfiltering
toFalse
.Post-filter plotting is performed (only when data was filtered, i.e.
run: True
). Specified metrics in the pipeline.yml are plotted in violin and spatial embedding plots. Plots are saved into the./figures/spatial
directory.Data is normalized and HVGs are selected. Before normalization, raw counts are saved into
.layers["raw_counts"]
, if not present already. Normalized counts are saved into.X
and.layers["lognorm"]
or.layers["norm_pearson_resid"]
, depending on the chosen normalization. HVGs are saved into.var["highly_variable"]
.PCA is computed and plotted. PCA plots are also saved into the
./figures/spatial
directory.Final
MuData
object is saved into the./filtered.data
directory
Steps to run
Activate conda environment
conda activate pipeline_env
Generate yaml and log file
panpipes preprocess_spatial config
Specify the parameter setting in the pipeline.yml file
Run complete preprocess workflow with
panpipes preprocess_spatial make full --local
The Preprocessing spatial data tutorial guides you through the preprocessing step by step.