Preprocessing spatial data
The preprocess_spatial workflow filters the data and preprocesses the data by normalization, HVG selection, and PCA computation. Multiple SpatialData objects of the same assay (Visium, Vizgen, or Xenium) can be filtered and preprocessed in one run.
Steps
If multiple SpatialData objects are provided, the following steps are run for each with the same parameter setting.
SpatialDataobject is filtered by the specified thresholds in the pipeline.yml. Note, that the filtering step is optional. You can avoid filtering by setting therunparameter in the pipeline.yml underfilteringtoFalse.Post-filter plotting is performed (only when data was filtered, i.e.
run: True). Specified metrics in the pipeline.yml are plotted in violin and spatial embedding plots. Plots are saved into the./figures/spatialdirectory.Data is normalized and HVGs are selected. Before normalization, raw counts are saved into
.layers["raw_counts"], if not present already. Normalized counts are saved into.Xand.layers["lognorm"]or.layers["norm_pearson_resid"], depending on the chosen normalization. HVGs are saved into.var["highly_variable"].PCA is computed and plotted. PCA plots are also saved into the
./figures/spatialdirectory.Final
SpatialDataobject is saved into the./filtered.datadirectory as azarrfile
If multiple SpatialData objects have been preprocessed, you have the option to concatenate all of them in the last step of the preprocessing:
(Optional) All
SpatialDataobjects in./filtered.dataare concatenated and saved to./concatenated.data/concatenated.zarr
Steps to run
Activate conda environment
conda activate pipeline_envGenerate yaml and log file
panpipes preprocess_spatial configSpecify the parameter setting in the pipeline.yml file
Run complete preprocess workflow with
panpipes preprocess_spatial make full --local
The Preprocessing spatial data tutorial guides you through the preprocessing step by step.