Ingesting spatial data
The spatial_qc workflow ingests Vizgen, Visium, or Xenium data and saves the data into SpatialData objects.
A primary difference to the cell suspension ingestion workflow is that we are not concatenating the input data into a single matrix, but keeping the samples as separate SpatialData objects. This ensures that the processing does not introduce any technical batch effect when tissue slides are very different in cell composition.
Steps
Data is ingested into
SpatialDataobjects. The workflow generates oneSpatialDataper dataset.SpatialDataobjects of the raw data are saved into./tmpaszarrfiles
QC metrics are computed using
scanpyfunctionalities:Basic QC metrics are computed using
sc.pp.calculate_qc_metrics(Optional) Compute cell-cycle scores using
sc.tl.score_genes_cell_cycle. For that, the default gene list can be used or a path to a tsv file can be specified.(Optional) Custom genes actions. Default gene list can be used or a path to a csv file can be specified.
Calculate proportions of gene groups, e.g. mitochondrial genes
Score genes using
sc.tl.score_genes
SpatialDataobjects with calculated QC metrics are saved inqc.dataMetadata (
.obs) is saved into the current directory as tsv files
Specified QC metrics are plotted in violin and spatial embedding plots
For
Vizgendata, additional histograms are plotted
Steps to run
Generate sample submission file. You can find more information about the generation here
(Optional) Generate QC gene lists as described in gene list format
Activate conda environment
conda activate pipeline_envGenerate yaml and log file
panpipes qc_spatial configSpecify the parameter setting in the pipeline.yml file
Run complete QC workflow with
panpipes qc_spatial make full --localUse outputs to decide filtering thresholds
Note that the actual filtering occurs in the first step of the
preprocess_spatialworkflow
The Ingesting 10X Visium data with Panpipes and Ingesting MERFISH data with Panpipes tutorials guide you through the ingestion step by step.