Scanpy `standard_scale`: var vs group β€” finally clear (with manual calculations)

Deep-dive into standard_scale='var' vs 'group' in Scanpy heatmaps, dotplots & matrixplots β€” complete with step-by-step math and when to choose each.

Scanpy – Single-Cell Analysis in Python

source: Scanpy

When creating heatmaps, dotplots or matrixplots in Scanpy, the standard_scale parameter dramatically changes what the reader perceives β€” even though both options use the same min-max formula:

\[\text{scaled} = \frac{x - \min}{\max - \min} \quad \in [0,1]\]

The only difference is along which axis the min & max are calculated.

  • standard_scale="var" β†’ column-wise (per gene across groups)
  • standard_scale="group" β†’ row-wise (per group across genes)

Let’s compute both versions manually on a tiny realistic toy dataset so the difference becomes intuitive.


Toy dataset β€” mean expression per cluster

Cluster CD68 CD3D CD19 ACTB
Macrophages 200 20 15 150
T cells 10 180 25 140
B cells 25 30 190 145

Classic markers + one housekeeping gene.


Option 1: standard_scale="var" β€” scale each gene across clusters

Goal: answer β€œWhich cluster expresses this gene the most / least?”

For every column, compute its own min & max.

CD68 (10, 200, 25) β†’ min=10, max=200, range=190
Macrophages: (200-10)/190 = 1.00
T cells: (10-10)/190 = 0.00
B cells: (25-10)/190 β‰ˆ 0.08

CD3D (20, 180, 30) β†’ min=20, max=180, range=160
β†’ 0.00 | 1.00 | 0.06

CD19 (15, 25, 190) β†’ min=15, max=190, range=175
β†’ 0.00 | 0.06 | 1.00

ACTB (150, 140, 145) β†’ min=140, max=150, range=10
β†’ 1.00 | 0.00 | 0.50

Scaled matrix ("var"):

Cluster CD68 CD3D CD19 ACTB
Macrophages 1.00 0.00 0.00 1.00
T cells 0.00 1.00 0.06 0.00
B cells 0.08 0.06 1.00 0.50

Key insight
After "var" scaling you cannot compare intensity across genes β€” every gene is stretched to use the full [0–1] range independently.
Perfect when you want to spot marker specificity.


Option 2: standard_scale="group" β€” scale each cluster across genes

Goal: answer β€œWhich genes stand out the most inside this cluster?”

For every row, compute its own min & max.

Macrophages (200, 20, 15, 150) β†’ min=15, max=200, range=185
CD68: 1.00 | CD3D: 0.03 | CD19: 0.00 | ACTB: 0.73

T cells (10, 180, 25, 140) β†’ min=10, max=180, range=170
β†’ 0.00 | 1.00 | 0.09 | 0.76

B cells (25, 30, 190, 145) β†’ min=25, max=190, range=165
β†’ 0.00 | 0.03 | 1.00 | 0.73

Scaled matrix ("group"):

Cluster CD68 CD3D CD19 ACTB
Macrophages 1.00 0.03 0.00 0.73
T cells 0.00 1.00 0.09 0.76
B cells 0.00 0.03 1.00 0.73

Key insight
Now gene values cannot be compared across clusters β€” each cluster has its own internal [0–1] stretch.
Excellent when showing intra-cluster gene ranking or signature composition.


Quick decision guide

Question you want the figure to answer Recommended
Which cluster has the highest expression of gene X? "var"
Within this cluster, which genes are relatively strongest? "group"
I want absolute comparison across everything None

Ready-to-copy Scanpy snippets

import scanpy as sc

marker_genes = ["CD68", "CD3D", "CD19", "ACTB"]
groupby     = "celltype"          # or "leiden", "anno"...

# Most common: per-gene scaling (marker view)
sc.pl.matrixplot(adata, marker_genes, groupby,
                 standard_scale="var",
                 cmap="Blues", dendrogram=True,
                 title="Per-gene scaling β€” marker specificity")

# Per-group scaling (signature composition view)
sc.pl.matrixplot(adata, marker_genes, groupby,
                 standard_scale="group",
                 cmap="YlOrRd", dendrogram=True,
                 title="Per-group scaling β€” intra-cluster ranking")

# Raw / absolute reference (often surprisingly useful)
sc.pl.matrixplot(adata, marker_genes, groupby,
                 standard_scale=None,
                 cmap="viridis", layer="log1p",
                 title="No scaling β€” absolute means")

Bonus: try sc.pl.dotplot(…) with the same standard_scale logic β€” same interpretation applies.


TL;DR takeaway

  • "var" = compare clusters within one gene β†’ great for marker panels
  • "group" = compare genes within one cluster β†’ great for cluster signatures
  • Always ask: β€œWhat comparison do I actually want to preserve?”

Happy plotting!

Comments

Leave a comment using your GitHub account. Your feedback is appreciated!