Benchmarks Overview#

densitree includes a comprehensive benchmarking framework for comparing SPADE against other single-cell clustering methods on real and synthetic cytometry data.

What we benchmark#

We evaluate six clustering methods:

Method	Implementation	Type
densitree	This library	Density-dependent downsampling + agglomerative + MST
FlowSOM (official)	`flowsom` Python package (saeyslab)	Self-organizing maps + consensus metaclustering
FlowSOM-style	MiniBatchKMeans + agglomerative	Fast reimplementation of the FlowSOM two-stage approach
PhenoGraph-style	k-NN graph + Leiden community detection	Graph-based community detection
KMeans	scikit-learn	Centroid-based flat clustering (baseline)
Agglomerative	scikit-learn (with subsampling for large data)	Ward’s linkage hierarchical clustering (baseline)

Metrics#

Metric	What it measures	Range
ARI (Adjusted Rand Index)	Overall clustering agreement with ground truth, adjusted for chance	-1 to 1 (1 = perfect)
NMI (Normalized Mutual Information)	Information-theoretic cluster-label agreement	0 to 1 (1 = perfect)
Rare Population F1	Precision/recall for populations comprising <3% of cells	0 to 1 (1 = perfect)
Runtime	Wall-clock time in seconds	Lower is better

Datasets#

Benchmark Datasets — Levine_32dim: 104,184 cells (gated), 32 CyTOF markers, 14 populations
Benchmark Datasets — Synthetic: 50,000 cells, 15 features, 12 populations (3 rare)

Running benchmarks#

cd benchmarks

# Synthetic dataset (no download needed)
python run_benchmark.py synthetic

# Real dataset (downloads automatically)
python run_benchmark.py Levine_32dim

# Specific methods only
python run_benchmark.py Levine_32dim "densitree,flowsom_official" 5

Results are saved to benchmarks/results/ in JSON, CSV, and Markdown formats.