Skip to content

Benchmarks Overview

densitree includes a comprehensive benchmarking framework for comparing SPADE against other single-cell clustering methods on real and synthetic cytometry data.

What we benchmark

We evaluate six clustering methods:

Method Implementation Type
densitree This library Density-dependent downsampling + agglomerative + MST
FlowSOM (official) flowsom Python package (saeyslab) Self-organizing maps + consensus metaclustering
FlowSOM-style MiniBatchKMeans + agglomerative Fast reimplementation of the FlowSOM two-stage approach
PhenoGraph-style k-NN graph + Leiden community detection Graph-based community detection
KMeans scikit-learn Centroid-based flat clustering (baseline)
Agglomerative scikit-learn (with subsampling for large data) Ward's linkage hierarchical clustering (baseline)

Metrics

Metric What it measures Range
ARI (Adjusted Rand Index) Overall clustering agreement with ground truth, adjusted for chance -1 to 1 (1 = perfect)
NMI (Normalized Mutual Information) Information-theoretic cluster-label agreement 0 to 1 (1 = perfect)
Rare Population F1 Precision/recall for populations comprising <3% of cells 0 to 1 (1 = perfect)
Runtime Wall-clock time in seconds Lower is better

Datasets

  • Levine_32dim: 104,184 cells (gated), 32 CyTOF markers, 14 populations
  • Synthetic: 50,000 cells, 15 features, 12 populations (3 rare)

Running benchmarks

cd benchmarks

# Synthetic dataset (no download needed)
python run_benchmark.py synthetic

# Real dataset (downloads automatically)
python run_benchmark.py Levine_32dim

# Specific methods only
python run_benchmark.py Levine_32dim "densitree,flowsom_official" 5

Results are saved to benchmarks/results/ in JSON, CSV, and Markdown formats.