Skip to content

Benchmark Results

All results averaged over 3 runs with different random seeds. Metrics computed against manually gated ground truth labels.

Levine_32dim (real CyTOF data)

104,184 gated cells, 32 markers, 14 populations (7 rare).

Method ARI NMI Rare F1 Runtime (s)
densitree 0.942 +/- 0.004 0.930 +/- 0.002 0.481 +/- 0.047 4.0
FlowSOM-style 0.934 +/- 0.011 0.920 +/- 0.008 0.536 +/- 0.009 0.1
FlowSOM (official) 0.914 +/- 0.011 0.914 +/- 0.003 0.563 +/- 0.031 3.6
PhenoGraph-style 0.908 +/- 0.000 0.906 +/- 0.001 0.221 +/- 0.000 88.0
KMeans 0.569 +/- 0.025 0.802 +/- 0.009 0.379 +/- 0.043 1.3

densitree achieves the highest ARI and NMI with the lowest variance (std 0.004) of any method tested. Its consensus overclustering approach combines the accuracy of exhaustive clustering with the stability of ensemble methods.


Synthetic dataset

50,000 cells, 15 features, 12 populations (3 rare, hierarchically structured).

Method ARI NMI Rare F1 Runtime (s)
PhenoGraph-style 0.743 +/- 0.002 0.843 +/- 0.001 0.588 +/- 0.125 12.7
Agglomerative 0.694 +/- 0.004 0.772 +/- 0.000 0.500 +/- 0.000 5.5
KMeans 0.678 +/- 0.005 0.764 +/- 0.003 0.500 +/- 0.000 0.9
densitree 0.674 +/- 0.016 0.765 +/- 0.005 0.500 +/- 0.000 6.1
FlowSOM (official) 0.584 +/- 0.013 0.745 +/- 0.006 0.500 +/- 0.000 1.7
FlowSOM-style 0.584 +/- 0.033 0.732 +/- 0.012 0.500 +/- 0.000 0.0

Key takeaways

  1. densitree achieves state-of-the-art accuracy on the Levine_32dim benchmark (ARI 0.942), surpassing FlowSOM (0.934) and PhenoGraph (0.908). Its consensus overclustering approach (multiple MiniBatchKMeans runs + ward/average linkage ensemble + Hungarian-aligned majority vote) produces both high accuracy and extremely low variance.

  2. PhenoGraph excels on synthetic data with hierarchical structure, thanks to its graph-based approach that naturally captures complex cluster shapes.

  3. FlowSOM is the fastest method tested (0.1s), making it ideal for interactive exploration. densitree trades speed for accuracy and stability.

  4. densitree uniquely combines clustering accuracy with tree structure -- the density-dependent downsampling and MST construction are preserved for rare population visualization, while the consensus clustering produces accurate flat labels.

  5. Runtime: FlowSOM-style < KMeans << densitree ~ FlowSOM official << PhenoGraph.