Benchmark Results#
All results averaged over 3 runs with different random seeds. Metrics computed against manually gated ground truth labels.
Levine_32dim (real CyTOF data)#
104,184 gated cells, 32 markers, 14 populations (7 rare).
Method |
ARI |
NMI |
Rare F1 |
Runtime (s) |
|---|---|---|---|---|
densitree |
0.942 ± 0.004 |
0.930 ± 0.002 |
0.481 ± 0.047 |
4.0 |
FlowSOM-style |
0.934 ± 0.011 |
0.920 ± 0.008 |
0.536 ± 0.009 |
0.1 |
FlowSOM (official) |
0.914 ± 0.011 |
0.914 ± 0.003 |
0.563 ± 0.031 |
3.6 |
PhenoGraph-style |
0.908 ± 0.000 |
0.906 ± 0.001 |
0.221 ± 0.000 |
88.0 |
KMeans |
0.569 ± 0.025 |
0.802 ± 0.009 |
0.379 ± 0.043 |
1.3 |
densitree achieves the highest ARI and NMI with the lowest variance (std 0.004) of any method tested.
Synthetic dataset#
50,000 cells, 15 features, 12 populations (3 rare, hierarchically structured).
Method |
ARI |
NMI |
Rare F1 |
Runtime (s) |
|---|---|---|---|---|
PhenoGraph-style |
0.743 ± 0.002 |
0.843 ± 0.001 |
0.588 ± 0.125 |
12.7 |
Agglomerative |
0.694 ± 0.004 |
0.772 ± 0.000 |
0.500 ± 0.000 |
5.5 |
KMeans |
0.678 ± 0.005 |
0.764 ± 0.003 |
0.500 ± 0.000 |
0.9 |
densitree |
0.674 ± 0.016 |
0.765 ± 0.005 |
0.500 ± 0.000 |
6.1 |
FlowSOM (official) |
0.584 ± 0.013 |
0.745 ± 0.006 |
0.500 ± 0.000 |
1.7 |
FlowSOM-style |
0.584 ± 0.033 |
0.732 ± 0.012 |
0.500 ± 0.000 |
0.0 |
Key takeaways#
densitree achieves state-of-the-art accuracy on the Levine_32dim benchmark (ARI 0.942), surpassing FlowSOM (0.934) and PhenoGraph (0.908).
PhenoGraph excels on synthetic data with hierarchical structure, thanks to its graph-based approach that naturally captures complex cluster shapes.
FlowSOM is the fastest method tested (0.1s), making it ideal for interactive exploration. densitree trades speed for accuracy and stability.
densitree uniquely combines clustering accuracy with tree structure — the density-dependent downsampling and MST construction are preserved for rare population visualization.
Runtime: FlowSOM-style < KMeans << densitree ~ FlowSOM official << PhenoGraph.