SPADE¶
densitree.spade.SPADE(n_clusters=50, downsample_target=0.1, knn=5, n_micro=None, n_consensus=10, transform='arcsinh', cofactor=150.0, backend='matplotlib', density_estimator=None, random_state=None)
¶
SPADE clustering with scikit-learn-compatible API.
Improved SPADE that combines density-dependent downsampling (for rare population preservation and tree construction) with consensus overclustering (for accurate cell assignment).
The algorithm:
- Density estimation (k-NN) on all cells.
- Consensus clustering over multiple runs:
a. Overcluster all cells into
n_micromicroclusters (MiniBatchKMeans). b. Merge microclusters inton_clustersmetaclusters using both ward and average linkage agglomerative clustering. c. Align labels across runs (Hungarian algorithm) and take majority vote. d. Filter out low-agreement runs before voting. - Density-dependent downsampling for tree construction.
- MST construction on metacluster centroids.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
n_clusters
|
int
|
Number of clusters (default 50). |
50
|
downsample_target
|
float
|
Fraction of cells to retain for tree construction (default 0.1). |
0.1
|
knn
|
int
|
k for k-NN density estimation (default 5). |
5
|
n_micro
|
int | None
|
Number of microclusters. |
None
|
n_consensus
|
int
|
Number of MiniBatchKMeans runs per linkage type for consensus. Total runs = 2 * n_consensus (ward + average). Default 10. |
10
|
transform
|
str | None
|
|
'arcsinh'
|
cofactor
|
float
|
Arcsinh cofactor (default 150.0). |
150.0
|
backend
|
str
|
Default plotting backend. |
'matplotlib'
|
density_estimator
|
BaseStep | None
|
Custom density estimator step. |
None
|
random_state
|
int | None
|
Seed for reproducibility. |
None
|
Source code in densitree/spade.py
fit(X)
¶
Fit SPADE to data.