Pipeline Steps#
BaseStep#
DensityEstimator#
DownsampleStep#
- class densitree.steps.downsample.DownsampleStep(downsample_target: float = 0.05, random_state: int | None = None)[source]#
Bases:
BaseStepDensity-normalized downsampling.
Cells in dense regions are sampled with lower probability so that rare populations (low density) are preserved after downsampling.
Inclusion probability for cell i:
p_i = min(1, target_count * w_i / sum(w))wherew_i = 1 / density_i.
ClusterStep#
- class densitree.steps.cluster.ClusterStep(n_clusters: int = 50, n_micro: int | None = None, linkage: str = 'average')[source]#
Bases:
BaseStepTwo-stage clustering on the downsampled cell set.
Stage 1: Overcluster into
n_micromicroclusters using MiniBatchKMeans (fast, sees all downsampled cells, captures fine structure).Stage 2: Merge microclusters into
n_clustersmetaclusters using agglomerative clustering on the microcluster centroids.This approach produces much better cluster boundaries than single-stage agglomerative clustering because MiniBatchKMeans scales linearly and produces stable microclusters, agglomerative merging on centroids is fast, and upsampling to fine-grained microcluster centroids dramatically improves cell assignment accuracy.
Returns micro-level and meta-level labels plus centroids for both.
UpsampleStep#
- class densitree.steps.upsample.UpsampleStep[source]#
Bases:
BaseStepAssign every original cell to its nearest cluster.
Uses microcluster centroids (fine-grained) for assignment, then maps each microcluster to its metacluster. This is far more accurate than assigning to the few metacluster centroids directly, because microclusters capture local structure that coarse centroids miss.
- run(data: ndarray, *, centroids: ndarray, micro_centroids: ndarray | None = None, micro_to_meta: ndarray | None = None, down_idx: ndarray, cluster_labels_down: ndarray, **ctx) dict[source]#
Run this step.
- Parameters:
data – The (possibly transformed) input array, shape (n_cells, n_features).
**ctx – Accumulated outputs from previous steps.
- Returns:
dict – New context keys produced by this step.
MSTBuilder#
- class densitree.steps.mst.MSTBuilder[source]#
Bases:
BaseStepBuild a minimum spanning tree connecting cluster centroids.
Each node in the resulting networkx.Graph represents one cluster. Node attributes: -
size: number of cells assigned to that cluster -median: per-feature median of cells in that cluster (ndarray)Edge weights are Euclidean distances between centroids.