Pipeline Steps#

BaseStep#

class densitree.steps.base.BaseStep[source]#

Bases: ABC

Abstract base for all SPADE pipeline steps.

Each step receives the shared pipeline context as keyword arguments and returns a dict of new keys to merge into that context.

abstractmethod run(data: ndarray, **ctx) → dict[source]#

Run this step.

Parameters:

data – The (possibly transformed) input array, shape (n_cells, n_features).
**ctx – Accumulated outputs from previous steps.

Returns:

dict – New context keys produced by this step.

DensityEstimator#

class densitree.steps.density.DensityEstimator(knn: int = 5, eps: float = 1e-08)[source]#

Bases: BaseStep

Estimate local density for each cell using k-NN.

Density is defined as 1 / (distance to k-th nearest neighbor + eps), so cells in dense regions get high density values.

run(data: ndarray, **ctx) → dict[source]#

Run this step.

Parameters:

data – The (possibly transformed) input array, shape (n_cells, n_features).
**ctx – Accumulated outputs from previous steps.

Returns:

dict – New context keys produced by this step.

DownsampleStep#

class densitree.steps.downsample.DownsampleStep(downsample_target: float = 0.05, random_state: int | None = None)[source]#

Bases: BaseStep

Density-normalized downsampling.

Cells in dense regions are sampled with lower probability so that rare populations (low density) are preserved after downsampling.

Inclusion probability for cell i: p_i = min(1, target_count * w_i / sum(w)) where w_i = 1 / density_i.

run(data: ndarray, *, density: ndarray, **ctx) → dict[source]#

Run this step.

Parameters:

data – The (possibly transformed) input array, shape (n_cells, n_features).
**ctx – Accumulated outputs from previous steps.

Returns:

dict – New context keys produced by this step.

ClusterStep#

class densitree.steps.cluster.ClusterStep(n_clusters: int = 50, n_micro: int | None = None, linkage: str = 'average')[source]#

Bases: BaseStep

Two-stage clustering on the downsampled cell set.

Stage 1: Overcluster into n_micro microclusters using MiniBatchKMeans (fast, sees all downsampled cells, captures fine structure).

Stage 2: Merge microclusters into n_clusters metaclusters using agglomerative clustering on the microcluster centroids.

This approach produces much better cluster boundaries than single-stage agglomerative clustering because MiniBatchKMeans scales linearly and produces stable microclusters, agglomerative merging on centroids is fast, and upsampling to fine-grained microcluster centroids dramatically improves cell assignment accuracy.

Returns micro-level and meta-level labels plus centroids for both.

run(data: ndarray, *, X_down: ndarray, **ctx) → dict[source]#

Run this step.

Parameters:

data – The (possibly transformed) input array, shape (n_cells, n_features).
**ctx – Accumulated outputs from previous steps.

Returns:

dict – New context keys produced by this step.

UpsampleStep#

class densitree.steps.upsample.UpsampleStep[source]#

Bases: BaseStep

Assign every original cell to its nearest cluster.

Uses microcluster centroids (fine-grained) for assignment, then maps each microcluster to its metacluster. This is far more accurate than assigning to the few metacluster centroids directly, because microclusters capture local structure that coarse centroids miss.

run(data: ndarray, *, centroids: ndarray, micro_centroids: ndarray | None = None, micro_to_meta: ndarray | None = None, down_idx: ndarray, cluster_labels_down: ndarray, **ctx) → dict[source]#

Run this step.

Parameters:

data – The (possibly transformed) input array, shape (n_cells, n_features).
**ctx – Accumulated outputs from previous steps.

Returns:

dict – New context keys produced by this step.

MSTBuilder#

class densitree.steps.mst.MSTBuilder[source]#

Bases: BaseStep

Build a minimum spanning tree connecting cluster centroids.

Each node in the resulting networkx.Graph represents one cluster. Node attributes: - size: number of cells assigned to that cluster - median: per-feature median of cells in that cluster (ndarray)

Edge weights are Euclidean distances between centroids.

run(data: ndarray, *, centroids: ndarray, labels_: ndarray, **ctx) → dict[source]#

Run this step.

Parameters:

data – The (possibly transformed) input array, shape (n_cells, n_features).
**ctx – Accumulated outputs from previous steps.

Returns:

dict – New context keys produced by this step.