Pipeline Steps¶
BaseStep¶
densitree.steps.base.BaseStep
¶
Bases: ABC
Abstract base for all SPADE pipeline steps.
Each step receives the shared pipeline context as keyword arguments and returns a dict of new keys to merge into that context.
run(data, **ctx)
abstractmethod
¶
Run this step.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
ndarray
|
The (possibly transformed) input array, shape (n_cells, n_features). |
required |
**ctx
|
Accumulated outputs from previous steps. |
{}
|
Returns:
| Type | Description |
|---|---|
dict
|
New context keys produced by this step. |
Source code in densitree/steps/base.py
DensityEstimator¶
densitree.steps.density.DensityEstimator(knn=5, eps=1e-08)
¶
Bases: BaseStep
Estimate local density for each cell using k-NN.
Density is defined as 1 / (distance to k-th nearest neighbor + eps), so cells in dense regions get high density values.
Source code in densitree/steps/density.py
DownsampleStep¶
densitree.steps.downsample.DownsampleStep(downsample_target=0.05, random_state=None)
¶
Bases: BaseStep
Density-normalized downsampling.
Cells in dense regions are sampled with lower probability so that rare populations (low density) are preserved after downsampling.
Inclusion probability for cell i: p_i = min(1, target_count * w_i / sum(w)) where w_i = 1 / density_i.
Source code in densitree/steps/downsample.py
ClusterStep¶
densitree.steps.cluster.ClusterStep(n_clusters=50, n_micro=None, linkage='average')
¶
Bases: BaseStep
Two-stage clustering on the downsampled cell set.
Stage 1: Overcluster into n_micro microclusters using MiniBatchKMeans (fast, sees all downsampled cells, captures fine structure). Stage 2: Merge microclusters into n_clusters metaclusters using agglomerative clustering on the microcluster centroids.
This approach produces much better cluster boundaries than single-stage agglomerative clustering because: - MiniBatchKMeans scales linearly and produces stable microclusters - Agglomerative merging on ~n_micro centroids is fast and produces good metaclusters - Upsampling to microcluster centroids (many, fine-grained) rather than metacluster centroids (few, coarse) dramatically improves the accuracy of cell assignment
Returns micro-level and meta-level labels plus centroids for both.
Source code in densitree/steps/cluster.py
UpsampleStep¶
densitree.steps.upsample.UpsampleStep
¶
Bases: BaseStep
Assign every original cell to its nearest cluster.
Uses microcluster centroids (fine-grained) for assignment, then maps each microcluster to its metacluster. This is far more accurate than assigning to the few metacluster centroids directly, because microclusters capture local structure that coarse centroids miss.
MSTBuilder¶
densitree.steps.mst.MSTBuilder
¶
Bases: BaseStep
Build a minimum spanning tree connecting cluster centroids.
Each node in the resulting networkx.Graph represents one cluster.
Node attributes:
- size: number of cells assigned to that cluster
- median: per-feature median of cells in that cluster (ndarray)
Edge weights are Euclidean distances between centroids.