Skip to content

Pipeline Steps

BaseStep

densitree.steps.base.BaseStep

Bases: ABC

Abstract base for all SPADE pipeline steps.

Each step receives the shared pipeline context as keyword arguments and returns a dict of new keys to merge into that context.

run(data, **ctx) abstractmethod

Run this step.

Parameters:

Name Type Description Default
data ndarray

The (possibly transformed) input array, shape (n_cells, n_features).

required
**ctx

Accumulated outputs from previous steps.

{}

Returns:

Type Description
dict

New context keys produced by this step.

Source code in densitree/steps/base.py
@abstractmethod
def run(self, data: np.ndarray, **ctx) -> dict:
    """Run this step.

    Parameters
    ----------
    data:
        The (possibly transformed) input array, shape (n_cells, n_features).
    **ctx:
        Accumulated outputs from previous steps.

    Returns
    -------
    dict
        New context keys produced by this step.
    """
    ...

DensityEstimator

densitree.steps.density.DensityEstimator(knn=5, eps=1e-08)

Bases: BaseStep

Estimate local density for each cell using k-NN.

Density is defined as 1 / (distance to k-th nearest neighbor + eps), so cells in dense regions get high density values.

Source code in densitree/steps/density.py
def __init__(self, knn: int = 5, eps: float = 1e-8) -> None:
    self.knn = knn
    self.eps = eps

DownsampleStep

densitree.steps.downsample.DownsampleStep(downsample_target=0.05, random_state=None)

Bases: BaseStep

Density-normalized downsampling.

Cells in dense regions are sampled with lower probability so that rare populations (low density) are preserved after downsampling.

Inclusion probability for cell i: p_i = min(1, target_count * w_i / sum(w)) where w_i = 1 / density_i.

Source code in densitree/steps/downsample.py
def __init__(self, downsample_target: float = 0.05, random_state: int | None = None) -> None:
    if not 0 < downsample_target <= 1:
        raise ValueError(f"downsample_target must be in (0, 1], got {downsample_target}")
    self.downsample_target = downsample_target
    self.random_state = random_state

ClusterStep

densitree.steps.cluster.ClusterStep(n_clusters=50, n_micro=None, linkage='average')

Bases: BaseStep

Two-stage clustering on the downsampled cell set.

Stage 1: Overcluster into n_micro microclusters using MiniBatchKMeans (fast, sees all downsampled cells, captures fine structure). Stage 2: Merge microclusters into n_clusters metaclusters using agglomerative clustering on the microcluster centroids.

This approach produces much better cluster boundaries than single-stage agglomerative clustering because: - MiniBatchKMeans scales linearly and produces stable microclusters - Agglomerative merging on ~n_micro centroids is fast and produces good metaclusters - Upsampling to microcluster centroids (many, fine-grained) rather than metacluster centroids (few, coarse) dramatically improves the accuracy of cell assignment

Returns micro-level and meta-level labels plus centroids for both.

Source code in densitree/steps/cluster.py
def __init__(
    self,
    n_clusters: int = 50,
    n_micro: int | None = None,
    linkage: str = "average",
) -> None:
    self.n_clusters = n_clusters
    self.n_micro = n_micro
    self.linkage = linkage

UpsampleStep

densitree.steps.upsample.UpsampleStep

Bases: BaseStep

Assign every original cell to its nearest cluster.

Uses microcluster centroids (fine-grained) for assignment, then maps each microcluster to its metacluster. This is far more accurate than assigning to the few metacluster centroids directly, because microclusters capture local structure that coarse centroids miss.

MSTBuilder

densitree.steps.mst.MSTBuilder

Bases: BaseStep

Build a minimum spanning tree connecting cluster centroids.

Each node in the resulting networkx.Graph represents one cluster. Node attributes: - size: number of cells assigned to that cluster - median: per-feature median of cells in that cluster (ndarray)

Edge weights are Euclidean distances between centroids.