Tutorial: Flow Cytometry Analysis with densitree¶

This tutorial walks through a complete SPADE analysis of CyTOF data using the Levine_32dim benchmark dataset.

Setup¶

import numpy as np
import pandas as pd
import matplotlib
matplotlib.use("Agg")  # use interactive backend if in a notebook
import matplotlib.pyplot as plt
from densitree import SPADE

Load data¶

densitree works with numpy arrays or pandas DataFrames. Here we load the Levine_32dim benchmark dataset:

# If you have readfcs installed:
import readfcs
adata = readfcs.read("Levine_32dim.fcs")
df = adata.to_df()

# Select marker columns (exclude metadata like Time, DNA, etc.)
markers = [c for c in df.columns if c.startswith("CD") or c == "HLA-DR" or c == "Flt3"]
X = df[markers]
print(f"Data: {X.shape[0]} cells, {X.shape[1]} markers")

Run SPADE¶

For CyTOF data, use arcsinh transform with cofactor 5:

spade = SPADE(
    n_clusters=50,           # overcluster for tree exploration
    downsample_target=0.1,   # retain 10% of cells
    knn=10,                  # k for density estimation
    transform="arcsinh",
    cofactor=5.0,            # CyTOF standard
    random_state=42,
)
spade.fit(X)

print(f"Clusters: {len(np.unique(spade.labels_))}")
print(f"Tree nodes: {spade.result_.tree_.number_of_nodes()}")
print(f"Tree edges: {spade.result_.tree_.number_of_edges()}")

Explore results¶

Cluster statistics¶

stats = spade.result_.cluster_stats_
print(stats.head(10))

# Find the largest and smallest clusters
print("\nLargest clusters:")
print(stats.nlargest(5, "size")[["size"]])

print("\nSmallest clusters (potential rare populations):")
print(stats.nsmallest(5, "size")[["size"]])

Visualize the tree¶

# Color by CD3 expression
fig = spade.result_.plot_tree(color_by="CD3", size_by="count", backend="matplotlib")
fig.savefig("spade_tree_cd3.png", dpi=150, bbox_inches="tight")

Tree colored by CD3

# Color by CD19 (B cell marker)
fig = spade.result_.plot_tree(color_by="CD19", size_by="count", backend="matplotlib")
fig.savefig("spade_tree_cd19.png", dpi=150, bbox_inches="tight")

Tree colored by CD19

Interactive exploration with plotly¶

fig = spade.result_.plot_tree(color_by="CD3", backend="plotly")
fig.write_html("spade_tree_interactive.html")
# or fig.show() in a notebook

Work with the tree directly¶

import networkx as nx

tree = spade.result_.tree_

# Find hub clusters (high degree = many connections)
degrees = sorted(tree.degree, key=lambda x: x[1], reverse=True)
print("Hub clusters:", degrees[:5])

# Find clusters connected to a specific node
neighbors = list(tree.neighbors(0))
print(f"Cluster 0 is connected to: {neighbors}")

# Get the path between two clusters (differentiation trajectory?)
path = nx.shortest_path(tree, source=0, target=25, weight="weight")
print(f"Path: {path}")
print("Markers along path:")
for node in path:
    median = tree.nodes[node]["median"]
    print(f"  Cluster {node}: size={tree.nodes[node]['size']}")

Export results¶

# Add cluster labels to original data
df_out = X.copy()
df_out["spade_cluster"] = spade.labels_
df_out.to_csv("annotated_cells.csv", index=False)

# Export cluster statistics
spade.result_.cluster_stats_.to_csv("cluster_stats.csv")

# Export tree for external tools
nx.write_graphml(spade.result_.tree_, "spade_tree.graphml")

Parameter tuning tips¶

If you see...	Try...
Too many small clusters	Decrease `n_clusters`
Rare populations missing	Increase `downsample_target` (e.g., 0.2) or decrease `knn`
Noisy tree with many short edges	Increase `n_clusters` to spread out populations
Very slow	Decrease `downsample_target` (e.g., 0.05)
For fluorescence flow cytometry	Use `cofactor=150.0` instead of 5.0