Overview
Bayesian Rips Active Learning (BRAL) is a framework that integrates topological data analysis with active learning strategies to discover rare lineages in high-dimensional biological datasets.
Motivation
Traditional active learning methods rely on geometric or density-based heuristics that struggle with manifold-structured data. BRAL leverages persistent homology from the Rips complex to identify topologically significant regions where rare lineages are likely to reside.
Method
Rips Complex Construction
We construct a Vietoris-Rips complex from the current labeled set, computing persistent homology to identify topological features (connected components, loops, voids) that persist across multiple scales.
Bayesian Acquisition Function
The acquisition function combines:
- Topological uncertainty: regions where the Rips complex exhibits unstable homological features
- Posterior predictive variance: standard Bayesian uncertainty from the classification model
- Diversity score: ensures spatial coverage across the manifold
Active Learning Loop
At each iteration:
- Compute the Rips complex on labeled data
- Extract persistent homology features
- Score unlabeled points using the topology-aware acquisition function
- Query the oracle for labels on the top-k candidates
- Update the model and repeat
Results
BRAL demonstrates improved discovery rates for rare lineages compared to standard active learning baselines, particularly in settings where the rare class occupies a topologically distinct region of the data manifold.