plexus.analysis.topics.stability module
Module for assessing topic stability using bootstrap sampling.
This module provides functionality to evaluate how stable topics are across multiple runs of the topic modeling process with different data samples.
- plexus.analysis.topics.stability.assess_topic_stability(docs: List[str], n_runs: int = 10, sample_fraction: float = 0.8, random_seed: int = 42, **bertopic_params) Dict[str, Any]
Assess topic stability using bootstrap sampling.
This function runs BERTopic multiple times with different random samples of the data and measures how consistently topics emerge across runs.
- Args:
docs: List of documents to analyze n_runs: Number of bootstrap runs to perform (default: 10) sample_fraction: Fraction of data to sample each run (default: 0.8) random_seed: Random seed for reproducibility (default: 42) **bertopic_params: Additional parameters to pass to BERTopic
- Returns:
- Dictionary containing:
n_runs: Number of runs performed
sample_fraction: Fraction of data sampled per run
mean_stability: Overall mean stability score across all topics
per_topic_stability: Dict mapping topic_id to stability score
topic_consistency_matrix: Matrix of topic similarities across runs
methodology: Description of the methodology used
- plexus.analysis.topics.stability.calculate_jaccard_similarity(set1: set, set2: set) float
Calculate Jaccard similarity between two sets.
- Args:
set1: First set of items set2: Second set of items
- Returns:
Jaccard similarity score (0-1)