plexus.analysis.topics.stability module

Module for assessing topic stability using bootstrap sampling.

This module provides functionality to evaluate how stable topics are across multiple runs of the topic modeling process with different data samples.

plexus.analysis.topics.stability.assess_topic_stability(docs: List[str], n_runs: int = 10, sample_fraction: float = 0.8, random_seed: int = 42, **bertopic_params) Dict[str, Any]

Assess topic stability using bootstrap sampling.

This function runs BERTopic multiple times with different random samples of the data and measures how consistently topics emerge across runs.

Args:

docs: List of documents to analyze n_runs: Number of bootstrap runs to perform (default: 10) sample_fraction: Fraction of data to sample each run (default: 0.8) random_seed: Random seed for reproducibility (default: 42) **bertopic_params: Additional parameters to pass to BERTopic

Returns:
Dictionary containing:
  • n_runs: Number of runs performed

  • sample_fraction: Fraction of data sampled per run

  • mean_stability: Overall mean stability score across all topics

  • per_topic_stability: Dict mapping topic_id to stability score

  • topic_consistency_matrix: Matrix of topic similarities across runs

  • methodology: Description of the methodology used

plexus.analysis.topics.stability.calculate_jaccard_similarity(set1: set, set2: set) float

Calculate Jaccard similarity between two sets.

Args:

set1: First set of items set2: Second set of items

Returns:

Jaccard similarity score (0-1)