plexus.processors.RelevantWindowsTranscriptFilter module
- class plexus.processors.RelevantWindowsTranscriptFilter.RelevantWindowsTranscriptFilter(**parameters)
Bases:
DataframeProcessorFilter transcript to extract relevant windows based on keywords or a classifier.
This processor can work in two modes: 1. Keyword mode: Provide a list of keywords with optional fuzzy matching 2. Classifier mode: Provide a Score classifier (legacy mode)
- Parameters:
keywords (list): List of keywords/phrases to match (alternative to classifier) fuzzy_match (bool): Enable fuzzy matching for keywords (default: False) fuzzy_threshold (int): Minimum similarity score for fuzzy matching, 0-100 (default: 80) case_sensitive (bool): Whether keyword matching is case sensitive (default: False) classifier (Score): A Score classifier for determining relevance (legacy) prev_count (int): Number of sentences to include before matched sentence (default: 1) next_count (int): Number of sentences to include after matched sentence (default: 1) window_unit (str): Unit for window size - ‘sentences’, ‘words’, or ‘characters’ (default: ‘sentences’)
- __init__(**parameters)
- combine_consecutive_ellipses(filtered_text)
- compute_inclusion_flags(relevance_flags)
- is_sentence_relevant(sentence: str) bool
Check if a sentence is relevant based on keywords or classifier.
- Args:
sentence: The sentence to check
- Returns:
True if sentence matches keywords or classifier returns True
- process(dataframe: DataFrame) DataFrame
- should_insert_ellipsis(index, include_flags, sentences)