plexus.processors.RelevantWindowsTranscriptFilter module

class plexus.processors.RelevantWindowsTranscriptFilter.RelevantWindowsTranscriptFilter(**parameters)

Bases: DataframeProcessor

Filter transcript to extract relevant windows based on keywords or a classifier.

This processor can work in two modes: 1. Keyword mode: Provide a list of keywords with optional fuzzy matching 2. Classifier mode: Provide a Score classifier (legacy mode)

Parameters:: keywords (list): List of keywords/phrases to match (alternative to classifier) fuzzy_match (bool): Enable fuzzy matching for keywords (default: False) fuzzy_threshold (int): Minimum similarity score for fuzzy matching, 0-100 (default: 80) case_sensitive (bool): Whether keyword matching is case sensitive (default: False) classifier (Score): A Score classifier for determining relevance (legacy) prev_count (int): Number of sentences to include before matched sentence (default: 1) next_count (int): Number of sentences to include after matched sentence (default: 1) window_unit (str): Unit for window size - ‘sentences’, ‘words’, or ‘characters’ (default: ‘sentences’)

__init__(**parameters)

combine_consecutive_ellipses(filtered_text)

compute_inclusion_flags(relevance_flags)

is_sentence_relevant(sentence: str) → bool

Check if a sentence is relevant based on keywords or classifier.

Args:: sentence: The sentence to check
Returns:: True if sentence matches keywords or classifier returns True

process(dataframe: DataFrame) → DataFrame

should_insert_ellipsis(index, include_flags, sentences)