plexus.processors.RemoveSpeakerIdentifiersTranscriptFilter module

class plexus.processors.RemoveSpeakerIdentifiersTranscriptFilter.RemoveSpeakerIdentifiersTranscriptFilter(**parameters)

Bases: Processor

Processor that removes speaker identifiers from transcript text.

Removes patterns like “Agent:”, “Customer:”, etc. from the beginning of lines.

Initialize the processor with configuration parameters.

Args:

**parameters: Processor-specific configuration parameters

GENERIC_SPEAKER_LABEL_PATTERN = re.compile('(?:(?<=^)|(?<=\\n)|(?<=\\.\\s))[A-Za-z][A-Za-z0-9_-]{1,31}\\s*:\\s*', re.MULTILINE)
SPEAKER_LABEL_PATTERN = re.compile('(?:(?<=^)|(?<=\\s))(?:speaker(?:\\s*[A-Za-z0-9_-]+)?|unknown\\s+speaker|agent|customer|contact|representative|rep)\\s*:\\s*', re.IGNORECASE|re.MULTILINE)
process(score_input: Score.Input) Score.Input

Process the Score.Input by removing speaker identifiers.

Args:

score_input: Score.Input with text containing speaker labels

Returns:

Score.Input with speaker labels removed