plexus.processors.RemoveSpeakerIdentifiersTranscriptFilter module

class plexus.processors.RemoveSpeakerIdentifiersTranscriptFilter.RemoveSpeakerIdentifiersTranscriptFilter(**parameters)

Processor that removes speaker identifiers from transcript text.

Removes patterns like “Agent:”, “Customer:”, etc. from the beginning of lines.

Initialize the processor with configuration parameters.

GENERIC_SPEAKER_LABEL_PATTERN = re.compile('(?:(?<=^)|(?<=\\n)|(?<=\\.\\s))[A-Za-z][A-Za-z0-9_-]{1,31}\\s*:\\s*', re.MULTILINE)

SPEAKER_LABEL_PATTERN = re.compile('(?:(?<=^)|(?<=\\s))(?:speaker(?:\\s*[A-Za-z0-9_-]+)?|unknown\\s+speaker|agent|customer|contact|representative|rep)\\s*:\\s*', re.IGNORECASE|re.MULTILINE)

process(score_input: Score.Input) → Score.Input

Process the Score.Input by removing speaker identifiers.