plexus.processors.RemoveSpeakerIdentifiersTranscriptFilter module
- class plexus.processors.RemoveSpeakerIdentifiersTranscriptFilter.RemoveSpeakerIdentifiersTranscriptFilter(**parameters)
Bases:
ProcessorProcessor that removes speaker identifiers from transcript text.
Removes patterns like “Agent:”, “Customer:”, etc. from the beginning of lines.
Initialize the processor with configuration parameters.
- Args:
**parameters: Processor-specific configuration parameters
- GENERIC_SPEAKER_LABEL_PATTERN = re.compile('(?:(?<=^)|(?<=\\n)|(?<=\\.\\s))[A-Za-z][A-Za-z0-9_-]{1,31}\\s*:\\s*', re.MULTILINE)
- SPEAKER_LABEL_PATTERN = re.compile('(?:(?<=^)|(?<=\\s))(?:speaker(?:\\s*[A-Za-z0-9_-]+)?|unknown\\s+speaker|agent|customer|contact|representative|rep)\\s*:\\s*', re.IGNORECASE|re.MULTILINE)
- process(score_input: Score.Input) Score.Input
Process the Score.Input by removing speaker identifiers.
- Args:
score_input: Score.Input with text containing speaker labels
- Returns:
Score.Input with speaker labels removed