plexus.processors.FilterCustomerOnlyProcessor module

class plexus.processors.FilterCustomerOnlyProcessor.FilterCustomerOnlyProcessor(**parameters)

Bases: DataframeProcessor

Processor that filters transcript text to include only customer utterances.

This processor extracts only the portions of a transcript where the customer is speaking, removing all agent/representative utterances. It handles various speaker label formats (Customer:, Contact:, etc.).

Note: This processor does NOT remove the speaker identifiers themselves. To remove speaker labels like “Customer:”, chain this with RemoveSpeakerIdentifiersTranscriptFilter.

Example usage in YAML:

data:

processors:

class: FilterCustomerOnlyProcessor
class: RemoveSpeakerIdentifiersTranscriptFilter

process(dataframe: DataFrame) → DataFrame

Process the dataframe by filtering text to customer utterances only.

Args:: dataframe: DataFrame with ‘text’ column containing transcripts
Returns:: DataFrame with ‘text’ column filtered to customer speech only