plexus.processors.DataframeProcessor module

plexus.processors.DataframeProcessor.DataframeProcessor

alias of Processor

class plexus.processors.DataframeProcessor.DatasetProcessor(**parameters)

Bases: ABC

Base class for dataset-level processors that operate on DataFrames.

These processors are used during evaluation/training to filter, transform, or balance datasets. They operate on entire DataFrames, not individual items.

Examples: ByColumnValueDatasetFilter, DownsampleClassDatasetFilter, etc.

Note: Dataset processors are NOT used in production - they’re dev/eval only. They belong in the ‘data:’ section of YAML configs, not the ‘item:’ section.

__init__(**parameters)
display_summary()

Display before/after summary (for debugging/logging).

abstractmethod process(dataframe: DataFrame) DataFrame

Transform a DataFrame (dataset-level operation).

Args:

dataframe: Input DataFrame

Returns:

Transformed DataFrame

class plexus.processors.DataframeProcessor.Processor(**parameters)

Bases: ABC

Base class for processors that transform Score.Input → Score.Input.

These processors work on individual items (per-item processing), ensuring the same transformation in production and development.

Use this for per-item text transformations that should work identically in production (single item) and development (datasets).

Examples: FilterCustomerOnlyProcessor, RemoveSpeakerIdentifiersTranscriptFilter, etc.

Initialize the processor with configuration parameters.

Args:

**parameters: Processor-specific configuration parameters

__init__(**parameters)

Initialize the processor with configuration parameters.

Args:

**parameters: Processor-specific configuration parameters

display_summary()

Display before/after summary (for debugging/logging).

abstractmethod process(score_input: Score.Input) Score.Input

Transform a Score.Input.

Args:

score_input: Input to transform (contains text, metadata, results)

Returns:

Transformed Score.Input with modified text/metadata