plexus.processors.DataframeProcessor module
- class plexus.processors.DataframeProcessor.DatasetProcessor(**parameters)
Bases:
ABCBase class for dataset-level processors that operate on DataFrames.
These processors are used during evaluation/training to filter, transform, or balance datasets. They operate on entire DataFrames, not individual items.
Examples: ByColumnValueDatasetFilter, DownsampleClassDatasetFilter, etc.
Note: Dataset processors are NOT used in production - they’re dev/eval only. They belong in the ‘data:’ section of YAML configs, not the ‘item:’ section.
- __init__(**parameters)
- display_summary()
Display before/after summary (for debugging/logging).
- abstractmethod process(dataframe: DataFrame) DataFrame
Transform a DataFrame (dataset-level operation).
- Args:
dataframe: Input DataFrame
- Returns:
Transformed DataFrame
- class plexus.processors.DataframeProcessor.Processor(**parameters)
Bases:
ABCBase class for processors that transform Score.Input → Score.Input.
These processors work on individual items (per-item processing), ensuring the same transformation in production and development.
Use this for per-item text transformations that should work identically in production (single item) and development (datasets).
Examples: FilterCustomerOnlyProcessor, RemoveSpeakerIdentifiersTranscriptFilter, etc.
Initialize the processor with configuration parameters.
- Args:
**parameters: Processor-specific configuration parameters
- __init__(**parameters)
Initialize the processor with configuration parameters.
- Args:
**parameters: Processor-specific configuration parameters
- display_summary()
Display before/after summary (for debugging/logging).
- abstractmethod process(score_input: Score.Input) Score.Input
Transform a Score.Input.
- Args:
score_input: Input to transform (contains text, metadata, results)
- Returns:
Transformed Score.Input with modified text/metadata