plexus.data.FeedbackItems module

FeedbackItems data cache for loading datasets from feedback items.

This data cache loads feedback items for a specific scorecard and score, builds a confusion matrix, and samples items from each cell to create a balanced training dataset.

class plexus.data.FeedbackItems.FeedbackItems(**parameters)

Bases: DataCache

Data cache that loads datasets from feedback items.

This class fetches feedback items for a given scorecard and score, analyzes the confusion matrix, and samples items from each matrix cell to create a balanced dataset for training/evaluation.

Confusion Matrix Sampling:

Items are grouped by (initial_value, final_value) pairs, forming confusion matrix cells: - (No→No): AI predicted No, human kept No (agreement) - (No→Yes): AI predicted No, human changed to Yes (false negative) - (Yes→No): AI predicted Yes, human changed to No (false positive) - (Yes→Yes): AI predicted Yes, human kept Yes (agreement)

Sampling behavior with limit_per_cell: - Samples up to limit_per_cell items from EACH cell independently - This creates balanced training across prediction patterns, not raw frequency - Example: With limit_per_cell=50 and raw data [2,592 No→Yes, 315 No→No, 5 Yes→Yes]:

Result: 50 No→Yes + 50 No→No + 5 Yes→Yes = 105 items (52% Yes, 48% No final values) Instead of raw 89% Yes, 11% No distribution

This balancing helps models learn from both agreements and corrections across different prediction types, rather than being dominated by the most common pattern.

Initialize the DataCache instance with the given parameters.

Parameters

**parametersdict

Arbitrary keyword arguments that are used to initialize the Parameters instance.

Raises

ValidationError

If the provided parameters do not pass validation.

class Parameters(*, class_name: str = 'DataCache', scorecard: str | int, score: str | int, days: int | None = None, limit: int | None = None, limit_per_cell: int | None = None, initial_value: str | None = None, final_value: str | None = None, feedback_id: str | None = None, backfill_cells: bool | None = False, identifier_extractor: str | None = None, column_mappings: Dict[str, str] | None = None, cache_file: str = 'feedback_items_cache.parquet', local_cache_directory: str = './.plexus_training_data_cache/')

Bases: Parameters

Parameters for FeedbackItems data cache.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

backfill_cells: bool | None
cache_file: str
column_mappings: Dict[str, str] | None
days: int | None
classmethod days_must_be_positive(v)
feedback_id: str | None
final_value: str | None
identifier_extractor: str | None
initial_value: str | None
limit: int | None
classmethod limit_must_be_positive(v)
limit_per_cell: int | None
classmethod limit_per_cell_must_be_positive(v)
local_cache_directory: str
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

score: str | int
scorecard: str | int
__init__(**parameters)

Initialize the DataCache instance with the given parameters.

Parameters

**parametersdict

Arbitrary keyword arguments that are used to initialize the Parameters instance.

Raises

ValidationError

If the provided parameters do not pass validation.

load_dataframe(*, data=None, fresh=False, reload=False) DataFrame

Load a dataframe of feedback items sampled from confusion matrix cells.

Args:

data: Not used - parameters come from class initialization fresh: If True, bypass cache and fetch fresh data (generates new parquet) reload: If True, reload existing cache with current values, preserving form IDs

Returns:

DataFrame with sampled feedback items