plexus.data.FeedbackItems module
FeedbackItems data cache for loading datasets from feedback items.
This data cache loads feedback items for a specific scorecard and score, builds a confusion matrix, and samples items from each cell to create a balanced training dataset.
- class plexus.data.FeedbackItems.FeedbackItems(**parameters)
Bases:
DataCacheData cache that loads datasets from feedback items.
This class fetches feedback items for a given scorecard and score, analyzes the confusion matrix, and samples items from each matrix cell to create a balanced dataset for training/evaluation.
Confusion Matrix Sampling:
Items are grouped by (initial_value, final_value) pairs, forming confusion matrix cells: - (No→No): AI predicted No, human kept No (agreement) - (No→Yes): AI predicted No, human changed to Yes (false negative) - (Yes→No): AI predicted Yes, human changed to No (false positive) - (Yes→Yes): AI predicted Yes, human kept Yes (agreement)
Sampling behavior with limit_per_cell: - Samples up to limit_per_cell items from EACH cell independently - This creates balanced training across prediction patterns, not raw frequency - Example: With limit_per_cell=50 and raw data [2,592 No→Yes, 315 No→No, 5 Yes→Yes]:
Result: 50 No→Yes + 50 No→No + 5 Yes→Yes = 105 items (52% Yes, 48% No final values) Instead of raw 89% Yes, 11% No distribution
This balancing helps models learn from both agreements and corrections across different prediction types, rather than being dominated by the most common pattern.
Initialize the DataCache instance with the given parameters.
Parameters
- **parametersdict
Arbitrary keyword arguments that are used to initialize the Parameters instance.
Raises
- ValidationError
If the provided parameters do not pass validation.
- class Parameters(*, class_name: str = 'DataCache', scorecard: str | int, score: str | int, days: int | None = None, limit: int | None = None, limit_per_cell: int | None = None, initial_value: str | None = None, final_value: str | None = None, feedback_id: str | None = None, backfill_cells: bool | None = False, identifier_extractor: str | None = None, column_mappings: Dict[str, str] | None = None, cache_file: str = 'feedback_items_cache.parquet', local_cache_directory: str = './.plexus_training_data_cache/')
Bases:
ParametersParameters for FeedbackItems data cache.
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- backfill_cells: bool | None
- cache_file: str
- column_mappings: Dict[str, str] | None
- days: int | None
- classmethod days_must_be_positive(v)
- feedback_id: str | None
- final_value: str | None
- identifier_extractor: str | None
- initial_value: str | None
- limit: int | None
- classmethod limit_must_be_positive(v)
- limit_per_cell: int | None
- classmethod limit_per_cell_must_be_positive(v)
- local_cache_directory: str
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- score: str | int
- scorecard: str | int
- __init__(**parameters)
Initialize the DataCache instance with the given parameters.
Parameters
- **parametersdict
Arbitrary keyword arguments that are used to initialize the Parameters instance.
Raises
- ValidationError
If the provided parameters do not pass validation.
- load_dataframe(*, data=None, fresh=False, reload=False) DataFrame
Load a dataframe of feedback items sampled from confusion matrix cells.
- Args:
data: Not used - parameters come from class initialization fresh: If True, bypass cache and fetch fresh data (generates new parquet) reload: If True, reload existing cache with current values, preserving form IDs
- Returns:
DataFrame with sampled feedback items