plexus.cli.evaluation.evaluations module
- plexus.cli.evaluation.evaluations.await_run_single_feedback_evaluation(client, account_id: str, scorecard_id: str, scorecard_name: str, score_id: str, score_name: str, days: int, start_date: datetime, end_date: datetime, task_id: str | None = None)
Helper function to run a single feedback evaluation.
This is used by both the ‘feedback’ and ‘feedback-all’ commands.
- Args:
client: API client account_id: Account ID scorecard_id: Scorecard ID scorecard_name: Scorecard name (for display) score_id: Score ID score_name: Score name (for display) days: Number of days to look back start_date: Start date for filtering (UTC aware) end_date: End date for filtering (UTC aware) task_id: Optional task ID for progress tracking
- Returns:
Evaluation record if successful, None otherwise
- plexus.cli.evaluation.evaluations.check_dict_serializability(d, path='')
- plexus.cli.evaluation.evaluations.create_client() PlexusDashboardClient
Create a client and log its configuration
- plexus.cli.evaluation.evaluations.format_confusion_matrix_summary(final_metrics)
Format confusion matrix and detailed metrics for the evaluation summary.
- plexus.cli.evaluation.evaluations.get_amplify_bucket()
Get the S3 bucket name from environment variables or fall back to reading amplify_outputs.json.
- plexus.cli.evaluation.evaluations.get_csv_samples(csv_filename)
- plexus.cli.evaluation.evaluations.get_data_driven_samples(scorecard_instance, scorecard_name, score_name, score_config, fresh, reload, content_ids_to_sample_set, progress_callback=None, number_of_samples=None, random_seed=None)
- plexus.cli.evaluation.evaluations.get_dataset_by_id(client: PlexusDashboardClient, dataset_id: str) dict
Get a specific DataSet by ID
- plexus.cli.evaluation.evaluations.get_latest_dataset_for_data_source(client: PlexusDashboardClient, data_source_id: str) dict
Get the most recent DataSet for a DataSource by finding its current version
- plexus.cli.evaluation.evaluations.get_latest_score_version(client, score_id: str) str | None
Get the most recent ScoreVersion ID for a given score using the scoreId index sorted by createdAt.
- Args:
client: GraphQL API client score_id: The score ID to get the latest version for
- Returns:
The latest ScoreVersion ID, or None if no versions found
- plexus.cli.evaluation.evaluations.is_json_serializable(obj)
- plexus.cli.evaluation.evaluations.load_configuration_from_yaml_file(configuration_file_path)
Load configuration from a YAML file.
- plexus.cli.evaluation.evaluations.load_samples_from_cloud_dataset(dataset: dict, score_name: str, score_config: dict, number_of_samples: int | None = None, random_seed: int | None = None, progress_callback=None) list
Load samples from a cloud dataset (Parquet file) and convert to evaluation format
- plexus.cli.evaluation.evaluations.load_scorecard_from_api(scorecard_identifier: str, score_names=None, use_cache=False, specific_version=None)
Load a scorecard from the Plexus Dashboard API.
- Args:
scorecard_identifier: A string that can identify the scorecard (id, key, name, etc.) score_names: Optional list of specific score names to load use_cache: Whether to prefer local cache files over API (default: False)
When False, will always fetch from API but still write cache files When True, will check local cache first and only fetch missing configs
specific_version: Optional specific score version ID to use instead of champion version
- Returns:
Scorecard: An initialized Scorecard instance with required scores loaded
- Raises:
ValueError: If the scorecard cannot be found
- plexus.cli.evaluation.evaluations.load_scorecard_from_yaml_files(scorecard_identifier: str, score_names=None, specific_version=None)
Load a scorecard from individual YAML configuration files saved by fetch_score_configurations.
- Args:
scorecard_identifier: A string that identifies the scorecard (ID, name, key, or external ID) score_names: Optional list of specific score names to load specific_version: Optional specific score version ID (Note: YAML files contain champion versions only)
- Returns:
Scorecard: An initialized Scorecard instance with required scores loaded from YAML files
- Raises:
ValueError: If the scorecard cannot be constructed from YAML files
- plexus.cli.evaluation.evaluations.log_scorecard_configurations(scorecard_instance, context='')
Log the actual configurations being used by the scorecard instance.
- plexus.cli.evaluation.evaluations.lookup_data_source(client: PlexusDashboardClient, name: str | None = None, key: str | None = None, id: str | None = None) dict
Look up a DataSource by name, key, or ID
- plexus.cli.evaluation.evaluations.resolve_score_external_id_to_uuid(client: PlexusDashboardClient, external_id: str, scorecard_id: str = None) str
Resolve a score external ID to its DynamoDB UUID using GraphQL API.
- Args:
client: PlexusDashboardClient instance external_id: The external ID to resolve (e.g., “45925”) scorecard_id: Optional scorecard ID to narrow the search
- Returns:
str: DynamoDB UUID for the score, or None if not found
- plexus.cli.evaluation.evaluations.score_text_wrapper(scorecard_instance, text, score_name, scorecard_name=None, executor=None)
Wrapper to handle the scoring of text with proper error handling and logging.
This function is called from within an async context (_run_accuracy), so we expect an event loop to be running. We use ThreadPoolExecutor to run the async score_entire_text method in a separate thread to avoid nested loop issues.
- plexus.cli.evaluation.evaluations.truncate_dict_strings(d, max_length=100)
Recursively truncate long string values in a dictionary.