plexus.cli.evaluation.evaluations module
- plexus.cli.evaluation.evaluations.assert_dataset_materialized_for_accuracy(dataset: dict) Dict[str, Any]
Fail fast when a dataset-backed accuracy run points to a non-materialized dataset.
- plexus.cli.evaluation.evaluations.build_dataset_materialization_failure_message(*, dataset_id: str, reason: str, dataset_file: str | None, next_step_hint: str) str
- plexus.cli.evaluation.evaluations.check_dict_serializability(d, path='')
- plexus.cli.evaluation.evaluations.create_client() PlexusDashboardClient
Create a client and log its configuration
- plexus.cli.evaluation.evaluations.format_confusion_matrix_summary(final_metrics)
Format confusion matrix and detailed metrics for the evaluation summary.
- plexus.cli.evaluation.evaluations.get_amplify_bucket()
Get the S3 bucket name from environment variables or fall back to reading amplify_outputs.json.
- plexus.cli.evaluation.evaluations.get_csv_samples(csv_filename)
- plexus.cli.evaluation.evaluations.get_data_driven_samples(scorecard_instance, scorecard_name, score_name, score_config, fresh, reload, content_ids_to_sample_set, progress_callback=None, number_of_samples=None, random_seed=None)
- plexus.cli.evaluation.evaluations.get_dataset_by_id(client: PlexusDashboardClient, dataset_id: str) dict
Get a specific DataSet by ID
- plexus.cli.evaluation.evaluations.get_latest_accuracy_evaluation_for_score_since(client: PlexusDashboardClient, score_id: str, created_after_iso: str) dict | None
- plexus.cli.evaluation.evaluations.get_latest_associated_dataset_for_score(client: PlexusDashboardClient, score_id: str) dict
- plexus.cli.evaluation.evaluations.get_latest_dataset_for_data_source(client: PlexusDashboardClient, data_source_id: str) dict
Get the most recent DataSet for a DataSource by finding its current version
- plexus.cli.evaluation.evaluations.get_latest_score_version(client, score_id: str) str | None
Get the most recent ScoreVersion ID for a given score using the scoreId index sorted by createdAt.
- Args:
client: GraphQL API client score_id: The score ID to get the latest version for
- Returns:
The latest ScoreVersion ID, or None if no versions found
- plexus.cli.evaluation.evaluations.is_json_serializable(obj)
- plexus.cli.evaluation.evaluations.list_associated_datasets_for_score(client: PlexusDashboardClient, score_id: str, limit: int = 200) list[dict]
List datasets associated to a score ordered newest-first by createdAt then id.
- plexus.cli.evaluation.evaluations.load_configuration_from_yaml_file(configuration_file_path)
Load configuration from a YAML file.
- plexus.cli.evaluation.evaluations.load_samples_from_cloud_dataset(dataset: dict, score_name: str, score_config: dict, number_of_samples: int | None = None, random_seed: int | None = None, progress_callback=None) list
Load samples from a cloud dataset (Parquet file) and convert to evaluation format
- plexus.cli.evaluation.evaluations.load_scorecard_from_api(scorecard_identifier: str, score_names=None, use_cache=False, specific_version=None)
Load a scorecard from the Plexus Dashboard API.
- Args:
scorecard_identifier: A string that can identify the scorecard (id, key, name, etc.) score_names: Optional list of specific score names to load use_cache: Whether to prefer local cache files over API (default: False)
When False, will always fetch from API but still write cache files When True, will check local cache first and only fetch missing configs
specific_version: Optional specific score version ID to use instead of champion version
- Returns:
Scorecard: An initialized Scorecard instance with required scores loaded
- Raises:
ValueError: If the scorecard cannot be found
- plexus.cli.evaluation.evaluations.load_scorecard_from_yaml_files(scorecard_identifier: str, score_names=None, specific_version=None)
Load a scorecard from individual YAML configuration files saved by fetch_score_configurations.
- Args:
scorecard_identifier: A string that identifies the scorecard (ID, name, key, or external ID) score_names: Optional list of specific score names to load specific_version: Optional specific score version ID (Note: YAML files contain champion versions only)
- Returns:
Scorecard: An initialized Scorecard instance with required scores loaded from YAML files
- Raises:
ValueError: If the scorecard cannot be constructed from YAML files
- plexus.cli.evaluation.evaluations.log_scorecard_configurations(scorecard_instance, context='')
Log the actual configurations being used by the scorecard instance.
- plexus.cli.evaluation.evaluations.lookup_data_source(client: PlexusDashboardClient, name: str | None = None, key: str | None = None, id: str | None = None) dict
Look up a DataSource by name, key, or ID
- plexus.cli.evaluation.evaluations.resolve_cloud_dataset_sample_limit(*, number_of_samples: int | None, number_of_samples_explicit: bool) int | None
Determine dataset-backed sample cap.
For cloud/associated datasets, default CLI sample size should not silently cap the dataset. Only apply a cap when the operator explicitly sets –number-of-samples.
- plexus.cli.evaluation.evaluations.resolve_primary_score_id_for_accuracy(client: PlexusDashboardClient, scorecard_identifier: str, score_identifier: str, use_yaml: bool, specific_version: str | None) str
- plexus.cli.evaluation.evaluations.resolve_score_external_id_to_uuid(client: PlexusDashboardClient, external_id: str, scorecard_id: str = None) str
Resolve a score external ID to its DynamoDB UUID using GraphQL API.
- Args:
client: PlexusDashboardClient instance external_id: The external ID to resolve (e.g., “45925”) scorecard_id: Optional scorecard ID to narrow the search
- Returns:
str: DynamoDB UUID for the score, or None if not found
- plexus.cli.evaluation.evaluations.score_text_wrapper(scorecard_instance, text, score_name, scorecard_name=None, executor=None)
Wrapper to handle the scoring of text with proper error handling and logging.
This function is called from within an async context (_run_accuracy), so we expect an event loop to be running. We use ThreadPoolExecutor to run the async score_entire_text method in a separate thread to avoid nested loop issues.
- plexus.cli.evaluation.evaluations.truncate_dict_strings(d, max_length=100)
Recursively truncate long string values in a dictionary.
- plexus.cli.evaluation.evaluations.validate_dataset_materialization(dataset: dict) Dict[str, Any]
Validate dataset-backed accuracy readiness from canonical DataSet.file.