plexus.cli.evaluation.evaluations module

plexus.cli.evaluation.evaluations.await_run_single_feedback_evaluation(client, account_id: str, scorecard_id: str, scorecard_name: str, score_id: str, score_name: str, days: int, start_date: datetime, end_date: datetime, task_id: str | None = None)

Helper function to run a single feedback evaluation.

This is used by both the ‘feedback’ and ‘feedback-all’ commands.

Args:

client: API client account_id: Account ID scorecard_id: Scorecard ID scorecard_name: Scorecard name (for display) score_id: Score ID score_name: Score name (for display) days: Number of days to look back start_date: Start date for filtering (UTC aware) end_date: End date for filtering (UTC aware) task_id: Optional task ID for progress tracking

Returns:

Evaluation record if successful, None otherwise

plexus.cli.evaluation.evaluations.check_dict_serializability(d, path='')
plexus.cli.evaluation.evaluations.create_client() PlexusDashboardClient

Create a client and log its configuration

plexus.cli.evaluation.evaluations.format_confusion_matrix_summary(final_metrics)

Format confusion matrix and detailed metrics for the evaluation summary.

plexus.cli.evaluation.evaluations.get_amplify_bucket()

Get the S3 bucket name from environment variables or fall back to reading amplify_outputs.json.

plexus.cli.evaluation.evaluations.get_csv_samples(csv_filename)
plexus.cli.evaluation.evaluations.get_data_driven_samples(scorecard_instance, scorecard_name, score_name, score_config, fresh, reload, content_ids_to_sample_set, progress_callback=None, number_of_samples=None, random_seed=None)
plexus.cli.evaluation.evaluations.get_dataset_by_id(client: PlexusDashboardClient, dataset_id: str) dict

Get a specific DataSet by ID

plexus.cli.evaluation.evaluations.get_latest_dataset_for_data_source(client: PlexusDashboardClient, data_source_id: str) dict

Get the most recent DataSet for a DataSource by finding its current version

plexus.cli.evaluation.evaluations.get_latest_score_version(client, score_id: str) str | None

Get the most recent ScoreVersion ID for a given score using the scoreId index sorted by createdAt.

Args:

client: GraphQL API client score_id: The score ID to get the latest version for

Returns:

The latest ScoreVersion ID, or None if no versions found

plexus.cli.evaluation.evaluations.is_json_serializable(obj)
plexus.cli.evaluation.evaluations.load_configuration_from_yaml_file(configuration_file_path)

Load configuration from a YAML file.

plexus.cli.evaluation.evaluations.load_samples_from_cloud_dataset(dataset: dict, score_name: str, score_config: dict, number_of_samples: int | None = None, random_seed: int | None = None, progress_callback=None) list

Load samples from a cloud dataset (Parquet file) and convert to evaluation format

plexus.cli.evaluation.evaluations.load_scorecard_from_api(scorecard_identifier: str, score_names=None, use_cache=False, specific_version=None)

Load a scorecard from the Plexus Dashboard API.

Args:

scorecard_identifier: A string that can identify the scorecard (id, key, name, etc.) score_names: Optional list of specific score names to load use_cache: Whether to prefer local cache files over API (default: False)

When False, will always fetch from API but still write cache files When True, will check local cache first and only fetch missing configs

specific_version: Optional specific score version ID to use instead of champion version

Returns:

Scorecard: An initialized Scorecard instance with required scores loaded

Raises:

ValueError: If the scorecard cannot be found

plexus.cli.evaluation.evaluations.load_scorecard_from_yaml_files(scorecard_identifier: str, score_names=None, specific_version=None)

Load a scorecard from individual YAML configuration files saved by fetch_score_configurations.

Args:

scorecard_identifier: A string that identifies the scorecard (ID, name, key, or external ID) score_names: Optional list of specific score names to load specific_version: Optional specific score version ID (Note: YAML files contain champion versions only)

Returns:

Scorecard: An initialized Scorecard instance with required scores loaded from YAML files

Raises:

ValueError: If the scorecard cannot be constructed from YAML files

plexus.cli.evaluation.evaluations.log_scorecard_configurations(scorecard_instance, context='')

Log the actual configurations being used by the scorecard instance.

plexus.cli.evaluation.evaluations.lookup_data_source(client: PlexusDashboardClient, name: str | None = None, key: str | None = None, id: str | None = None) dict

Look up a DataSource by name, key, or ID

plexus.cli.evaluation.evaluations.resolve_score_external_id_to_uuid(client: PlexusDashboardClient, external_id: str, scorecard_id: str = None) str

Resolve a score external ID to its DynamoDB UUID using GraphQL API.

Args:

client: PlexusDashboardClient instance external_id: The external ID to resolve (e.g., “45925”) scorecard_id: Optional scorecard ID to narrow the search

Returns:

str: DynamoDB UUID for the score, or None if not found

plexus.cli.evaluation.evaluations.score_text_wrapper(scorecard_instance, text, score_name, scorecard_name=None, executor=None)

Wrapper to handle the scoring of text with proper error handling and logging.

This function is called from within an async context (_run_accuracy), so we expect an event loop to be running. We use ThreadPoolExecutor to run the async score_entire_text method in a separate thread to avoid nested loop issues.

plexus.cli.evaluation.evaluations.truncate_dict_strings(d, max_length=100)

Recursively truncate long string values in a dictionary.