plexus.scores.nodes.test_classifier_confidence_harmonized module

Test the harmonized confidence calculation that uses string-based parsing to find the classification, then finds the corresponding token position, and calculates confidence from that token’s logprobs.

Tests both parse_from_start=True and parse_from_start=False scenarios.

plexus.scores.nodes.test_classifier_confidence_harmonized.create_mock_logprobs_response(tokens_and_alternatives)

Create a mock logprobs response structure.

Args:: tokens_and_alternatives: List of tuples (actual_token, [(alt_token, logprob), …])

async plexus.scores.nodes.test_classifier_confidence_harmonized.test_confidence_multiple_matches_same_class(): Test confidence calculation when multiple tokens map to the same classification.

async plexus.scores.nodes.test_classifier_confidence_harmonized.test_confidence_no_matching_tokens(): Test confidence calculation when no token alternatives match the classification.

async plexus.scores.nodes.test_classifier_confidence_harmonized.test_confidence_parse_from_start_false(): Test confidence calculation with parse_from_start=False (last occurrence).

async plexus.scores.nodes.test_classifier_confidence_harmonized.test_confidence_parse_from_start_true(): Test confidence calculation with parse_from_start=True (first occurrence).

async plexus.scores.nodes.test_classifier_confidence_harmonized.test_confidence_single_token_response(): Test confidence calculation for single-token responses.