plexus.scores.nodes.test_classifier_confidence_harmonized module
Test the harmonized confidence calculation that uses string-based parsing to find the classification, then finds the corresponding token position, and calculates confidence from that token’s logprobs.
Tests both parse_from_start=True and parse_from_start=False scenarios.
- plexus.scores.nodes.test_classifier_confidence_harmonized.create_mock_logprobs_response(tokens_and_alternatives)
Create a mock logprobs response structure.
- Args:
tokens_and_alternatives: List of tuples (actual_token, [(alt_token, logprob), …])
- async plexus.scores.nodes.test_classifier_confidence_harmonized.test_confidence_multiple_matches_same_class()
Test confidence calculation when multiple tokens map to the same classification.
- async plexus.scores.nodes.test_classifier_confidence_harmonized.test_confidence_no_matching_tokens()
Test confidence calculation when no token alternatives match the classification.
- async plexus.scores.nodes.test_classifier_confidence_harmonized.test_confidence_parse_from_start_false()
Test confidence calculation with parse_from_start=False (last occurrence).
- async plexus.scores.nodes.test_classifier_confidence_harmonized.test_confidence_parse_from_start_true()
Test confidence calculation with parse_from_start=True (first occurrence).
- async plexus.scores.nodes.test_classifier_confidence_harmonized.test_confidence_single_token_response()
Test confidence calculation for single-token responses.