plexus.analysis.metrics package
Metrics module for Plexus.
This module provides various metrics for measuring agreement and accuracy between predicted and reference values.
- class plexus.analysis.metrics.Accuracy
Bases:
MetricImplementation of accuracy metric for classification tasks.
Accuracy is calculated as the number of correct predictions divided by the total number of predictions, expressed as a value between 0 and 1.
- class plexus.analysis.metrics.GwetAC1
Bases:
MetricImplementation of Gwet’s AC1 statistic for measuring inter-rater agreement.
Gwet’s AC1 is an alternative to Cohen’s Kappa and Fleiss’ Kappa that is more robust to the “Kappa paradox” where high observed agreement can result in low or negative Kappa values when there is high class imbalance.
References: - Gwet, K. L. (2008). Computing inter-rater reliability and its variance in the
presence of high agreement. British Journal of Mathematical and Statistical Psychology, 61(1), 29-48.
- class plexus.analysis.metrics.Metric
Bases:
ABCAbstract base class for implementing evaluation metrics in Plexus.
Metric is the foundation for standardized evaluation metrics. Each implementation represents a specific way to measure agreement or performance, such as:
Agreement coefficients (Gwet’s AC1, Cohen’s Kappa)
Accuracy metrics (raw accuracy, F1 score)
Distance metrics (RMSE, MAE)
The Metric class provides: - Standard input/output interfaces using Pydantic models - Consistent calculation methods - Range information for proper visualization
Common usage patterns: 1. Creating a custom metric:
- class MyMetric(Metric):
- def calculate(self, input_data: Metric.Input) -> Metric.Result:
# Custom metric calculation logic return Metric.Result(
name=”My Custom Metric”, value=calculated_value, range=[0, 1]
)
- Using a metric:
metric = MyMetric() result = metric.calculate(Metric.Input(
reference=[“Yes”, “No”, “Yes”], predictions=[“Yes”, “No”, “No”]
))
- class Input(*, reference: List[Any], predictions: List[Any])
Bases:
BaseModelStandard input structure for all metric calculations in Plexus.
The Input class standardizes how data is passed to metric calculations, typically consisting of two lists: reference (ground truth) and predictions.
- Attributes:
reference: List of reference/gold standard values predictions: List of predicted values to compare against reference
- Common usage:
- input_data = Metric.Input(
reference=[“Yes”, “No”, “Yes”, “Yes”], predictions=[“Yes”, “No”, “No”, “Yes”]
)
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- model_config: ClassVar[ConfigDict] = {'protected_namespaces': ()}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- predictions: List[Any]
- reference: List[Any]
- class Result(*, name: str, value: float, range: List[float], metadata: dict = {})
Bases:
BaseModelStandard output structure for all metric calculations in Plexus.
The Result class provides a consistent way to represent metric outcomes, including the metric name, calculated value, and valid range.
- Attributes:
name: The name of the metric (e.g., “Gwet’s AC1”) value: The calculated metric value range: Valid range for the metric as [min, max] metadata: Optional additional information about the calculation
- Common usage:
- result = Metric.Result(
name=”Accuracy”, value=0.75, range=[0, 1], metadata={“sample_size”: 100}
)
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- metadata: dict
- model_config: ClassVar[ConfigDict] = {'protected_namespaces': ()}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- name: str
- range: List[float]
- value: float
- abstractmethod calculate(input_data: Input) Result
Calculate the metric value based on the provided input data.
This abstract method must be implemented by all concrete metric classes.
- Args:
input_data: Metric.Input object containing reference and prediction data
- Returns:
Metric.Result object with the calculated metric value and metadata
- class plexus.analysis.metrics.Precision(positive_labels=None)
Bases:
MetricImplementation of precision metric for binary classification tasks.
Precision is calculated as the number of true positives divided by the total number of items predicted as positive (true positives + false positives). It represents the ability of a classifier to avoid labeling negative samples as positive.
For binary classification, labels must be strings like ‘yes’/’no’ or ‘true’/’false’. The first label in self.positive_labels is considered the “positive” class.
Initialize the Precision metric with specified positive labels.
- Args:
- positive_labels: List of values to consider as positive class.
If None, defaults to [‘yes’, ‘true’, ‘1’, 1, True]
- __init__(positive_labels=None)
Initialize the Precision metric with specified positive labels.
- Args:
- positive_labels: List of values to consider as positive class.
If None, defaults to [‘yes’, ‘true’, ‘1’, 1, True]
- class plexus.analysis.metrics.Recall(positive_labels=None)
Bases:
MetricImplementation of recall metric for binary classification tasks.
Recall is calculated as the number of true positives divided by the total number of actual positive instances (true positives + false negatives). It represents the ability of a classifier to find all positive samples.
For binary classification, labels must be strings like ‘yes’/’no’ or ‘true’/’false’. The first label in self.positive_labels is considered the “positive” class.
Initialize the Recall metric with specified positive labels.
- Args:
- positive_labels: List of values to consider as positive class.
If None, defaults to [‘yes’, ‘true’, ‘1’, 1, True]
- __init__(positive_labels=None)
Initialize the Recall metric with specified positive labels.
- Args:
- positive_labels: List of values to consider as positive class.
If None, defaults to [‘yes’, ‘true’, ‘1’, 1, True]