plexus.analysis.metrics package

Metrics module for Plexus.

This module provides various metrics for measuring agreement and accuracy between predicted and reference values.

class plexus.analysis.metrics.Accuracy

Bases: Metric

Implementation of accuracy metric for classification tasks.

Accuracy is calculated as the number of correct predictions divided by the total number of predictions, expressed as a value between 0 and 1.

calculate(input_data: Input) → Result

Calculate accuracy between prediction and reference data.

Args:: input_data: Metric.Input containing reference and prediction lists
Returns:: Metric.Result with the accuracy value and metadata

class plexus.analysis.metrics.GwetAC1

Bases: Metric

Implementation of Gwet’s AC1 statistic for measuring inter-rater agreement.

Gwet’s AC1 is an alternative to Cohen’s Kappa and Fleiss’ Kappa that is more robust to the “Kappa paradox” where high observed agreement can result in low or negative Kappa values when there is high class imbalance.

References: - Gwet, K. L. (2008). Computing inter-rater reliability and its variance in the

presence of high agreement. British Journal of Mathematical and Statistical Psychology, 61(1), 29-48.

calculate(input_data: Input) → Result

Calculate Gwet’s AC1 agreement coefficient.

Args:: input_data: Metric.Input containing reference and prediction lists
Returns:: Metric.Result with the Gwet’s AC1 value and metadata

class plexus.analysis.metrics.Metric

Bases: ABC

Abstract base class for implementing evaluation metrics in Plexus.

Metric is the foundation for standardized evaluation metrics. Each implementation represents a specific way to measure agreement or performance, such as:

Agreement coefficients (Gwet’s AC1, Cohen’s Kappa)
Accuracy metrics (raw accuracy, F1 score)
Distance metrics (RMSE, MAE)

The Metric class provides: - Standard input/output interfaces using Pydantic models - Consistent calculation methods - Range information for proper visualization

Common usage patterns: 1. Creating a custom metric:

class MyMetric(Metric):

def calculate(self, input_data: Metric.Input) -> Metric.Result:
# Custom metric calculation logic return Metric.Result(

name=”My Custom Metric”, value=calculated_value, range=[0, 1]

)

Using a metric:
metric = MyMetric() result = metric.calculate(Metric.Input(

reference=[“Yes”, “No”, “Yes”], predictions=[“Yes”, “No”, “No”]

))

class Input(*, reference: List[Any], predictions: List[Any])

Bases: BaseModel

Standard input structure for all metric calculations in Plexus.

The Input class standardizes how data is passed to metric calculations, typically consisting of two lists: reference (ground truth) and predictions.

Attributes:

reference: List of reference/gold standard values predictions: List of predicted values to compare against reference

Common usage:

input_data = Metric.Input(: reference=[“Yes”, “No”, “Yes”, “Yes”], predictions=[“Yes”, “No”, “No”, “Yes”]

)

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

model_config = {'protected_namespaces': ()}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

predictions: List[Any]

reference: List[Any]

class Result(*, name: str, value: float, range: List[float], metadata: dict = {})

Bases: BaseModel

Standard output structure for all metric calculations in Plexus.

The Result class provides a consistent way to represent metric outcomes, including the metric name, calculated value, and valid range.

Attributes:

name: The name of the metric (e.g., “Gwet’s AC1”) value: The calculated metric value range: Valid range for the metric as [min, max] metadata: Optional additional information about the calculation

Common usage:

result = Metric.Result(: name=”Accuracy”, value=0.75, range=[0, 1], metadata={“sample_size”: 100}

)

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

metadata: dict

model_config = {'protected_namespaces': ()}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

name: str

range: List[float]

value: float

abstractmethod calculate(input_data: Input) → Result

Calculate the metric value based on the provided input data.

This abstract method must be implemented by all concrete metric classes.

Args:: input_data: Metric.Input object containing reference and prediction data
Returns:: Metric.Result object with the calculated metric value and metadata

class plexus.analysis.metrics.Precision(positive_labels=None)

Bases: Metric

Implementation of precision metric for binary classification tasks.

Precision is calculated as the number of true positives divided by the total number of items predicted as positive (true positives + false positives). It represents the ability of a classifier to avoid labeling negative samples as positive.

For binary classification, labels must be strings like ‘yes’/’no’ or ‘true’/’false’. The first label in self.positive_labels is considered the “positive” class.

Initialize the Precision metric with specified positive labels.

Args:

positive_labels: List of values to consider as positive class.: If None, defaults to [‘yes’, ‘true’, ‘1’, 1, True]

__init__(positive_labels=None)

Initialize the Precision metric with specified positive labels.

Args:

positive_labels: List of values to consider as positive class.: If None, defaults to [‘yes’, ‘true’, ‘1’, 1, True]

calculate(input_data: Input) → Result

Calculate precision between prediction and reference data.

Args:: input_data: Metric.Input containing reference and prediction lists
Returns:: Metric.Result with the precision value and metadata

class plexus.analysis.metrics.Recall(positive_labels=None)

Bases: Metric

Implementation of recall metric for binary classification tasks.

Recall is calculated as the number of true positives divided by the total number of actual positive instances (true positives + false negatives). It represents the ability of a classifier to find all positive samples.

For binary classification, labels must be strings like ‘yes’/’no’ or ‘true’/’false’. The first label in self.positive_labels is considered the “positive” class.

Initialize the Recall metric with specified positive labels.

Args:

positive_labels: List of values to consider as positive class.: If None, defaults to [‘yes’, ‘true’, ‘1’, 1, True]

__init__(positive_labels=None)

Initialize the Recall metric with specified positive labels.

Args:

positive_labels: List of values to consider as positive class.: If None, defaults to [‘yes’, ‘true’, ‘1’, 1, True]

calculate(input_data: Input) → Result

Calculate recall between prediction and reference data.

Args:: input_data: Metric.Input containing reference and prediction lists
Returns:: Metric.Result with the recall value and metadata

plexus.analysis.metrics package

Submodules