plexus.scores.nodes.Extractor module

class plexus.scores.nodes.Extractor.Extractor(**parameters)

Bases: BaseNode, LangChainUser

A node that extracts a specific quote from the input text using a hybrid approach:

  1. LLM Extraction: It first uses a Large Language Model (LLM) guided by the provided prompt (system_message and user_message) to identify a potential quote within the input text.

  2. Fuzzy Match Verification (Optional): Unless trust_model_output is true, it then attempts to verify that the LLM’s extracted quote actually exists in the original input text using fuzzy string matching (rapidfuzz). This step helps ground the LLM’s output.

  3. Fallback: If verification fails (the match score is below fuzzy_match_score_cutoff or no match is found), it logs a warning and falls back to using the raw (cleaned) output from the LLM. It does not signal an error or return None in this case, prioritizing returning some extraction.

The final extracted text is stored in the extracted_text field of the graph state.

class ExtractionOutputParser(*args, name: str | None = None, FUZZY_MATCH_SCORE_CUTOFF: int, text: str, use_exact_matching: bool = False, trust_model_output: bool = False)

Bases: BaseOutputParser[dict]

class Config

Bases: object

arbitrary_types_allowed = True
underscore_attrs_are_private = True
FUZZY_MATCH_SCORE_CUTOFF: int
__init__(*args, **kwargs)
model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'ignore', 'protected_namespaces': (), 'underscore_attrs_are_private': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(context: Any, /) None

This function is meant to behave like a BaseModel method to initialise private attributes.

It takes context as an argument since that’s what pydantic-core passes when calling it.

Args:

self: The BaseModel instance. context: The context.

parse(output: str) Dict[str, Any]

Parse a single string model output into some structure.

Args:

text: String output of a language model.

Returns:

Structured output.

text: str
tokenize(text: str) list[str]
trust_model_output: bool
use_exact_matching: bool
class GraphState(*, text: str, metadata: dict | None = None, results: dict | None = None, messages: ~typing.List[~typing.Dict[str, ~typing.Any]] | None = None, is_not_empty: bool | None = None, value: str | None = None, explanation: str | None = None, reasoning: str | None = None, chat_history: ~typing.List[~typing.Any] = <factory>, completion: str | None = None, classification: str | None = None, confidence: float | None = None, retry_count: int | None = 0, at_llm_breakpoint: bool | None = False, good_call: str | None = None, good_call_explanation: str | None = None, non_qualifying_reason: str | None = None, non_qualifying_explanation: str | None = None, extracted_text: str | None = None, **extra_data: ~typing.Any)

Bases: GraphState

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

extracted_text: str | None
model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'allow', 'validate_default': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

reasoning: str | None
class Parameters(*, model_provider: Literal['ChatOpenAI', 'AzureChatOpenAI', 'BedrockChat', 'ChatVertexAI', 'ChatOllama'] = 'AzureChatOpenAI', model_name: str | None = None, base_model_name: str | None = None, reasoning_effort: str | None = 'low', verbosity: str | None = 'medium', model_region: str | None = None, temperature: float | None = 0, top_p: float | None = 0.03, max_tokens: int | None = 500, logprobs: bool | None = False, top_logprobs: int | None = None, input: dict | None = None, output: dict | None = None, system_message: str | None = None, user_message: str | None = None, example_refinement_message: str | None = None, single_line_messages: bool = False, name: str | None = None, fuzzy_match_score_cutoff: int = 50, use_exact_matching: bool = False, trust_model_output: bool = False)

Bases: Parameters

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

fuzzy_match_score_cutoff: int
model_config: ClassVar[ConfigDict] = {'protected_namespaces': ()}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

trust_model_output: bool
use_exact_matching: bool
__init__(**parameters)
add_core_nodes(workflow: StateGraph) StateGraph

Build and return a core LangGraph workflow. The node name is available as self.node_name when needed.

execute(*args, **kwargs)
get_extractor_node() LambdaType