plexus.scores.nodes.Extractor module
- class plexus.scores.nodes.Extractor.Extractor(**parameters)
Bases:
BaseNode,LangChainUserA node that extracts a specific quote from the input text using a hybrid approach:
LLM Extraction: It first uses a Large Language Model (LLM) guided by the provided prompt (system_message and user_message) to identify a potential quote within the input text.
Fuzzy Match Verification (Optional): Unless trust_model_output is true, it then attempts to verify that the LLM’s extracted quote actually exists in the original input text using fuzzy string matching (rapidfuzz). This step helps ground the LLM’s output.
Fallback: If verification fails (the match score is below fuzzy_match_score_cutoff or no match is found), it logs a warning and falls back to using the raw (cleaned) output from the LLM. It does not signal an error or return None in this case, prioritizing returning some extraction.
The final extracted text is stored in the extracted_text field of the graph state.
- class ExtractionOutputParser(*args, name: str | None = None, FUZZY_MATCH_SCORE_CUTOFF: int, text: str, use_exact_matching: bool = False, trust_model_output: bool = False)
Bases:
BaseOutputParser[dict]- FUZZY_MATCH_SCORE_CUTOFF: int
- __init__(*args, **kwargs)
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'ignore', 'protected_namespaces': (), 'underscore_attrs_are_private': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_post_init(context: Any, /) None
This function is meant to behave like a BaseModel method to initialise private attributes.
It takes context as an argument since that’s what pydantic-core passes when calling it.
- Args:
self: The BaseModel instance. context: The context.
- parse(output: str) Dict[str, Any]
Parse a single string model output into some structure.
- Args:
text: String output of a language model.
- Returns:
Structured output.
- text: str
- tokenize(text: str) list[str]
- trust_model_output: bool
- use_exact_matching: bool
- class GraphState(*, text: str, metadata: dict | None = None, results: dict | None = None, messages: ~typing.List[~typing.Dict[str, ~typing.Any]] | None = None, is_not_empty: bool | None = None, value: str | None = None, explanation: str | None = None, reasoning: str | None = None, chat_history: ~typing.List[~typing.Any] = <factory>, completion: str | None = None, classification: str | None = None, confidence: float | None = None, retry_count: int | None = 0, at_llm_breakpoint: bool | None = False, good_call: str | None = None, good_call_explanation: str | None = None, non_qualifying_reason: str | None = None, non_qualifying_explanation: str | None = None, extracted_text: str | None = None, **extra_data: ~typing.Any)
Bases:
GraphStateCreate a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- extracted_text: str | None
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'allow', 'validate_default': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- reasoning: str | None
- class Parameters(*, model_provider: Literal['ChatOpenAI', 'AzureChatOpenAI', 'BedrockChat', 'ChatVertexAI', 'ChatOllama'] = 'AzureChatOpenAI', model_name: str | None = None, base_model_name: str | None = None, reasoning_effort: str | None = 'low', verbosity: str | None = 'medium', model_region: str | None = None, temperature: float | None = 0, top_p: float | None = 0.03, max_tokens: int | None = 500, logprobs: bool | None = False, top_logprobs: int | None = None, input: dict | None = None, output: dict | None = None, system_message: str | None = None, user_message: str | None = None, example_refinement_message: str | None = None, single_line_messages: bool = False, name: str | None = None, fuzzy_match_score_cutoff: int = 50, use_exact_matching: bool = False, trust_model_output: bool = False)
Bases:
ParametersCreate a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- fuzzy_match_score_cutoff: int
- model_config: ClassVar[ConfigDict] = {'protected_namespaces': ()}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- trust_model_output: bool
- use_exact_matching: bool
- __init__(**parameters)
- add_core_nodes(workflow: StateGraph) StateGraph
Build and return a core LangGraph workflow. The node name is available as self.node_name when needed.
- execute(*args, **kwargs)
- get_extractor_node() LambdaType