plexus.reports.blocks.vector_topic_memory module

VectorTopicMemory ReportBlock — full orchestration.

Rebuilds topic memory by re-indexing datasets into S3 Vectors. Uses S3 embedding cache, global clustering, and memory weights.

Supports two data sources: - Feedback items: scorecard + days (same as FeedbackAnalysis) — uses transcript text from Items - ScoreResults: scorecard + days with content_source=score_result_no_explanation —

uses ScoreResult.explanation where value=’No’ from normal production predictions

  • DataSource/DataSet: data.source or data.dataset — uses Parquet from DatasetResolver

class plexus.reports.blocks.vector_topic_memory.VectorTopicMemory(config: Dict[str, Any], params: Dict[str, Any] | None, api_client: PlexusDashboardClient)

Bases: BaseReportBlock

ReportBlock that rebuilds a full topic memory view by re-indexing datasets into an AWS S3 Vectors index.

Config:

scorecard (str): Scorecard identifier. When provided with days, uses feedback items (same as FeedbackAnalysis). days (int): Number of days of feedback to include. Used with scorecard. data: { source?: str, dataset?: str, content_column?: str, fresh?: bool, content_source?: str }

content_source:
  • edit_comment (default): FeedbackItem.editCommentValue for mismatches

  • transcript: Item transcript text linked from feedback items

  • score_result_no_explanation: ScoreResult.explanation for normal production predictions with value=’No’

s3_vectors: { bucket_name: str, index_name: str, index_arn?: str, region?: str } embedding: { model_id?: str, preprocessing_version?: str } clustering: { min_topic_size?: int }

coarse_min_topic_fraction?: float # default 0.02 when min_topic_fraction is not set coarse_target_max_topics_per_score?: int # default 12 when target_max_topics_per_score is not set

mode: “full” | “incremental” # incremental = KNN assign only, no re-cluster label:

batch_one_pass?: bool # default true: one LLM call for all selected buckets batch_model?: str # default gpt-4o batch_prompt?: str # optional user prompt override; supports {{topic_bundles_json}} request_timeout_seconds?: int # default 60 batch_request_timeout_seconds?: int # default request_timeout_seconds

DEFAULT_DESCRIPTION: str | None = 'Persistent vector-based topic memory from S3 Vectors index'
DEFAULT_NAME: str | None = 'Vector Topic Memory'
MEDIUM_TERM_DAYS = 30
SHORT_TERM_DAYS = 14
async generate() Tuple[Dict[str, Any] | None, str | None]

Orchestrates: resolve dataset -> embed -> index -> cluster -> memory weights.