plexus.reports.blocks.vector_topic_memory module
VectorTopicMemory ReportBlock — full orchestration.
Rebuilds topic memory by re-indexing datasets into S3 Vectors. Uses S3 embedding cache, global clustering, and memory weights.
Supports two data sources: - Feedback items: scorecard + days (same as FeedbackAnalysis) — uses transcript text from Items - ScoreResults: scorecard + days with content_source=score_result_no_explanation —
uses ScoreResult.explanation where value=’No’ from normal production predictions
DataSource/DataSet: data.source or data.dataset — uses Parquet from DatasetResolver
- class plexus.reports.blocks.vector_topic_memory.VectorTopicMemory(config: Dict[str, Any], params: Dict[str, Any] | None, api_client: PlexusDashboardClient)
Bases:
BaseReportBlockReportBlock that rebuilds a full topic memory view by re-indexing datasets into an AWS S3 Vectors index.
- Config:
scorecard (str): Scorecard identifier. When provided with days, uses feedback items (same as FeedbackAnalysis). days (int): Number of days of feedback to include. Used with scorecard. data: { source?: str, dataset?: str, content_column?: str, fresh?: bool, content_source?: str }
- content_source:
edit_comment (default): FeedbackItem.editCommentValue for mismatches
transcript: Item transcript text linked from feedback items
score_result_no_explanation: ScoreResult.explanation for normal production predictions with value=’No’
s3_vectors: { bucket_name: str, index_name: str, index_arn?: str, region?: str } embedding: { model_id?: str, preprocessing_version?: str } clustering: { min_topic_size?: int }
coarse_min_topic_fraction?: float # default 0.02 when min_topic_fraction is not set coarse_target_max_topics_per_score?: int # default 12 when target_max_topics_per_score is not set
mode: “full” | “incremental” # incremental = KNN assign only, no re-cluster label:
batch_one_pass?: bool # default true: one LLM call for all selected buckets batch_model?: str # default gpt-4o batch_prompt?: str # optional user prompt override; supports {{topic_bundles_json}} request_timeout_seconds?: int # default 60 batch_request_timeout_seconds?: int # default request_timeout_seconds
- DEFAULT_DESCRIPTION: str | None = 'Persistent vector-based topic memory from S3 Vectors index'
- DEFAULT_NAME: str | None = 'Vector Topic Memory'
- MEDIUM_TERM_DAYS = 30
- SHORT_TERM_DAYS = 14
- async generate() Tuple[Dict[str, Any] | None, str | None]
Orchestrates: resolve dataset -> embed -> index -> cluster -> memory weights.