plexus.input_sources package
- class plexus.input_sources.DeepgramInputSource(pattern: str = None, **options)
Bases:
TextFileInputSourceExtracts and formats text from Deepgram JSON transcription files. Supports multiple output formats: paragraphs, utterances, words, raw.
- Args:
pattern: Regex pattern to match attachments (used by file-based sources) **options: Additional source-specific options
- extract(item, default_text: str) str
Parse Deepgram JSON and format transcript.
- Options:
format: “paragraphs” (default), “utterances”, “words”, “raw” include_timestamps: bool (default False) speaker_labels: bool (default False) time_range_start: float (default 0.0) - Start time in seconds time_range_duration: float or None (default None) - Duration in seconds, None = no end limit
- Returns:
Formatted transcript text
- Raises:
ValueError: If no matching attachment, invalid format, or invalid time range parameters KeyError: If Deepgram JSON structure is invalid Exception: If file download or parsing fails
- class plexus.input_sources.InputSource(pattern: str = None, **options)
Bases:
ABCBase class for input sources that extract text from various sources. Input sources run BEFORE processors in the pipeline.
- Args:
pattern: Regex pattern to match attachments (used by file-based sources) **options: Additional source-specific options
- __init__(pattern: str = None, **options)
- Args:
pattern: Regex pattern to match attachments (used by file-based sources) **options: Additional source-specific options
- abstractmethod extract(item, default_text: str) str
Extract text from the specified source.
- Args:
item: Item object (may have attachedFiles) default_text: Fallback text from item.text
- Returns:
Extracted text string
- find_matching_attachment(item) str | None
Find first attachment matching the regex pattern.
- Args:
item: Item object with attachedFiles list
- Returns:
S3 key path of matching attachment, or None
- class plexus.input_sources.InputSourceFactory
Bases:
objectFactory for creating input source instances from class names. Mirrors ProcessorFactory pattern for consistency.
- class plexus.input_sources.TextFileInputSource(pattern: str = None, **options)
Bases:
InputSourceExtracts raw text from a file attachment matching a pattern.
- Args:
pattern: Regex pattern to match attachments (used by file-based sources) **options: Additional source-specific options
- extract(item, default_text: str) str
Find and return text from matching attachment.
- Args:
item: Item with attachedFiles default_text: Not used (kept for interface compatibility)
- Returns:
Text content from file
- Raises:
ValueError: If no matching attachment found Exception: If file download or parsing fails