plexus.input_sources package

class plexus.input_sources.DeepgramInputSource(pattern: str = None, **options)

Bases: TextFileInputSource

Extracts and formats text from Deepgram JSON transcription files. Supports multiple output formats: paragraphs, utterances, words, raw.

Args:

pattern: Regex pattern to match attachments (used by file-based sources) **options: Additional source-specific options

extract(item, default_text: str) str

Parse Deepgram JSON and format transcript.

Options:

format: “paragraphs” (default), “utterances”, “words”, “raw” include_timestamps: bool (default False) speaker_labels: bool (default False) time_range_start: float (default 0.0) - Start time in seconds time_range_duration: float or None (default None) - Duration in seconds, None = no end limit

Returns:

Formatted transcript text

Raises:

ValueError: If no matching attachment, invalid format, or invalid time range parameters KeyError: If Deepgram JSON structure is invalid Exception: If file download or parsing fails

class plexus.input_sources.InputSource(pattern: str = None, **options)

Bases: ABC

Base class for input sources that extract text from various sources. Input sources run BEFORE processors in the pipeline.

Args:

pattern: Regex pattern to match attachments (used by file-based sources) **options: Additional source-specific options

__init__(pattern: str = None, **options)
Args:

pattern: Regex pattern to match attachments (used by file-based sources) **options: Additional source-specific options

abstractmethod extract(item, default_text: str) str

Extract text from the specified source.

Args:

item: Item object (may have attachedFiles) default_text: Fallback text from item.text

Returns:

Extracted text string

find_matching_attachment(item) str | None

Find first attachment matching the regex pattern.

Args:

item: Item object with attachedFiles list

Returns:

S3 key path of matching attachment, or None

class plexus.input_sources.InputSourceFactory

Bases: object

Factory for creating input source instances from class names. Mirrors ProcessorFactory pattern for consistency.

static create_input_source(source_name: str, **options)

Create an input source instance dynamically.

Args:

source_name: Class name (e.g., “TextFileInputSource”) **options: Configuration options passed to __init__

Returns:

Instantiated InputSource subclass

class plexus.input_sources.TextFileInputSource(pattern: str = None, **options)

Bases: InputSource

Extracts raw text from a file attachment matching a pattern.

Args:

pattern: Regex pattern to match attachments (used by file-based sources) **options: Additional source-specific options

extract(item, default_text: str) str

Find and return text from matching attachment.

Args:

item: Item with attachedFiles default_text: Not used (kept for interface compatibility)

Returns:

Text content from file

Raises:

ValueError: If no matching attachment found Exception: If file download or parsing fails

Submodules