# Speech-to-Text (STT) Audio transcription extractors for converting spoken content into text. ```{toctree} :maxdepth: 1 :caption: Speech-to-Text Extractors openai deepgram aldea deepgram-transform ``` ## Overview Speech-to-text extractors transcribe audio from video and audio files. They are ideal for: - Podcast transcription - Lecture and presentation recordings - Interview transcripts - Video content with narration - Audio messages and recordings The raw audio bytes remain unchanged in the corpus; only transcribed text is stored in extraction results. ## Available Extractors ### [stt-openai](openai.md) OpenAI Whisper API for audio transcription: - **Model**: Whisper-1 (OpenAI hosted) - **Accuracy**: Excellent general-purpose accuracy - **Languages**: 50+ languages supported - **Features**: Automatic language detection, translation - **Formats**: MP3, MP4, MPEG, MPGA, M4A, WAV, WEBM **Installation**: `pip install biblicus[openai]` **Best for**: General transcription, multi-language content, OpenAI ecosystem integration ### [stt-deepgram](deepgram.md) Deepgram Nova-3 for fast, accurate transcription: - **Model**: Nova-3 (default), Nova-2, other Deepgram models - **Accuracy**: Lower word error rate than Whisper - **Features**: Smart formatting, speaker diarization, filler word filtering - **Languages**: 30+ languages supported - **Formats**: Most audio formats **Installation**: `pip install biblicus[deepgram]` **Best for**: High-accuracy transcription, speaker diarization, professional content ### [stt-aldea](aldea.md) Aldea Speech-to-Text API for audio transcription: - **API**: REST pre-recorded audio (`POST /v1/listen`) - **Response**: Deepgram-compatible (channels, alternatives, transcript) - **Features**: Language hint, speaker diarization, word timestamps - **Formats**: MP3, AAC, FLAC, WAV, OGG, WebM, Opus, M4A **Installation**: `pip install biblicus[aldea]` **Best for**: Aldea-hosted transcription, Deepgram-compatible workflows ### [deepgram-transform](deepgram-transform.md) Render Deepgram structured metadata into text: - **Source**: transcript, utterances, or words - **Filters**: channel and speaker selection - **Labels**: optional channel/speaker prefixes **Best for**: Diarized filtering, channel selection, and structured transcript rendering ## Choosing an Extractor | Use Case | Recommended | Notes | |----------|-------------|-------| | General transcription | [stt-deepgram](deepgram.md) | Better accuracy, formatting | | Multi-language content | [stt-openai](openai.md) | More languages supported | | Speaker identification | [stt-deepgram](deepgram.md) | Has diarization feature | | Translation to English | [stt-openai](openai.md) | Built-in translation | | Cost-sensitive | [stt-deepgram](deepgram.md) | Competitive pricing | | OpenAI workflow | [stt-openai](openai.md) | Single API key | | Aldea / Deepgram-shaped API | [stt-aldea](aldea.md) | Aldea-hosted, same response shape as Deepgram | ## Performance Comparison ### OpenAI Whisper - **Accuracy**: Excellent (WER ~5-10%) - **Speed**: Moderate - **Languages**: 50+ - **Max file size**: 25 MB - **Pricing**: $0.006/minute ### Deepgram Nova-3 - **Accuracy**: Superior (WER ~3-7%) - **Speed**: Fast (real-time capable) - **Languages**: 30+ - **Max file size**: No limit - **Pricing**: Competitive (volume discounts) ## Common Patterns ### Fallback Chain Try Deepgram first, fall back to OpenAI: ```yaml extractor_id: select-text config: extractors: - stt-deepgram - stt-openai ``` ### Language-Specific Routing Route by media type or use overrides: ```yaml extractor_id: select-smart-override config: default_extractor: stt-deepgram overrides: - media_type_pattern: "audio/.*" extractor: stt-deepgram - media_type_pattern: "video/.*" extractor: stt-openai ``` ### Speaker Diarization Use Deepgram with diarization enabled: ```yaml extractor_id: stt-deepgram config: diarize: true smart_format: true ``` ## Authentication Both extractors require API keys: ### Environment Variables ```bash export OPENAI_API_KEY="your-openai-key" export DEEPGRAM_API_KEY="your-deepgram-key" ``` ### Configuration File Add to `~/.biblicus/config.yml`: ```yaml openai: api_key: YOUR_OPENAI_KEY deepgram: api_key: YOUR_DEEPGRAM_KEY aldea: api_key: org_YOUR_ALDEA_KEY ``` ## Supported Audio Formats Both extractors support common audio formats: - MP3 - MP4 (audio track) - MPEG - MPGA - M4A - WAV - WEBM - OGG - FLAC ## See Also - [Extractors Overview](../index.md) - [stt-openai](openai.md) - OpenAI Whisper extractor details - [stt-deepgram](deepgram.md) - Deepgram Nova-3 extractor details - [Pipeline Utilities](../pipeline-utilities/index.md) - Combining extraction strategies