Deepgram Speech-to-Text Extractor

Extractor ID: stt-deepgram

Overview

The Deepgram speech-to-text extractor uses Deepgram’s neural network-based API to transcribe audio files. It provides fast, accurate transcription with advanced features like speaker diarization, smart formatting, and lower word error rates than traditional ASR systems.

Deepgram’s Nova-3 model delivers state-of-the-art accuracy with excellent performance on diverse audio conditions. The API is optimized for speed and scale, making it ideal for large corpus processing.

Installation

Install the Deepgram Python SDK:

pip install "biblicus[deepgram]"

You’ll also need a Deepgram API key.

Supported Media Types

audio/mpeg - MP3 audio
audio/mp4 - M4A audio
audio/wav - WAV audio
audio/webm - WebM audio
audio/flac - FLAC audio
audio/ogg - OGG audio
audio/* - Any audio format supported by Deepgram

Only audio items are processed. Other media types are automatically skipped.

Configuration

Config Schema

class DeepgramSpeechToTextExtractorConfig(BaseModel):
    model: str = "nova-3"
    language: Optional[str] = None
    punctuate: bool = True
    smart_format: bool = True
    diarize: bool = False
    filler_words: bool = False

Configuration Options

Option	Type	Default	Description
`model`	str	`nova-3`	Deepgram model: `nova-3`, `nova-2`, `base`, `enhanced`
`language`	str or null	`null`	Language code hint (e.g., `en`, `es`, `fr`)
`punctuate`	bool	`true`	Add punctuation to transcript
`smart_format`	bool	`true`	Apply smart formatting (numbers, dates, etc.)
`diarize`	bool	`false`	Enable speaker diarization
`filler_words`	bool	`false`	Include filler words (um, uh, etc.)

Model Options

nova-3 (default): Latest model, best accuracy, lowest WER
nova-2: Previous generation, good accuracy
base: Basic model, faster, lower accuracy
enhanced: Enhanced accuracy for challenging audio

Usage

Command Line

Basic Usage

# Configure API key
export DEEPGRAM_API_KEY="your-key-here"

# Extract audio transcripts
biblicus extract my-corpus --extractor stt-deepgram

Custom Configuration

# Enable speaker diarization
biblicus extract my-corpus --extractor stt-deepgram \
  --config diarize=true

# Transcribe Spanish audio
biblicus extract my-corpus --extractor stt-deepgram \
  --config language=es

# Disable smart formatting
biblicus extract my-corpus --extractor stt-deepgram \
  --config smart_format=false

Configuration File

extractor_id: stt-deepgram
config:
  model: nova-3
  punctuate: true
  smart_format: true
  diarize: false
  filler_words: false

biblicus extract my-corpus --configuration configuration.yml

Python API

from biblicus import Corpus

# Load corpus
corpus = Corpus.from_directory("my-corpus")

# Extract with defaults
results = corpus.extract_text(extractor_id="stt-deepgram")

# Extract with speaker diarization
results = corpus.extract_text(
    extractor_id="stt-deepgram",
    config={"diarize": True}
)

# Extract with language hint
results = corpus.extract_text(
    extractor_id="stt-deepgram",
    config={
        "language": "es",
        "model": "nova-3"
    }
)

In Pipeline

Audio Processing

extractor_id: pipeline
config:
  stages:
    - extractor_id: pass-through-text
    - extractor_id: stt-deepgram
    - extractor_id: select-text

Media Type Routing

extractor_id: select-smart-override
config:
  default_extractor: pass-through-text
  overrides:
    - media_type_pattern: "audio/.*"
      extractor: stt-deepgram

Examples

Podcast Transcription

Transcribe podcast episodes with smart formatting:

export DEEPGRAM_API_KEY="your-key"
biblicus extract podcasts --extractor stt-deepgram \
  --config smart_format=true

Multi-Speaker Audio

Enable speaker diarization for interviews or meetings:

biblicus extract meetings --extractor stt-deepgram \
  --config diarize=true

Multilingual Content

Transcribe Spanish audio:

from biblicus import Corpus

corpus = Corpus.from_directory("spanish-audio")

results = corpus.extract_text(
    extractor_id="stt-deepgram",
    config={"language": "es"}
)

Include Filler Words

Preserve filler words for linguistic analysis:

biblicus extract interviews --extractor stt-deepgram \
  --config filler_words=true

API Configuration

Environment Variable

export DEEPGRAM_API_KEY="your-api-key-here"

User Config File

Add to ~/.biblicus/config.yml:

deepgram:
  api_key: YOUR_API_KEY_HERE

Local Config File

Add to .biblicus/config.yml in your project:

deepgram:
  api_key: YOUR_API_KEY_HERE

Language Support

Deepgram supports 30+ languages including:

English (en)
Spanish (es)
French (fr)
German (de)
Italian (it)
Portuguese (pt)
Dutch (nl)
Russian (ru)
Chinese (zh)
Japanese (ja)
Korean (ko)
Hindi (hi)

And many more. See Deepgram documentation for the full list.

Smart Formatting

With smart_format: true, Deepgram automatically formats:

Numbers: “one hundred” → “100”
Dates: “january first” → “January 1st”
Times: “three thirty pm” → “3:30 PM”
Currency: “fifty dollars” → “$50”
Addresses: Street numbers and names
Phone numbers: Digit sequences

Example:

Input audio: "Call me at five five five one two three four"
Output: "Call me at 555-1234"

Speaker Diarization

With diarize: true, Deepgram identifies different speakers:

Speaker 0: Hello, how are you?
Speaker 1: I'm doing well, thanks for asking.
Speaker 0: Great to hear!

Note: Deepgram’s transcription API returns speaker labels in the detailed response. The Biblicus extractor combines all speaker segments into a single transcript.

Structured Metadata

Biblicus stores the full Deepgram response payload as structured metadata on the extraction stage. This lets downstream stages transform the transcript using Deepgram’s words or utterances representations (for example, to filter by speaker or channel).

To render a specific representation, add the deepgram-transform stage after stt-deepgram:

extractor_id: pipeline
config:
  stages:
    - extractor_id: stt-deepgram
      config:
        diarize: true
    - extractor_id: deepgram-transform
      config:
        source: utterances
        speakers: [0]

Performance

Speed: Fast (~0.05x realtime for Nova-3)
Accuracy: Excellent (lower WER than Whisper for English)
Word Error Rate: ~8-10% for Nova-3 on clean audio
Cost: Per-minute API pricing (check Deepgram pricing)

Error Handling

Missing Dependency

If Deepgram SDK is not installed:

ExtractionRunFatalError: Deepgram speech to text extractor requires an optional dependency.
Install it with pip install "biblicus[deepgram]".

Missing API Key

If API key is not configured:

ExtractionRunFatalError: Deepgram speech to text extractor requires a Deepgram API key.
Set DEEPGRAM_API_KEY or configure it in ~/.biblicus/config.yml or ./.biblicus/config.yml under deepgram.api_key.

Non-Audio Items

Non-audio items are silently skipped (returns None).

API Errors

API errors (rate limits, invalid audio, etc.) are recorded as per-item errors but don’t halt extraction.

Use Cases

Podcast Archives

Transcribe podcast episodes for search:

biblicus extract podcasts --extractor stt-deepgram \
  --config smart_format=true

Meeting Recordings

Create searchable meeting transcripts with speaker identification:

biblicus extract meetings --extractor stt-deepgram \
  --config diarize=true

Call Center Audio

Process customer service calls:

biblicus extract calls --extractor stt-deepgram \
  --config model=nova-3 \
  --config diarize=true

Lecture Capture

Transcribe educational content with smart formatting:

biblicus extract lectures --extractor stt-deepgram \
  --config smart_format=true \
  --config punctuate=true

When to Use Deepgram vs OpenAI

Use Deepgram when:

You need fastest processing speed
Speaker diarization is required
Lower word error rate for English is critical
Smart formatting is desired
Processing large volumes

Use OpenAI Whisper when:

You need broader language support
Audio quality varies significantly
You prefer OpenAI ecosystem
Multilingual content is diverse

Comparison

Feature	Deepgram	OpenAI Whisper
Speed	⭐⭐⭐⭐⭐	⭐⭐⭐
English WER	⭐⭐⭐⭐⭐	⭐⭐⭐⭐
Languages	30+	50+
Diarization	✅	❌
Smart Formatting	✅	❌
Filler Words	✅	❌

Best Practices

Use Nova-3 for Best Results

Nova-3 provides the lowest word error rate:

config:
  model: nova-3

Enable Smart Formatting

Make transcripts more readable:

config:
  smart_format: true
  punctuate: true

Use Diarization for Multi-Speaker Audio

Identify speakers in meetings and interviews:

config:
  diarize: true

Provide Language Hints

When you know the language, specify it:

config:
  language: en

Monitor API Usage

Track API costs:

print(f"Processed items: {results.stats.processed_items}")

Advanced Features

Filler Words

Include or exclude filler words:

config:
  filler_words: true  # Include "um", "uh", etc.

Custom Model Selection

Choose model based on needs:

# Best accuracy
config:
  model: nova-3

# Faster processing
config:
  model: base