Deepgram Speech-to-Text Extractor

Extractor ID: stt-deepgram

Category: Speech-to-Text Extractors

Overview

The Deepgram speech-to-text extractor uses Deepgram’s neural network-based API to transcribe audio files. It provides fast, accurate transcription with advanced features like speaker diarization, smart formatting, and lower word error rates than traditional ASR systems.

Deepgram’s Nova-3 model delivers state-of-the-art accuracy with excellent performance on diverse audio conditions. The API is optimized for speed and scale, making it ideal for large corpus processing.

Installation

Install the Deepgram Python SDK:

pip install "biblicus[deepgram]"

You’ll also need a Deepgram API key.

Supported Media Types

  • audio/mpeg - MP3 audio

  • audio/mp4 - M4A audio

  • audio/wav - WAV audio

  • audio/webm - WebM audio

  • audio/flac - FLAC audio

  • audio/ogg - OGG audio

  • audio/* - Any audio format supported by Deepgram

Only audio items are processed. Other media types are automatically skipped.

Configuration

Config Schema

class DeepgramSpeechToTextExtractorConfig(BaseModel):
    model: str = "nova-3"
    language: Optional[str] = None
    punctuate: bool = True
    smart_format: bool = True
    diarize: bool = False
    filler_words: bool = False

Configuration Options

Option

Type

Default

Description

model

str

nova-3

Deepgram model: nova-3, nova-2, base, enhanced

language

str or null

null

Language code hint (e.g., en, es, fr)

punctuate

bool

true

Add punctuation to transcript

smart_format

bool

true

Apply smart formatting (numbers, dates, etc.)

diarize

bool

false

Enable speaker diarization

filler_words

bool

false

Include filler words (um, uh, etc.)

Model Options

  • nova-3 (default): Latest model, best accuracy, lowest WER

  • nova-2: Previous generation, good accuracy

  • base: Basic model, faster, lower accuracy

  • enhanced: Enhanced accuracy for challenging audio

Usage

Command Line

Basic Usage

# Configure API key
export DEEPGRAM_API_KEY="your-key-here"

# Extract audio transcripts
biblicus extract my-corpus --extractor stt-deepgram

Custom Configuration

# Enable speaker diarization
biblicus extract my-corpus --extractor stt-deepgram \
  --config diarize=true

# Transcribe Spanish audio
biblicus extract my-corpus --extractor stt-deepgram \
  --config language=es

# Disable smart formatting
biblicus extract my-corpus --extractor stt-deepgram \
  --config smart_format=false

Configuration File

extractor_id: stt-deepgram
config:
  model: nova-3
  punctuate: true
  smart_format: true
  diarize: false
  filler_words: false
biblicus extract my-corpus --configuration configuration.yml

Python API

from biblicus import Corpus

# Load corpus
corpus = Corpus.from_directory("my-corpus")

# Extract with defaults
results = corpus.extract_text(extractor_id="stt-deepgram")

# Extract with speaker diarization
results = corpus.extract_text(
    extractor_id="stt-deepgram",
    config={"diarize": True}
)

# Extract with language hint
results = corpus.extract_text(
    extractor_id="stt-deepgram",
    config={
        "language": "es",
        "model": "nova-3"
    }
)

In Pipeline

Audio Processing

extractor_id: pipeline
config:
  stages:
    - extractor_id: pass-through-text
    - extractor_id: stt-deepgram
    - extractor_id: select-text

Media Type Routing

extractor_id: select-smart-override
config:
  default_extractor: pass-through-text
  overrides:
    - media_type_pattern: "audio/.*"
      extractor: stt-deepgram

Examples

Podcast Transcription

Transcribe podcast episodes with smart formatting:

export DEEPGRAM_API_KEY="your-key"
biblicus extract podcasts --extractor stt-deepgram \
  --config smart_format=true

Multi-Speaker Audio

Enable speaker diarization for interviews or meetings:

biblicus extract meetings --extractor stt-deepgram \
  --config diarize=true

Multilingual Content

Transcribe Spanish audio:

from biblicus import Corpus

corpus = Corpus.from_directory("spanish-audio")

results = corpus.extract_text(
    extractor_id="stt-deepgram",
    config={"language": "es"}
)

Include Filler Words

Preserve filler words for linguistic analysis:

biblicus extract interviews --extractor stt-deepgram \
  --config filler_words=true

API Configuration

Environment Variable

export DEEPGRAM_API_KEY="your-api-key-here"

User Config File

Add to ~/.biblicus/config.yml:

deepgram:
  api_key: YOUR_API_KEY_HERE

Local Config File

Add to .biblicus/config.yml in your project:

deepgram:
  api_key: YOUR_API_KEY_HERE

Language Support

Deepgram supports 30+ languages including:

  • English (en)

  • Spanish (es)

  • French (fr)

  • German (de)

  • Italian (it)

  • Portuguese (pt)

  • Dutch (nl)

  • Russian (ru)

  • Chinese (zh)

  • Japanese (ja)

  • Korean (ko)

  • Hindi (hi)

And many more. See Deepgram documentation for the full list.

Smart Formatting

With smart_format: true, Deepgram automatically formats:

  • Numbers: “one hundred” → “100”

  • Dates: “january first” → “January 1st”

  • Times: “three thirty pm” → “3:30 PM”

  • Currency: “fifty dollars” → “$50”

  • Addresses: Street numbers and names

  • Phone numbers: Digit sequences

Example:

Input audio: "Call me at five five five one two three four"
Output: "Call me at 555-1234"

Speaker Diarization

With diarize: true, Deepgram identifies different speakers:

Speaker 0: Hello, how are you?
Speaker 1: I'm doing well, thanks for asking.
Speaker 0: Great to hear!

Note: Deepgram’s transcription API returns speaker labels in the detailed response. The Biblicus extractor combines all speaker segments into a single transcript.

Structured Metadata

Biblicus stores the full Deepgram response payload as structured metadata on the extraction stage. This lets downstream stages transform the transcript using Deepgram’s words or utterances representations (for example, to filter by speaker or channel).

To render a specific representation, add the deepgram-transform stage after stt-deepgram:

extractor_id: pipeline
config:
  stages:
    - extractor_id: stt-deepgram
      config:
        diarize: true
    - extractor_id: deepgram-transform
      config:
        source: utterances
        speakers: [0]

Performance

  • Speed: Fast (~0.05x realtime for Nova-3)

  • Accuracy: Excellent (lower WER than Whisper for English)

  • Word Error Rate: ~8-10% for Nova-3 on clean audio

  • Cost: Per-minute API pricing (check Deepgram pricing)

Error Handling

Missing Dependency

If Deepgram SDK is not installed:

ExtractionRunFatalError: Deepgram speech to text extractor requires an optional dependency.
Install it with pip install "biblicus[deepgram]".

Missing API Key

If API key is not configured:

ExtractionRunFatalError: Deepgram speech to text extractor requires a Deepgram API key.
Set DEEPGRAM_API_KEY or configure it in ~/.biblicus/config.yml or ./.biblicus/config.yml under deepgram.api_key.

Non-Audio Items

Non-audio items are silently skipped (returns None).

API Errors

API errors (rate limits, invalid audio, etc.) are recorded as per-item errors but don’t halt extraction.

Use Cases

Podcast Archives

Transcribe podcast episodes for search:

biblicus extract podcasts --extractor stt-deepgram \
  --config smart_format=true

Meeting Recordings

Create searchable meeting transcripts with speaker identification:

biblicus extract meetings --extractor stt-deepgram \
  --config diarize=true

Call Center Audio

Process customer service calls:

biblicus extract calls --extractor stt-deepgram \
  --config model=nova-3 \
  --config diarize=true

Lecture Capture

Transcribe educational content with smart formatting:

biblicus extract lectures --extractor stt-deepgram \
  --config smart_format=true \
  --config punctuate=true

When to Use Deepgram vs OpenAI

Use Deepgram when:

  • You need fastest processing speed

  • Speaker diarization is required

  • Lower word error rate for English is critical

  • Smart formatting is desired

  • Processing large volumes

Use OpenAI Whisper when:

  • You need broader language support

  • Audio quality varies significantly

  • You prefer OpenAI ecosystem

  • Multilingual content is diverse

Comparison

Feature

Deepgram

OpenAI Whisper

Speed

⭐⭐⭐⭐⭐

⭐⭐⭐

English WER

⭐⭐⭐⭐⭐

⭐⭐⭐⭐

Languages

30+

50+

Diarization

Smart Formatting

Filler Words

Best Practices

Use Nova-3 for Best Results

Nova-3 provides the lowest word error rate:

config:
  model: nova-3

Enable Smart Formatting

Make transcripts more readable:

config:
  smart_format: true
  punctuate: true

Use Diarization for Multi-Speaker Audio

Identify speakers in meetings and interviews:

config:
  diarize: true

Provide Language Hints

When you know the language, specify it:

config:
  language: en

Monitor API Usage

Track API costs:

print(f"Processed items: {results.stats.processed_items}")

Advanced Features

Filler Words

Include or exclude filler words:

config:
  filler_words: true  # Include "um", "uh", etc.

Custom Model Selection

Choose model based on needs:

# Best accuracy
config:
  model: nova-3

# Faster processing
config:
  model: base

See Also