# Deepgram Speech-to-Text Extractor

**Extractor ID:** `stt-deepgram`

**Category:** [Speech-to-Text Extractors](index.md)

## Overview

The Deepgram speech-to-text extractor uses Deepgram's neural network-based API to transcribe audio files. It provides fast, accurate transcription with advanced features like speaker diarization, smart formatting, and lower word error rates than traditional ASR systems.

Deepgram's Nova-3 model delivers state-of-the-art accuracy with excellent performance on diverse audio conditions. The API is optimized for speed and scale, making it ideal for large corpus processing.

## Installation

Install the Deepgram Python SDK:

```bash
pip install "biblicus[deepgram]"
```

You'll also need a Deepgram API key.

## Supported Media Types

- `audio/mpeg` - MP3 audio
- `audio/mp4` - M4A audio
- `audio/wav` - WAV audio
- `audio/webm` - WebM audio
- `audio/flac` - FLAC audio
- `audio/ogg` - OGG audio
- `audio/*` - Any audio format supported by Deepgram

Only audio items are processed. Other media types are automatically skipped.

## Configuration

### Config Schema

```python
class DeepgramSpeechToTextExtractorConfig(BaseModel):
    model: str = "nova-3"
    language: Optional[str] = None
    punctuate: bool = True
    smart_format: bool = True
    diarize: bool = False
    filler_words: bool = False
```

### Configuration Options

| Option | Type | Default | Description |
|--------|------|---------|-------------|
| `model` | str | `nova-3` | Deepgram model: `nova-3`, `nova-2`, `base`, `enhanced` |
| `language` | str or null | `null` | Language code hint (e.g., `en`, `es`, `fr`) |
| `punctuate` | bool | `true` | Add punctuation to transcript |
| `smart_format` | bool | `true` | Apply smart formatting (numbers, dates, etc.) |
| `diarize` | bool | `false` | Enable speaker diarization |
| `filler_words` | bool | `false` | Include filler words (um, uh, etc.) |

### Model Options

- **nova-3** (default): Latest model, best accuracy, lowest WER
- **nova-2**: Previous generation, good accuracy
- **base**: Basic model, faster, lower accuracy
- **enhanced**: Enhanced accuracy for challenging audio

## Usage

### Command Line

#### Basic Usage

```bash
# Configure API key
export DEEPGRAM_API_KEY="your-key-here"

# Extract audio transcripts
biblicus extract my-corpus --extractor stt-deepgram
```

#### Custom Configuration

```bash
# Enable speaker diarization
biblicus extract my-corpus --extractor stt-deepgram \
  --config diarize=true

# Transcribe Spanish audio
biblicus extract my-corpus --extractor stt-deepgram \
  --config language=es

# Disable smart formatting
biblicus extract my-corpus --extractor stt-deepgram \
  --config smart_format=false
```

#### Configuration File

```yaml
extractor_id: stt-deepgram
config:
  model: nova-3
  punctuate: true
  smart_format: true
  diarize: false
  filler_words: false
```

```bash
biblicus extract my-corpus --configuration configuration.yml
```

### Python API

```python
from biblicus import Corpus

# Load corpus
corpus = Corpus.from_directory("my-corpus")

# Extract with defaults
results = corpus.extract_text(extractor_id="stt-deepgram")

# Extract with speaker diarization
results = corpus.extract_text(
    extractor_id="stt-deepgram",
    config={"diarize": True}
)

# Extract with language hint
results = corpus.extract_text(
    extractor_id="stt-deepgram",
    config={
        "language": "es",
        "model": "nova-3"
    }
)
```

### In Pipeline

#### Audio Processing

```yaml
extractor_id: pipeline
config:
  stages:
    - extractor_id: pass-through-text
    - extractor_id: stt-deepgram
    - extractor_id: select-text
```

#### Media Type Routing

```yaml
extractor_id: select-smart-override
config:
  default_extractor: pass-through-text
  overrides:
    - media_type_pattern: "audio/.*"
      extractor: stt-deepgram
```

## Examples

### Podcast Transcription

Transcribe podcast episodes with smart formatting:

```bash
export DEEPGRAM_API_KEY="your-key"
biblicus extract podcasts --extractor stt-deepgram \
  --config smart_format=true
```

### Multi-Speaker Audio

Enable speaker diarization for interviews or meetings:

```bash
biblicus extract meetings --extractor stt-deepgram \
  --config diarize=true
```

### Multilingual Content

Transcribe Spanish audio:

```python
from biblicus import Corpus

corpus = Corpus.from_directory("spanish-audio")

results = corpus.extract_text(
    extractor_id="stt-deepgram",
    config={"language": "es"}
)
```

### Include Filler Words

Preserve filler words for linguistic analysis:

```bash
biblicus extract interviews --extractor stt-deepgram \
  --config filler_words=true
```

## API Configuration

### Environment Variable

```bash
export DEEPGRAM_API_KEY="your-api-key-here"
```

### User Config File

Add to `~/.biblicus/config.yml`:

```yaml
deepgram:
  api_key: YOUR_API_KEY_HERE
```

### Local Config File

Add to `.biblicus/config.yml` in your project:

```yaml
deepgram:
  api_key: YOUR_API_KEY_HERE
```

## Language Support

Deepgram supports 30+ languages including:

- English (`en`)
- Spanish (`es`)
- French (`fr`)
- German (`de`)
- Italian (`it`)
- Portuguese (`pt`)
- Dutch (`nl`)
- Russian (`ru`)
- Chinese (`zh`)
- Japanese (`ja`)
- Korean (`ko`)
- Hindi (`hi`)

And many more. See Deepgram documentation for the full list.

## Smart Formatting

With `smart_format: true`, Deepgram automatically formats:

- **Numbers**: "one hundred" → "100"
- **Dates**: "january first" → "January 1st"
- **Times**: "three thirty pm" → "3:30 PM"
- **Currency**: "fifty dollars" → "$50"
- **Addresses**: Street numbers and names
- **Phone numbers**: Digit sequences

Example:

```
Input audio: "Call me at five five five one two three four"
Output: "Call me at 555-1234"
```

## Speaker Diarization

With `diarize: true`, Deepgram identifies different speakers:

```
Speaker 0: Hello, how are you?
Speaker 1: I'm doing well, thanks for asking.
Speaker 0: Great to hear!
```

Note: Deepgram's transcription API returns speaker labels in the detailed response. The Biblicus extractor combines all speaker segments into a single transcript.

## Structured Metadata

Biblicus stores the full Deepgram response payload as structured metadata on the extraction stage.
This lets downstream stages transform the transcript using Deepgram's `words` or `utterances`
representations (for example, to filter by speaker or channel).

To render a specific representation, add the `deepgram-transform` stage after `stt-deepgram`:

```yaml
extractor_id: pipeline
config:
  stages:
    - extractor_id: stt-deepgram
      config:
        diarize: true
    - extractor_id: deepgram-transform
      config:
        source: utterances
        speakers: [0]
```

## Performance

- **Speed**: Fast (~0.05x realtime for Nova-3)
- **Accuracy**: Excellent (lower WER than Whisper for English)
- **Word Error Rate**: ~8-10% for Nova-3 on clean audio
- **Cost**: Per-minute API pricing (check Deepgram pricing)

## Error Handling

### Missing Dependency

If Deepgram SDK is not installed:

```
ExtractionRunFatalError: Deepgram speech to text extractor requires an optional dependency.
Install it with pip install "biblicus[deepgram]".
```

### Missing API Key

If API key is not configured:

```
ExtractionRunFatalError: Deepgram speech to text extractor requires a Deepgram API key.
Set DEEPGRAM_API_KEY or configure it in ~/.biblicus/config.yml or ./.biblicus/config.yml under deepgram.api_key.
```

### Non-Audio Items

Non-audio items are silently skipped (returns `None`).

### API Errors

API errors (rate limits, invalid audio, etc.) are recorded as per-item errors but don't halt extraction.

## Use Cases

### Podcast Archives

Transcribe podcast episodes for search:

```bash
biblicus extract podcasts --extractor stt-deepgram \
  --config smart_format=true
```

### Meeting Recordings

Create searchable meeting transcripts with speaker identification:

```bash
biblicus extract meetings --extractor stt-deepgram \
  --config diarize=true
```

### Call Center Audio

Process customer service calls:

```bash
biblicus extract calls --extractor stt-deepgram \
  --config model=nova-3 \
  --config diarize=true
```

### Lecture Capture

Transcribe educational content with smart formatting:

```bash
biblicus extract lectures --extractor stt-deepgram \
  --config smart_format=true \
  --config punctuate=true
```

## When to Use Deepgram vs OpenAI

### Use Deepgram when:
- You need fastest processing speed
- Speaker diarization is required
- Lower word error rate for English is critical
- Smart formatting is desired
- Processing large volumes

### Use OpenAI Whisper when:
- You need broader language support
- Audio quality varies significantly
- You prefer OpenAI ecosystem
- Multilingual content is diverse

### Comparison

| Feature | Deepgram | OpenAI Whisper |
|---------|----------|----------------|
| Speed | ⭐⭐⭐⭐⭐ | ⭐⭐⭐ |
| English WER | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ |
| Languages | 30+ | 50+ |
| Diarization | ✅ | ❌ |
| Smart Formatting | ✅ | ❌ |
| Filler Words | ✅ | ❌ |

## Best Practices

### Use Nova-3 for Best Results

Nova-3 provides the lowest word error rate:

```yaml
config:
  model: nova-3
```

### Enable Smart Formatting

Make transcripts more readable:

```yaml
config:
  smart_format: true
  punctuate: true
```

### Use Diarization for Multi-Speaker Audio

Identify speakers in meetings and interviews:

```yaml
config:
  diarize: true
```

### Provide Language Hints

When you know the language, specify it:

```yaml
config:
  language: en
```

### Monitor API Usage

Track API costs:

```python
print(f"Processed items: {results.stats.processed_items}")
```

## Advanced Features

### Filler Words

Include or exclude filler words:

```yaml
config:
  filler_words: true  # Include "um", "uh", etc.
```

### Custom Model Selection

Choose model based on needs:

```yaml
# Best accuracy
config:
  model: nova-3

# Faster processing
config:
  model: base
```

## Related Extractors

### Same Category

- [stt-openai](openai.md) - OpenAI Whisper speech-to-text

### Alternatives

- [stt-openai](openai.md) - More languages, different accuracy profile
- [pass-through-text](../text-document/pass-through.md) - Direct text files
- [metadata-text](../text-document/metadata.md) - Metadata-based text

### Pipeline Utilities

- [select-text](../pipeline-utilities/select-text.md) - First non-empty selection
- [select-longest-text](../pipeline-utilities/select-longest.md) - Choose longest output
- [select-smart-override](../pipeline-utilities/select-smart-override.md) - Media type routing
- [pipeline](../pipeline-utilities/pipeline.md) - Multi-step extraction

## See Also

- [Speech-to-Text Extractors Overview](index.md)
- [Extractors Index](../index.md)
- [extraction.md](../../extraction.md) - Extraction pipeline concepts
- [User Configuration](../../user-configuration.md)
- [Deepgram API Documentation](https://developers.deepgram.com/)