Deepgram Transform Extractor
Extractor ID: deepgram-transform
Category: Speech-to-Text
Overview
The Deepgram transform extractor renders text from the structured metadata produced by
stt-deepgram. Use it when you want to choose a specific Deepgram representation
(transcript, utterances, or words) or filter by channel or speaker.
This extractor does not call Deepgram. It only transforms metadata from a prior pipeline stage.
Installation
No additional dependencies required. The transform runs on existing Deepgram metadata.
Configuration
class DeepgramTranscriptTransformConfig(BaseModel):
source: str = "transcript" # transcript | utterances | words
channels: Optional[List[int]] = None
speakers: Optional[List[int]] = None
join_with: str = " "
include_channel_labels: bool = False
include_speaker_labels: bool = False
Configuration Options
Option |
Type |
Required |
Description |
|---|---|---|---|
|
str |
✅ |
|
|
list[int] |
❌ |
Channel indices to include (default: all) |
|
list[int] |
❌ |
Speaker indices to include (default: all) |
|
str |
❌ |
Separator when joining utterances or words |
|
bool |
❌ |
Prefix output with |
|
bool |
❌ |
Prefix output with |
Usage
Pipeline Example
extractor_id: pipeline
config:
stages:
- extractor_id: stt-deepgram
config:
diarize: true
- extractor_id: deepgram-transform
config:
source: utterances
speakers: [0]
Command Line
biblicus extract build --corpus corpora/Alfa \
--stage "stt-deepgram:diarize=true" \
--stage "deepgram-transform:source=utterances,speakers=[0]"
Notes
deepgram-transformrequires a prior stage that provides Deepgram metadata, typicallystt-deepgram.When
source=words, the extractor joins word entries withjoin_with.When
source=utterances, each utterance transcript is joined withjoin_with.