transcriber
Utility module for transcribing audio clips or files.
It uses the whisper model for real time transcription of short audio clips and the pyannote model for diarization. Currently the diarization is not integrated with the front end API, but this is on the roadmap.
What is real time?
The STT module originally included methods for processing incoming audio streams that were sent over a websocket to produce live transcription results. This was abandonded in favor of having the audio processing occur on the front end for two reasons. One, setting up websockets for remote connections is unecessarily complicated. Two, placing the websocket in the frontend easily exposes parameters for controlling speech sensitivity which will vary with environment and microphone.
Transcriber
Wraps the whisper module with pyannotate diarization and microphone recording.
When transcribing a clip it will just return the text, but when transcribing a file there is the option to return diarized speech. When diarizing, it will save the diarized text to a csv.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model_size |
str
|
Size of the whisper model to use. Defaults to "medium". |
'medium'
|
save_dir |
str
|
Location to save the transcribed audio. Defaults to None. |
None
|
Attributes:
| Name | Type | Description |
|---|---|---|
stt |
obj
|
Exposes transcribe_clip and transcribe_file method |
diarizer |
obj
|
Exposes diarize_file method |
common_hallucinations |
list[str]
|
List of common hallucinations produced by the whisper stt model. |
Source code in backend/app/utils/stt/transcriber.py
27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 | |
diarize_file(file_name)
Wrapper to diarize a file.
Yes. I know I am probably playing fast and loose with grammar here.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
file_name |
str
|
Path to the file. |
required |
Returns:
| Type | Description |
|---|---|
list[dict]: list of speaker and time segments. |
Source code in backend/app/utils/stt/transcriber.py
save_csv(diarization, filename='diarization.csv')
Save diarization to csv
Source code in backend/app/utils/stt/transcriber.py
transcribe_clip(audio_clip)
Transcribes audio segment
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
audio_clip |
AudioSegment
|
bytes read from a file containing speech |
required |
Returns:
| Name | Type | Description |
|---|---|---|
str |
str
|
the transcribed text. returns "" if the audio was a hallucination |
Source code in backend/app/utils/stt/transcriber.py
transcribe_file(file_name, diarize=False)
Transcribe a file, diarize if specified.
Transcribes a complete audio file, will save diarized speech if diarize is True.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
file_name |
str
|
Path to audio file. |
required |
diarize |
bool
|
Diarize the audio. Defaults to False. |
False
|
Returns:
| Name | Type | Description |
|---|---|---|
str |
str
|
transcription of text from audio. |
Source code in backend/app/utils/stt/transcriber.py
main()
Run transcriber on a wav file.