This is the multi-page printable view of this section. Click here to print.
Automatic Speech Recognition (as a service)
1 - Getting Started
Introduction
This guide will get you started with Agile Contents Automatic Speech Recognition service. It will show you how WebVTT subtitles are generated live from an audio clip ingested with FFmpeg.
Prerequisites
Before following those steps make sure to prepare the following
- FFmpeg (or another mean to ingest WAV audio over SRT to the ASR service)
- An audio clip in a format decodable by ffmpeg
- Curl or another HTTP tool to use the API
- An account for the Automatic Speech Recognition Service (contact sales@agilecontent.com for info)
You will need the following account information
- URL in the form
<your-id>.asr.agilecontent.com
- username
- password
- Audio ingest IP
- SRT secret
Replace <your-id>
, <username>
, <password>
, <ingest-ip>
and <srt-secret>
with those values in the examples below
Validate API Access
Validate you can access the service API.
$ curl -i -u "<username>:<password>" https://<your-id>.asr.agilecontent.com/api/v1/channels
You should get a response similar to this
HTTP/2 200
date: Thu, 08 Aug 2024 09:05:13 GMT
content-type: text/plain; charset=utf-8
content-length: 3
access-control-allow-origin: *
x-request-id: ip-10-0-0-138.eu-north-1.compute.internal/nxlyokd96h-000003
[]
Create a channel
To setup a channel first prepare a file mychannel.json
with the following,
but replace the language with the language code of your audio clip
{
"id": "mychannel",
"name": "My Channel",
"input":{
"type":"srt",
"port":10000
},
"language":"en-US",
"outputs":[
"webvtt"
]
}
Then apply the channel configuration to the API
$ curl -i -u "<username>:<password>" -XPOST -H "Content-Type: application/json" https://<your-id>.asr.agilecontent.com/api/v1/channels -d @mychannel.json
You will receive a 200 OK
response with a JSON payload similar to this
{
"id": "mychannel",
"name": "My Channel",
"enabled": true,
"engine": "google",
"input": {
"type": "srt",
"port": 10000
},
"language": "en-US",
"outputs": [
"webvtt"
],
"segmentation": {
"rows": 2,
"chars_per_row": 40,
"progressive": false
}
}
Ingest the Audio
Start ingesting audio from your audiofile (named audio.ts
here).
ffmpeg -re -i audio.ts -vn -ac 1 -acodec pcm_s16le -f wav -bitexact "srt://<ingest-ip>:10000?mode=caller&passphrase=<srt-secret>"
Check the Subtitles
First request a HLS subtitle variant manifest to see available WebVTT files
$ curl https://<your-id>.asr.agilecontent.com/subtitles/mychannel/subtitles.m3u8
Replace with a specific WebVTT file to see the generated subtitles
$ curl https://<your-id>.asr.agilecontent.com/subtitles/mychannel/<replace with vtt file in manifest>.vtt
You will receive a WebVTT file, like
WEBVTT
00:00:47.717 --> 00:00:49.317
Nice subtitles you generated for me.
00:00:50.557 --> 00:00:51.117
No problem, it's been my pleasure.
You could also run the following python code to progressively see all WebVTT files as they are generated
import sys
import time
import m3u8
import requests
def watch(uri):
seen_segments = set()
while True:
manifest = m3u8.load(uri)
for s in manifest.segments:
if s.absolute_uri not in seen_segments:
print("#", s.absolute_uri)
res = requests.get(s.absolute_uri)
print(res.text)
seen_segments.add(s.absolute_uri)
if manifest.target_duration >= 1:
time.sleep(manifest.target_duration / 2)
else:
time.sleep(1)
if __name__ == "__main__":
watch(sys.argv[1])
Add it to a file show_subs.py
and run
$ pip3 install m3u8 requests
$ python3 show_subs.py https://<your-id>.asr.agilecontent.com/subtitles/mychannel/subtitles.m3u8
Adjust Subtitles
To get the current settings for mychannel
use
curl -u "<username>:<password>" https://<your-id>.asr.agilecontent.com/api/v1/channels/mychannel
Put the JSON data in a file channel-change.json
, then adjust for example chars_per_row
and apply the change with
curl -u "<username>:<password>" https://<your-id>.asr.agilecontent.com/api/v1/channels/mychannel -XPUT -H "Content-Type: application/json" -d @channel-change.json
Cleanup channel
To delete the channel mychannel
after you’re done use
$ curl -i -u "<username>:<password>" -XDELETE https://<your-id>.asr.agilecontent.com/api/v1/channels/mychannel
2 - ASR Engine Configuration
Introduction
This section will show you how to configure the Automatic Speech Recognition (ASR) engine and language settings for a channel.
There are two ASR engines available for the ASR service:
google
(Google Cloud Speech-to-Text)speechmatics
(Speechmatics)
Configure ASR Engine and Language
Add the selected engine to the channel configuration
{
"id": "mychannel",
"name": "My Channel",
"input":{
"type":"srt",
"port":10000
},
"language":"en-US",
"outputs":[
"webvtt"
],
"engine": "google"
}
If the engine is not specified, the default engine is google
.
Google Cloud Speech-to-Text
The language code for the Google ASR engine is in the form ll-CC
where ll
is the language code and CC
is the country code. For example, en-US
is English (United States) and es-ES
is Spanish (Spain).
Speechmatics
Speechmatics uses the short language code ll
, for example, ja
is Japanese and es
is Spanish.
The only exception is for English and Chinese Mandarin, where the expected output should be specified as one of the following:
en-GB
, en-US
, en-AU
, cmn-Hans (Simplified)
, cmn-Hant (Traditional)
.
Once the desired engine and language has been selected, apply the channel configuration to the API
$ curl -i -u "<username>:<password>" -XPOST -H "Content-Type: application/json" https://<your-id>.asr.agilecontent.com/api/v1/channels -d @mychannel.json
You will receive a 200 OK
response with a JSON payload similar to this
{
"id": "mychannel",
"name": "My Channel",
"enabled": true,
"engine": "google",
"input": {
"type": "srt",
"port": 10000
},
"language": "en-US",
"outputs": [
"webvtt"
],
"segmentation": {
"rows": 2,
"chars_per_row": 40,
"progressive": false
}
}
3 - Integrate ASR output with Cavena STU
Introduction
This section will show you how to use Cavena STU as an output in the Automatic Speech Recognition (ASR) service.
There are 2 modes of integration with Cavena STU.
Mode 1: If the STU Subtitle Input Protocol of your STU is available on a public IP the ASR service can connect directly to the STU.
Mode 2: If not you need to install a STU agent together with the STU that connects to the ASR service and proxies the subtitle traffic.
Setup ASR service to connect to a STU on a public IP
You need to configure the public IP and port where the ASR
service can connect to the STU Subtitle Input Protocol.
Those settings are available in our settings
API which contains system-wide
settings that are not unique for each channel used.
Access the current settings with
$ curl -i -u "<username>:<password>" https://<your-id>.asr.agilecontent.com/api/v1/settings
The output have a JSON data structure similar to this
{
"srt_passphrase": "<SRT Passphrase>",
"subtitle_filename": "subtitle.vtt",
"subtitle_rotation_interval": 4,
"stu_settings": {
"hostname": "<Public IP of the STU>",
"port": 4621,
"subtitle_type": 0,
"user_id": "test-0",
"username": "test",
"offset": 0,
"remove_accents": false,
"server_mode": false
}
}
Put the JSON data in a file settings.json
and update the following
- Set
stu_settings.hostname
to the public IP of your STU - Update
stu_settings.port
if you use a non-default port - Update
stu_settings.server_mode
tofalse
Now apply the new settings with the following command
$ curl -i -u "<username>:<password>" https://<your-id>.asr.agilecontent.com/api/v1/settings -XPUT -H "Content-Type: application/json" -d @settings.json
Continue to Create a channel further down to create a channel with a STU output.
Setup ASR service to listen for connections from a STU
Please contact your Agile Content representative for access and instructions on how to setup this mode.
Create a channel
To setup a channel first prepare a file mychannel.json
with the following,
but replace the language with the language code of your audio clip
{
"id": "mychannel",
"name": "My Channel",
"input":{
"type":"srt",
"port":10000
},
"language":"en-US",
"outputs":[
"stu_ip"
]
}
Then apply the channel configuration to the API
$ curl -i -u "<username>:<password>" -XPOST -H "Content-Type: application/json" https://<your-id>.asr.agilecontent.com/api/v1/channels -d @mychannel.json
You will receive a 200 OK
response with a JSON payload similar to this
{
"id": "mychannel",
"name": "My Channel",
"enabled": true,
"engine": "google",
"input": {
"type": "srt",
"port": 10000
},
"language": "en-US",
"outputs": [
"stu_ip"
],
"segmentation": {
"rows": 2,
"chars_per_row": 40,
"progressive": false
}
}
See the Getting Started guide for guidance on how to ingest audio and validate generated subtitles.
4 - API Reference
The Swagger API reference for your account is available at https://<your-id>.asr.agilecontent.com/swagger/index.html
. Replace <your-id>
in the URL so it matches the URL
given with your account. To interact directly with the API through the Swagger interface
you also need to provide your account username and password when asked for.