This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Automatic Speech Recognition (as a service)

Subtitling for live-content from the audio-track

1: Getting Started

2: ASR Engine Configuration

3: Integrate ASR output with Cavena STU

4: Use with StreamBuilder

5: API Reference

1 - Getting Started

Guide to get started using Automatic Speech Recognition (ASR) service

Introduction

This guide will get you started with Agile Contents Automatic Speech Recognition service. It will show you how WebVTT subtitles are generated live from an audio clip ingested with FFmpeg.

Prerequisites

Before following those steps make sure to prepare the following

FFmpeg (or another mean to ingest WAV audio over SRT to the ASR service)
An audio clip in a format decodable by ffmpeg
Curl or another HTTP tool to use the API
An account for the Automatic Speech Recognition Service (contact sales@agilecontent.com for info)

You will need the following account information

URL in the form <your-id>.asr.agilecontent.com
username
password
Audio ingest IP
SRT secret

Replace <your-id>, <username>, <password>, <ingest-ip> and <srt-secret> with those values in the examples below

Validate API Access

Validate you can access the service API.

$ curl -i -u "<username>:<password>" https://<your-id>.asr.agilecontent.com/api/v1/channels

You should get a response similar to this

HTTP/2 200
date: Thu, 08 Aug 2024 09:05:13 GMT
content-type: text/plain; charset=utf-8
content-length: 3
access-control-allow-origin: *
x-request-id: ip-10-0-0-138.eu-north-1.compute.internal/nxlyokd96h-000003

[]

Create a channel

To setup a channel first prepare a file mychannel.json with the following, but replace the language with the language code of your audio clip

{
  "id": "mychannel",
  "name": "My Channel",
  "input":{
    "type":"srt",
    "port":10000
  },
  "language":"en-US",
  "outputs":[
    "webvtt"
  ]
}

Then apply the channel configuration to the API

$ curl -i -u "<username>:<password>" -XPOST -H "Content-Type: application/json" https://<your-id>.asr.agilecontent.com/api/v1/channels -d @mychannel.json

You will receive a 200 OK response with a JSON payload similar to this

{
  "id": "mychannel",
  "name": "My Channel",
  "enabled": true,
  "engine": "google",
  "input": {
    "type": "srt",
    "port": 10000
  },
  "language": "en-US",
  "outputs": [
    "webvtt"
  ],
  "segmentation": {
    "rows": 2,
    "chars_per_row": 40,
    "progressive": false
  }
}

Ingest the Audio

Start ingesting audio from your audiofile (named audio.ts here).

ffmpeg -re -i audio.ts -vn -ac 1 -acodec pcm_s16le -f wav -bitexact "srt://<ingest-ip>:10000?mode=caller&passphrase=<srt-secret>"

Check the Subtitles

First request a HLS subtitle variant manifest to see available WebVTT files

$ curl https://<your-id>.asr.agilecontent.com/subtitles/mychannel/subtitles.m3u8

Replace with a specific WebVTT file to see the generated subtitles

$ curl https://<your-id>.asr.agilecontent.com/subtitles/mychannel/<replace with vtt file in manifest>.vtt

You will receive a WebVTT file, like

WEBVTT

00:00:47.717 --> 00:00:49.317
Nice subtitles you generated for me.

00:00:50.557 --> 00:00:51.117
No problem, it's been my pleasure.

You could also run the following python code to progressively see all WebVTT files as they are generated

import sys
import time

import m3u8
import requests

def watch(uri):

    seen_segments = set()
    while True:
        manifest = m3u8.load(uri)
        for s in manifest.segments:
            if s.absolute_uri not in seen_segments:
                print("#", s.absolute_uri)
                res = requests.get(s.absolute_uri)
                print(res.text)

                seen_segments.add(s.absolute_uri)

        if manifest.target_duration >= 1:
            time.sleep(manifest.target_duration / 2)
        else:
            time.sleep(1)

if __name__ == "__main__":
    watch(sys.argv[1])

Add it to a file show_subs.py and run

$ pip3 install m3u8 requests
$ python3 show_subs.py https://<your-id>.asr.agilecontent.com/subtitles/mychannel/subtitles.m3u8

Adjust Subtitles

To get the current settings for mychannel use

curl  -u "<username>:<password>" https://<your-id>.asr.agilecontent.com/api/v1/channels/mychannel

Put the JSON data in a file channel-change.json, then adjust for example chars_per_row and apply the change with

curl  -u "<username>:<password>" https://<your-id>.asr.agilecontent.com/api/v1/channels/mychannel -XPUT -H "Content-Type: application/json" -d @channel-change.json

Cleanup channel

To delete the channel mychannel after you’re done use

$ curl -i -u "<username>:<password>" -XDELETE https://<your-id>.asr.agilecontent.com/api/v1/channels/mychannel

2 - ASR Engine Configuration

Configuration of the ASR Engine and language settings

Introduction

This section will show you how to configure the Automatic Speech Recognition (ASR) engine and language settings for a channel.

There are two ASR engines available for the ASR service:

google (Google Cloud Speech-to-Text)
speechmatics (Speechmatics)

Configure ASR Engine and Language

Add the selected engine to the channel configuration

{
  "id": "mychannel",
  "name": "My Channel",
  "input":{
    "type":"srt",
    "port":10000
  },
  "language":"en-US",
  "outputs":[
    "webvtt"
  ],
  "engine": "google"
}

If the engine is not specified, the default engine is google.

Google Cloud Speech-to-Text

The language code for the Google ASR engine is in the form ll-CC where ll is the language code and CC is the country code. For example, en-US is English (United States) and es-ES is Spanish (Spain).

Speechmatics

Speechmatics uses the short language code ll, for example, ja is Japanese and es is Spanish. The only exception is for English and Chinese Mandarin, where the expected output should be specified as one of the following: en-GB, en-US, en-AU, cmn-Hans (Simplified), cmn-Hant (Traditional).

Once the desired engine and language has been selected, apply the channel configuration to the API

$ curl -i -u "<username>:<password>" -XPOST -H "Content-Type: application/json" https://<your-id>.asr.agilecontent.com/api/v1/channels -d @mychannel.json

You will receive a 200 OK response with a JSON payload similar to this

{
  "id": "mychannel",
  "name": "My Channel",
  "enabled": true,
  "engine": "google",
  "input": {
    "type": "srt",
    "port": 10000
  },
  "language": "en-US",
  "outputs": [
    "webvtt"
  ],
  "segmentation": {
    "rows": 2,
    "chars_per_row": 40,
    "progressive": false
  }
}

Translation

Speechmatics also support translation, enable it with the translation parameter in the channels API. For example, to translate American English channel into Spanish use the following channel configuration

{
  "id": "mychannel",
  "name": "My Translated Channel",
  "input":{
    "type":"srt",
    "port":10000
  },
  "language":"en-US",
  "translation": "es",
  "outputs":[
    "webvtt"
  ],
  "engine": "speechmatics"
}

See here for supported language translations.

3 - Integrate ASR output with Cavena STU

How to integrate Automatic Speech Recognition (ASR) output with Cavena STU

Introduction

This section will show you how to use Cavena STU as an output in the Automatic Speech Recognition (ASR) service.

There are 2 modes of integration with Cavena STU.

Mode 1: If the STU Subtitle Input Protocol of your STU is available on a public IP the ASR service can connect directly to the STU.

Mode 2: If not you need to install a STU agent together with the STU that connects to the ASR service and proxies the subtitle traffic.

Setup ASR service to connect to a STU on a public IP

You need to configure the public IP and port where the ASR service can connect to the STU Subtitle Input Protocol. Those settings are available in our settings API which contains system-wide settings that are not unique for each channel used.

Access the current settings with

$ curl -i -u "<username>:<password>" https://<your-id>.asr.agilecontent.com/api/v1/settings

The output have a JSON data structure similar to this

{
  "srt_passphrase": "<SRT Passphrase>",
  "subtitle_filename": "subtitle.vtt",
  "subtitle_rotation_interval": 4,
  "stu_settings": {
    "hostname": "<Public IP of the STU>",
    "port": 4621,
    "subtitle_type": 0,
    "user_id": "test-0",
    "username": "test",
    "offset": 0,
    "remove_accents": false,
    "server_mode": false
  }
}

Put the JSON data in a file settings.json and update the following

Set stu_settings.hostname to the public IP of your STU
Update stu_settings.port if you use a non-default port
Update stu_settings.server_mode to false

Now apply the new settings with the following command

$ curl -i -u "<username>:<password>" https://<your-id>.asr.agilecontent.com/api/v1/settings -XPUT -H "Content-Type: application/json" -d @settings.json

Continue to Create a channel further down to create a channel with a STU output.

Setup ASR service to listen for connections from a STU

Please contact your Agile Content representative for access and instructions on how to setup this mode.

Create a channel

To setup a channel first prepare a file mychannel.json with the following, but replace the language with the language code of your audio clip

{
  "id": "mychannel",
  "name": "My Channel",
  "input":{
    "type":"srt",
    "port":10000
  },
  "language":"en-US",
  "outputs":[
    "stu_ip"
  ]
}

Then apply the channel configuration to the API

$ curl -i -u "<username>:<password>" -XPOST -H "Content-Type: application/json" https://<your-id>.asr.agilecontent.com/api/v1/channels -d @mychannel.json

You will receive a 200 OK response with a JSON payload similar to this

{
  "id": "mychannel",
  "name": "My Channel",
  "enabled": true,
  "engine": "google",
  "input": {
    "type": "srt",
    "port": 10000
  },
  "language": "en-US",
  "outputs": [
    "stu_ip"
  ],
  "segmentation": {
    "rows": 2,
    "chars_per_row": 40,
    "progressive": false
  }
}

See the Getting Started guide for guidance on how to ingest audio and validate generated subtitles.

4 - Use with StreamBuilder

How to configure for integration with StreamBuilder

StreamBuilder Configuration

StreamBuilder communicates with ASR over a WebSocket URL that is used both to send audio to ASR and to receive subtitles back. The WebSocket URL is based on the id for your ASR account and have the form wss://<your-id>.asr.agilecontent.com/api/v1/sbapi.

When integrated with StreamBuilder the audio and subtitle track languages are configured in StreamBuilder. The language codes to use are described under ASR Engine. We currently support 2 variants of operation

One audio and one subtitle track with the same language code
When Speechmatic’s ASR engine is used it is also possible to specify the subtitle track langugage as a translated language instead of the original audio language as long as it is a combination supported by Speechmatics.

See the StreamBuilder documentation for details on StreamBuilder configuration.

Configure a StreamBuilder Channel

Channels are configured in ASR to tune ASR settings for a StreamBuilder channel. Create a channel configuration with input type sbapi in a file mychannel.json

{
  "id": "sbchannel",
  "name": "StreamBuilder Channel",
  "input": {
    "type":"sbapi"
  },
  "engine": "google",
  "segmentation": {
    "rows": 2,
    "chars_per_row": 40,
    "progressive": false
  }
}

and apply it to the configuration API with

$ curl -i -u "<username>:<password>" -XPOST -H "Content-Type: application/json" https://<your-id>.asr.agilecontent.com/api/v1/channels -d @mychannel.json

Tune ASR settings by changing the engine and segmentation settings. Note that both the port in the input section and the outputs are unused when integrated with StreamBuilder.

Remove a StreamBuilder Channel

To remove a StreamBuilder channel delete the corresponding resource in the API

$ curl -i -u "<username>:<password>" -XDELETE https://<your-id>.asr.agilecontent.com/api/v1/channels/sbchannel

This will also terminate any active StreamBuilder sessions on the channel.

5 - API Reference

Swagger API for Automatic Speech Recognition (ASR)

The Swagger API reference for your account is available at https://<your-id>.asr.agilecontent.com/swagger/index.html. Replace <your-id> in the URL so it matches the URL given with your account. To interact directly with the API through the Swagger interface you also need to provide your account username and password when asked for.