This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Getting Started

Guide to get started using Automatic Speech Recognition (ASR) service

    Introduction

    This guide will get you started with Agile Contents Automatic Speech Recognition service. It will show you how WebVTT subtitles are generated live from an audio clip ingested with FFmpeg.

    Prerequisites

    Before following those steps make sure to prepare the following

    1. FFmpeg (or another mean to ingest WAV audio over SRT to the ASR service)
    2. An audio clip in a format decodable by ffmpeg
    3. Curl or another HTTP tool to use the API
    4. An account for the Automatic Speech Recognition Service (contact sales@agilecontent.com for info)

    You will need the following account information

    • URL in the form <your-id>.asr.agilecontent.com
    • username
    • password
    • Audio ingest IP
    • SRT secret

    Replace <your-id>, <username>, <password>, <ingest-ip> and <srt-secret> with those values in the examples below

    Validate API Access

    Validate you can access the service API.

    $ curl -i -u "<username>:<password>" https://<your-id>.asr.agilecontent.com/api/v1/channels
    

    You should get a response similar to this

    HTTP/2 200
    date: Thu, 08 Aug 2024 09:05:13 GMT
    content-type: text/plain; charset=utf-8
    content-length: 3
    access-control-allow-origin: *
    x-request-id: ip-10-0-0-138.eu-north-1.compute.internal/nxlyokd96h-000003
    
    []
    

    Create a channel

    To setup a channel first prepare a file mychannel.json with the following, but replace the language with the language code of your audio clip

    {
      "id": "mychannel",
      "name": "My Channel",
      "input":{
        "type":"srt",
        "port":10000
      },
      "language":"en-US",
      "outputs":[
        "webvtt"
      ]
    }
    

    Then apply the channel configuration to the API

    $ curl -i -u "<username>:<password>" -XPOST -H "Content-Type: application/json" https://<your-id>.asr.agilecontent.com/api/v1/channels -d @mychannel.json
    

    You will receive a 200 OK response with a JSON payload similar to this

    {
      "id": "mychannel",
      "name": "My Channel",
      "enabled": true,
      "engine": "google",
      "input": {
        "type": "srt",
        "port": 10000
      },
      "language": "en-US",
      "outputs": [
        "webvtt"
      ],
      "segmentation": {
        "rows": 2,
        "chars_per_row": 40,
        "progressive": false
      }
    }
    

    Ingest the Audio

    Start ingesting audio from your audiofile (named audio.ts here).

    ffmpeg -re -i audio.ts -vn -ac 1 -acodec pcm_s16le -f wav -bitexact "srt://<ingest-ip>:10000?mode=caller&passphrase=<srt-secret>"
    

    Check the Subtitles

    First request a HLS subtitle variant manifest to see available WebVTT files

    $ curl https://<your-id>.asr.agilecontent.com/subtitles/mychannel/subtitles.m3u8
    

    Replace with a specific WebVTT file to see the generated subtitles

    $ curl https://<your-id>.asr.agilecontent.com/subtitles/mychannel/<replace with vtt file in manifest>.vtt
    

    You will receive a WebVTT file, like

    WEBVTT
    
    00:00:47.717 --> 00:00:49.317
    Nice subtitles you generated for me.
    
    00:00:50.557 --> 00:00:51.117
    No problem, it's been my pleasure.
    

    You could also run the following python code to progressively see all WebVTT files as they are generated

    import sys
    import time
    
    import m3u8
    import requests
    
    def watch(uri):
    
        seen_segments = set()
        while True:
            manifest = m3u8.load(uri)
            for s in manifest.segments:
                if s.absolute_uri not in seen_segments:
                    print("#", s.absolute_uri)
                    res = requests.get(s.absolute_uri)
                    print(res.text)
    
                    seen_segments.add(s.absolute_uri)
    
            if manifest.target_duration >= 1:
                time.sleep(manifest.target_duration / 2)
            else:
                time.sleep(1)
    
    if __name__ == "__main__":
        watch(sys.argv[1])
    

    Add it to a file show_subs.py and run

    $ pip3 install m3u8 requests
    $ python3 show_subs.py https://<your-id>.asr.agilecontent.com/subtitles/mychannel/subtitles.m3u8
    

    Adjust Subtitles

    To get the current settings for mychannel use

    curl  -u "<username>:<password>" https://<your-id>.asr.agilecontent.com/api/v1/channels/mychannel
    

    Put the JSON data in a file channel-change.json, then adjust for example chars_per_row and apply the change with

    curl  -u "<username>:<password>" https://<your-id>.asr.agilecontent.com/api/v1/channels/mychannel -XPUT -H "Content-Type: application/json" -d @channel-change.json
    

    Cleanup channel

    To delete the channel mychannel after you’re done use

    $ curl -i -u "<username>:<password>" -XDELETE https://<your-id>.asr.agilecontent.com/api/v1/channels/mychannel