ESF Format

Introduction to the Edgeware Storage Format (ESF)

ESF is Edgeware’s CMAF-based storage format for both live and VoD content. It saves media data in CMAF tracks, and has additional metadata files. It supports H.264/AVC and HEVC/H.265 video, AAC and AC-3 audio, and wvtt subtitles.

Asset media data

The media data is stored as CMAF tracks in files with extensions “.cmfv”, “.cmfa”, and “.cmft”, for video, audio, and subtitles, respectively. The subtitle format is “wvtt” which is much more storage-efficient than “stpp”.

Asset metadata

The metadata for the assets are stored in two places:

  1. a single content_info.json file describing the tracks
  2. segment info (.dat) files describing the segments, one per track

Content info

Content info is stored in a file named content_info.json inside the asset directory. It is a JSON file containing enough information about the tracks of the assets, to be able to fill in all information about media tracks in HLS, DASH, or MSS manifests. This includes codecs, languages, time scales, bitrates etc. However, it does not contain information about individual segments. Such information is stored in segment info files.

Segment info

Metadata about the segments are stored in segment info files with extension .dat. There is exactly one such file for each media file.

These files describe the data of the individual segments of the CMAF tracks with a 32-byte entry for each segment:

FieldTypeDescription
Nruint32Segment number
Timeuint64Time (normally presentationTime=DecodeTime)
Duruint32Duration of segment in timescale specified in content info
Sizeuint32Size in bytes
Offsetuint64Byte offset inside track file
Restuint32Flags for SCTE-markers and other information

For VoD assets, an init segment is stored at the start of the file. Its size is given by the Offset of the first media segment in the segment info file.

Commonalities with DASH OnDemand format

DASH OnDemand stores media data in the same type of CMAF tracks as ESF. However, the metadata is stored in other structures.

To describe the asset and its variants, there is a manifest called Media Presentation Description (MPD) which is an XML file with file extension .mpd.

It is similar to the ESF content_info.json file. It has explicit switching groups called Adaptation Sets, but is lacking some other information compared to ESF like video parameter sets which are needed to generate MSS manifests.

Similar to ESF, there is a second structure that contains the information about the segments. In the case of DASH OnDemand, this information is stored in a sidx box inside the CMAF track itself. Its position is at the beginning of the media file right after the init segment, and before the actual media data.

By generating a DASH MPD file, and inserting a sidx box in the media tracks, it is possible to make VoD assets which are both compatible with DASH OnDemand and the ESF format. The new ESB3031 ew-vodingest tool generates such combined ingested files. For DASH OnDemand, a complete WebVTT file is used instead of wvtt segments. That complete WebVTT file is generated by extracting and concatenating all subtitle cues from the wvtt subtitle tracks used in the ESF format.

Live storage using ESF

The main reason to use ESF instead of DASH OnDemand, is that the latter does not support live content, but requires a static structure. In fact, since the sidx box must be placed before the media, it is not even possible to concatenate media segments and write a sidx box at the end. With ESF we can have the same format for both VoD and live.

The SegmentInfo files are separated from the media files and use 32 bytes per segment. These are therefore easy to seek and also to grow, by just adding another 32 bytes for each segment. In the ESB3003 catchup buffer, we store live content in one-minute files.