Adaptive Demuxers for DASH, HLS and Smooth Streaming

There are two sets of elements implementing client-side adaptive streaming (HLS, DASH, Microsoft Smooth Streaming) in GStreamer:

  • The old legacy elements dashdemux, hlsdemux, mssdemux in the gst-plugins-bad module.

  • New dashdemux2, hlsdemux2, mssdemux2 elements in gst-plugins-good (added in GStreamer 1.22).

The legacy adaptive streaming support in gst-plugins-bad had several pitfalls that prevented improving it easily. The legacy design used a model where an adaptive streaming element (dashdemux, hlsdemux) downloaded multiplexed fragments of media, but then relied on other components in the pipeline to provide download buffering, demuxing, elementary stream handling and decoding.

The problems with the old design included:

  1. An assumption that fragment streams (to download) are equal to output (elementary) streams.

    • This made it hard to expose GstStream and GstStreamCollection describing the available media streams, and by extension made it difficult to provide efficient stream selection support
  2. By performing download buffering outside the adaptive streaming elements, the download scheduling had no visibility into the presentation timeline.

    • This made it impossible to handle more efficient variant selection and download strategy
  3. Several issues with establishing accurate timing/duration of fragments due to not dealing with parsed data

    • Especially with HLS, which does not provide detailed timing information about the underlying media streams to the same extent that DASH does.
  4. Aging design that grew organically since the initial adaptive demuxer implementation with a much more limited feature set, and misses a better understanding of how a feature-rich implementation should work nowadays.

    • The code was complicated and interwoven in ways that were hard to follow and reason about.
  5. Use of GStreamer pipeline sources for downloading.

    • An internal download pipeline that contained a httpsrc -> queue2 -> src chain made download management, bandwidth estimation and stream parsing more difficult, and used a new thread for each download.

New design

The rest of this document describes the new adaptive streaming client implementation that landed in gst-plugins-good in GStreamer 1.22.

The new elements only work in combination with the "streams-aware" playbin3 and uridecodebin3 elements that support advanced stream selection functionality, they won't work with the legacy playbin element.

High-level overview of the new internal AdaptiveDemux2 base class:

  • Buffering is handled inside the adaptive streaming element, based on elementary streams (i.e. de-multiplexed from the downloaded fragments) and stored inside the adaptivedemux-based element.

  • The download strategy has full visibility on bitrates, bandwidth, per-stream queueing level (in time and bytes), playback position, etc. This opens up the possibility of much more intelligent adaptive download strategies.

  • Output pads are not handled directly by the subclasses. Instead subclasses specify which tracks of elementary streams they can provide and what "download streams" can provide contents for those tracks. The baseclass handles usage and activation of the tracks based on application select-streams requests, and activation of the stream needed to feed each selected track.

  • Output is done from a single thread, with the various elementary streams packets being output in time order (i.e. behaving like a regular demuxer, with interleaving reduced to its minimum). There is minimal buffering downstream in the pipeline - only the amount required to perform decode and display.

  • The adaptive streaming element only exposes src pads for the selected GstStreams. Typically, there will be one video track, one audio track and perhaps one subtitle track exposed at a time, for example.

  • Stream selection is handled by the element directly. When switching on a new media stream, the output to the relevant source pad is switched once there is enough content buffered on the newly requested stream, providing rapid and seamless stream switching.

  • Only 3 threads are used regardless of the number of streams/tracks. One is dedicated to download, one for output, and one for scheduling and feeding contents to the tracks.

The main components of the new adaptive demuxers are:

  • GstAdaptiveDemuxTrack : end-user meaningful elementary streams. Those can be selected by the user. They are provided by the subclasses based on the manifest.

    • They each correspond to a GstStream of a GstStreamCollection
    • They are unique by GstStreamType and any other unique identifier specified by the manifest (ex: language)
    • The caps can change through time
  • OutputSlot : A track being exposed via a source pad. This is handled by the parent class.

  • GstAdaptiveDemuxStream : implementation-specific download stream. This is linked to one or more GstAdaptiveDemuxTrack. The contents of that stream will be parsed (via parsebin) and fed to the target tracks.

    • What tracks are provided by a given GstAdaptiveDemuxStream is specified by the subclass. But can also be discovered at runtime if the manifest did not provide enough information (for example with HLS).
  • Download thread : Receives download requests from the scheduling thread that can be queried and interrupted. Performs all download servicing in a single dedicated thread that can estimate download bandwidth across all simultaneous requests.

  • Scheduling thread : In charge of deciding what new downloads should be started based on overall position, track buffering levels, selected tracks, current time ... It is also in charge of handling completed downloads. Fragment downloads are sent to dedicated parsebin elements that feed the parsed elementary data to GstAdaptiveDemuxTrack

  • Output thread : In charge of deciding which track should be outputted/removed/switched (via OutputSlot) based on requested selection and track levels.

Track(s) and Stream(s)

Adaptive Demuxers provide one or more Track of elementary streams. They are each unique in terms of:

  • Their type (audio, video, text, ..). Ex : GST_STREAM_TYPE_AUDIO
  • Optional: Their codec. Ex : video/x-h264
  • Optional: Their language. ex : GST_TAG_LANGUAGE : "fr"
  • Optional: Their number of channels (ex: stereo vs 5.1). ex audio/x-vorbis,channels=2
  • Any other feature which would make the stream "unique" either because of their nature (ex: video angle) or specified by the manifest as being "unique".

But tracks can vary over time by:

  • bitrate
  • profile or level
  • resolution

They correspond to what an end-user might want to select (i.e. will be exposed in a GstStreamCollection). They are each identified by a stream_id provided by the subclass.

Note: A manifest can specify that tracks that would normally be separate based on the above rules (for example different codecs or channels) are actually the same "end-user selectable stream" (i.e. track). In such case only one track is provided and the nature of the elementary stream can change through time.

Adaptive Demuxers subclasses also need to provide one or more Download Stream (GstAdaptiveDemuxStream) which are the implementation-/manifest-specific "streams" that each feed one or more Track. Those streams can also vary over time by bitrate/profile/resolution/... but always target the same tracks.

The downloaded data from each of those GstAdaptiveDemuxStream is fed to a parsebin element which will put the output in the associated GstAdaptiveDemuxTrack.

The tracks have some buffering capability, handled by the baseclass.

This separation allows the base-class to:

  • decide which download stream(s) should be (de)activated based on the current track selection
  • decide when to (re)start download requests based on buffering levels, positions and external actions.
  • Handle buffering, output and stream selection.

The subclass is responsible for deciding:

  • Which next download should be requested for that stream based on current playback position, the provided encoded bitrates, estimates of download bandwidth, buffering levels, etc..

Subclasses can also decide, before passing the downloaded data over, to:

  • pre-parse specific headers (ex: ID3 and webvtt headers in HLS, MP4 fragment position, etc..).

  • pre-parse actual content if needed because a position estimation is needed (ex: HLS missing accurate positioning of fragments in the overall timeline)

  • rewrite the content altogether (for example webvtt fragments which require timing to be re-computed)

Timeline, position, playout

Adaptive Demuxers decide what to download based on a Timeline made of one or more Tracks.

The output of that Timeline is synchronized (each Track pushes downstream at more or less the same position in time). That position is the "Global Output Position".

The Timeline should have sufficient data in each track to allow all tracks to be decoded and played back downstream without introducing stalls. It is the goal of the Scheduling thread of adaptive demuxers to determine which fragment of data to download and at which moment, in order for:

  • each track to have sufficient data for continuous playback downstream
  • the overall buffering to not exceed specified limits (in time and/or bytes)
  • the playback position to not stray away in case of live sources and low-latency scenarios.

Which Track is selected on that Timeline is either:

  • decided by the element (default choices)
  • decided by the user (via GST_EVENT_SELECT_STREAMS)

The goal of an Adaptive Demuxer is to establish which fragment to download and when based on:

  • The selected Tracks
  • The current Timeline output position
  • The current Track download position (i.e. how much is buffered)
  • The available bandwidth (calculated based on download speed)
  • The bitrate of each fragment stream provided
  • The current time (for live sources)

In the future, an Adaptive Demuxer will be able to decide to discard a fragment if it estimates it can switch to a higher/lower variant in time to still satisfy the above requirements.

Download helper and thread

Based on the above, each Adaptive Demuxer implementation specifies to a Download Loop which fragment to download next and when.

Multiple downloads can be requested at the same time on that thread. It is the responsibility of the Scheduler thread to decide what to do when a download is completed.

Since all downloads are done in a dedicated thread without any blocking, it can estimate current bandwidth and latency, which the element can use to switch variants and improve buffering strategy.

Note: Unlike the old design, the libsoup library is used directly for downloading, and not via external GStreamer elements. In the future, this could be made modular so that other HTTP libraries can be used instead.

Stream Selection

When sending GST_EVENT_STREAM_COLLECTION downstream, the adaptive demuxer also specifies on the event that it can handle stream-selection. Downstream elements (i.e. decodebin3) won't attempt to do any selection but will handle/decode/expose all the streams provided by the adaptive demuxer (including streams that get added/removed at runtime).

When handling a GST_EVENT_SELECT_STREAMS, the adaptive demuxer will:

  • mark the requested tracks as selected (and no-longer requested ones as not selected)
  • instruct the streams associated to no-longer selected tracks to stop
  • set the current output position on streams associated to newly selected tracks and instruct them to be started
  • return

The actual changes in output (because of a stream selection change) are done in the output thread

  • If a track is no longer selected and there are no candidate replacement tracks of the same type, the associated output/pad is removed and the track is drained.

  • If a track is selected and doesn't have a candidate replacement slot of the same type, a new output/pad is added for that track

  • If a track is selected and has a candidate replacement slot, it will only be switched if the track it is replacing is empty OR when it has enough buffering so the switch can happen without re-triggering buffering.

Periods

The number and type of GstAdaptiveDemuxTrack and GstAdaptiveDemuxStream can not change once the initial manifests are parsed.

In order to change that (for example in the case of a new DASH period), a new GstAdaptiveDemuxPeriod must be started.

All the tracks and streams that are created at any given time are associated to the current input period. The streams of the input period are the ones that are active (i.e. downloading), and by extension the tracks of that input period are the ones that are being filled (if selected).

That period could also be the output period. The (selected) tracks of that period are the ones that are used for output by the output thread.

But due to buffering, the input and output period could be different, the baseclass will automatically handle switch over.

The only requirement for subclasses is to ask the parent class to start a new period when needed and then create the new tracks and streams.

Responsibility split

The GstAdaptiveDemux2 base class is in charge of:

  • helper for all downloads.
  • helper for parsing (using parsebin and custom parsing functions) stream data.
  • provides parsed elementary content for each fragment (note: could be more than one output stream for a given fragment)
  • helper for providing Tracks that can be filled by subclasses.
  • dealing with stream selection and output, including notifying subclasses which of those are active or not
  • handling buffering and deciding when to request new data from associated stream

Subclasses are in charge of:

  • specifying which GstAdaptiveDemuxTrack and GstAdaptiveDemuxStream they provide (based on the manifest) and their relationship.
  • when requested by the base class, specify which GstAdaptiveDemuxFragment should be downloaded next for a given (selected) stream.

The results of the search are