scaletempo

Scale tempo while maintaining pitch (WSOLA-like technique with cross correlation) Inspired by SoundTouch library by Olli Parviainen

Use Sceletempo to apply playback rates without the chipmunk effect.

Example pipelines

 filesrc location=media.ext ! decodebin name=d \
     d. ! queue ! audioconvert ! audioresample ! scaletempo ! audioconvert ! audioresample ! autoaudiosink \
     d. ! queue ! videoconvert ! autovideosink

OR

 playbin uri=... audio_sink="scaletempo ! audioconvert ! audioresample ! autoaudiosink"

When an application sends a seek event with rate != 1.0, Scaletempo applies the rate change by scaling the tempo without scaling the pitch.

Scaletempo works by producing audio in constant sized chunks (#GstScaletempo:stride) but consuming chunks proportional to the playback rate.

Scaletempo then smooths the output by blending the end of one stride with the next (#GstScaletempo:overlap).

Scaletempo smooths the overlap further by searching within the input buffer for the best overlap position. Scaletempo uses a statistical cross correlation (roughly a dot-product). Scaletempo consumes most of its CPU cycles here. One can use the search propery to tune how far the algorithm looks.

Scaletempo also supports an alternative mode where a scaling factor is dynamically selected to scale input data down to the duration of the input buffers.

The use case for this is when text to speech / speech synthesis elements are placed upstream: they will attach the duration of the input text as a custom GstScaletempoTargetDurationMeta to the audio buffers they output, scaletempo can then rescale the audio down to the expected duration.

When this mode is selected, using a rate != 1.0 is not supported.

Hierarchy

GObject
    ╰──GInitiallyUnowned
        ╰──GstObject
            ╰──GstElement
                ╰──GstBaseTransform
                    ╰──scaletempo

Factory details

Authors: – Rov Juvano

Classification:Filter/Effect/Rate/Audio

Rank – none

Plugin – audiofx

Package – GStreamer Good Plug-ins

Pad Templates

sink

audio/x-raw:
         format: F32LE
           rate: [ 1, 2147483647 ]
       channels: [ 1, 2147483647 ]
         layout: interleaved
audio/x-raw:
         format: F64LE
           rate: [ 1, 2147483647 ]
       channels: [ 1, 2147483647 ]
         layout: interleaved
audio/x-raw:
         format: S16LE
           rate: [ 1, 2147483647 ]
       channels: [ 1, 2147483647 ]
         layout: interleaved

Presencealways

Directionsink

Object typeGstPad


src

audio/x-raw:
         format: F32LE
           rate: [ 1, 2147483647 ]
       channels: [ 1, 2147483647 ]
         layout: interleaved
audio/x-raw:
         format: F64LE
           rate: [ 1, 2147483647 ]
       channels: [ 1, 2147483647 ]
         layout: interleaved
audio/x-raw:
         format: S16LE
           rate: [ 1, 2147483647 ]
       channels: [ 1, 2147483647 ]
         layout: interleaved

Presencealways

Directionsrc

Object typeGstPad


Properties

mode

“mode” Scaletempo-mode *

Control how the scaling factor is selected.

Flags : Read / Write

Default value : none

Since : 1.26


overlap

“overlap” gdouble

Percentage of stride to overlap

Flags : Read / Write

Default value : 0.2


rate

“rate” gdouble

Current playback rate

Flags : Read

Default value : 0


search

“search” guint

Length in milliseconds to search for best overlap position

Flags : Read / Write

Default value : 14


stride

“stride” guint

Length in milliseconds to output each stride

Flags : Read / Write

Default value : 30


Named constants

Scaletempo-mode

Possible values for the GstScaletempo:mode property.

Members

none (0x00000000) – default behavior, scale according to segment rate
fit-down (0x00000001) – fit audio data down to buffer duration, only supported with rate == 1.0

Since : 1.26


The results of the search are