scaletempo
Scale tempo while maintaining pitch (WSOLA-like technique with cross correlation) Inspired by SoundTouch library by Olli Parviainen
Use Sceletempo to apply playback rates without the chipmunk effect.
Example pipelines
filesrc location=media.ext ! decodebin name=d \
d. ! queue ! audioconvert ! audioresample ! scaletempo ! audioconvert ! audioresample ! autoaudiosink \
d. ! queue ! videoconvert ! autovideosink
OR
playbin uri=... audio_sink="scaletempo ! audioconvert ! audioresample ! autoaudiosink"
When an application sends a seek event with rate != 1.0, Scaletempo applies the rate change by scaling the tempo without scaling the pitch.
Scaletempo works by producing audio in constant sized chunks (#GstScaletempo:stride) but consuming chunks proportional to the playback rate.
Scaletempo then smooths the output by blending the end of one stride with the next (#GstScaletempo:overlap).
Scaletempo smooths the overlap further by searching within the input buffer for the best overlap position. Scaletempo uses a statistical cross correlation (roughly a dot-product). Scaletempo consumes most of its CPU cycles here. One can use the search propery to tune how far the algorithm looks.
Scaletempo also supports an alternative mode where a scaling factor is dynamically selected to scale input data down to the duration of the input buffers.
The use case for this is when text to speech / speech synthesis elements are
placed upstream: they will attach the duration of the input text as a custom
GstScaletempoTargetDurationMeta
to the audio buffers they output,
scaletempo can then rescale the audio down to the expected duration.
When this mode is selected, using a rate != 1.0 is not supported.
Hierarchy
GObject ╰──GInitiallyUnowned ╰──GstObject ╰──GstElement ╰──GstBaseTransform ╰──scaletempo
Factory details
Authors: – Rov Juvano
Classification: – Filter/Effect/Rate/Audio
Rank – none
Plugin – audiofx
Package – GStreamer Good Plug-ins
Pad Templates
sink
audio/x-raw:
format: F32LE
rate: [ 1, 2147483647 ]
channels: [ 1, 2147483647 ]
layout: interleaved
audio/x-raw:
format: F64LE
rate: [ 1, 2147483647 ]
channels: [ 1, 2147483647 ]
layout: interleaved
audio/x-raw:
format: S16LE
rate: [ 1, 2147483647 ]
channels: [ 1, 2147483647 ]
layout: interleaved
src
audio/x-raw:
format: F32LE
rate: [ 1, 2147483647 ]
channels: [ 1, 2147483647 ]
layout: interleaved
audio/x-raw:
format: F64LE
rate: [ 1, 2147483647 ]
channels: [ 1, 2147483647 ]
layout: interleaved
audio/x-raw:
format: S16LE
rate: [ 1, 2147483647 ]
channels: [ 1, 2147483647 ]
layout: interleaved
Properties
mode
“mode” Scaletempo-mode *
Control how the scaling factor is selected.
Flags : Read / Write
Default value : none
Since : 1.26
search
“search” guint
Length in milliseconds to search for best overlap position
Flags : Read / Write
Default value : 14
stride
“stride” guint
Length in milliseconds to output each stride
Flags : Read / Write
Default value : 30
Named constants
Scaletempo-mode
Possible values for the GstScaletempo:mode property.
Members
none
(0x00000000) – default behavior, scale according to segment rate
fit-down
(0x00000001) – fit audio data down to buffer duration, only supported with rate == 1.0
Since : 1.26
The results of the search are