Seven of the spectral shape descriptors, computed on a linear scale for both amplitude and frequency.
The descriptors are:
The drawings in Peeters 2003 ( http://recherche.ircam.fr/anasyn/peeters/ARTICLES/Peeters_2003_cuidadoaudiofeatures.pdf ) are useful, as are the commented examples below. For the mathematically-inclined reader, the tutorials and code offered here ( https://www.audiocontentanalysis.org/ ) are interesting to further the understanding. For examples of the impact of computing the moments in power magnitudes, and/or in exponential frequency scale, please refer to the helpfile.
The process will return a multichannel control stream with the seven values, which will be repeated if no change happens within the algorithm, i.e. when the hopSize is larger than the signal vector size.
Read more about FluidSpectralShape on the learn platform.
in |
Audio-rate signal to analyze |
select |
An array of |
minFreq |
The minimum frequency that the algorithm will consider for computing the spectral shape. Frequencies below will be ignored. The default of 0 goes down to DC when possible. Constraints
|
maxFreq |
The maximum frequency that the algorithm will consider for computing the spectral shape. Frequencies above will be ignored. The default of -1 goes up to Nyquist. Constraints
|
rolloffPercent |
This sets the percentage of the frame's energy that will be reported as the rolloff frequency. The default is 95%. Constraints
|
unit |
The frequency unit for the spectral shapes to be computed upon, and outputted at. The default (0) is in Hertz and computes the moments on a linear spectrum. The alternative is in MIDI note numbers(1), which compute the moments on an exponential spectrum. |
power |
This flag sets the scaling of the magnitudes in the moment calculation. It uses either its amplitude (0, by default) or its power (1). |
windowSize |
The window size. As sinusoidal estimation relies on spectral frames, we need to decide what precision we give it spectrally and temporally. For more information visit https://learn.flucoma.org/learn/fourier-transform/ |
hopSize |
The window hop size. As sinusoidal estimation relies on spectral frames, we need to move the window forward. It can be any size, but low overlap will create audible artefacts. The -1 default value will default to half of windowSize (overlap of 2). |
fftSize |
The inner FFT/IFFT size. It should be at least 4 samples long, at least the size of the window, and a power of 2. Making it larger allows an oversampling of the spectral precision. The -1 default value will default to windowSize. |
maxFFTSize |
Set an explicit upper bound on the FFT size at object instantiation. The default of |
A 7-channel KR signal with the seven spectral shape descriptors. The latency is windowSize.
logarithmic scaleThe computation of the spectral centroid can also be done considering a logarithmic pitch scale and the power of the magnitudes. This yields values that are generally considered to be more in line with perception, for instance where the shape is often drawn and described in logarithmic terms, i.e., dB per octave.
Compare the values of the centroid and the spread in both scales. The lower the frequency, the more the linear spectral bias shows. The same applies to the spread. The logarithmic unit is in semitones. To convert, etiher divide by 12 to get the octave of one standard deviation, or divide by 6 to get the width of the filter in octaves. One clear observation is that the width is now in a range that scales with what we hear, growing fourfold as the filter goes from resonanting to more broadband.