Stats

Stats#

thebeat.stats.acf_df(sequence, resolution, smoothing_window=None, smoothing_sd=None)[source]#

Perform autocorrelation analysis on a Sequence object, and return a Pandas.DataFrame object containing the results.

Parameters:

sequence (Sequence) – A Sequence object.
resolution – The temporal resolution. If the used Sequence is in seconds, you might want to use 0.001. If the Sequence is in milliseconds, try using 1. Incidentally, the number of lags for the autocorrelation function is calculated as n_lags = sequence_duration / resolution.
smoothing_window (float | None, default: None) – The window within which a normal probability density function is used for smoothing out the analysis.
smoothing_sd (float | None, default: None) – The standard deviation of the normal probability density function used for smoothing out the analysis.

Returns:

DataFrame containing two columns: the timestamps, and the autocorrelation factor.

Return type:

pandas.DataFrame

Notes

This function is based on the procedure described in Ravignani and Norton [RN17]. There, one can also find a more detailed description of the smoothing procedure.

Examples

>>> rng = np.random.default_rng(seed=123)  # for reproducability
>>> seq = thebeat.core.Sequence.generate_random_uniform(n_events=10,a=400,b=600,rng=rng)
>>> df = acf_df(seq, smoothing_window=50, smoothing_sd=20, resolution=10)
>>> print(df.head(3))
   timestamp  correlation
0          0     1.000000
1         10     0.824706
2         20     0.547863

thebeat.stats.acf_plot(sequence, resolution, min_lag=None, max_lag=None, smoothing_window=None, smoothing_sd=None, title='Autocorrelation', x_axis_label='Lag', y_axis_label='Correlation', figsize=None, dpi=100, ax=None)[source]#

This function can be used for plotting an autocorrelation plot from a Sequence.

Parameters:

sequence (Sequence) – A Sequence object.
resolution – The temporal resolution. If the used Sequence is in seconds, you might want to use 0.001. If the Sequence is in milliseconds, try using 1. Incidentally, the number of lags for the autocorrelation function is calculated as n_lags = sequence_duration_in_ms / resolution.
min_lag (float | None, default: None) – The minimum lag to be plotted. Defaults to 0.
max_lag (float | None, default: None) – The maximum lag to be plotted. Defaults to the sequence duration.
smoothing_window (float | None, default: None) – The window (in milliseconds) within which a normal probability density function is used for smoothing out the analysis.
smoothing_sd (float | None, default: None) – The standard deviation of the normal probability density function used for smoothing out the analysis.
title (str, default: 'Autocorrelation') – If desired, one can provide a title for the plot. This takes precedence over using the Sequence or SoundSequence name attribute as the title of the plot (if the object has one).
x_axis_label (str, default: 'Lag') – A label for the x axis.
y_axis_label (str, default: 'Correlation') – A label for the y axis.
figsize (tuple | None, default: None) – A tuple containing the desired output size of the plot in inches, e.g. (4, 1). This refers to the figsize parameter in matplotlib.pyplot.figure().
dpi (int, default: 100) – The desired output resolution of the plot in dots per inch (DPI). This refers to the dpi parameter in matplotlib.pyplot.figure().
ax (Axes | None, default: None) – If desired, one can provide an existing matplotlib.axes.Axes object to plot the autocorrelation plot on. This is for instance useful if you want to plot multiple autocorrelation plots on the same figure.

Return type:

tuple[Figure, Axes]

Notes

This function is based on the procedure described in Ravignani and Norton [RN17]. There, one can also find a more detailed description of the smoothing procedure.

thebeat.stats.acf_values(sequence, resolution, smoothing_window=None, smoothing_sd=None)[source]#

Perform autocorrelation. This function takes a Sequence object, and returns an array with steps of resolution of unstandardized correlation factors.

Parameters:

sequence (Sequence) – A Sequence object.
resolution – The temporal resolution. If the used Sequence is in seconds, you might want to use 0.001. If the Sequence is in milliseconds, try using 1. Incidentally, the number of lags for the autocorrelation function is calculated as n_lags = sequence_duration / resolution.
smoothing_window (float | None, default: None) – The window within which a normal probability density function is used for smoothing out the analysis.
smoothing_sd (float | None, default: None) – The standard deviation of the normal probability density function used for smoothing out the analysis.

Return type:

ndarray

Notes

This function is based on the procedure described in Ravignani and Norton [RN17]. There, one can also find a more detailed description of the smoothing procedure.

This function uses the numpy.correlate() to calculate the correlations.

thebeat.stats.ccf_df(test_sequence, reference_sequence, resolution, smoothing_window=None, smoothing_sd=None)[source]#

Perform autocorrelation analysis on a Sequence object, and return a Pandas.DataFrame object containing the results.

Parameters:

test_sequence (Sequence) – The test sequence.
reference_sequence (Sequence) – The reference sequence.
resolution – The temporal resolution. If the used Sequence is in seconds, you might want to use 0.001. If the Sequence is in milliseconds, try using 1. Incidentally, the number of lags for the autocorrelation function is calculated as n_lags = sequence_duration / resolution.
smoothing_window (float | None, default: None) – The window within which a normal probability density function is used for smoothing out the analysis.
smoothing_sd (float | None, default: None) – The standard deviation of the normal probability density function used for smoothing out the analysis.

Returns:

DataFrame containing two columns: the timestamps, and the cross-correlation factor.

Return type:

pandas.DataFrame

Notes

This function is based on the procedure described in Ravignani and Norton [RN17]. There, one can also find a more detailed description of the smoothing procedure.

thebeat.stats.ccf_plot(test_sequence, reference_sequence, resolution, smoothing_window=None, smoothing_sd=None, title='Cross-correlation', x_axis_label='Lag', y_axis_label='Correlation', figsize=None, dpi=100, ax=None)[source]#

Calculate and plot the cross-correlation function (CCF) between two Sequence objects. The test sequence is compared to the reference sequence.

Parameters:

test_sequence (Sequence) – The test sequence.
reference_sequence (Sequence) – The reference sequence.
resolution – The temporal resolution. If the used Sequence is in milliseconds, you probably want 1. If the Sequence is in seconds, try using 0.001.
smoothing_window (float | None, default: None) – The window within which a normal probability density function is used for smoothing out the analysis.
smoothing_sd (float | None, default: None) – The standard deviation of the normal probability density function used for smoothing out the analysis.
title (str, default: 'Cross-correlation') – The title of the plot.
x_axis_label (str, default: 'Lag') – The label of the x axis.
y_axis_label (str, default: 'Correlation') – The label of the y axis.
figsize (tuple | None, default: None) – A tuple containing the desired output size of the plot in inches, e.g. (4, 1). This refers to the figsize parameter in matplotlib.pyplot.figure().
dpi (int, default: 100) – The resolution of the plot in dots per inch. This refers to the dpi parameter in matplotlib.pyplot.figure().
ax (Axes | None, default: None) – A matplotlib.axes.Axes object. If None, a new Figure and Axes is created.

Return type:

tuple[Figure, Axes]

Returns:

fig – The matplotlib.figure.Figure object.
ax – The matplotlib.axes.Axes object.

Notes

This function is based on the procedure described in Ravignani and Norton [RN17]. There, one can also find a more detailed description of the smoothing procedure.

thebeat.stats.ccf_values(test_sequence, reference_sequence, resolution, smoothing_window=None, smoothing_sd=None)[source]#

Returns the unstandardized cross-correlation function (CCF) for two Sequence objects. The test sequence is compared to the reference sequence.

Parameters:

test_sequence (Sequence) – The test sequence.
reference_sequence (Sequence) – The reference sequence.
resolution (float) – The temporal resolution. If the used Sequence is in milliseconds, you probably want 1. If the Sequence is in seconds, try using 0.001.
smoothing_window (float | None, default: None) – The window within which a normal probability density function is used for smoothing out the analysis.
smoothing_sd (float | None, default: None) – The standard deviation of the normal probability density function used for smoothing out the analysis.

Returns:

The unstandardized cross-correlation function.

Return type:

correlation

thebeat.stats.edit_distance_rhythm(test_rhythm, reference_rhythm, smallest_note_value=Fraction(1, 16))[source]#

Caculates edit/Levenshtein distance between two rhythms. The smallest_note_value determines the underlying grid that is used. If e.g. 16, the underlying grid is composed of 1/16th notes.

Note

Based on the procedure described in Post and Toussaint [PT11].

Parameters:

test_rhythm (Rhythm) – The rhythm to be tested.
reference_rhythm (Rhythm) – The rhythm to which test_rhythm will be compared.
smallest_note_value (Fraction | float, default: Fraction(1, 16)) – The smallest note value that is used in the underlying grid. 16 means 1/16th notes, 4 means 1/4th notes, etc.

Return type:

float

Examples

>>> from thebeat.music import Rhythm
>>> test_rhythm = Rhythm.from_note_values([1/4, 1/4, 1/4, 1/4])
>>> reference_rhythm = Rhythm.from_note_values([1/4, 1/8, 1/8, 1/4, 1/4])
>>> print(edit_distance_rhythm(test_rhythm, reference_rhythm))
1

thebeat.stats.edit_distance_sequence(test_sequence, reference_sequence, resolution)[source]#

Calculates the edit/Levenshtein distance between two sequences.

Requires for all the IOIs in a Sequence to be multiples of ‘resolution’. If needed, quantize the Sequence first, e.g. using Sequence.quantize_iois().

Note

The resolution also represents the underlying grid. If, for example, the resolution is 50, that means that a grid will be created with steps of 50. The onsets of the sequence are then placed on the grid for both sequences. The resulting sequences consist of ones and zeros, where ones represent the event onsets. This string for test_sequence is compared to the string of the reference_sequence. Note that test_sequence and reference_sequence can be interchanged without an effect on the results.

Parameters:

test_sequence (Sequence) – The sequence to be tested.
reference_sequence (Sequence) – The sequence to which test_sequence will be compared.
resolution (int) – The used resolution (i.e. bin size).

Return type:

float

thebeat.stats.fft_plot(sequence, unit_size, x_min=None, x_max=None, remove_dc=True, title='Fourier transform', x_axis_label='Cycles per unit', y_axis_label='Absolute power', figsize=None, dpi=100, ax=None)[source]#

Plots the Fourier transform of a Sequence object. The unit_size parameter is required, because Sequence objects are agnostic about the used time unit. You can use 1000 if the Sequence is in milliseconds, and 1 if the Sequence is in seconds.

On the x axis is plotted the number of cycles per unit, and on the y axis the absolute power. The number of cycles per unit can be interpreted as the beat frequency. For instance, 2 cycles for a unit size of 1000 ms means a beat frequency of 2 Hz.

Note

In most beat-finding applications you will want to set the x_max argument to something reasonable. The Fourier transform is plotted for all possible frequencies until the Nyquist frequency (half the value of unit_size). However, in most cases you will not be interested in frequencies that are higher than the beat frequency. For instance, if you have a sequence with a beat frequency of 2 Hz, you will not be interested in frequencies higher than, say, 20 Hz. In that case, you can set x_max to 20.

Parameters:

sequence (Sequence) – The sequence.
unit_size (float) – The size of the unit in which the sequence is measured. If the sequence is in milliseconds, you probably want 1000. If the sequence is in seconds, you probably want 1.
x_min (float | None, default: None) – The minimum number of cycles per unit to be plotted.
x_max (float | None, default: None) – The maximum number of cycles per unit to be plotted.
remove_dc (bool, default: True) – If True, the DC component is removed from the Fourier transform by subtracting the mean from the signal. The DC component is the first value of the Fourier transform and represents the average of the sequence.
title (str, default: 'Fourier transform') – The title of the plot.
x_axis_label (str, default: 'Cycles per unit') – The label of the x axis.
y_axis_label (str, default: 'Absolute power') – The label of the y axis.
figsize (tuple | None, default: None) – A tuple containing the desired output size of the plot in inches, e.g. (4, 1). This refers to the figsize parameter in matplotlib.pyplot.figure().
dpi (int, default: 100) – The resolution of the plot in dots per inch.
ax (Axes | None, default: None) – A matplotlib Axes object to plot on. If not provided, a new figure and axes will be created.

Return type:

tuple[Figure, Axes]

Returns:

fig – The matplotlib Figure object.
ax – The matplotlib Axes object.

Examples

>>> from thebeat import Sequence
>>> from thebeat.stats import fft_plot
>>> seq = Sequence.generate_random_normal(n_events=100, mu=500, sigma=25)  # milliseconds
>>> fft_plot(seq, unit_size=1000)
(<Figure size 800x550 with 1 Axes>, <Axes: title={'center': 'Fourier transform'}, xlabel='Cycles per unit', ylabel='Absolute power'>)

>>> seq = Sequence.generate_random_normal(n_events=100, mu=0.5, sigma=0.025)  # seconds
>>> fft_plot(seq, unit_size=1, x_max=5)
(<Figure size 800x550 with 1 Axes>, <Axes: title={'center': 'Fourier transform'}, xlabel='Cycles per unit', ylabel='Absolute power'>)

thebeat.stats.fft_values(sequence, unit_size, x_min=None, x_max=None, remove_dc=True)[source]#

Gets the x and y values for a Fourier transform of a Sequence object. The unit_size parameter is required, because Sequence objects are agnostic about the used time unit. You can use 1000 if the Sequence is in milliseconds, and 1 if the Sequence is in seconds.

The x values indicate the number of cycles per unit, and y values indicate the absolute power (amplitude squared). The number of cycles per unit can be interpreted as the beat frequency. For instance, 2 cycles for a unit size of 1000 ms means a beat frequency of 2 Hz.

Note

If a sequence ends with an event, the last event is not included in the calculation of the Fourier transform. This is because the Fourier transform assumes that the sequence can be repeated indefinitely. In other words, we estimate the Fourier transform on the basis of the IOIs, not on the basis of the onsets.

Parameters:

sequence (Sequence) – The sequence.
unit_size (float) – The size of the unit in which the sequence is measured. If the sequence is in milliseconds, you probably want 1000. If the sequence is in seconds, you probably want 1.
x_min (float | None, default: None) – The minimum number of cycles per unit to be plotted.
x_max (float | None, default: None) – The maximum number of cycles per unit to be plotted.
remove_dc (bool, default: True) – If True, the DC component is removed from the Fourier transform by subtracting the mean from the signal. The DC component is the first value of the Fourier transform and represents the average of the sequence.

Return type:

tuple[ndarray, ndarray]

Returns:

xf – The x values.
yf – The y values. Note that absolute values are returned.

Examples

>>> from thebeat import Sequence
>>> from thebeat.stats import fft_values
>>> seq = Sequence.generate_random_normal(n_events=100, mu=500, sigma=25)  # milliseconds
>>> xf, yf = fft_values(seq, unit_size=1000, x_max=10)

thebeat.stats.get_cov(sequence)[source]#

Calculate the coefficient of variantion of the inter-onset intervals (IOIS) in a thebeat.core.Sequence object.

Parameters:: sequence (Sequence) – A thebeat.core.Sequence object.
Returns:: The covariance of the sequence.
Return type:: float

thebeat.stats.get_interval_ratios_from_dyads(sequence)[source]#

Return sequential interval ratios, calculated as:

\(\textrm{ratio}_k = \frac{\textrm{IOI}_k}{\textrm{IOI}_k + \textrm{IOI}_{k+1}}\).

Note that for n IOIs this property returns n-1 ratios.

Parameters:: sequence (np.array | thebeat.core.Sequence | list) – The sequence from which to calculate the interval ratios. Can be a Sequence object, or a list or array of IOIs.

Notes

The used method is based on the methodology from Roeske et al. [RTPJ20].

thebeat.stats.get_npvi(sequence)[source]#

This function calculates the normalized pairwise variability index (nPVI) for a provided Sequence or SoundSequence object, or for an interable of inter-onset intervals (IOIs).

Parameters:: sequence (Sequence) – Either a Sequence or SoundSequence object, or an iterable containing inter-onset intervals (IOIs).
Returns:: The nPVI for the provided sequence.
Return type:: numpy.float64

Notes

The normalied pairwise variability index (nPVI) is a measure of the variability of adjacent temporal intervals. The nPVI is zero for sequences that are perfectly isochronous. See Jadoul et al. [JRT+16] and Ravignani and Norton [RN17] for more information on its use in rhythm research.

Examples

>>> seq = thebeat.core.Sequence.generate_isochronous(n_events=10, ioi=500)
>>> print(get_npvi(seq))
0.0

>>> rng = np.random.default_rng(seed=123)
>>> seq = thebeat.core.Sequence.generate_random_normal(n_events=10,mu=500,sigma=50,rng=rng)
>>> print(get_npvi(seq))
9.40657936323865

thebeat.stats.get_phase_differences(test_events, reference_sequence, reference_ioi='preceding', window_size=None, unit='degrees', modulo=True)[source]#

Get the phase differences for test_events compared to reference_sequence.

Phase differences are a (circular) measure of temporal alignment. They are calculated as the difference between the onset of a test event and the onset of the corresponding reference event, divided by the IOI of the reference IOI:

\(\textrm{phase difference} = \frac{\textrm{test onset} - \textrm{reference onset}}{\textrm{reference IOI}}\)

The resulting phase differences are expressed in the unit specified by unit, where the default is in degrees.

The reference IOI can either be the IOI in the reference sequence that ‘contains’ an onset in the test sequence (reference_ioi='containing'), or the IOI in the reference sequences that precedes this onset (reference_ioi='preceding'). The default is for the reference IOI to be the one that precedes the test onset. Let us consider a reference sequence with onsets:

\(t = 0\), \(t = 250\), \(t = 1000\)

If an onset in the test sequence is at \(t = 750\), and reference_ioi='containing', the reference IOI that will be used is the one from \(t = 250\) to \(t = 1000\). If reference_ioi='preceding', the reference IOI that will be used is the one from \(t = 0\) to \(t = 250\).

In addition, one can specify a window size to get a (moving) average of the preceding IOIs. This is only used if reference_ioi='preceding'. The window size determines the number of reference IOIs (up to and including the immediately ‘preceding’ one) that are used to calculate a mean reference IOI. This can be useful to smoothe out the reference IOI when using irregular (reference) sequences.

Finally, one can specify whether modular arithmetic should be used. For degrees, if modulo=True, the phase differences will be expressed in the range \([0, 360)\) degrees, \([0, 2\pi)\) radians, or \([0, 1)\) (as a plain ‘fraction’). If modulo=False, the phase differences will be expressed in the range \([0, \infty)\), with the event at the start of the containing interval corresponding to 0 for both the ‘containing’ and ‘preceding’ values.

Note

In cases where it is not possible to calculate a phase difference, np.nan is returned. This can happen in the following cases: - The test event is before the first onset in the reference sequence. - The test event is after the last onset in the reference sequence, and reference_ioi='containing'. - The test event is in the nth interval of the reference sequence, and reference_ioi='preceding' and a window_size is larger than n.

Examples

>>> from thebeat.core import Sequence
>>> reference = Sequence.from_onsets([0, 250, 1000])
>>> test = Sequence.from_onsets([250, 1250])
>>> get_phase_differences(test, reference, reference_ioi='preceding', unit='fraction')
array([0.        , 0.33333333])

Parameters:

test_events (Sequence | list[Real] | ndarray | Real) – A sequence or a single time point to be compared with the reference sequence. Can either be a single event time, or a Sequence, list, or NumPy array containing multiple events.
reference_sequence (Sequence) – The reference sequence. Can be a Sequence object, a list or array of Sequence objects, or a number. In the latter case, the reference sequence will be an isochronous sequence with a constant IOI of that number and the same length as sequence_1.
reference_ioi (str, default: 'preceding') – The IOI in the reference sequence that is used as the reference IOI. Can be either “containing” or “preceding”.
window_size (int | None, default: None) – The window size used for calculating the mean reference IOI. Only used if reference_ioi='preceding'.
circular_unit – The unit of the circular unit. Can be “degrees” (the default), “radians”, or “fraction”.
modulo (bool, default: True) – Return the phase differences modulo 360 degrees or not. Only has an effect if reference_ioi='preceding'.

Return type:

ndarray | float

Returns:

numpy.ndarray – An array containing the phase differences if a Sequence, list, or NumPy array was passed.
float – The phase difference if a single event time was passed.

thebeat.stats.get_rhythmic_entropy(sequence, smallest_unit)[source]#

Calculate Shannon entropy from bins. This is a measure of rhythmic complexity. If many different ‘note durations’ are present, entropy is high. If only a few are present, entropy is low. A sequence that is completely isochronous has a Shannon entropy of 0.

The smallest_unit determines the size of the bins/the underlying grid. In musical terms, this for instance represents a 1/16th note. Bins will then be made such that each bin has a width of one 1/16th note. A 1/4th note will then be contained in one of those bins. Sequence needs to be quantized to multiples of ‘smallest_unit’. If needed, quantize the Sequence first, e.g. using ‘Sequence.quantize_iois’.

Parameters:

sequence (Sequence | Rhythm) – The thebeat.core.Sequence object for which Shannon entropy is calculated.
smallest_unit (float) – The size of the bins/the underlying grid.

Example

>>> seq = thebeat.Sequence.generate_isochronous(n_events=10, ioi=500)
>>> print(get_rhythmic_entropy(seq, smallest_unit=250))
0.0

thebeat.stats.get_ugof_isochronous(test_sequence, reference_ioi, output_statistic='mean')[source]#

This function calculates the universal goodness of fit (ugof) measure. The ugof statistic quantifies how well a theoretical sequence describes a sequence at hand (the test_sequence). This function can only calculate ugof using a theoretical sequence that is isochronous.

The reference_ioi is the IOI of an isochronous theoretical sequence.

Parameters:

test_sequence (Sequence) – A Sequence or SoundSequence object that will be compared to reference_sequence.
reference_ioi (float) – A number (float or int) representing the IOI of an isochronous sequence.
output_statistic (str, default: 'mean') – Either ‘mean’ (the default) or ‘median’. This determines whether for the individual ugof values we take the mean or the median as the output statistic.

Returns:

The ugof statistic.

Return type:

numpy.float64

Notes

This measure is described in Burchardt et al. [BBKnornschild21]. Please also refer to this Github page for an R implementation of the ugof measure.

Examples

>>> seq = thebeat.core.Sequence.generate_isochronous(n_events=10, ioi=1000)
>>> ugof = get_ugof_isochronous(seq, reference_ioi=68.21)
>>> print(ugof)
0.4646435

thebeat.stats.ks_test(sequence, reference_distribution='normal', alternative='two-sided')[source]#

This function returns the D statistic and p value of a one-sample Kolmogorov-Smirnov test. It calculates how different the distribution of inter-onset intervals (IOIs) is compared to the provided reference distribution.

If p is significant it means that the IOIs are not distributed according to the provided reference distribution.

Parameters:

sequence (Sequence) – A Sequence object.
reference_distribution (str, default: 'normal') – Either ‘normal’ or ‘uniform’. The distribution against which the distribution of inter-onset intervals (IOIs) is compared.
alternative (str, default: 'two-sided') – Either ‘two-sided’, ‘less’, or ‘greater’. See scipy.stats.kstest() for more information.

Returns:

A SciPy named tuple containing the D statistic and the p value.

Return type:

scipy.stats._stats_py.KstestResult

Notes

This function uses scipy.stats.kstest(). For more information about the use of the Kolmogorov-Smirnov test in rhythm research, see Jadoul et al. [JRT+16] and Ravignani and Norton [RN17].

Examples

>>> rng = np.random.default_rng(seed=123)
>>> seq = thebeat.core.Sequence.generate_random_normal(n_events=100,mu=500,sigma=25,rng=rng)
>>> ks_result = ks_test(seq)
>>> print(round(ks_result.pvalue, 5))
0.6608

Stats

Contents

Stats#