Stats#
- thebeat.stats.acf_df(sequence, resolution, smoothing_window=None, smoothing_sd=None)[source]#
Perform autocorrelation analysis on a
Sequenceobject, and return aPandas.DataFrameobject containing the results.- Parameters:
resolution – The temporal resolution. If the used Sequence is in seconds, you might want to use 0.001. If the Sequence is in milliseconds, try using 1. Incidentally, the number of lags for the autocorrelation function is calculated as
n_lags = sequence_duration / resolution.smoothing_window (
float|None, default:None) – The window within which a normal probability density function is used for smoothing out the analysis.smoothing_sd (
float|None, default:None) – The standard deviation of the normal probability density function used for smoothing out the analysis.
- Returns:
DataFrame containing two columns: the timestamps, and the autocorrelation factor.
- Return type:
Notes
This function is based on the procedure described in Ravignani and Norton [RN17]. There, one can also find a more detailed description of the smoothing procedure.
Examples
>>> rng = np.random.default_rng(seed=123) # for reproducability >>> seq = thebeat.core.Sequence.generate_random_uniform(n_events=10,a=400,b=600,rng=rng) >>> df = acf_df(seq, smoothing_window=50, smoothing_sd=20, resolution=10) >>> print(df.head(3)) timestamp correlation 0 0 1.000000 1 10 0.824706 2 20 0.547863
- thebeat.stats.acf_plot(sequence, resolution, min_lag=None, max_lag=None, smoothing_window=None, smoothing_sd=None, title='Autocorrelation', x_axis_label='Lag', y_axis_label='Correlation', figsize=None, dpi=100, ax=None)[source]#
This function can be used for plotting an autocorrelation plot from a
Sequence.- Parameters:
resolution – The temporal resolution. If the used Sequence is in seconds, you might want to use 0.001. If the Sequence is in milliseconds, try using 1. Incidentally, the number of lags for the autocorrelation function is calculated as
n_lags = sequence_duration_in_ms / resolution.min_lag (
float|None, default:None) – The minimum lag to be plotted. Defaults to 0.max_lag (
float|None, default:None) – The maximum lag to be plotted. Defaults to the sequence duration.smoothing_window (
float|None, default:None) – The window (in milliseconds) within which a normal probability density function is used for smoothing out the analysis.smoothing_sd (
float|None, default:None) – The standard deviation of the normal probability density function used for smoothing out the analysis.title (
str, default:'Autocorrelation') – If desired, one can provide a title for the plot. This takes precedence over using theSequenceorSoundSequencenameattribute as the title of the plot (if the object has one).x_axis_label (
str, default:'Lag') – A label for the x axis.y_axis_label (
str, default:'Correlation') – A label for the y axis.figsize (
tuple|None, default:None) – A tuple containing the desired output size of the plot in inches, e.g.(4, 1). This refers to thefigsizeparameter inmatplotlib.pyplot.figure().dpi (
int, default:100) – The desired output resolution of the plot in dots per inch (DPI). This refers to thedpiparameter inmatplotlib.pyplot.figure().ax (
Axes|None, default:None) – If desired, one can provide an existingmatplotlib.axes.Axesobject to plot the autocorrelation plot on. This is for instance useful if you want to plot multiple autocorrelation plots on the same figure.
- Return type:
Notes
This function is based on the procedure described in Ravignani and Norton [RN17]. There, one can also find a more detailed description of the smoothing procedure.
- thebeat.stats.acf_values(sequence, resolution, smoothing_window=None, smoothing_sd=None)[source]#
Perform autocorrelation. This function takes a
Sequenceobject, and returns an array with steps ofresolutionof unstandardized correlation factors.- Parameters:
resolution – The temporal resolution. If the used Sequence is in seconds, you might want to use 0.001. If the Sequence is in milliseconds, try using 1. Incidentally, the number of lags for the autocorrelation function is calculated as
n_lags = sequence_duration / resolution.smoothing_window (
float|None, default:None) – The window within which a normal probability density function is used for smoothing out the analysis.smoothing_sd (
float|None, default:None) – The standard deviation of the normal probability density function used for smoothing out the analysis.
- Return type:
Notes
This function is based on the procedure described in Ravignani and Norton [RN17]. There, one can also find a more detailed description of the smoothing procedure.
This function uses the
numpy.correlate()to calculate the correlations.
- thebeat.stats.ccf_df(test_sequence, reference_sequence, resolution, smoothing_window=None, smoothing_sd=None)[source]#
Perform autocorrelation analysis on a
Sequenceobject, and return aPandas.DataFrameobject containing the results.- Parameters:
test_sequence (
Sequence) – The test sequence.reference_sequence (
Sequence) – The reference sequence.resolution – The temporal resolution. If the used Sequence is in seconds, you might want to use 0.001. If the Sequence is in milliseconds, try using 1. Incidentally, the number of lags for the autocorrelation function is calculated as
n_lags = sequence_duration / resolution.smoothing_window (
float|None, default:None) – The window within which a normal probability density function is used for smoothing out the analysis.smoothing_sd (
float|None, default:None) – The standard deviation of the normal probability density function used for smoothing out the analysis.
- Returns:
DataFrame containing two columns: the timestamps, and the cross-correlation factor.
- Return type:
Notes
This function is based on the procedure described in Ravignani and Norton [RN17]. There, one can also find a more detailed description of the smoothing procedure.
- thebeat.stats.ccf_plot(test_sequence, reference_sequence, resolution, smoothing_window=None, smoothing_sd=None, title='Cross-correlation', x_axis_label='Lag', y_axis_label='Correlation', figsize=None, dpi=100, ax=None)[source]#
Calculate and plot the cross-correlation function (CCF) between two
Sequenceobjects. The test sequence is compared to the reference sequence.- Parameters:
test_sequence (
Sequence) – The test sequence.reference_sequence (
Sequence) – The reference sequence.resolution – The temporal resolution. If the used Sequence is in milliseconds, you probably want 1. If the Sequence is in seconds, try using 0.001.
smoothing_window (
float|None, default:None) – The window within which a normal probability density function is used for smoothing out the analysis.smoothing_sd (
float|None, default:None) – The standard deviation of the normal probability density function used for smoothing out the analysis.title (
str, default:'Cross-correlation') – The title of the plot.x_axis_label (
str, default:'Lag') – The label of the x axis.y_axis_label (
str, default:'Correlation') – The label of the y axis.figsize (
tuple|None, default:None) – A tuple containing the desired output size of the plot in inches, e.g.(4, 1). This refers to thefigsizeparameter inmatplotlib.pyplot.figure().dpi (
int, default:100) – The resolution of the plot in dots per inch. This refers to thedpiparameter inmatplotlib.pyplot.figure().ax (
Axes|None, default:None) – Amatplotlib.axes.Axesobject. IfNone, a new Figure and Axes is created.
- Return type:
- Returns:
fig– Thematplotlib.figure.Figureobject.ax– Thematplotlib.axes.Axesobject.
Notes
This function is based on the procedure described in Ravignani and Norton [RN17]. There, one can also find a more detailed description of the smoothing procedure.
- thebeat.stats.ccf_values(test_sequence, reference_sequence, resolution, smoothing_window=None, smoothing_sd=None)[source]#
Returns the unstandardized cross-correlation function (CCF) for two
Sequenceobjects. The test sequence is compared to the reference sequence.- Parameters:
test_sequence (
Sequence) – The test sequence.reference_sequence (
Sequence) – The reference sequence.resolution (
float) – The temporal resolution. If the used Sequence is in milliseconds, you probably want 1. If the Sequence is in seconds, try using 0.001.smoothing_window (
float|None, default:None) – The window within which a normal probability density function is used for smoothing out the analysis.smoothing_sd (
float|None, default:None) – The standard deviation of the normal probability density function used for smoothing out the analysis.
- Returns:
The unstandardized cross-correlation function.
- Return type:
correlation
- thebeat.stats.edit_distance_rhythm(test_rhythm, reference_rhythm, smallest_note_value=Fraction(1, 16))[source]#
Caculates edit/Levenshtein distance between two rhythms. The
smallest_note_valuedetermines the underlying grid that is used. If e.g. 16, the underlying grid is composed of 1/16th notes.Note
Based on the procedure described in Post and Toussaint [PT11].
- Parameters:
test_rhythm (
Rhythm) – The rhythm to be tested.reference_rhythm (
Rhythm) – The rhythm to whichtest_rhythmwill be compared.smallest_note_value (
Fraction|float, default:Fraction(1, 16)) – The smallest note value that is used in the underlying grid. 16 means 1/16th notes, 4 means 1/4th notes, etc.
- Return type:
Examples
>>> from thebeat.music import Rhythm >>> test_rhythm = Rhythm.from_note_values([1/4, 1/4, 1/4, 1/4]) >>> reference_rhythm = Rhythm.from_note_values([1/4, 1/8, 1/8, 1/4, 1/4]) >>> print(edit_distance_rhythm(test_rhythm, reference_rhythm)) 1
- thebeat.stats.edit_distance_sequence(test_sequence, reference_sequence, resolution)[source]#
Calculates the edit/Levenshtein distance between two sequences.
Requires for all the IOIs in a Sequence to be multiples of ‘resolution’. If needed, quantize the Sequence first, e.g. using Sequence.quantize_iois().
Note
The resolution also represents the underlying grid. If, for example, the resolution is 50, that means that a grid will be created with steps of 50. The onsets of the sequence are then placed on the grid for both sequences. The resulting sequences consist of ones and zeros, where ones represent the event onsets. This string for
test_sequenceis compared to the string of thereference_sequence. Note thattest_sequenceandreference_sequencecan be interchanged without an effect on the results.
- thebeat.stats.fft_plot(sequence, unit_size, x_min=None, x_max=None, remove_dc=True, title='Fourier transform', x_axis_label='Cycles per unit', y_axis_label='Absolute power', figsize=None, dpi=100, ax=None)[source]#
Plots the Fourier transform of a
Sequenceobject. Theunit_sizeparameter is required, because Sequence objects are agnostic about the used time unit. You can use 1000 if the Sequence is in milliseconds, and 1 if the Sequence is in seconds.On the x axis is plotted the number of cycles per unit, and on the y axis the absolute power. The number of cycles per unit can be interpreted as the beat frequency. For instance, 2 cycles for a unit size of 1000 ms means a beat frequency of 2 Hz.
Note
In most beat-finding applications you will want to set the
x_maxargument to something reasonable. The Fourier transform is plotted for all possible frequencies until the Nyquist frequency (half the value ofunit_size). However, in most cases you will not be interested in frequencies that are higher than the beat frequency. For instance, if you have a sequence with a beat frequency of 2 Hz, you will not be interested in frequencies higher than, say, 20 Hz. In that case, you can setx_maxto 20.- Parameters:
sequence (
Sequence) – The sequence.unit_size (
float) – The size of the unit in which the sequence is measured. If the sequence is in milliseconds, you probably want 1000. If the sequence is in seconds, you probably want 1.x_min (
float|None, default:None) – The minimum number of cycles per unit to be plotted.x_max (
float|None, default:None) – The maximum number of cycles per unit to be plotted.remove_dc (
bool, default:True) – If True, the DC component is removed from the Fourier transform by subtracting the mean from the signal. The DC component is the first value of the Fourier transform and represents the average of the sequence.title (
str, default:'Fourier transform') – The title of the plot.x_axis_label (
str, default:'Cycles per unit') – The label of the x axis.y_axis_label (
str, default:'Absolute power') – The label of the y axis.figsize (
tuple|None, default:None) – A tuple containing the desired output size of the plot in inches, e.g.(4, 1). This refers to thefigsizeparameter inmatplotlib.pyplot.figure().dpi (
int, default:100) – The resolution of the plot in dots per inch.ax (
Axes|None, default:None) – A matplotlib Axes object to plot on. If not provided, a new figure and axes will be created.
- Return type:
- Returns:
fig– The matplotlib Figure object.ax– The matplotlib Axes object.
Examples
>>> from thebeat import Sequence >>> from thebeat.stats import fft_plot >>> seq = Sequence.generate_random_normal(n_events=100, mu=500, sigma=25) # milliseconds >>> fft_plot(seq, unit_size=1000) (<Figure size 800x550 with 1 Axes>, <Axes: title={'center': 'Fourier transform'}, xlabel='Cycles per unit', ylabel='Absolute power'>)
>>> seq = Sequence.generate_random_normal(n_events=100, mu=0.5, sigma=0.025) # seconds >>> fft_plot(seq, unit_size=1, x_max=5) (<Figure size 800x550 with 1 Axes>, <Axes: title={'center': 'Fourier transform'}, xlabel='Cycles per unit', ylabel='Absolute power'>)
- thebeat.stats.fft_values(sequence, unit_size, x_min=None, x_max=None, remove_dc=True)[source]#
Gets the x and y values for a Fourier transform of a
Sequenceobject. Theunit_sizeparameter is required, because Sequence objects are agnostic about the used time unit. You can use 1000 if the Sequence is in milliseconds, and 1 if the Sequence is in seconds.The x values indicate the number of cycles per unit, and y values indicate the absolute power (amplitude squared). The number of cycles per unit can be interpreted as the beat frequency. For instance, 2 cycles for a unit size of 1000 ms means a beat frequency of 2 Hz.
Note
If a sequence ends with an event, the last event is not included in the calculation of the Fourier transform. This is because the Fourier transform assumes that the sequence can be repeated indefinitely. In other words, we estimate the Fourier transform on the basis of the IOIs, not on the basis of the onsets.
- Parameters:
sequence (
Sequence) – The sequence.unit_size (
float) – The size of the unit in which the sequence is measured. If the sequence is in milliseconds, you probably want 1000. If the sequence is in seconds, you probably want 1.x_min (
float|None, default:None) – The minimum number of cycles per unit to be plotted.x_max (
float|None, default:None) – The maximum number of cycles per unit to be plotted.remove_dc (
bool, default:True) – If True, the DC component is removed from the Fourier transform by subtracting the mean from the signal. The DC component is the first value of the Fourier transform and represents the average of the sequence.
- Return type:
- Returns:
xf– The x values.yf– The y values. Note that absolute values are returned.
Examples
>>> from thebeat import Sequence >>> from thebeat.stats import fft_values >>> seq = Sequence.generate_random_normal(n_events=100, mu=500, sigma=25) # milliseconds >>> xf, yf = fft_values(seq, unit_size=1000, x_max=10)
- thebeat.stats.get_cov(sequence)[source]#
Calculate the coefficient of variantion of the inter-onset intervals (IOIS) in a
thebeat.core.Sequenceobject.- Parameters:
sequence (
Sequence) – Athebeat.core.Sequenceobject.- Returns:
The covariance of the sequence.
- Return type:
- thebeat.stats.get_interval_ratios_from_dyads(sequence)[source]#
Return sequential interval ratios, calculated as:
\(\textrm{ratio}_k = \frac{\textrm{IOI}_k}{\textrm{IOI}_k + \textrm{IOI}_{k+1}}\).
Note that for n IOIs this property returns n-1 ratios.
- Parameters:
sequence (np.array | thebeat.core.Sequence | list) – The sequence from which to calculate the interval ratios. Can be a Sequence object, or a list or array of IOIs.
Notes
The used method is based on the methodology from Roeske et al. [RTPJ20].
- thebeat.stats.get_npvi(sequence)[source]#
This function calculates the normalized pairwise variability index (nPVI) for a provided
SequenceorSoundSequenceobject, or for an interable of inter-onset intervals (IOIs).- Parameters:
sequence (
Sequence) – Either aSequenceorSoundSequenceobject, or an iterable containing inter-onset intervals (IOIs).- Returns:
The nPVI for the provided sequence.
- Return type:
numpy.float64
Notes
The normalied pairwise variability index (nPVI) is a measure of the variability of adjacent temporal intervals. The nPVI is zero for sequences that are perfectly isochronous. See Jadoul et al. [JRT+16] and Ravignani and Norton [RN17] for more information on its use in rhythm research.
Examples
>>> seq = thebeat.core.Sequence.generate_isochronous(n_events=10, ioi=500) >>> print(get_npvi(seq)) 0.0
>>> rng = np.random.default_rng(seed=123) >>> seq = thebeat.core.Sequence.generate_random_normal(n_events=10,mu=500,sigma=50,rng=rng) >>> print(get_npvi(seq)) 9.40657936323865
- thebeat.stats.get_phase_differences(test_events, reference_sequence, reference_ioi='preceding', window_size=None, unit='degrees', modulo=True)[source]#
Get the phase differences for
test_eventscompared toreference_sequence.Phase differences are a (circular) measure of temporal alignment. They are calculated as the difference between the onset of a test event and the onset of the corresponding reference event, divided by the IOI of the reference IOI:
\(\textrm{phase difference} = \frac{\textrm{test onset} - \textrm{reference onset}}{\textrm{reference IOI}}\)
The resulting phase differences are expressed in the unit specified by
unit, where the default is in degrees.The reference IOI can either be the IOI in the reference sequence that ‘contains’ an onset in the test sequence (
reference_ioi='containing'), or the IOI in the reference sequences that precedes this onset (reference_ioi='preceding'). The default is for the reference IOI to be the one that precedes the test onset. Let us consider a reference sequence with onsets:\(t = 0\), \(t = 250\), \(t = 1000\)
If an onset in the test sequence is at \(t = 750\), and
reference_ioi='containing', the reference IOI that will be used is the one from \(t = 250\) to \(t = 1000\). Ifreference_ioi='preceding', the reference IOI that will be used is the one from \(t = 0\) to \(t = 250\).In addition, one can specify a window size to get a (moving) average of the preceding IOIs. This is only used if
reference_ioi='preceding'. The window size determines the number of reference IOIs (up to and including the immediately ‘preceding’ one) that are used to calculate a mean reference IOI. This can be useful to smoothe out the reference IOI when using irregular (reference) sequences.Finally, one can specify whether modular arithmetic should be used. For degrees, if
modulo=True, the phase differences will be expressed in the range \([0, 360)\) degrees, \([0, 2\pi)\) radians, or \([0, 1)\) (as a plain ‘fraction’). Ifmodulo=False, the phase differences will be expressed in the range \([0, \infty)\), with the event at the start of the containing interval corresponding to 0 for both the ‘containing’ and ‘preceding’ values.Note
In cases where it is not possible to calculate a phase difference,
np.nanis returned. This can happen in the following cases: - The test event is before the first onset in the reference sequence. - The test event is after the last onset in the reference sequence, andreference_ioi='containing'. - The test event is in the nth interval of the reference sequence, andreference_ioi='preceding'and awindow_sizeis larger than n.Examples
>>> from thebeat.core import Sequence >>> reference = Sequence.from_onsets([0, 250, 1000]) >>> test = Sequence.from_onsets([250, 1250]) >>> get_phase_differences(test, reference, reference_ioi='preceding', unit='fraction') array([0. , 0.33333333])
- Parameters:
test_events (
Sequence|list[Real] |ndarray|Real) – A sequence or a single time point to be compared with the reference sequence. Can either be a single event time, or a Sequence, list, or NumPy array containing multiple events.reference_sequence (
Sequence) – The reference sequence. Can be a Sequence object, a list or array of Sequence objects, or a number. In the latter case, the reference sequence will be an isochronous sequence with a constant IOI of that number and the same length assequence_1.reference_ioi (
str, default:'preceding') – The IOI in the reference sequence that is used as the reference IOI. Can be either “containing” or “preceding”.window_size (
int|None, default:None) – The window size used for calculating the mean reference IOI. Only used ifreference_ioi='preceding'.circular_unit – The unit of the circular unit. Can be “degrees” (the default), “radians”, or “fraction”.
modulo (
bool, default:True) – Return the phase differences modulo 360 degrees or not. Only has an effect ifreference_ioi='preceding'.
- Return type:
- Returns:
numpy.ndarray– An array containing the phase differences if a Sequence, list, or NumPy array was passed.float– The phase difference if a single event time was passed.
- thebeat.stats.get_rhythmic_entropy(sequence, smallest_unit)[source]#
Calculate Shannon entropy from bins. This is a measure of rhythmic complexity. If many different ‘note durations’ are present, entropy is high. If only a few are present, entropy is low. A sequence that is completely isochronous has a Shannon entropy of 0.
The smallest_unit determines the size of the bins/the underlying grid. In musical terms, this for instance represents a 1/16th note. Bins will then be made such that each bin has a width of one 1/16th note. A 1/4th note will then be contained in one of those bins. Sequence needs to be quantized to multiples of ‘smallest_unit’. If needed, quantize the Sequence first, e.g. using ‘Sequence.quantize_iois’.
- Parameters:
sequence (
Sequence|Rhythm) – Thethebeat.core.Sequenceobject for which Shannon entropy is calculated.smallest_unit (
float) – The size of the bins/the underlying grid.
Example
>>> seq = thebeat.Sequence.generate_isochronous(n_events=10, ioi=500) >>> print(get_rhythmic_entropy(seq, smallest_unit=250)) 0.0
- thebeat.stats.get_ugof_isochronous(test_sequence, reference_ioi, output_statistic='mean')[source]#
This function calculates the universal goodness of fit (
ugof) measure. Theugofstatistic quantifies how well a theoretical sequence describes a sequence at hand (thetest_sequence). This function can only calculateugofusing a theoretical sequence that is isochronous.The
reference_ioiis the IOI of an isochronous theoretical sequence.- Parameters:
test_sequence (
Sequence) – ASequenceorSoundSequenceobject that will be compared toreference_sequence.reference_ioi (
float) – A number (float or int) representing the IOI of an isochronous sequence.output_statistic (
str, default:'mean') – Either ‘mean’ (the default) or ‘median’. This determines whether for the individual ugof values we take the mean or the median as the output statistic.
- Returns:
The ugof statistic.
- Return type:
numpy.float64
Notes
This measure is described in Burchardt et al. [BBKnornschild21]. Please also refer to this Github page for an R implementation of the ugof measure.
Examples
>>> seq = thebeat.core.Sequence.generate_isochronous(n_events=10, ioi=1000) >>> ugof = get_ugof_isochronous(seq, reference_ioi=68.21) >>> print(ugof) 0.4646435
- thebeat.stats.ks_test(sequence, reference_distribution='normal', alternative='two-sided')[source]#
This function returns the D statistic and p value of a one-sample Kolmogorov-Smirnov test. It calculates how different the distribution of inter-onset intervals (IOIs) is compared to the provided reference distribution.
If p is significant it means that the IOIs are not distributed according to the provided reference distribution.
- Parameters:
reference_distribution (
str, default:'normal') – Either ‘normal’ or ‘uniform’. The distribution against which the distribution of inter-onset intervals (IOIs) is compared.alternative (
str, default:'two-sided') – Either ‘two-sided’, ‘less’, or ‘greater’. Seescipy.stats.kstest()for more information.
- Returns:
A SciPy named tuple containing the D statistic and the p value.
- Return type:
scipy.stats._stats_py.KstestResult
Notes
This function uses
scipy.stats.kstest(). For more information about the use of the Kolmogorov-Smirnov test in rhythm research, see Jadoul et al. [JRT+16] and Ravignani and Norton [RN17].Examples
>>> rng = np.random.default_rng(seed=123) >>> seq = thebeat.core.Sequence.generate_random_normal(n_events=100,mu=500,sigma=25,rng=rng) >>> ks_result = ks_test(seq) >>> print(round(ks_result.pvalue, 5)) 0.6608