Stats#
- thebeat.stats.acf_df(sequence, resolution, smoothing_window=None, smoothing_sd=None)[source]#
Perform autocorrelation analysis on a
Sequence
object, and return aPandas.DataFrame
object containing the results.- Parameters:
resolution – The temporal resolution. If the used Sequence is in seconds, you might want to use 0.001. If the Sequence is in milliseconds, try using 1. Incidentally, the number of lags for the autocorrelation function is calculated as
n_lags = sequence_duration / resolution
.smoothing_window (
Optional
[float
], default:None
) – The window within which a normal probability density function is used for smoothing out the analysis.smoothing_sd (
Optional
[float
], default:None
) – The standard deviation of the normal probability density function used for smoothing out the analysis.
- Returns:
DataFrame containing two columns: the timestamps, and the autocorrelation factor.
- Return type:
Notes
This function is based on the procedure described in Ravignani and Norton [RN17]. There, one can also find a more detailed description of the smoothing procedure.
Examples
>>> rng = np.random.default_rng(seed=123) # for reproducability >>> seq = thebeat.core.Sequence.generate_random_uniform(n_events=10,a=400,b=600,rng=rng) >>> df = acf_df(seq, smoothing_window=50, smoothing_sd=20, resolution=10) >>> print(df.head(3)) timestamp correlation 0 0 1.000000 1 10 0.824706 2 20 0.547863
- thebeat.stats.acf_plot(sequence, resolution, min_lag=None, max_lag=None, smoothing_window=None, smoothing_sd=None, style='seaborn-v0_8', title='Autocorrelation', x_axis_label='Lag', y_axis_label='Correlation', figsize=None, dpi=100, ax=None, suppress_display=False)[source]#
This function can be used for plotting an autocorrelation plot from a
Sequence
.- Parameters:
resolution – The temporal resolution. If the used Sequence is in seconds, you might want to use 0.001. If the Sequence is in milliseconds, try using 1. Incidentally, the number of lags for the autocorrelation function is calculated as
n_lags = sequence_duration_in_ms / resolution
.min_lag (
Optional
[float
], default:None
) – The minimum lag to be plotted. Defaults to 0.max_lag (
Optional
[float
], default:None
) – The maximum lag to be plotted. Defaults to the sequence duration.smoothing_window (
Optional
[float
], default:None
) – The window (in milliseconds) within which a normal probability density function is used for smoothing out the analysis.smoothing_sd (
Optional
[float
], default:None
) – The standard deviation of the normal probability density function used for smoothing out the analysis.style (
str
, default:'seaborn-v0_8'
) – Style used by matplotlib. See matplotlib style sheets reference.title (
str
, default:'Autocorrelation'
) – If desired, one can provide a title for the plot. This takes precedence over using theSequence
orSoundSequence
name
attribute as the title of the plot (if the object has one).x_axis_label (
str
, default:'Lag'
) – A label for the x axis.y_axis_label (
str
, default:'Correlation'
) – A label for the y axis.figsize (
Optional
[tuple
], default:None
) – A tuple containing the desired output size of the plot in inches, e.g.(4, 1)
. This refers to thefigsize
parameter inmatplotlib.pyplot.figure()
.dpi (
int
, default:100
) – The desired output resolution of the plot in dots per inch (DPI). This refers to thedpi
parameter inmatplotlib.pyplot.figure()
.ax (
Optional
[Axes
], default:None
) – If desired, one can provide an existingmatplotlib.axes.Axes
object to plot the autocorrelation plot on. This is for instance useful if you want to plot multiple autocorrelation plots on the same figure.suppress_display (
bool
, default:False
) – IfTrue
,matplotlib.pyplot.show()
is not run.
- Return type:
Notes
This function is based on the procedure described in Ravignani and Norton [RN17]. There, one can also find a more detailed description of the smoothing procedure.
- thebeat.stats.acf_values(sequence, resolution, smoothing_window=None, smoothing_sd=None)[source]#
Perform autocorrelation. This function takes a
Sequence
object, and returns an array with steps ofresolution
of unstandardized correlation factors.- Parameters:
resolution – The temporal resolution. If the used Sequence is in seconds, you might want to use 0.001. If the Sequence is in milliseconds, try using 1. Incidentally, the number of lags for the autocorrelation function is calculated as
n_lags = sequence_duration / resolution
.smoothing_window (
Optional
[float
], default:None
) – The window within which a normal probability density function is used for smoothing out the analysis.smoothing_sd (
Optional
[float
], default:None
) – The standard deviation of the normal probability density function used for smoothing out the analysis.
- Return type:
Notes
This function is based on the procedure described in Ravignani and Norton [RN17]. There, one can also find a more detailed description of the smoothing procedure.
This function uses the
numpy.correlate()
to calculate the correlations.
- thebeat.stats.ccf_df(test_sequence, reference_sequence, resolution, smoothing_window=None, smoothing_sd=None)[source]#
Perform autocorrelation analysis on a
Sequence
object, and return aPandas.DataFrame
object containing the results.- Parameters:
test_sequence (
Sequence
) – The test sequence.reference_sequence (
Sequence
) – The reference sequence.resolution – The temporal resolution. If the used Sequence is in seconds, you might want to use 0.001. If the Sequence is in milliseconds, try using 1. Incidentally, the number of lags for the autocorrelation function is calculated as
n_lags = sequence_duration / resolution
.smoothing_window (
Optional
[float
], default:None
) – The window within which a normal probability density function is used for smoothing out the analysis.smoothing_sd (
Optional
[float
], default:None
) – The standard deviation of the normal probability density function used for smoothing out the analysis.
- Returns:
DataFrame containing two columns: the timestamps, and the cross-correlation factor.
- Return type:
Notes
This function is based on the procedure described in Ravignani and Norton [RN17]. There, one can also find a more detailed description of the smoothing procedure.
- thebeat.stats.ccf_plot(test_sequence, reference_sequence, resolution, smoothing_window=None, smoothing_sd=None, style='seaborn-v0_8', title='Cross-correlation', x_axis_label='Lag', y_axis_label='Correlation', figsize=None, dpi=100, ax=None, suppress_display=False)[source]#
Calculate and plot the cross-correlation function (CCF) between two
Sequence
objects. The test sequence is compared to the reference sequence.- Parameters:
test_sequence (
Sequence
) – The test sequence.reference_sequence (
Sequence
) – The reference sequence.resolution – The temporal resolution. If the used Sequence is in milliseconds, you probably want 1. If the Sequence is in seconds, try using 0.001.
smoothing_window (
Optional
[float
], default:None
) – The window within which a normal probability density function is used for smoothing out the analysis.smoothing_sd (
Optional
[float
], default:None
) – The standard deviation of the normal probability density function used for smoothing out the analysis.style (
str
, default:'seaborn-v0_8'
) – The matplotlib style to use. See matplotlib style reference.title (
str
, default:'Cross-correlation'
) – The title of the plot.x_axis_label (
str
, default:'Lag'
) – The label of the x axis.y_axis_label (
str
, default:'Correlation'
) – The label of the y axis.figsize (
Optional
[tuple
], default:None
) – A tuple containing the desired output size of the plot in inches, e.g.(4, 1)
. This refers to thefigsize
parameter inmatplotlib.pyplot.figure()
.dpi (
int
, default:100
) – The resolution of the plot in dots per inch. This refers to thedpi
parameter inmatplotlib.pyplot.figure()
.ax (
Optional
[Axes
], default:None
) – Amatplotlib.axes.Axes
object. IfNone
, a new Figure and Axes is created.suppress_display (
bool
, default:False
) – IfTrue
, the plot is not displayed. This is useful e.g. if you only want to save the plot to a file
- Return type:
- Returns:
fig
– Thematplotlib.figure.Figure
object.ax
– Thematplotlib.axes.Axes
object.
Notes
This function is based on the procedure described in Ravignani and Norton [RN17]. There, one can also find a more detailed description of the smoothing procedure.
- thebeat.stats.ccf_values(test_sequence, reference_sequence, resolution, smoothing_window=None, smoothing_sd=None)[source]#
Returns the unstandardized cross-correlation function (CCF) for two
Sequence
objects. The test sequence is compared to the reference sequence.- Parameters:
test_sequence (
Sequence
) – The test sequence.reference_sequence (
Sequence
) – The reference sequence.resolution (
float
) – The temporal resolution. If the used Sequence is in milliseconds, you probably want 1. If the Sequence is in seconds, try using 0.001.smoothing_window (
Optional
[float
], default:None
) – The window within which a normal probability density function is used for smoothing out the analysis.smoothing_sd (
Optional
[float
], default:None
) – The standard deviation of the normal probability density function used for smoothing out the analysis.
- Returns:
The unstandardized cross-correlation function.
- Return type:
correlation
- thebeat.stats.edit_distance_rhythm(test_rhythm, reference_rhythm, smallest_note_value=16)[source]#
Caculates edit/Levenshtein distance between two rhythms. The
smallest_note_value
determines the underlying grid that is used. If e.g. 16, the underlying grid is composed of 1/16th notes.Note
Based on the procedure described in Post and Toussaint [PT11].
- Parameters:
- Return type:
Examples
>>> from thebeat.music import Rhythm >>> test_rhythm = Rhythm.from_fractions([1/4, 1/4, 1/4, 1/4]) >>> reference_rhythm = Rhythm.from_fractions([1/4, 1/8, 1/8, 1/4, 1/4]) >>> print(edit_distance_rhythm(test_rhythm, reference_rhythm)) 1
- thebeat.stats.edit_distance_sequence(test_sequence, reference_sequence, resolution)[source]#
Calculates the edit/Levenshtein distance between two sequences.
Requires for all the IOIs in a Sequence to be multiples of ‘resolution’. If needed, quantize the Sequence first, e.g. using Sequence.quantize_iois().
Note
The resolution also represents the underlying grid. If, for example, the resolution is 50, that means that a grid will be created with steps of 50. The onsets of the sequence are then placed on the grid for both sequences. The resulting sequences consist of ones and zeros, where ones represent the event onsets. This string for
test_sequence
is compared to the string of thereference_sequence
. Note thattest_sequence
andreference_sequence
can be interchanged without an effect on the results.
- thebeat.stats.fft_plot(sequence, unit_size, x_min=None, x_max=None, style='seaborn-v0_8', title='Fourier transform', x_axis_label='Cycles per unit', y_axis_label='Absolute power', figsize=None, dpi=100, ax=None, suppress_display=False)[source]#
Plots the Fourier transform of a
Sequence
object. Theunit_size
parameter is required, because Sequence objects are agnostic about the used time unit. You can use 1000 if the Sequence is in milliseconds, and 1 if the Sequence is in seconds.On the x axis is plotted the number of cycles per unit, and on the y axis the absolute power. The number of cycles per unit can be interpreted as the beat frequency. For instance, 2 cycles for a unit size of 1000 ms means a beat frequency of 2 Hz.
Note
In most beat-finding applications you will want to set the
x_max
argument to something reasonable. The Fourier transform is plotted for all possible frequencies until the Nyquist frequency (half the value ofunit_size
). However, in most cases you will not be interested in frequencies that are higher than the beat frequency. For instance, if you have a sequence with a beat frequency of 2 Hz, you will not be interested in frequencies higher than, say, 20 Hz. In that case, you can setx_max
to 20.- Parameters:
sequence (
Sequence
) – The sequence.unit_size (
float
) – The size of the unit in which the sequence is measured. If the sequence is in milliseconds, you probably want 1000. If the sequence is in seconds, you probably want 1.x_min (
Optional
[float
], default:None
) – The minimum number of cycles per unit to be plotted.x_max (
Optional
[float
], default:None
) – The maximum number of cycles per unit to be plotted.style (
str
, default:'seaborn-v0_8'
) –The matplotlib style to use. See matplotlib style reference.
title (
str
, default:'Fourier transform'
) – The title of the plot.x_axis_label (
str
, default:'Cycles per unit'
) – The label of the x axis.y_axis_label (
str
, default:'Absolute power'
) – The label of the y axis.figsize (
Optional
[tuple
], default:None
) – A tuple containing the desired output size of the plot in inches, e.g.(4, 1)
. This refers to thefigsize
parameter inmatplotlib.pyplot.figure()
.dpi (
int
, default:100
) – The resolution of the plot in dots per inch.ax (
Optional
[Axes
], default:None
) – A matplotlib Axes object to plot on. If not provided, a new figure and axes will be created.suppress_display (
bool
, default:False
) – If True, the plot will not be displayed.
- Return type:
- Returns:
fig
– The matplotlib Figure object.ax
– The matplotlib Axes object.
Examples
>>> from thebeat import Sequence >>> from thebeat.stats import fft_plot >>> seq = Sequence.generate_random_normal(n_events=100, mu=500, sigma=25) # milliseconds >>> fft_plot(seq, unit_size=1000) (<Figure size 800x550 with 1 Axes>, <Axes: title={'center': 'Fourier transform'}, xlabel='Cycles per unit', ylabel='Absolute power'>)
>>> seq = Sequence.generate_random_normal(n_events=100, mu=0.5, sigma=0.025) # seconds >>> fft_plot(seq, unit_size=1, x_max=5) (<Figure size 800x550 with 1 Axes>, <Axes: title={'center': 'Fourier transform'}, xlabel='Cycles per unit', ylabel='Absolute power'>)
- thebeat.stats.fft_values(sequence, unit_size, x_min=None, x_max=None)[source]#
Gets the x and y values for a Fourier transform of a
Sequence
object. Theunit_size
parameter is required, because Sequence objects are agnostic about the used time unit. You can use 1000 if the Sequence is in milliseconds, and 1 if the Sequence is in seconds.The x values indicate the number of cycles per unit, and y values the absolute power. The number of cycles per unit can be interpreted as the beat frequency. For instance, 2 cycles for a unit size of 1000 ms means a beat frequency of 2 Hz.
- Parameters:
- Return type:
- Returns:
xf
– The x values.yf
– The y values. Note that absolute values are returned.
Examples
>>> from thebeat import Sequence >>> from thebeat.stats import fft_values >>> seq = Sequence.generate_random_normal(n_events=100, mu=500, sigma=25) # milliseconds >>> xf, yf = fft_values(seq, unit_size=1000, x_max=10)
- thebeat.stats.get_cov(sequence)[source]#
Calculate the coefficient of variantion of the inter-onset intervals (IOIS) in a
thebeat.core.Sequence
object.- Parameters:
sequence (
Sequence
) – Athebeat.core.Sequence
object.- Returns:
The covariance of the sequence.
- Return type:
- thebeat.stats.get_interval_ratios_from_dyads(sequence)[source]#
Return sequential interval ratios, calculated as:
\(\textrm{ratio}_k = \frac{\textrm{IOI}_k}{\textrm{IOI}_k + \textrm{IOI}_{k+1}}\).
Note that for n IOIs this property returns n-1 ratios.
- Parameters:
sequence (np.array | thebeat.core.Sequence | list) – The sequence from which to calculate the interval ratios. Can be a Sequence object, or a list or array of IOIs.
Notes
The used method is based on the methodology from Roeske et al. [RTPJ20].
- thebeat.stats.get_npvi(sequence)[source]#
This function calculates the normalized pairwise variability index (nPVI) for a provided
Sequence
orSoundSequence
object, or for an interable of inter-onset intervals (IOIs).- Parameters:
sequence (
Sequence
) – Either aSequence
orSoundSequence
object, or an iterable containing inter-onset intervals (IOIs).- Returns:
The nPVI for the provided sequence.
- Return type:
numpy.float64
Notes
The normalied pairwise variability index (nPVI) is a measure of the variability of adjacent temporal intervals. The nPVI is zero for sequences that are perfectly isochronous. See Jadoul et al. [JRT+16] and Ravignani and Norton [RN17] for more information on its use in rhythm research.
Examples
>>> seq = thebeat.core.Sequence.generate_isochronous(n_events=10, ioi=500) >>> print(get_npvi(seq)) 0.0
>>> rng = np.random.default_rng(seed=123) >>> seq = thebeat.core.Sequence.generate_random_normal(n_events=10,mu=500,sigma=50,rng=rng) >>> print(get_npvi(seq)) 4.703289681619325
- thebeat.stats.get_phase_differences(test_events, reference_sequence, reference_ioi='preceding', window_size=None, unit='degrees', modulo=True)[source]#
Get the phase differences for
test_events
compared toreference_sequence
.Phase differences are a (circular) measure of temporal alignment. They are calculated as the difference between the onset of a test event and the onset of the corresponding reference event, divided by the IOI of the reference IOI:
\(\textrm{phase difference} = \frac{\textrm{test onset} - \textrm{reference onset}}{\textrm{reference IOI}}\)
The resulting phase differences are expressed in the unit specified by
unit
, where the default is in degrees.The reference IOI can either be the IOI in the reference sequence that ‘contains’ an onset in the test sequence (
reference_ioi='containing'
), or the IOI in the reference sequences that precedes this onset (reference_ioi='preceding'
). The default is for the reference IOI to be the one that precedes the test onset. Let us consider a reference sequence with onsets:\(t = 0\), \(t = 250\), \(t = 1000\)
If an onset in the test sequence is at \(t = 750\), and
reference_ioi='containing'
, the reference IOI that will be used is the one from \(t = 250\) to \(t = 1000\). Ifreference_ioi='preceding'
, the reference IOI that will be used is the one from \(t = 0\) to \(t = 250\).In addition, one can specify a window size to get a (moving) average of the preceeding IOIs. This is only used if
reference_ioi='preceding'
. The window size determines the number of reference IOIs (up to and including the immediately ‘preceding’ one) that are used to calculate a mean reference IOI. This can be useful to smoothe out the reference IOI when using irregular (reference) sequences.Finally, one can specify whether modular arithmetic should be used. For degrees, if
modulo=True
, the phase differences will be expressed in the range \([0, 360)\) degrees, \([0, 2\pi)\) radians, or \([0, 1)\) (as a plain ‘fraction’). Ifmodulo=False
, the phase differences will be expressed in the range \([0, \infty)\), with the event at the start of the containing interval corresponding to 0 for both the ‘containing’ and ‘preceeding’ values.Note
In cases where it is not possible to calculate a phase difference,
np.nan
is returned. This can happen in the following cases: - The test event is before the first onset in the reference sequence. - The test event is after the last onset in the reference sequence, andreference_ioi='containing'
. - The test event is in the nth interval of the reference sequence, andreference_ioi='preceding'
and awindow_size
is larger than n.Examples
>>> from thebeat.core import Sequence >>> reference = Sequence.from_onsets([0, 250, 1000]) >>> test = Sequence.from_onsets([250, 1250]) >>> get_phase_differences(test, reference, reference_ioi='preceding', unit='fraction') array([0. , 0.33333333])
- Parameters:
test_events (
Sequence
|list
[Real
] |ndarray
|Real
) – A sequence or a single time point to be compared with the reference sequence. Can either be a single event time, or a Sequence, list, or NumPy array containing multiple events.reference_sequence (
Sequence
) – The reference sequence. Can be a Sequence object, a list or array of Sequence objects, or a number. In the latter case, the reference sequence will be an isochronous sequence with a constant IOI of that number and the same length assequence_1
.reference_ioi (
str
, default:'preceding'
) – The IOI in the reference sequence that is used as the reference IOI. Can be either “containing” or “preceding”.window_size (
Optional
[int
], default:None
) – The window size used for calculating the mean reference IOI. Only used ifreference_ioi='preceding'
.circular_unit – The unit of the circular unit. Can be “degrees” (the default), “radians”, or “fraction”.
modulo (
bool
, default:True
) – Return the phase differences modulo 360 degrees or not. Only has an effect ifreference_ioi='preceding'
.
- Return type:
- Returns:
numpy.ndarray
– An array containing the phase differences if a Sequence, list, or NumPy array was passed.float
– The phase difference if a single event time was passed.
- thebeat.stats.get_rhythmic_entropy(sequence, smallest_unit)[source]#
Calculate Shannon entropy from bins. This is a measure of rhythmic complexity. If many different ‘note durations’ are present, entropy is high. If only a few are present, entropy is low. A sequence that is completely isochronous has a Shannon entropy of 0.
The smallest_unit determines the size of the bins/the underlying grid. In musical terms, this for instance represents a 1/16th note. Bins will then be made such that each bin has a width of one 1/16th note. A 1/4th note will then be contained in one of those bins. Sequence needs to be quantized to multiples of ‘smallest_unit’. If needed, quantize the Sequence first, e.g. using ‘Sequence.quantize_iois’.
- Parameters:
sequence (
Sequence
|Rhythm
) – Thethebeat.core.Sequence
object for which Shannon entropy is calculated.smallest_unit (
float
) – The size of the bins/the underlying grid.
Example
>>> seq = thebeat.Sequence.generate_isochronous(n_events=10, ioi=500) >>> print(get_rhythmic_entropy(seq, smallest_unit=250)) 0.0
- thebeat.stats.get_ugof_isochronous(test_sequence, reference_ioi, output_statistic='mean')[source]#
This function calculates the universal goodness of fit (
ugof
) measure. Theugof
statistic quantifies how well a theoretical sequence describes a sequence at hand (thetest_sequence
). This function can only calculateugof
using a theoretical sequence that is isochronous.The
reference_ioi
is the IOI of an isochronous theoretical sequence.- Parameters:
test_sequence (
Sequence
) – ASequence
orSoundSequence
object that will be compared toreference_sequence
.reference_ioi (
float
) – A number (float or int) representing the IOI of an isochronous sequence.output_statistic (
str
, default:'mean'
) – Either ‘mean’ (the default) or ‘median’. This determines whether for the individual ugof values we take the mean or the median as the output statistic.
- Returns:
The ugof statistic.
- Return type:
numpy.float64
Notes
This measure is described in Burchardt et al. [BBKnornschild21]. Please also refer to this Github page for an R implementation of the ugof measure.
Examples
>>> seq = thebeat.core.Sequence.generate_isochronous(n_events=10, ioi=1000) >>> ugof = get_ugof_isochronous(seq, reference_ioi=68.21) >>> print(ugof) 0.4646435
- thebeat.stats.ks_test(sequence, reference_distribution='normal', alternative='two-sided')[source]#
This function returns the D statistic and p value of a one-sample Kolmogorov-Smirnov test. It calculates how different the distribution of inter-onset intervals (IOIs) is compared to the provided reference distribution.
If p is significant it means that the IOIs are not distributed according to the provided reference distribution.
- Parameters:
reference_distribution (
str
, default:'normal'
) – Either ‘normal’ or ‘uniform’. The distribution against which the distribution of inter-onset intervals (IOIs) is compared.alternative (
str
, default:'two-sided'
) – Either ‘two-sided’, ‘less’ or ‘greater’. Seescipy.stats.kstest()
for more information.
- Returns:
A SciPy named tuple containing the D statistic and the p value.
- Return type:
scipy.stats._stats_py.KstestResult
Notes
This function uses
scipy.stats.kstest()
. For more information about the use of the Kolmogorov-Smirnov test in rhythm research, see Jadoul et al. [JRT+16] and Ravignani and Norton [RN17].Examples
>>> rng = np.random.default_rng(seed=123) >>> seq = thebeat.core.Sequence.generate_random_normal(n_events=100,mu=500,sigma=25,rng=rng) >>> print(ks_test(seq)) KstestResult(statistic=0.07176677141846549, pvalue=0.6608009345687911, statistic_location=496.3505526721194, statistic_sign=1)