Beat-finding using autocorrelations#
In this snippet we learn how to do some beat-finding using autocorrelations.
First, let’s import the necessary functions and classes:
[1]:
from thebeat.core import Sequence
from thebeat.stats import acf_df, acf_plot, acf_values
import numpy as np
import scipy.signal
Plotting ACF#
The autocorrelation function (ACF) is best represented in a plot, where on the x axis we have the autocorrelation lags (which for us correspond to timestamps), and on the y axis we have the correlation coefficient.
We can use such a function to find an underlying inter-onset interval (IOI) that describes our sequence well.
First let’s create a Sequence
object with inter-onset intervals (IOIs) sampled from a normal distribution
(so we have some noise in there), and then plot the sequence using thebeat.stats.acf_plot()
. We supply a resolution; for sequences that use milliseconds as the time unit, this will likely be 1. For seconds, it will be 0.01.
[3]:
# We use a Generator object with a seed so you will get the same results as we:
rng = np.random.default_rng(seed=123)
seq = Sequence.generate_random_normal(n_events=20, mu=500, sigma=40, rng=rng)
acf_plot(seq, resolution=1);
Now, that’s a bit disappointing. We don’t see any clear peaks, even though we might expect one at the mean of the distribution (500 ms).
To improve the plot, we use the smoothing parameters, which under the hood convolve tiny normal distributions
with the acf function at each step (where the steps are of size resolution_ms
).
So let’s try that:
[4]:
acf_plot(seq, resolution=1, smoothing_window=100, smoothing_sd=20);
Now that looks way better, and now we do see clear peaks at 500 ms, 1000 ms, etc.
Of course, the more we smoothe, the less accurately that IOI represents the sequence. So be careful with that, and
afterwards use e.g. the thebeat.stats.get_ugof()
function to calculate how well the sequence is described by the found IOI.
Finding the IOIs#
From the plot we cannot say exactly where the peaks lie. If we want to find the exact values we can use
scipy.signal.find_peaks()
.
For this method, we need the actual values from the ACF function, which we can get using thebeat.stats.acf_values()
.
Then, we use scipy.signal.find_peaks()
to find the peaks.
[5]:
# We use a Generator object with a seed so you will get the same results as we:
correlation_factors = acf_values(seq, resolution=1, smoothing_window=100, smoothing_sd=20)
peaks = scipy.signal.find_peaks(correlation_factors)
print(peaks)
(array([ 1, 501, 1021, 1518, 1998, 2062, 2492, 2547, 3040, 3475, 3540,
4042, 4527, 5018, 5523, 6046, 6535, 7089, 7577, 7656, 8061, 8139,
8615, 9093, 9553]), {})
As such, we can see where the peaks lie.
An important thing to note is that if you use a different resolution
than 1 (which is however the default), the peaks will correspond to the indices of acf_values
, where the indices are not timestamps.
Multiply the array that thebeat.stats.acf_values()
returns by resolution
to get the original timestamps again.
Also note that scipy.signal.find_peaks()
returns a tuple
containing an array with the peak values, and a properties
dictionary, which is empty here. To get the array
we must therefore use peaks[0]
.
[6]:
resolution = 10
correlation_factors = acf_values(seq,
resolution=resolution,
smoothing_window=100,
smoothing_sd=20)
peaks = scipy.signal.find_peaks(correlation_factors)
peaks = peaks[0] * resolution
print(peaks)
[ 490 1020 1520 2070 2520 3050 3570 4030 4530]
Plotting#
We can then easily plot a Sequence and its isochronous description using thebeat.visualization.plot_multiple_sequences
:
[7]:
# Plot
from thebeat.visualization import plot_multiple_sequences
from thebeat.stats import get_ugof_isochronous
# Make isochronous sequence
seq_isoc = Sequence.generate_isochronous(n_events=20, ioi=peaks[0])
fig, ax = plot_multiple_sequences([seq_isoc, seq],
figsize=(10, 4),
y_axis_labels=['Theoretical beat', 'Random sequence'],
suppress_display=True)
# Add box with ugof
ugof_round = str(round(get_ugof_isochronous(seq, peaks[1]), 2))
box_properties = dict(boxstyle='round', facecolor='white', alpha=0.7)
ax.text(8400, 1.25, s=f"ugof = {ugof_round}", bbox=box_properties, fontsize=14);
fig.show()
Getting a Pandas dictionary of correlation factors#
The thebeat.stats
module contains an additional function, thebeat.stats.acf_df()
, which returns a pandas.DataFrame
containing the timestamps and the correlation factors.
[8]:
seq = Sequence.generate_random_normal(n_events=20, mu=500, sigma=25, rng=rng)
df = acf_df(seq, resolution=1, smoothing_window=10, smoothing_sd=2)
print(df)
timestamps correlation
0 0 1.000000
1 1 0.931255
2 2 0.763098
3 3 0.548497
4 4 0.343360
... ... ...
9621 9621 0.023837
9622 9622 0.014014
9623 9623 0.006382
9624 9624 0.001945
9625 9625 0.000000
[9626 rows x 2 columns]
Let’s sort this dataframe by the correlation factor:
[9]:
df = df.sort_values(by="correlation", ascending=False)