Beat-finding using autocorrelations#

In this snippet we learn how to do some beat-finding using autocorrelations.

First, let’s import the necessary functions and classes:

[1]:
from thebeat.core import Sequence
from thebeat.stats import acf_df, acf_plot, acf_values
import numpy as np
import scipy.signal

Plotting ACF#

The autocorrelation function (ACF) is best represented in a plot, where on the x axis we have the autocorrelation lags (which for us correspond to timestamps), and on the y axis we have the correlation coefficient.

We can use such a function to find an underlying inter-onset interval (IOI) that describes our sequence well.

First let’s create a Sequence object with inter-onset intervals (IOIs) sampled from a normal distribution (so we have some noise in there), and then plot the sequence using thebeat.stats.acf_plot(). We supply a resolution; for sequences that use milliseconds as the time unit, this will likely be 1. For seconds, it will be 0.01.

[3]:
# We use a Generator object with a seed so you will get the same results as we:
rng = np.random.default_rng(seed=123)

seq = Sequence.generate_random_normal(n_events=20, mu=500, sigma=40, rng=rng)
acf_plot(seq, resolution=1);
../../_images/examples_snippets_autocorrelation_4_0.png

Now, that’s a bit disappointing. We don’t see any clear peaks, even though we might expect one at the mean of the distribution (500 ms).

To improve the plot, we use the smoothing parameters, which under the hood convolve tiny normal distributions with the acf function at each step (where the steps are of size resolution_ms).

So let’s try that:

[4]:
acf_plot(seq, resolution=1, smoothing_window=100, smoothing_sd=20);
../../_images/examples_snippets_autocorrelation_6_0.png

Now that looks way better, and now we do see clear peaks at 500 ms, 1000 ms, etc.

Of course, the more we smoothe, the less accurately that IOI represents the sequence. So be careful with that, and afterwards use e.g. the thebeat.stats.get_ugof() function to calculate how well the sequence is described by the found IOI.

Finding the IOIs#

From the plot we cannot say exactly where the peaks lie. If we want to find the exact values we can use scipy.signal.find_peaks().

For this method, we need the actual values from the ACF function, which we can get using thebeat.stats.acf_values(). Then, we use scipy.signal.find_peaks() to find the peaks.

[5]:
# We use a Generator object with a seed so you will get the same results as we:
correlation_factors = acf_values(seq, resolution=1, smoothing_window=100, smoothing_sd=20)

peaks = scipy.signal.find_peaks(correlation_factors)
print(peaks)
(array([   1,  501, 1021, 1518, 1998, 2062, 2492, 2547, 3040, 3475, 3540,
       4042, 4527, 5018, 5523, 6046, 6535, 7089, 7577, 7656, 8061, 8139,
       8615, 9093, 9553]), {})

As such, we can see where the peaks lie.


An important thing to note is that if you use a different resolution than 1 (which is however the default), the peaks will correspond to the indices of acf_values, where the indices are not timestamps. Multiply the array that thebeat.stats.acf_values() returns by resolution to get the original timestamps again.

Also note that scipy.signal.find_peaks() returns a tuple containing an array with the peak values, and a properties dictionary, which is empty here. To get the array we must therefore use peaks[0].

[6]:
resolution = 10

correlation_factors = acf_values(seq,
                                 resolution=resolution,
                                 smoothing_window=100,
                                 smoothing_sd=20)

peaks = scipy.signal.find_peaks(correlation_factors)
peaks = peaks[0] * resolution
print(peaks)
[ 490 1020 1520 2070 2520 3050 3570 4030 4530]

Plotting#

We can then easily plot a Sequence and its isochronous description using thebeat.visualization.plot_multiple_sequences:

[7]:
# Plot
from thebeat.visualization import plot_multiple_sequences
from thebeat.stats import get_ugof_isochronous

# Make isochronous sequence
seq_isoc = Sequence.generate_isochronous(n_events=20, ioi=peaks[0])

fig, ax = plot_multiple_sequences([seq_isoc, seq],
                                  figsize=(10, 4),
                                  y_axis_labels=['Theoretical beat', 'Random sequence'],
                                  suppress_display=True)


# Add box with ugof
ugof_round = str(round(get_ugof_isochronous(seq, peaks[1]), 2))
box_properties = dict(boxstyle='round', facecolor='white', alpha=0.7)
ax.text(8400, 1.25, s=f"ugof = {ugof_round}", bbox=box_properties, fontsize=14);
fig.show()
../../_images/examples_snippets_autocorrelation_13_0.png

Getting a Pandas dictionary of correlation factors#

The thebeat.stats module contains an additional function, thebeat.stats.acf_df(), which returns a pandas.DataFrame containing the timestamps and the correlation factors.

[8]:
seq = Sequence.generate_random_normal(n_events=20, mu=500, sigma=25, rng=rng)

df = acf_df(seq, resolution=1, smoothing_window=10, smoothing_sd=2)
print(df)
      timestamps  correlation
0              0     1.000000
1              1     0.931255
2              2     0.763098
3              3     0.548497
4              4     0.343360
...          ...          ...
9621        9621     0.023837
9622        9622     0.014014
9623        9623     0.006382
9624        9624     0.001945
9625        9625     0.000000

[9626 rows x 2 columns]

Let’s sort this dataframe by the correlation factor:

[9]:
df = df.sort_values(by="correlation", ascending=False)