Root mean square energy & Zero crossing rate¶
In the previous article we have talked about the amplitude envelope of a digital signal as our first audio feature, now let's talk about more common feature like Root mean square energy and Zero crossing rate.
All of this metrics are importants for sound analysis
import matplotlib.pyplot as plt
import numpy as np
import librosa
import librosa.display
import IPython.display as ipd
melo_file = "./files/94_Melo.wav"
melo, sr = librosa.load(melo_file)
FRAME_SIZE = 1024
HOP_LENGTH = 512
Root mean square energy¶
The Root Mean Square (RMS) energy of an audio signal is a measure that quantifies the average power or loudness of that signal. This means that RMS tells us how loud sound is on average over time (if in average it's break you ear are not like in a crepy tech festival 😂), taking into account the dynamic range of the signal.
$$RMS_t=\sqrt{\frac{1}{K}\sum_{k=t.K}^{(t+1).K-1}s(k)^2}$$
Where :
- $s(k)^2$ is the energy of kth sample
- $\sum_{k=t.K}^{(t+1).K-1}s(k)^2$ is the sum of energy for all samples in frame t
- $\frac{1}{K}$ factor is here to compute the mean of sum of energy
If want to dig up more on the RMS you can read this good article here. For now we can rely on the librosa.feature.rms()
function 😇
Ping to the people who have read my ML article, yeah indeed is again our good old RMSE metrics that we serve at every meal 😂
RMSE Applications¶
RMS Energy is a foundational concept in audio processing with several important applications:
Loudness Measurement RMS energy provides a more accurate representation of perceived loudness than peak amplitude levels or AE.
Dynamic Range Processing In dynamics processing, such as compression and limiting, RMS levels are often used to make more musically pleasing adjustments. Since RMS reflects energy over time rather than instantaneous peaks, it allows for processing that more closely aligns with human hearing.
Sound Normalization Audio normalization adjusts the volume of audio tracks to a standard level. Using RMS energy for normalization ensures a consistent loudness perception across different audio sources, which is especially important in broadcasting, playlists, and albums. That's we all Tylor's songs sound all the same 😂 (it is not entirely true btw)
Noise Analysis RMS energy can help distinguish between signal and noise. In environments with variable noise levels, measuring the RMS energy can indicate when the actual signal is present versus when noise dominates, aiding in noise reduction and signal enhancement.
Audio Feature Extraction In machine learning and audio analysis, RMS energy is a feature that can help classify and differentiate between audio signals, contributing to genre classification, mood detection, and activity recognition.
rms_melo = librosa.feature.rms(y=melo, frame_length=FRAME_SIZE, hop_length=HOP_LENGTH)[0]
rms_melo.shape
(6731,)
frames = range(len(rms_melo))
t = librosa.frames_to_time(frames, hop_length=HOP_LENGTH)
plt.figure(figsize=(16, 9))
librosa.display.waveshow(melo, alpha=0.5)
plt.plot(t, rms_melo, color="r")
plt.ylim((-1, 1))
plt.title("94 Melo")
Text(0.5, 1.0, '94 Melo')
RMSE from scratch¶
def rmse(signal, frame_size, hop_length):
rmse = []
#calculate rmse for each frame according to the formula above
for i in range(0, len(signal), hop_length):
rmse_current_frame = np.sqrt(sum(signal[i:i+frame_size]**2) / frame_size)
rmse.append(rmse_current_frame)
return np.array(rmse)
rms_melo_perso = rmse(melo, FRAME_SIZE, HOP_LENGTH)
plt.figure(figsize=(16, 9))
librosa.display.waveshow(melo, alpha=0.5)
plt.plot(t, rms_melo, color="r")
plt.plot(t, rms_melo_perso, color="y")
plt.ylim((-1, 1))
plt.title("94 Melo")
Text(0.5, 1.0, '94 Melo')
If you zoom in you can see the yellow line and the red line are one of each other, that's mean we have done a goood job 🤓
Zero crossing rate $ZCR$¶
The zero crossing rate metrics is the number of times a signal crosses the horizontal axis ($y(x=0)$) and it can be written with the following formula here :
$$ZCR_t=\frac{1}{2}\sum_{k=tK}^{(t+1)K-1}|sgn(s(k))-sgn(s(k+1))|$$
This measure is simple yet powerful, offering insights into the characteristics of an audio signal:
Timbre Analysis $ZCR$ is a key feature in timbre analysis, helping to distinguish between harmonic (musical) and noise-like sounds. Instruments or sounds with lower ZCRs tend to be more harmonic, while higher ZCRs are characteristic of noisy or percussive sounds like viloin vs drums.
Speech Processing In speech analysis, $ZCR$ can differentiate voiced from unvoiced speech segments. Voiced speech, which includes vowels and certain consonants, typically has a lower $ZCR$.
Genre Classification and Music Analysis $ZCR$ is used in music information retrieval for genre classification and music analysis, providing clues about the rhythmic content and texture of a piece. For instance, a higher $ZCR$ might indicate genres with more percussive elements.
Beat Detection Although not as direct as amplitude envelope analysis, $ZCR$ contributes to beat detection algorithms by highlighting sections with potential rhythmic activity, especially in combination with other features.
Signal Segmentation $ZCR$ is useful for segmenting audio signals into different regions, such as separating silence from audio content or identifying different types of sound events within a recording.
Let's use the Zero-crossing rate from the Librosa library because we are lazy developers and we have already implement our math formula of the day in this notebook 😂
zcr_melo = librosa.feature.zero_crossing_rate(melo, frame_length=FRAME_SIZE, hop_length=HOP_LENGTH)[0]
plt.figure(figsize=(15, 10))
plt.plot(t, zcr_melo, color="y")
[<matplotlib.lines.Line2D at 0x1208ac7f0>]
Wrapping It Up with a Beat¶
Think of RMS energy and ZCR as the dynamic duo of the audio world, kind of like how Beyoncé and Jay-Z rock the music scene 😎.
They might seem like just some fancy terms, but they're superstars in making our tunes and podcasts sound just right.
RMS Energy: The Volume Guru¶
RMS energy is all about the vibe — how loud and powerful a song feels. It's like when you're listening to Adele's ballads, and you feel that deep, soulful intensity. That's RMS energy making sure you're getting all the feels by keeping the sound smooth and consistent, whether you're in for a tear-jerker or a power anthem.
Zero Crossing Rate: The Beat Detector¶
ZCR, on the other hand, is your go-to for catching the beat. It's like when you're trying to find the groove in a Billie Eilish track, figuring out where those whispery highs and lows fit into the rhythm. ZCR helps break down the beats and vibes, telling us if we're dealing with a dance floor banger or a chill, laid-back tune.
Why They Rock Together¶
Together, RMS energy and ZCR are the heroes behind the scenes, making sure every track, from Ariana Grande's pop hits to Kendrick Lamar's rap verses, hits just right 🎼. They're about keeping the loudness on point and the rhythm in check, ensuring your playlist transitions feel as smooth as a DJ's mix.
So, What's the Big Deal?¶
Understanding these concepts is like having VIP backstage passes🤗. You get a better appreciation of what goes into making music sound great it's not just about hitting play; it's about the maths and art behind those beats and melodies 🧮