To compute Mel Frequency Cepstral Coefficients

Mel Frequency Cepstral Coefficients (MFCCs) are features commonly used in speech and audio processing. They provide a compact representation of the power spectrum of a signal, capturing essential characteristics for tasks such as speech recognition, speaker identification, and emotion analysis.

Steps to Compute MFCCs:

Pre-emphasis: A high-pass filter is applied to the signal to boost high frequencies and improve clarity.
Framing: The signal is divided into short frames (e.g., 20-40 ms) to assume stationary properties within each frame.
Windowing: Each frame is multiplied by a window function (e.g., Hamming window) to reduce spectral leakage.
Fast Fourier Transform (FFT): Converts the time-domain signal into the frequency domain.
Mel Filter Bank Processing: The frequency spectrum is transformed to the Mel scale, which mimics human auditory perception.
Logarithm and Discrete Cosine Transform (DCT): Logarithmic scaling is applied, followed by DCT to decorrelate features and extract cepstral coefficients.
Selecting Coefficients: Typically, the first 12-13 coefficients (excluding the first one, which represents overall energy) are used as features.

Applications of MFCCs:

Speech Recognition (e.g., Google Assistant, Siri)
Speaker Identification
Emotion Recognition
Music Classification
Environmental Sound Classification

Python example

Compute MFCCs in Python

https://gist.github.com/viadean/17f05f66656b4b57fe82cea3ddb871c3

Explanation of the Code:

Load Audio: Uses Librosa to load an example speech file.