Mel Frequency Cepstral Coefficients (MFCCs) are features commonly used in speech and audio processing. They provide a compact representation of the power spectrum of a signal, capturing essential characteristics for tasks such as speech recognition, speaker identification, and emotion analysis.

Steps to Compute MFCCs:

  1. Pre-emphasis: A high-pass filter is applied to the signal to boost high frequencies and improve clarity.
  2. Framing: The signal is divided into short frames (e.g., 20-40 ms) to assume stationary properties within each frame.
  3. Windowing: Each frame is multiplied by a window function (e.g., Hamming window) to reduce spectral leakage.
  4. Fast Fourier Transform (FFT): Converts the time-domain signal into the frequency domain.
  5. Mel Filter Bank Processing: The frequency spectrum is transformed to the Mel scale, which mimics human auditory perception.
  6. Logarithm and Discrete Cosine Transform (DCT): Logarithmic scaling is applied, followed by DCT to decorrelate features and extract cepstral coefficients.
  7. Selecting Coefficients: Typically, the first 12-13 coefficients (excluding the first one, which represents overall energy) are used as features.

Applications of MFCCs:

Python example

Compute MFCCs in Python

https://gist.github.com/viadean/17f05f66656b4b57fe82cea3ddb871c3

Explanation of the Code:

  1. Load Audio: Uses Librosa to load an example speech file.