Pitch Detection - Harmonic Product Spectrum
The harmonic product spectrum (HPS) technique described in [5] was implemented to detect pitches in audio files. First, an audio file is read in, and the stereo channels are combined into a mono channel if necessary. Next, the audio is split into windows to better isolate the pitches in time. The window duration is selected to achieve a fine enough frequency resolution to distinguish between adjacent pitches. Above C3, a 4 Hz resolution is sufficient. In order to have an integer number of windows, the audio file is zero-padded.
Each window of the audio input Fourier transformed and downsampled into five sampled audio fragments. The magnitudes of these five sampled Fourier transforms are multiplied together to create the harmonic product. The frequency that has the largest magnitude is then selected as the fundamental frequency for that audio window and the frequency is compared to a table of frequencies and their corresponding pitches to determine what pitch is being played. This method can be prone to octave errors, but we disregard octave as we only need to know how frequently each pitch occurs to determine a song’s key. The code used to implement the pitch recognition algorithm is linked below.
The following plots show the detected pitch frequencies for a short segment of Yankee Doodle, played on a synthesizer. The audio file contained 318464 points. The first plot was created using windows of length 15000 points, which corresponds to a frequency resolution of 3.2 Hz. The played notes are correctly detected, but random pitches are assigned at the beginning and end of the clip as well as in one of the pauses between notes. The second plot was created using windows of length 31000 points, corresponding to a frequency resolution of
about 1.5 Hz. The detected frequencies and their corresponding pitches are tabulated below the plot. Using a fewer number of longer-duration windows removes most of the extraneous pitches and may work better for the key detection algorithm.


Figure 1a: Pitch detection for the Yankee Doodle audio file with length 15000 windows.
Figure 1b: Pitch detection for the Yankee Doodle audio file with length 31000 windows.

Detected frequencies and their corresponding pitches for the Yankee Doodle audio file. The pitches are accurate to the notes that were played.
DSP Used in Pitch Detection
In order to use the Harmonic Product Spectrum Algorithm, the songs needed to be split the songs up into multiple snippets. Each snippet was then windowed using a Hamming window, and then the FFT of each window was calculated. In the figure below, the effect of the Hamming window can be seen.

Figure 2: Effect of the Hamming Window on the time domain signal
The Hamming window was demonstrated in class and was useful in this project. The fundamental frequency of each snippet is important in determining the pitch of that snippet, so the data needed to be smoothed out and have little ripple. The Hamming window helps reduce the ripple and give a more accurate fundamental frequency of the signal.
Another major concept that was used in this project and was also taught in class was the Discrete Fourier Transform. This was implemented on Matlab using the Fast Fourier Transform algorithm, which was also seen in class. The Initial Analysis page shows a graph of an audio signal and the FFT of the signal.
A concept that was applied to the HPS algorithm that had not been seen in class is downsampling. This was achieved using the downsample function on Matlab. In the graph below, the effect of downsampling can be seen. This helped in getting an accurate fundamental frequency.

Figure 3: Downsampling the Original Signal