Challenges

The primary challenge we faced in implementing the key detection algorithm is that there is a tradeoff between the frequency resolution and the number of sampled pitches we obtain from each song. At low frequencies, where pitches are closer together, a resolution of 1 Hz or smaller is necessary to distinguish between adjacent pitches. However, achieving this resolution reduces the total number of pitch samples, which makes it more difficult to match the song to an existing key profile. To balance these two requirements, we decided to filter out frequencies below 75 Hz to maintain pitch accuracy with a 1 Hz resolution. This results in around 80% accuracy for the songs we tested.

There were three main ways in which songs were misidentified. The first is misclassification between relative major and minor keys. This occurs because major and minor keys share the same pitches but in different distributions, and a song may not match its ideal key profile. Generally, this type of misclassification is not a concern and should not have a significant effect on the playlist output because relative major and minor keys are harmonic matches for each other. The second type of misclassification occurs when the most frequently sampled pitch isn’t the tonic of the key. Since the Krumhansl-Schmuckler approach uses pitch distribution rather than chordal analysis to determine the key, identifying the loudest pitch in each sample rather than looking at only the melody or only the accompaniment can create pitch distributions that don’t match the key profiles. Finally, songs that are not in the ionian or aeolian modes (major and natural minor keys) do not have key profiles and will always be misclassified. The other five modes and other complex musical structures often occur in jazz, experimental, and non-Western music, so this key algorithm and playlist generator will work best for simple songs that follow Western music theory traditions.

The primary challenge in tempo detection is that human- and machine-detected tempo can differ. There were many cases in our testing where the output of the tempo detection algorithm differed from the expected tempo by a multiple of 0.5 or 2, which can occur because the algorithm relies on strong percussive beats to find the pulse. If the drums on a song don’t play every downbeat or are playing subdivisions, the algorithm may not choose the tempo that a human listener would. This can cause the algorithm to output excessively high tempos beyond what a human listener would tap their foot to, which could be mitigated by setting an upper tempo threshold and dividing by 2 to remain within the desirable range. Additionally, this algorithm has difficulties in finding the beat in songs that feature prominent vocals without a strong instrumental pulse. Therefore, tempo detection works best in genres with strong percussion.