Methods - Jackson Filter

We sampled several screams (masks) from the song “Black or White.” After trying various combinations of filtering, signal averaging, etc, to obtain the ‘perfect’ mask, we settled on one raw sample that had the cleanest sounding scream. The duration of a typical scream is approximately 250 msec.

JTFA was performed on the mask with Matlab’s specgram.m using a Hanning window. Given the sample rate was 44.1KHz, our time windows were 2.3msec long (100pts), with a time step of 0.54msec (20pts)  (ie, an overlap of 80pts, or 1.8msec). (We also wrote our own ‘specgram’ and a corresponding inverse function, but the execution time was suboptimal.)

Our program then entered a loop where it sent a sub-second clip of the song to the sound card, computed the corresponding spectral analysis and correlation on the sound clip, scanned the correlation to see if it crossed our pre-defined threshold, and then repeated the process until the song finished. The program also recorded the instant in time (in seconds) that a scream occurred.

For the cross correlation, we see that:

This implies that we can compute the correlation by convolving the reversed mask with the signal in question. However, we can also do this utilizing the FFT, b/c convolution in one domain transforms to multiplication in the other. We compared the two methods and found the FFT method was over twice as fast, making our ‘real-time’ analysis feasible. To further speed the comparison, we computed the correlation over a small set of frequencies that contained most of the energy in the mask. The most prominent bin was centered at 882Hz (approx. a concert ‘A’), containing the range 661.5Hz < f < 1102.5Hz. 

We tested our program on two songs, ‘Black or White,’ and ‘The Way You Make Me Feel.’