In the previous post I described the process of creating an ad-blocker for the polish radio station “Trójka”. It uses cross-correlation and fast Fourier transform to detect the ad jingles in the internet radio stream, silences the commercials and outputs the result. In other words: it plays the internet radio stream without ads. Creating the app wasn’t my final goal, though. I wanted to bring the same feature to my home amplituner (Yamaha R-N301), which receives “Trójka” in a traditional way via the FM broadcast.
So, I already have the algorithm, but I’m missing 3 elements required to run it in the real world of my flat:
a device on which I can run the analysis,
the broadcast, which will be used as a source for the analyzer,
a way to turn the volume up and down on the amplituner.
That’s the easy one. I already have Raspberry Pi 2, which works great as a home media center. It should be able to run the Java analyzer.
the broadcast source
The internet broadcast is delayed by about 30 seconds, so I can’t use it to analyze the signal and silence the FM tuner. No, I should analyze the real-time FM broadcast. How to receive it on a Raspberry Pi? Well, it will cost about $20. There’s a lot of cheap DVB-T USB sticks on the market and apparently (because of the RTL-SDR library) they can be used to receive all kinds of signals, FM radio stations included.
After plugging-in the stick, following command will receive the broadcast on 89.5Mhz:
rtl_fm -f 89.5M -M wbfm
The stream will be redirected to the standard output as a 16-bit, little-endian, 32k, 1-channel PCM data, perfect for the Java analyzer.
controlling the amplituner
Yamaha calls R-N301 a “Network HiFi Receiver”, which means (among other things) it can be controlled from an iPhone app. Apparently, Yamaha network-enabled devices exposes a RESTful interface, which can be called with curl. Following command will set the volume level to 55:
Polish Radio Three (so-called Trójka) is famous for broadcasting good music and having non-offensive speakers. On the other hand it suffers from the number of commercial blocks between auditions. The ads, usually related to drugs or electronics are loud and irritating. Trójka accompanies me in home and at work for most of the time, so I wondered if there’s something than can be done about the ads. It seems there is.
digital signal processing
My aim is to create an app that mutes the ads. The commercial block starts and finishes with a jingle, so the potential software should recognize these specific sounds and turn off the volume between them.
I know that the area of maths/computer science dealing with these kinds of problems is called digital signal processing, but it always seems like a magic for me - so this is a great opportunity to learn something new. I spend a day or two trying to find out what mechanism can be used to analyze an audio stream looking for a jingle. And I found it, eventually - it’s called cross-corellation.
People usually describes the cross-corellation referring to the MATLAB implementation. MATLAB is an expensive application that makes it easy to perform complex mathematical operations, including DSP operations. Fortunatelly, there’s a free alternative to MATLAB, called Octave. It seems it’s quite easy to run cross-corellation on two audio files using Octave. All you have to do is to run following commands:
As you can see, there’s a peek describing the position of the jingle.wav within the audio.wav. What suprised me is the simplicity of the method - the xcorr() makes all the work, the rest of the Octave code is just for reading the files and displaying the result.
I wanted to reimplement the same algorithm in Java, so I’ll have a tool that:
reads the audio stream from standard input (eg. provided by ffmpeg),
analyses it looking for the jingles,
outputs the same stream on stdout and/or muting it.
Using stdin and stdout will allow to connect the new analyzer with other apps, responsible for providing audio stream and playing the result.
reading sound files
The first step of the Java implementation is to read the jingle (saved as a .wav file) into an array. .wav file contains some extra info, like headers, metadata, etc. while we need the raw data. The format I was looking for is PCM - it simply contains the list of numbers representing sounds. Converting a wav to PCM can be done using ffmpeg:
In this case each sample will be saved as 16-bit number, little endian. In Java such number is called short and the ByteBuffer class may be used to automatically transform the input stream into a list of short values:
ByteBuffer buf = ByteBuffer.allocate(4);
short leftChannel = buf.readShort(); // stereo stream
short rightChannel = buf.readShort();
In order to implement the xcorr() function in Java, I looked into the Octave’s source code. Without changing the final result, I was able to replace the xcorr() invocation with the following lines - they need to be rewritten to Java:
N = length(audio);
M = 2 ^ nextpow2(2 * N - 1);
pre = fft(postpad(prepad(jingle(:), length(jingle) + N - 1), M));
post = fft(postpad(audio(:), M));
cor = ifft(pre .* conj(post));
R = real(cor(1:2 * N));
It looks quite scary, but most of the functions are trivial array operations. The heart of the cross-corellation is applying the fast Fourier transform on the sound sample.
fast Fourier transform
As someone who didn’t have earlier experience with DSP, I’ve simply treated the FFT as a function that takes the array describing the sound sample and returns an array containing complex numbers representing the frequencies. This minimalistic approach worked well - I was able to run the FFT implementation from the JTransforms package and got the same results as in Octave. I guess there’s a bit of cargo cult here, but hey - it works!
running xcorr on a stream
As you may see, the algorithm above assumes that the audio is an array, in which we are looking for the jingle. That’s not exactly the case for the radio broadcast, where we have a continous stream of sound. In order to run the analysis, I created a round-robin buffer, slightly longer than the jingle I’m looking for. The incoming stream fills the buffer and once it’s full, I run the cross-corellation test. If there’s nothing found, I discard the oldest part of the buffer and then wait until it’s full again.
I experimented a bit with the buffer length and got the best results with buffer 1.5 times bigger than the jingle size.
putting it all together
Getting the stream in PCM format is easy and can be done using aforementioned ffmpeg - the command below redirects the stream into the java standard input and then plays the result:
I also prepared a simple standalone version of the analyzer, that connects to the Trójka stream on its own (without an external ffmpeg) and plays the result using javax.sound. The whole thing is a single JAR file and contains a basic start/stop UI. It can be downloaded here: radioblock.jar. If you feel uneasy about running a foreign JAR on your machine (like you should do), all the sources can be found on my GitHub.
The final goal is to mute ads on a hardware amplituner, receiving a “real” FM signal rather than some internet streams. This will be covered in the next blog post.