Before reading this blog for some time, which describes how the author uses Java to implement the famous foreign Music search tool Shazam basic functions. The article mentioned also leads me to a paper and another blog about Shazam. After reading it found that the principle is not very complex, but the method of noise robustness is very good, out of curiosity decided to use Python himself to implement a simple music search tool--song Finder, its core functions are encapsulated in SFEngine
, third-party dependencies only used to scipy
.
Tool Demo
This demo shows the use of the tool under Ipython, this project is named Song Finder, and I enclose the functions of index and search in song Finder SFEngine
. The first is the simple preparation work:
In [1]: from SFEngine import *In [2]: engine = SFEngine()
After that we indexed the existing songs, and I original
prepared dozens of songs (. wav files) in the directory as Music library:
In [3]: engine.index(‘original‘) # 索引该目录下的所有歌曲
After completing the index we Song Finder
searched for a recorded song with a background noise. For this section of "Maple" in 1 minutes and 15 seconds of recording:
The return result of the tool is:
Inch[4]:engine. Search ( ' record/record0.wav ' ) original / Jay - Maple 73 Original/ Jay - Maple 31 original/ Jay - maple 10original/ Jay - Maple 28original/ I want to be happy. - Hui Mei 28
The display is the song name and the location of the clip in the song (in seconds), you can see the tool correctly retrieved the song's name, but also found its correct position in the song.
And for this piece of "fairy tale" in 1 minutes and 05 seconds of background noise more noisy recording:
The return result of the tool is:
Inch[5]:engine. Search ( ' record/record8.wav ' ) original / Light Liang - Fairy 67 Original/ Light Liang - Fairy 39 original/ Light Liang - fairy tale 33original/ Light Liang - Fairy 135original/ Light Liang - Fairy 69
You can see that despite the noisy noise, the tool can still successfully identify the corresponding song and correspond to the correct location of the song, indicating that the tool in the noisy environment has good robustness!
Project home: Github
Song Finder principle
The retrieval of a recording fragment by a given music library is a no-compromise search problem, but the search for audio is not as straightforward as searching for documents or data. To complete the search for music, the tool needs to complete the following 3 tasks:
- Extract features for all songs in the song Library
- Extract features in the same way for recording clips
- Searches for music library based on the characteristics of the recording clip, returning the most similar song and its position in the song
Feature extraction? Discrete Fourier transform!
In order to extract the characteristics of music (audio), a very straightforward idea is to get information about the pitch of the music, while the pitch is physically corresponding to the frequency information of the wave. To obtain this kind of information, a very direct approach is to use discrete Fu changes to analyze the sound, even if the sound is sampled with a sliding window, the data in the window is discrete Fourier changes, the information in the time domain is transformed into the information on the frequency domain, scipy
the interface can be easily completed. After that we will segment the frequency to extract the frequency of the maximum amplitude per frequency:
DefExtract_feature(Self,Scaled,Start,Interval):End=Start+IntervalDst=Fft(Scaled[Start:End])Length=Len(Dst)/2Normalized=Abs(Dst[:(Length-1)])Feature=[Normalized[:50].Argmax(),50+Normalized[50:100].Argmax(),100+Normalized[100:200]. Argmax (), 200 + normalized[200:300]. Argmax (), 300 + normalized[300:400]. Argmax (), 400 + normalized[400:]. Argmax ()] return feature
Thus, for a sliding window, I extracted 6 frequencies as their characteristics. For the entire segment of the audio, we call this function repeatedly for feature extraction:
DefSample(Self,FileName,Start_second,Duration=5,Callback=None):Start=Start_second*44100IfDuration==0:End=1e15Else:End=Start+44100*DurationInterval=8192Scaled=Self.Read_and_scale(FileName)Length=Scaled.SizeWhilestart < min (length end): feature = self. Extract_feature (scaledstartinterval) if callback != none: callback (filename startfeature) start += interval
Where 44100 is the audio file itself sampling frequency, 8192 is I set the sampling window (yes, so hardcode is very wrong), callback
is an incoming function, need this parameter because in different scenarios for the resulting features will have different follow-up operation.
Match Music Library
How to search efficiently is a problem when you get a lot of features of songs and recordings. An effective way is to create a special hash table, where the key is the frequency, and its corresponding value is a series (曲名,时间)
of tuple, it is recorded that a certain song at a certain time a certain feature frequency, but with the frequency of key instead of the song name or time as key.
Form..
The advantage of this is that when a feature frequency is extracted from the recording, we can find the song and time associated with that feature frequency from this hash table!
Of course, with this hash table is not enough, we can not have all the characteristics of the frequency-related songs are drawn to see who hit more times, because this will completely ignore the song timing information, and introduce some wrong match.
Our approach is that for recording t
a feature frequency at a point in time f
, find all the associated tuple from the song Library f
(曲名,时间)
, for example we got
[(s1, t1), (s2, t2), (s3, t3)]
We use time to align and get this list
[(s1, t1-t), (s2, t2-t), (s3, t3-t)]
Recorded as
[(s1, t1`), (s2, t2`), (s3, t3`)]
We do this for all the feature frequencies at all points in time, and we get a large list:
[(s1, t1`), (s2, t2`), (s3, t3`), ..., (sn, tn`)]
Counting this list, you can see which song has the most hits, and return the most hits to the (曲名,时间)
user.
Insufficient
This gadget is a hack that has been written for a few hours, and there are places to be improved, such as:
- Currently only supports music library and recording in WAV format.
- All data is in memory and music library needs to introduce better back-end storage when the volume increases
- The index should be parallelized, the match should also be parallelized, and the matching model is a typical map-reduce.
Project Home
Github
90 lines Python with a music search tool