This article mainly introduces Python to build a music search tool through 90 lines of code. If you need a music search tool, refer to the following small series to share the specific implementation Code as follows:
I have read this blog some time ago, which describes how the author uses java to implement the basic functions of the famous music search tool shazam. The article mentioned here leads me to a thesis about shazam and another blog. After reading it, I found that the principle is not very complex, but the method is very robust to noise. Out of curiosity, I decided to use python to implement a simple music search tool-Song Finder, its core functions are encapsulated in SFEngine, and third-party dependencies only use scipy.
Tool demo
This demo shows how to use the tool in ipython. The project name is Song Finder. I encapsulate all the indexing and searching functions in SFEngine of Song Finder. The first is simple preparation:
The Code is as follows:
In [1]: from SFEngine import *
In [2]: engine = SFEngine ()
After that, we have introduced the following songs:
The Code is as follows:
In [3]: engine. index ('original') # index all the songs In this directory
After the index is complete, we submit a recording of a background noise to Song Finder for search. For the recording of Feng at around 1 minute 15 seconds:
The tool returns the following results:
The Code is as follows:
In [4]: engine. search ('record/record0.wav ')
Original/Jay Chou-Feng 73
Original/Jay Chou-Feng 31
Original/Jay Chou-Feng 10
Original/Jay Chou-Feng 28
Original/I want to renew quick bi-hui Mei 28
The names of the songs and the positions where the clips appear in the songs (in seconds) are displayed. You can see that the tool correctly retrieves the song name, the correct position in the song is also found.
However, the background noise of this "fairy tale" around 1 minute 5 seconds is even more noisy:
The tool returns the following results:
The Code is as follows:
In [5]: engine. search ('record/record8.wav ')
Original/Guangliang-fairy tale 67
Original/Guangliang-fairy tale 39
Original/Guangliang-fairy tale 33
Original/Guangliang-fairy tale 135
Original/Guangliang-fairy tale 69
Although the noise is very noisy, the tool can still successfully identify the corresponding song and match the correct position of the song, indicating that the tool is robust in a noisy environment!
Project homepage: Github
Song Finder principles
A given library is a perfect search problem for a recording clip, but searching for audio is not as direct as searching for documents and data. To search for music, the tool must complete the following three tasks:
Extract features from all songs in the music library
Extract features from recordings in the same way
Searches the music library based on the characteristics of the recording clip to return the most similar song and its position in the song.
Feature Extraction? Discrete Fourier transformation!
In order to extract features from music (audio), a very direct idea is to get the information of the music's pitch, while the pitch physically corresponds to the wave's frequency information. In order to obtain such information, a very direct method is to analyze the sound using discrete leaf variation, that is, to sample the sound using a sliding window, perform discrete Fourier changes on the data in the window, and convert the information in the time domain to the information in the frequency domain, which can be easily completed using the scipy interface. After that, we segment the frequency to extract the maximum amplitude frequency of each frequency:
The Code is as follows:
Def extract_feature (self, scaled, start, interval ):
End = start + interval
Dst = fft (scaled [start: end])
Length = len (dst)/2
Normalized = abs (dst [:( length-1)])
Feature = [normalized [: 50]. argmax (),\
50 + normalized [50: 100]. argmax (),\
100 + normalized [100:200]. argmax (),\
200 + normalized []. argmax (),\
300 + normalized [300: 400]. argmax (),\
400 + normalized [400:]. argmax ()]
Return feature
In this way, for a sliding window, I have extracted 6 frequencies as its features. For the entire audio segment, we call this function repeatedly for feature extraction:
def sample(self, filename, start_second, duration = 5, callback = None): start = start_second * 44100 if duration == 0: end = 1e15 else: end = start + 44100 * duration interval = 8192 scaled = self.read_and_scale(filename) length = scaled.size while start < min(length, end): feature = self.extract_feature(scaled, start, interval) if callback != None: callback(filename, start, feature) start += interval
Among them, 44100 is the sampling frequency of the audio file, 8192 is the Sampling window I set (right, so hardcode is very incorrect), and callback is an input function, this parameter is required because in different scenarios, the obtained features are different for subsequent operations.
Matching Database
How to perform efficient search after obtaining many features of songs and recordings. An effective method is to create a special hash table, where the key is the frequency, and its corresponding value is a series of tuple (qu name, time, it records a certain feature frequency of a song at a certain time, but uses the frequency as the key rather than the name or time as the key.
Table ..
The advantage of this is that when a feature frequency is extracted from the recording, we can find the songs and time related to the feature frequency from this hash table!
Of course, this hash table is not enough. We cannot extract all the songs related to the feature frequency to see who hits more times, this will completely ignore the timing information of the Song and introduce some incorrect matching.
In our practice, for a feature frequency f at the t time point in the recording, find all the f-related (qu name, time) tuple from the Qu library. For example, we get
The Code is as follows:
[(S1, t1), (s2, t2), (s3, t3)]
We use time for alignment to get this list
The Code is as follows:
[(S1, t1-t), (s2, t2-t), (s3, t3-t)]
As
The Code is as follows:
[(S1, t1'), (s2, t2'), (s3, t3')]
We perform the preceding operations on all feature frequencies at all time points to obtain a large list:
The Code is as follows:
[(S1, t1'), (s2, t2'), (s3, t3 '),..., (sn, tn')]
Count this list to see which song has the most hits at which point in time, and return the most hits (qu name, time) to the user.
Insufficient
This toolkit is a hack written in a few hours. It may need to be improved in some places, for example:
Currently, only music libraries and recordings in wav format are supported.
All data is stored in the memory, and better backend storage needs to be introduced when the database volume increases.
The index should be parallelized, and the matching should also be parallelized. The matching model is actually a typical map-reduce.
Python builds music search tools using 90 lines of code.