Python builds a music search tool with 90 lines of code

Source: Internet
Author: User
This article mainly introduces Python to build a music search tool through 90 lines of code. If you need a music search tool, refer to the following small series to share the specific implementation Code as follows:

I have read this blog some time ago, which describes how the author uses java to implement the basic functions of the famous music search tool shazam. The article mentioned here leads me to a thesis about shazam and another blog. After reading it, I found that the principle is not very complex, but the method is very robust to noise. Out of curiosity, I decided to use python to implement a simple music search tool-Song Finder, its core functions are encapsulated in SFEngine, and third-party dependencies only use scipy.

Tool demo
This demo shows how to use the tool in ipython. The project name is Song Finder. I encapsulate all the indexing and searching functions in SFEngine of Song Finder. The first is simple preparation:

The Code is as follows:


In [1]: from SFEngine import *

In [2]: engine = SFEngine ()


After that, we have introduced the following songs:

The Code is as follows:


In [3]: engine. index ('original') # index all the songs In this directory


After the index is complete, we submit a recording of a background noise to Song Finder for search. For the recording of Feng at around 1 minute 15 seconds:

The tool returns the following results:

The Code is as follows:


In [4]: engine. search ('record/record0.wav ')
Original/Jay Chou-Feng 73
Original/Jay Chou-Feng 31
Original/Jay Chou-Feng 10
Original/Jay Chou-Feng 28
Original/I want to renew quick bi-hui Mei 28

The names of the songs and the positions where the clips appear in the songs (in seconds) are displayed. You can see that the tool correctly retrieves the song name, the correct position in the song is also found.

However, the background noise of this "fairy tale" around 1 minute 5 seconds is even more noisy:

The tool returns the following results:

The Code is as follows:


In [5]: engine. search ('record/record8.wav ')
Original/Guangliang-fairy tale 67
Original/Guangliang-fairy tale 39
Original/Guangliang-fairy tale 33
Original/Guangliang-fairy tale 135
Original/Guangliang-fairy tale 69

Although the noise is very noisy, the tool can still successfully identify the corresponding song and match the correct position of the song, indicating that the tool is robust in a noisy environment!

Project homepage: Github

Song Finder principles
A given library is a perfect search problem for a recording clip, but searching for audio is not as direct as searching for documents and data. To search for music, the tool must complete the following three tasks:

Extract features from all songs in the music library
Extract features from recordings in the same way
Searches the music library based on the characteristics of the recording clip to return the most similar song and its position in the song.
Feature Extraction? Discrete Fourier transformation!
In order to extract features from music (audio), a very direct idea is to get the information of the music's pitch, while the pitch physically corresponds to the wave's frequency information. In order to obtain such information, a very direct method is to analyze the sound using discrete leaf variation, that is, to sample the sound using a sliding window, perform discrete Fourier changes on the data in the window, and convert the information in the time domain to the information in the frequency domain, which can be easily completed using the scipy interface. After that, we segment the frequency to extract the maximum amplitude frequency of each frequency:

The Code is as follows:


Def extract_feature (self, scaled, start, interval ):
End = start + interval
Dst = fft (scaled [start: end])
Length = len (dst)/2
Normalized = abs (dst [:( length-1)])
Feature = [normalized [: 50]. argmax (),\
50 + normalized [50: 100]. argmax (),\
100 + normalized [100:200]. argmax (),\
200 + normalized []. argmax (),\
300 + normalized [300: 400]. argmax (),\
400 + normalized [400:]. argmax ()]
Return feature

In this way, for a sliding window, I have extracted 6 frequencies as its features. For the entire audio segment, we call this function repeatedly for feature extraction:

def sample(self, filename, start_second, duration = 5, callback = None): start = start_second * 44100 if duration == 0:  end = 1e15 else:  end = start + 44100 * duration interval = 8192 scaled = self.read_and_scale(filename) length = scaled.size while start < min(length, end):  feature = self.extract_feature(scaled, start, interval)  if callback != None:   callback(filename, start, feature)  start += interval

Among them, 44100 is the sampling frequency of the audio file, 8192 is the Sampling window I set (right, so hardcode is very incorrect), and callback is an input function, this parameter is required because in different scenarios, the obtained features are different for subsequent operations.

Matching Database
How to perform efficient search after obtaining many features of songs and recordings. An effective method is to create a special hash table, where the key is the frequency, and its corresponding value is a series of tuple (qu name, time, it records a certain feature frequency of a song at a certain time, but uses the frequency as the key rather than the name or time as the key.

Table ..

The advantage of this is that when a feature frequency is extracted from the recording, we can find the songs and time related to the feature frequency from this hash table!

Of course, this hash table is not enough. We cannot extract all the songs related to the feature frequency to see who hits more times, this will completely ignore the timing information of the Song and introduce some incorrect matching.

In our practice, for a feature frequency f at the t time point in the recording, find all the f-related (qu name, time) tuple from the Qu library. For example, we get

The Code is as follows:


[(S1, t1), (s2, t2), (s3, t3)]


We use time for alignment to get this list

The Code is as follows:


[(S1, t1-t), (s2, t2-t), (s3, t3-t)]


As

The Code is as follows:


[(S1, t1'), (s2, t2'), (s3, t3')]


We perform the preceding operations on all feature frequencies at all time points to obtain a large list:

The Code is as follows:


[(S1, t1'), (s2, t2'), (s3, t3 '),..., (sn, tn')]


Count this list to see which song has the most hits at which point in time, and return the most hits (qu name, time) to the user.

Insufficient
This toolkit is a hack written in a few hours. It may need to be improved in some places, for example:

Currently, only music libraries and recordings in wav format are supported.
All data is stored in the memory, and better backend storage needs to be introduced when the database volume increases.
The index should be parallelized, and the matching should also be parallelized. The matching model is actually a typical map-reduce.

Python builds music search tools using 90 lines of code.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.