Tag: MST type deform numpy cannot create sub dia str
Resources download
#本文PDF版下载
How Python parses wav files and draws waveforms
#本文代码下载
WAV Waveform Drawing Code
#本文实例音频文件night. wav Download
Audio file Download (stone-in-the-night piano)
Objective
In the busy life now, we often listen to some songs to relax ourselves, we often hear their favorite songs from various playback software, and often we will download some songs, and now the type of audio is quite a lot, such as WAV,MP3,FLAC,AAC and many other formats, Recently, due to the need to do a waveform to analyze the WAV format audio to obtain some data, such as the acquisition of people when the recording is finished and so on. This week, we explored some ways to parse WAV and use Python to draw its waveform.
WAV file Format
Let's take a look at Wikipedia's explanation of the WAV audio format:
"waveform Audio File format (or WAV, which is known for its extension), is a coded format developed by Microsoft and IBM to store audio streams in personal computers, and is widely supported in the application software on the Windows platform. The status is similar to the AIFF in the Macintosh computer. This format is one of the applications of the resource Exchange file format (RIFF), which typically stores audio assets encoded with pulse encoding in chunks. It is also one of the most commonly used specifications in music enthusiasts. Because this audio format is not compressed, there is no distortion in sound quality, but the volume of the archive is larger in many audio formats. "
We can see the above mentioned two keywords RIFF and pulse code modulation. So let's explain what the riff"resource Exchange file format" is.
riff Format
Let's also take a look at Wikipedia's interpretation of riff.
The "resource Interchange File format (RIFF), a resource-exchange, is a common file storage format that stores data (tagged chunks) in tagged chunks, and is used to store multimedia data such as audio and video. Microsoft's AVI, ANI, WAV, etc. under Windows are all based on riff implementations.
Riff was introduced by Microsoft and IBM in 1991 in Windows 3.1 as the default multimedia file format for Windows 3.1. Riff is a reference to the interchange File format, the main difference is the big-endian, small end of the problem. Under IBM-based 80X86 series hosts, the byte order of riff is small, and in the original format of the IFF, the integer data is stored as the big-endian. "
Riff is composed of chunk, chunk is the basic unit of riff, each chunk can be regarded as the storage of a video frame of data or a frame of audio data, so let's discuss the structure of chunk how it is.
structure of the chunk
The chunk consists of a total of three parts:
- FOURCC uses a 4-byte ASIIC character to identify the type
- Size of a size data
- Data is used for storage
The structure is as follows:
Structure of the chunk
- Chunk cannot be nested under normal circumstances, but can be nested when the FourCC of chunk is "RIFF" or "LIST".
- "RIFF" The first chunk FourCC must be "RIFF", so list for FOURCC must be chunk and subchunk.
The following is a structure that contains the sub-chunk:
Contains the structure of the Subchunk
- The first four bytes of the "RIFF" chunk in the data area are known as "form Type", which records the type of the information, for example, the form type of our WAV file is "WAV"
The Form type structure is as follows:
Contains the structure of the form type
- The data area of the same FOURCC as List Subchunk also contains the list Type, which is used to represent the format of the range in the list
Pulse code modulation (PCM)
Let's take a look at the Baidu encyclopedia to explain it:
"PCM Pulse code modulation is the abbreviation of the Pulse code modulation, and pulse-coded modulation is one of the encoding methods of digital communication. The main process is to sample the voice, image and other analog signals at a certain time, so that they are discretized, at the same time the sampling value is rounded by the hierarchical unit rounding to quantify, At the same time, the sampled values are represented by a set of binary codes to indicate the amplitude of the sampled pulses. "
We can understand from the above introduction:
The three processes-sampling, quantization, and encoding-of analog signals for audio are converted into digital signals.
sampling
Sampling is due to the analog signal is continuous, through a certain frequency of the analog signal sampling, approximate to get, as the following figure in the gray box is taken to a certain frequency sampling:
Sampling process
- Sampling is to extract the analog signal at a frequency of more than twice times its signal bandwidth to the sampling signal on the time axis * discrete *, A sampling signal that can replace the original continuous audio signal can be obtained. A sampling signal sampled from a sinusoidal signal is a pulse amplitude modulated (PAM) signal, which is then detected and smoothed by the sampled signal to restore the original analog signal.
Quantization
Sampling signal discrete analog signal, its sampling value within a certain range of values, by an infinite number of possibilities exist. In order to realize the sample value in digital code, we use "rounding" method to classify the sample value as "rounded", so that the sample value within a certain range of values changes from an infinite number of values to a finite value.
Coding
The quantified sampled signal has only a limited number of desirable sample values within a certain range of values, and the symmetry of positive and negative amplitude distribution of the signal makes the numbers of positive and negative samples equal, and the quantization level symmetric distribution of positive and negative direction.
chunk information for WAV files
Wave files are made up of several Chunk. The order in which the Chunk appear in the file is: RIFF Chunk, Format Chunk, fact Chunk, Data Chunk, where the fact Chunk is a non-essential part, as shown in the structure:
Chunk composition of WAV file header file
Riff is the head chunk, and the format chunk contains various parameters information of WAV, the detailed parameter information is as follows:
- Formattag encoding of audio data, with PCM in the form of 1
- Channels channel number, mono 1, dual Channel 2
- SamplesPerSec Sample rate (number of samples per second)
- bytespersec* Audio data transfer rate
- blockalign* the size of each sample
- bitspersample* sampling accuracy for each channel
The fact chunk is because some do not use the PCM format, so a fact chunk is required to decompress the data size.
The last data block is filled with real sound data. Generally according to the WAVE_FORMAT_PCM data format storage, namely pulse code modulation PCM.
the data part of WAV
The information in the data block is determined based on the information in the format chunk. Determined by the number of quantization bits/channels/sample rate, the format of information is stored in four cases for the data region:
Format of DATA Chunk
Python Read wav file information
In Python, we can manipulate WAV files directly from a number of audio libraries, such as our own standard library wave library, as well as Eyed3,pyaudio, Audacity and so on. We do not introduce this way, we first through the traditional file operation to read the WAV file in binary form, to analyze its header file to verify our previous knowledge of the chunk. The first four bytes of the audio file are obtained by binary operation and the code is as follows: ( Our test audio is night.wav, which has been placed on GitHub, via the green icon in the upper right corner of my blog park can be linked to my GitHub interface, find the W8 directory under lab102 to get the resource, or find the top resource in this blog Park):
# Read WAV First four bytes of content -xlxw= open ("night.wav""RB "= file.read (4)print(s)
Program run:
We can see the first byte and we think that's right, it's riff, so let's read 44 bytes to see what the information is like:
# Read WAV First 44 bytes of content -xlxw= open ("night.wav""RB " = File.read ()print(s)
Program run:
We can see that the string behind riff is the form Type of wave, and fmt,data these FourCC, The other hexadecimal representation is data size/data. So the information we read in the WAV file by binary is consistent with the content in the chunk we learned earlier.
Use Wave Library to extract WAV file information
In our previous introduction can know the WAV file storage, and can easily extract the information, and we know that WAV file is the most important is the storage of sound information, this part we can also through the chunk data analysis, However, we have a more simple way to get sound in Python, that is, the use of Python's own wave library, we will introduce some of the wave library methods, to pave the way for the later article.
- First introduce Wave library
Import Wave
Open a sound file using the method Wave.open (sound file address, mode)
Where the sound file address is the WAV file location, mode and file read and write almost, such as "WB"-only write; "RB"-read-only B means open in binary mode.
Close a sound file
Gets the parameters of the WAV file (output as a tuple), followed by (number of channels, sample accuracy, sample rate, number of frames, ...)
For information on Getparams () For example night.wav for this article:
To get the sound data for each frame, the returned value is binary data, and in Python the binary data is represented by a string, such as, so we have to convert later.
Get the night.wav of the first 10 frames of data as shown:
The above is basically a common method of wave library. This is applied below.
draw the waveform of a WAV file
We often see the waveform of sound files in software such as cooledit,audition, so we use Python and the Wave library described above, supplemented by NumPy, Matplotlib to try to draw the waveform of the Night.wav file.
Let's take a brief look at the steps to draw the waveform: (Take night.wav as an example)
- Get the information in the Night.wav header file via the WAV library, such as sample rate/channel number, etc.
- Extracts information from the data area and converts the string format data to an array using NumPy
- Process data by determining the number of channels (converting an array matrix)
- Time to get each plot point (x coordinate)
- Draw a waveform diagram using the method provided by the Matplotlib Library
Let's take a look at some of these steps in detail:
processing of the data area
Because Night.wav is a two-channel WAV file, we can tell from the data region format above that the storage form is the left channel/right channel in the form of storage, so here we have to process the extracted data, here the NumPy library provides us with a good solution, We have used the shape method and the T-transpose method to change shapes, and here we give an example:
We create an array [1,2,3,4,5,6,7,8] with 8 elements, which, according to our separation method, should be divided into the left channel [1,3,5,7] and right [2,4,6,8], we can change the shape of the matrix first to make the data into two columns, the left channel, By transpose the final data, our example can be understood as follows:
Examples of dividing data into left and right channels
Matplotlib Library for plotting waveform graphs
Here we want to draw the waveform, so with Matplotlib library greatly reduce our drawing difficulty, we mainly use the Plt.subplot and Plt.plot these two methods, so we explain these two methods.
This is the method for Matplotlib drawing multiple sub-graphs, because we have here the audio file to be divided into two parts (left/right channel)
It is divided into 2x1 form, so our first drawing and the second drawing are separated by a
Plt.subplot (211) and Plt.subplot (212) are indicated as shown in:
Matplotlib Sub-chart
Plt.plot () is the method used to draw the line, and we have used three of these parameters
(x-coordinate, y-coordinate, color)
You can draw the final waveform with plot.
- Other methods are not introduced here, but Matplotlib's drawing capabilities are quite powerful.
draw the code for the Night.wav waveform
The code used to draw the WAV file waveform is as follows (here we still take night.wav as an example)
#Wave Data-xlxw#ImportImportWave as weImportNumPy as NPImportMatplotlib.pyplot as Pltdefwavread (path): Wavfile= We.open (Path,"RB") Params=wavfile.getparams () framesra,frameswav= Params[2],params[3] Datawav=wavfile.readframes (frameswav) wavfile.close () Datause= np.fromstring (Datawav,dtype =np.short) Datause.shape= -1,2Datause=Datause. T Time= Np.arange (0, frameswav) * (1.0/Framesra)returnDatause,timedefMain (): Path= Input ("The Path is:") Wavdata,wavtime=wavread (path) plt.title ("night.wav ' s Frames") Plt.subplot (211) Plt.plot (wavtime, Wavdata[0],color='Green') Plt.subplot (212) Plt.plot (wavtime, wavdata[1]) Plt.show () main ()
The program draws a waveform diagram of:
Night.wav two-channel waveform for sound files
Summary & Development
This week I learned the storage format of WAV files and how to read WAV file information in Python and draw a waveform diagram. After plotting the waveform, we can analyze the information that the waveform reveals more, such as:
- Get the characteristics of a sound
- Whether the recording person stops talking when the recording is analyzed
And so on use, can be used in many aspects, such as voice recognition, segmentation (audio segmentation) and many other scenes. I will continue to explore here.
How Python parses wav files and draws waveforms