What to do with Python audio processing

Last Update:2017-05-04 Source: Internet

Author: User

Tags abs define stream string format

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Gui.

Time: 2017-05-03-12:18:46

Links: http://www.cnblogs.com/xingshansi/p/6799994.html

Objective

This article mainly records the operation of the audio commonly used under Python, as an example of a. wav format file. In fact, there are a lot of ready-made Audio toolkit online, if just call, toolkit is more convenient.

1, bulk read. wav file name:

Import OsFilePath = "./data/" #添加路径filename = Os.listdir (filepath) #得到文件夹下的所有文件名称 for  file in filename:    print ( Filepath+file)

Here we use the string path:

1.通常意义字符串(str)2.原始字符串，以大写R 或 小写r开始，r‘‘，不对特殊字符进行转义3.Unicode字符串，u‘‘ basestring子类

Such as:

Path = './file/n ' path = R '. \file\n ' path = '. \\file\\n '

The three are equivalent, the right dash \ is the escape character, and the quotation mark preceded by R denotes the original string without escaping (R:raw string).

Common ways to get help:

>>> help(str)>>> dir(str)>>> help(str.replace)

2. Read. wav files

Wave.open usage:

Wave.open (File,mode)

Mode can be:

' RB ', read the file;

' WB ', write files;

Simultaneous read/write operations are not supported.

Wave_read.getparams usage:

f = wave.open (file, ' rb ') params = F.getparams () nchannels, Sampwidth, framerate, nframes = Params[:4]

One of the most common audio parameters for the last behavior:

Nchannels: Number of channels

Sampwidth: Quantization number of digits (byte)

Framerate: Sampling Frequency

Nframes: Sample Count

Single channel

Corresponding Code:

Import Waveimport Matplotlib.pyplot as Pltimport numpy as Npimport OsFilePath = "./data/" #添加路径filename = Os.listdir (Filepa TH) #得到文件夹下的所有文件名称  f = wave.open (filepath+filename[1], ' rb ') params = F.getparams () nchannels, Sampwidth, Framerate, Nframes = Params[:4]strdata = F.readframes (nframes) #读取音频, string format wavedata = np.fromstring (strdata,dtype=np.int16) # Convert string to Intwavedata = wavedata*1.0/(max (ABS (Wavedata)) #wave幅值归一化 # plot The Wavetime = Np.arange (0,nframes) * (1.0/ framerate) Plt.plot (time,wavedata) Plt.xlabel ("Time (s)") Plt.ylabel ("amplitude") plt.title ("Single channel Wavedata" ) Plt.grid (' on ') #标尺, on: There, off: none.

Result diagram:

Multi-Channel

Here the number of channels is 3, mainly with the help of Np.reshape, the other same single channel processing is exactly the same, corresponding code:

#-*-Coding:utf-8-*-"" "Created on Wed May 3 12:15:34 2017@author:nobleding" "" Import waveimport Matplotlib.pyplot as P Ltimport NumPy as Npimport OsFilePath = "./data/" #添加路径filename = Os.listdir (filepath) #得到文件夹下的所有文件名称 f = Wave.open (Filepa Th+filename[0], ' rb ') params = F.getparams () nchannels, Sampwidth, framerate, nframes = Params[:4]strdata = F.readframes ( Nframes) #读取音频, string format wavedata = np.fromstring (strdata,dtype=np.int16) #将字符串转化为intwaveData = wavedata*1.0/(max (ABS ( Wavedata))) #wave幅值归一化waveData = Np.reshape (Wavedata,[nframes,nchannels]) f.close () # plot The Wavetime = Np.arange (0, Nframes) * (1.0/framerate) plt.figure () Plt.subplot (5,1,1) Plt.plot (time,wavedata[:,0]) Plt.xlabel ("Time (s)") Plt.ylabel ("amplitude") plt.title ("Ch-1 wavedata") Plt.grid (' on ') #标尺, on: There, off: none. Plt.subplot (5,1,3) Plt.plot (time,wavedata[:,1]) Plt.xlabel ("Time (s)") Plt.ylabel ("amplitude") plt.title ("Ch-2 Wavedata ") Plt.grid (' on ') #标尺, on: There, off: none. Plt.subplot (5,1,5) Plt.plot (time,wavedata[:,2]) Plt.xlabel ("Time (s)") Plt.ylabel ("AmpliTude ") plt.title (" Ch-3 wavedata ") Plt.grid (' on ') #标尺, on: There, off: none. Plt.show ()

：

Single channel is a special case of multichannel, so the multi-channel read mode for any channel WAV files are applicable. It is important to note that Wavedata is different from the previous data structure after reshape. That is, wavedata[0] is equivalent to the wavedata before reshape, but does not affect the mapping analysis, but only when analyzing the spectrum is necessary to consider this point.

3. wav Write

The main directives involved are three:

Parameter settings:

Nchannels = 1 #单通道为例sampwidth = 2FS = 8000data_size = Len (outdata) framerate = Int (fs) nframes = Data_sizecomptype = "NONE" C Ompname = "Not Compressed" outwave.setparams ((Nchannels, Sampwidth, framerate, Nframes, Comptype, Compname))

The storage path and file name of the WAV file to be written to:

outfile = filepath+ ' out1.wav ' Outwave = Wave.open (outfile, ' WB ') #定义存储路径以及文件名

Write the data:

For V in Outdata:        outwave.writeframes (Struct.pack (' h ', int (v * 64000/2)) #outData: 16-bit, -32767~32767, take care not to overflow

single-channel data write :

import wave#import Matplotlib.pyplot as Pltimport numpy as Npimport osimport Struct#wav file Reads filepath = "./data/" #添加路径filename = Os.listdir (filepath) #得到文件夹下的所有文件名称 f = Wave.open (filepath+filename[ 1], ' RB ') params = F.getparams () nchannels, Sampwidth, framerate, nframes = Params[:4]strdata = F.readframes (nframes) #读取音频 , string format wavedata = np.fromstring (strdata,dtype=np.int16) #将字符串转化为intwaveData = wavedata*1.0/(max (ABS (Wavedata))) # Wave amplitude Normalization f.close () #wav文件写入outData = wavedata# data to be written to WAV, wavedata data is still taken here outfile = filepath+ ' out1.wav ' Outwave =  Wave.open (outfile, ' WB ') #定义存储路径以及文件名nchannels = 1sampwidth = 2FS = 8000data_size = Len (outdata) framerate = Int (fs) nframes = Data_sizecomptype = "NONE" Compname = "Not Compressed" outwave.setparams ((Nchannels, Sampwidth, Framerate, Nframes, CO Mptype, Compname)) #outData: 16-bit, 64000/2 (struct.pack (' h ', int (v * -32767~32))) 767, be careful not to overflow outwave.close ()

multi-channel data write :

Multi-channel write is similar to multi-channel read, multi-channel reading is the one-dimensional data reshape to two-dimensional, multi-channel writing is the two-dimensional data reshape as one dimension, in fact, is a reverse process:

Import Wave#import Matplotlib.pyplot as Pltimport numpy as Npimport osimport struct#wav file Read filepath = "./data/" #添加路径filen Ame= Os.listdir (filepath) #得到文件夹下的所有文件名称 f = Wave.open (filepath+filename[0], ' rb ') params = F.getparams () nchannels, Sampwidth, framerate, nframes = Params[:4]strdata = F.readframes (nframes) #读取音频, string format wavedata = np.fromstring (StrData, dtype=np.int16) #将字符串转化为intwaveData = wavedata*1.0/(max (ABS (wavedata)) #wave幅值归一化waveData = Np.reshape (wavedata,[ Nframes,nchannels]) f.close () #wav文件写入outData = wavedata# The data to be written to WAV, here still takes wavedata data Outdata = Np.reshape (outdata,[ nframes*nchannels,1]) outfile = filepath+ ' out2.wav ' Outwave = Wave.open (outfile, ' WB ') #定义存储路径以及文件名nchannels = 3sampwidth = 2FS = 8000data_size = Len (outdata) framerate = Int (fs) nframes = Data_sizecomptype = "NONE" compname = "Not comp Ressed "Outwave.setparams ((Nchannels, Sampwidth, framerate, Nframes, Comptype, Compname)) for V in Outdata:outwav E.writeframes (Struct.pack (' h ', int (v * 64000/2))) #outData: 16-bit, -32767~32767, be careful not to overflow outwave.close ()

Here to use the struct.pack (.) Binary Conversions :

For example:

4. Audio playback

WAV file playback needs to be used to Pyaudio, install the package click here. I'll put it in the \scripts folder, CMD and switch to the corresponding directory

Pip Install PYAUDIO-0.2.9-CP35-NONE-WIN_AMD64.WHL

Pyaudio installation is complete.

Pyaudio Main usage:

Main list parameters of the open () method of the Pyaudio object:

- Rate: Sample Rates
- Channels: Number of channels
- Format: The quantitative format of sampled values can be PaFloat32, PaInt32, PaInt24, PaInt16, PaInt8, and so on. In the following example, use Get_from_width () to convert a value of 2 to Sampwidth to paInt16.
- Input: Enter stream flag, ture indicates start input stream
- Output: Export stream flag

Give the corresponding code:

Import waveimport pyaudio  import os#wav file Read filepath = "./data/" #添加路径filename = Os.listdir (filepath) #得到文件夹下的所有文件名称  f = wave.open (filepath+filename[0], ' rb ') params = F.getparams () nchannels, Sampwidth, framerate, nframes = Params[:4 ] #instantiate pyaudio  p = pyaudio. Pyaudio ()  #define STREAM chunk   chunk = 1024x768  #打开声音输出流stream = p.open (format = P.get_format_from_width ( Sampwidth),                 channels = nchannels, rate                 = framerate,                  output = True)  #写声音输出流到声卡进行播放data = F.readframes ( Chunk)  i=1while True:     data = f.readframes (chunk)    if data = = B ': Break    stream.write (data)    F.close () #stop stream  stream.stop_stream ()  stream.close ()  #close pyaudio  p.terminate ()

Because it is python3.5, the judgment statement if data = = b': Break's B cannot be missing.

5. Signal Plus window

Usually the signal is truncated, the frame needs to be added to the window, because truncation has a frequency-domain energy leakage, and window function can reduce the impact of truncation.

The window function is in the scipy.signal Signal processing toolbox, such as the Hamming window:

Import scipy.signal as Signalpl.plot (signal.hanning (512))

Using the above function, draw the Hanning window:

Import Pylab as Plimport scipy.signal as Signalpl.figure (figsize= (6,2)) Pl.plot (signal.hanning (512))

6, signal sub-frame

The theoretical basis of the signal sub-frame, where x is the voice signal, W is the window function:

Window truncation similar sampling, in order to ensure that the adjacent frame is not too large, usually frame and frame movement between frames, in fact, is the role of interpolation smoothing.

Give the following:

7, the language spectrum map

In fact, the sub-frame signal, frequency domain change in exchange for amplitude, you can get the spectrogram, if only to observe, Matplotlib.pyplot has specgram directive:

Import Waveimport Matplotlib.pyplot as Pltimport numpy as Npimport OsFilePath = "./data/" #添加路径filename = Os.listdir (Filepa TH) #得到文件夹下的所有文件名称  f = wave.open (filepath+filename[0], ' rb ') params = F.getparams () nchannels, Sampwidth, Framerate, Nframes = Params[:4]strdata = F.readframes (nframes) #读取音频, string format wavedata = np.fromstring (strdata,dtype=np.int16) # Convert string to Intwavedata = wavedata*1.0/(max (ABS (wavedata)) #wave幅值归一化waveData = Np.reshape (Wavedata,[nframes,nchannels] ). Tf.close () # Plot the Waveplt.specgram (Wavedata[0],fs = framerate, Scale_by_freq = True, sides = ' default ') Plt.ylabel (' Fre Quency (Hz) ') Plt.xlabel (' Time (s) ') Plt.show ()

What to do with Python audio processing

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More