Speech Signal Short Time Domain Analysis preprocessing (III)

Source: Internet
Author: User

Labels: Pre-reinforced plus Window Frame-separated rectangular window hanming window

A speech signal is a time-varying signal that carries various information. Generally, there are two types of speech processing purposes: Analysis of voice signals and extraction of feature parameters for subsequent processing; and processing of voice signals, for example, noise suppression is performed on Noisy Speech in speech enhancement to obtain a relatively clean speech.

Different analysis parameters can be divided into Time Domain Analysis and transform domain (frequency domain, cepstrum domain) analysis. The time domain analysis is the simplest and most intuitive method, it directly analyzes and extracts the time domain waveforms of speech signals, including short-term speech energy and average amplitude, short-term average zero crossing rate, short-term auto-correlation function, and short-term average amplitude difference function.

The actual voice signal is a analog signal. Therefore, before digital processing of the voice signal, the analog voice signal must be S (t) sampled at the sampling cycle T, discretization is S (n). The sampling period should be determined based on the bandwidth of the analog voice signal to avoid frequency-domain mixing distortion.

Audio signal preprocessing generally includes preprocessing and window processing.

Pre-aggravated Processing

The input digital speech is pre-aggravated to increase the high-frequency part of the speech, remove the influence of lip radiation, and increase the high-frequency resolution of the speech. Generally, pre-weighting is achieved by using a high-pass digital filter with the transfer function. A is the pre-weighting coefficient, which is generally 0.9 <A <1. Set the voice sample value to x (n) at the time of N, and the result is Y (n) = x (n)-Ax (n-1) after preprocessing. Here a = 0.98. The Matlab code is as follows. For more information, see.

<Span style = "font-size: 14px;"> e=wavread('beijing.wav '); EE = E (); % select the voice from to of the original file E, you can also select another sample point R = FFT (EE, 1024); % to perform the 1024 point Fourier transformation R1 = ABS (r) on the signal EE ); % Take the absolute value of R R1 to indicate the amplitude value of the spectrum pinlv = () * 8000/512; % Yuing between the vertex and frequency yuanlai = 20 * log10 (R1 ); % Take the logarithm signal () = yuanlai () for the amplitude; % Take 256 points for the purpose of consistent dimensions when drawing [H1, F1] = freqz ([1, -0.98], [1],); % Qualcomm filter PHA = angle (H1); % phase H1 = ABS (H1) of Qualcomm filter ); % The amplitude of the Qualcomm filter R2 () = R (); U = R2. * H1 '; % multiplying the signal frequency domain with the high-pass filter frequency domain is equivalent to Convolution U2 = ABS (u) in the time domain; % obtaining the absolute amplitude U3 = 20 * log10 (U2 ); % logarithm of amplitude % UN = filter ([1,-0.98], [1], ee); % UN is the time domain signal figure (1) after high-frequency enhancement ); subplot (211); plot (F1, H1); Title ('amplitude frequency of high-pass filter'); xlabel ('frequency/Hz '); ylabel ('amplitude '); subplot (212); plot (PHA); Title ('Phase response' of high pass filter'); xlabel ('frequency/Hz '); ylabel ('angle/radians '); figure (2); subplot (211); plot (pinlv, signal); Title ('original Speech Signal spectral '); xlabel ('frequency/Hz '); ylabel ('amplitude/db'); subplot (212); plot (pinlv, U3); Title ('voice Signal Spectrum filtered by Qualcomm '); xlabel ('frequency/Hz '); ylabel ('amplitude/db'); </span>

The following figure shows the Qualcomm filter and the pre-aggravated filter:

Two-plus window processing

After pre-aggravated digital filtering, the next step is window sharding. Voice signals change over time and are divided into two categories: voiced and voiced, due to the inertial movement of the pronunciation organ, it can be considered that the speech signal remains unchanged for a short period of time (generally 10 ms-30 ms), that is, the speech signal has a short-term stability. In this way, the voice signal can be divided into some short segments for processing, and the frame separation of the voice signal is achieved by weighted by a movable Limited-length window, generally, the number of frames per second is 3-. Although frame separation can adopt continuous segmentation, the overlapping segmentation method is generally used. In order to ensure smooth transition between frames and keep continuity, the overlapping part of the previous frame and the next frame is frame shifting, and the ratio of frame shifting to frame length is generally 0-1/2.

There are two types of window: rectangular window and Hamming window. The window function is as follows:

(1) rectangular window (2) Hamming window

The time domain and frequency domain waveforms of rectangular windows. The Window Length is n = 61. The Matlab code is as follows:

<Span style = "font-size: 14px;"> % Program 3.2: juxing. MX = linspace (0,100,100 01); % between 0 and ~ 100 x-axis values H = zeros (10001, 2000); % 0 h () = 0 for matrix H; % 0 for the first 2002 values H: 8003) = 1; % window length, the window value is 1 h (8004: 10001) = 0; % The Last 2000 values are 0 value figure (1 ); % define Image number subplot (, 1) % draw the first subgraph plot (x, H, 'k'); % draw a waveform, x, H, k indicates the black title ('rectangular window Time Domain Waveform '); % figure title xlabel ('sample point'); % x coordinate name ylabel ('amplitude '); % ordinate name axis ([0,100,-0.5, 1.5]) % limit the range of horizontal and vertical coordinates line ([0,100], []) % draw the X axis W1 = linspace, 61); % get 61 points in the Window Length W1 (1: 61) = 1; % value 1, equivalent to the rectangle window W2 = FFT (W1, 1024 ); % Fourier transform W3 = W2/W2 (1) % amplitude normalization W4 = 20 * log10 (ABS (W3) for time domain signals at 1024 points )); % logarithm of the normalization amplitude W = 2 * [0: 1023]/1024; % frequency normalization subplot (1, 2); % draw the second subgraph plot (W, W4, 'k ') % axis ([,-]) % specifies the range of horizontal and vertical coordinates ('rectangular window amplitude characteristic '); % image title xlabel ('normalization frequency f/fs'); % x coordinate name ylabel ('amplitude/db'); % Y coordinate name </span>

The Matlab code of the hanming window is as follows:

<Span style = "font-size: 14px;"> X = linspace (20, 80, 61); % in 20 ~ Take 61 values for the abscissa of 80 as the abscissa point H = Hamming (61); % Take the Hamming window value of 61 points as the ordinate value figure (1 ); % plot subplot (, 1); % The first subgraph plot (x, H, 'k'); % x, H, k indicates the black title ('hamming window Time Domain Waveform '); % figure title xlabel ('sample point'); % x coordinate name ylabel ('amplitude '); % ordinate name W1 = linspace (, 61); % 61 points in the Window Length W1 (1: 61) = Hamming (61 ); % + Hamming window W2 = FFT (W1, 1024); % perform the 1024-point Fourier transformation W3 = W2/W2 (1) on the time domain signal ); % amplitude normalization W4 = 20 * log10 (ABS (W3) % logarithm of the normalization amplitude W = 2 * [0: 1023]/1024; % frequency normalization subplot (, 2) % plot the second subgraph plot (W, W4, 'k') % plot the feature graph axis ([,-, 0]) % define the range of horizontal and vertical coordinates title ('hamming window amplitude characteristic '); % image title xlabel ('normalization frequency f/fs '); % abscissa name ylabel ('amplitude/db'); % ordinate name </span>

(1) rectangular window (2) Hamming window

The figure shows that the hanming window has a smoother low-pass feature, which can reflect the frequency characteristics of short-term signals to a higher extent.

The window addition method is described. The window sequence moves from left to right along the voice sample point value, and the window W (n) length is N. After determining the window function, the frame separation processing of the voice signal is actually a transformation or operation on each frame. This transformation is represented by T [], and x (n) is the input voice signal, W (n) is a window sequence, and H (n) is a filter related to W (N). After processing, the output of each frame can be expressed



Zookeeper

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.