Speech Signal Processing-Vector Quantization)

Last Update:2018-12-04 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Speech Signal Processing-Vector Quantization)

Zouxy09@qq.com

Http://blog.csdn.net/zouxy09

This semester I have a speech signal processing course, and I am about to take the test. So I need to know more about the relevant knowledge. Haha, I have no lectures at ordinary times, but now I have to stick to it. By the way, I would like to make my knowledge architecture clearer and share it with you. The third knowledge point is VQ. Because it takes a little time, there may be a lot of things wrong with it. I hope you can correct it. Thank you.

Vector Quantization (VQ, Vector Quantization) is an extremely important method for signal compression. VQ plays an important role in speech signal processing. It is widely used in speech coding, speech recognition, speech synthesis, and other fields.

I. Overview

Vectorquantization (VQ) is a Lossy Data Compression Method Based on block encoding rules. In fact, in JPEG, MPEG-4 and other multimedia compression formats have VQ this step. The basic idea is to construct several scalar data into a vector, and then quantify it in the vector space to compress the data without losing much information.

In the past, a difficulty in using VQ was that it had to solve the problem of multi-dimensional integration. Later, in 1980, Linde, buzo, and gray (LBG) proposed a training sequence-based VQ design algorithm, the application of training sequence bypasses the solution of multi-dimensional integral, so that the world has produced a classic algorithm called LBG-VQ! It has been extended so far, and the classic will never fade.

2. Knowledge preparation

VQ is actually an approximation. The idea is similar to the idea of rounding out a number. It is represented by an integer closest to a number. Let's take a look at a one-dimensional VQ example:

Here, the numbers smaller than-2 are almost-3, the numbers between-2 and 0 are almost-1, and the numbers between 0 and 2 are almost 1, all numbers greater than 2 are approximately 3. In this way, any number is approximately one of the four numbers-3,-1, 1, or 3. We only need two binary bits to encode these four numbers. So this is 1-dimenq, 2-bit VQ, and its rate (quantization rate ?) It is 2 bits/dimension.

Let's look at another two-dimensional example:

Here, we use blue solid lines to divide this graph into 16 regions. Any pair of numbers (that is, any coordinate point (x, y) composed of X and Y) will fall into a specific area in the figure above. Then it will be approximate by the Red Star points in the region. There are 16 different regions, that is, 16 red stars. Then the 16 values can be encoded with a 4-bit binary code (2 ^ 4 = 16 ). Therefore, this is a 2-dimension, 4-bit VQ, and its speed is also 2 bits/dimension. The above Red Star points are quantization vectors, indicating that any point in the graph can be quantified as one of the 16 vectors.

We can also use image compression to describe two-dimensional data. Similar to using each pixel of an image as a data, we run K-means clustering. If we aggregate the image into K classes, we will get the centroids of each class, K in total, then, these centroid pixel values are used to replace the pixel values of all vertices in the corresponding class. This will compress the image, because you only need to encode K pixel values (and index these K values for each pixel in the image) to represent the entire image. Of course, this will be distorted, and the degree of distortion is the number of K. The most extreme is that each pixel of the original image is a class, so there is no distortion, and of course there is no compression.

In the preceding two examples, Red Star is called codevectors ). The area defined by the blue edge is called the Encoding Area (encoding regions ). The set of all code vectors is called the codebook, and the set of all encoding areas is called the partition of space ).

Iii. VQ Design Problems

The VQ problem can be described as follows: Given a vector source (that is, a training sample set, each sample is a vector) with known statistical attributes and a distortion measure. The number of code vectors is also given (that is, the number of parts of the vector space to be divided, or the quantity to be quantified), and a minimum average distortion (data compression, the smaller the distortion, the better.) the code book (the set of all code vectors, that is, all the red stars above) and the Division of Space (the set of all blue lines in the figure ).

Suppose we have a training sequence (training set) with M vector sources (training samples): t = {x1, x2 ,..., Xm };

This training sequence can be obtained through some large databases. For example, if the vector source is voice, we can crop some phone recordings. We suppose M is large enough (there are enough training samples) to ensure that this training sequence contains all the statistical characteristics of the source. We assume that the source vector is k-dimensional:

XM = (XM, 1, XM, 2 ,..., XM, k), M = 1, 2 ,..., M

Assume that the number of code vectors is n (that is, we want to divide the vector space into N parts, or quantize them into N values), the code book (a set of all code vectors) C = {C1, C2 ,..., CN };

Each code vector is a K-dimensional vector: Cn = (CN, 1, CN, 2 ,..., CN, k), n = 1, 2 ,..., N;

The encoding region corresponding to the Code vector CN is expressed as Sn, and then the space is divided into: P = {S1, S2 ,..., Sn };

If the source vector XM is in the SN, its approximation (expressed by Q (XM) is CN,

Q (XM) = Cn, if XM belongs to Sn

Assuming we use the Equi-Level Error distortion, the average distortion is shown as follows:

Here |E| 2 is a Euclidean distance.

Then the design problem can be simply described as: Given t (training set) and N (number of code vectors), finding C (codebook) that can minimize Dave (average distortion) and P (Space Division ).

Iv. Optimization criteria

If C and P are the solutions of the above minimization problem, the solutions must meet the following two conditions:

1) nearest neighborcondition nearest neighbor condition:

This condition means that the SN of the Encoding Area should contain all the vectors closest to CN (compared to the distance from other code vectors ). Some decision-making methods (any tie-Breaking procedure) are required for vectors on the boundary (Blue Line ).

2) centroid condition:

This condition requires that the canal vector CN is the mean vector of all training sample vectors in the SN of the Encoding Area. In implementation, make sure that each Encoding Area must have at least one training sample Vector so that the denominator of the preceding formula is not 0.

V. LBG algorithm

The LBG-VQ algorithm is an iterative algorithm that alternately adjusts P and C (solving the above two optimization criteria ), the distortion constantly tends to its local minimum value (a bit of EM thinking ). This algorithm requires an initial codec (0 ). This initial codebook can be obtained through the split (splitting) method. This method sets an initial code vector to the average value of all training samples. Then split the code vector into two (for the splitting method, see the formula in step 1 of the LBG algorithm below, as long as it is multiplied by a disturbance coefficient ). Take these two code vectors as the initial code book, and then the iterative algorithm runs on the initial code book. Each time it splits each code vector into two, repeat this process until the required code vector number is obtained. One split is two, two split is four, and four split is eight ......

LBG algorithm:

1. Specify the training set T. A fixed distortion threshold is a very small positive number.

2. Let n = 1 (number of code vectors) and set this code vector to the average value of all training samples:

Calculate the total distortion (the total distortion is obviously the biggest at this time ):

3. Split: for I = 1, 2 ,..., N. Their code vectors are:

Let n = 2n, that is, each code vector is split (multiplied by the disturbance coefficient 1 + percentile and 1-percentile) to two, the number of code vectors after each split is twice that of the previous one.

4. iteration: Let the initial distortion be :. Set the iteration index or iteration counter to zero I = 0.

1) for each training sample M = 1, 2,… in training set T ,..., M. The minimum value searched for in all the code vectors, that is, the distance between the training sample and the code vector. We use N * to record the index of this minimum value. Then we use this code vector to approximate the training sample:

2) for n = 1, 2 ,..., N. Update all code vectors as follows:

That is to say, the training samples of all the SNS in the encoding region where CN is located take the average as the new Code vector of this encoding region.

3) iteration counter plus 1: I = I + 1.

4) calculate the total distortion of C and P at the current stage:

5) if the distortion is different from the previous one (relative distortion improvement) it is also greater than the acceptable distortion threshold (if it is smaller, it indicates that the distortion of the next iteration operation is reduced to be limited to stop iteration), then continue iteration and return step 1 ).

6) otherwise, the ultimate distortion is. For n = 1, 2 ,..., N. The final code vector is:

5. Repeat Steps 3 and 4 to reach the required number of code vectors.

Vi. Two-Dimensional Simulation

Two-dimensional simulation animation click here http://www.data-compression.com/vqanim.shtml (here spit, why does csdn not support GIF animation insertion ?)

1) The training sample set is generated by Gaussian distribution of zero mean and unit variance.

2) small green points are the training samples. There are 4096.

3) We set the threshold to Threshold = 0.001.

4) This algorithm ensures a partial attention solution.

5) The training sequence must be large enough. m> = 1000N is recommended.

VII. References:

[1] http://www.data-compression.com/vq.html

[2] talking about clustering (outside): Vector Quantization: http://blog.pluskid.org /? P = 57

[3] lbgvq. c --- C program forlbg VQ Design

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Speech Signal Processing-Vector Quantization)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Speech Signal Processing-Vector Quantization)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support