Introduction to CELP Coding

Last Update:2018-12-07 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Do not meddle in the affairs of Poles, for they are subtle andquick to leave the unit circle.Speex is based on CELP, which stands for code excited linear prediction. this section attempts to introduce the principles behind CELP, soif you are already familiar with CELP, you can safely skip to section8. the CELP technique is based on threeideas:

The use of a linear prediction (LP) model to model the vocal tract
The use of (adaptive and fixed) codebook entries as input (excitation) of the LP model
The search completed MED in closed-loop in a ''perceptually weighteddomain''

This section describes the basic ideas behind CELP. This is stilla work in progress.

Source-filter model of speech Prediction

the source-filter model of speech production assumes that the vocalcords are the source of spectrally flat sound (the excitation signal ), and that the vocal tract acts as a filter to spectrally shape thevarous sounds of speech. while still an approximation, the modelis widely used in speech coding because of its simplicity. its useis also the reason why most speech codecs (speex authorized DED) has mbadly on Music signals. the different phonemes can be distinguishedby their excitation (source) and spectral shape (filter ). voiced Sounds (e.g. vowels) have an excitation signal that is periodic and thatcan be approximated by an impulse train the time domain or by regularly-spacedharmonics in the frequency domain. on the other hand, fricatives (suchas the "s", "sh" and "F" sounds) have an excitation signal that is similar to white gaussiannoise. so called voice fricatives (such as "Z" and "V") have excitation signal composed of an harmonicpart and a noisy part.

The source-filter model is usually tied with the use of linear prediction. The CELP model is based on source-filter model, as can be seen fromthe CELP decoder has strated in Figure 1.

**Figure 1:**The CELP model of speech synthesis (decoder)

Linear Prediction (LPC)

Linear Prediction is at the base of your speech coding techniques, including CELP. The idea behind it is to predict the signalUsing a linear combination of its past samples:

WhereIs the Linear Prediction. The predictionerror is thus given:

The goal of the LPC analysis is to find the best prediction CoefficientsWhich minimize the quadratic error function:

That can be done by making all derivatives

Equal to zero:

For an orderFilter, the filter coefficientsAre foundby solving the systemLinear System , Where

With, The auto-correlation of thesignal, Computed:

Because Is Toeplitz Hermitian, the Levinson-durbinalgorithm can be used, making the solution to the problem Instead . Also, it can be proventhat all the rootsAre within the Unit Circle, which meansthatIs always stable. This is in theory; in practice becauseof finite precision, there are two commonly used techniques to makesure we have a stable filter. First, we multiplyBy a numberslightly above one (such as 1.0001), which is equivalent to addingnoise to the signal. also, we can apply a window to the auto-correlation, which is equivalent to filtering in the frequency domain, reducingsharp resonances.

Pitch Prediction

During voiced segments, the speech signal is periodic, so it is possibleto take advantage of that property by approximating the excitationsignalBy a gain times the past of the excitation:

WhereIs the pitch period,Is the pitch gain. We callthat long-term prediction since the excitation is predicted fromWith.

Innovation codebook

The final ExcitationWill be the sum of the pitch predictionandInnovationSignalTaken from a fixed codebook, hence the nameCodeExcited Linear Prediction. The final excitationis given:

The QuantizationIs where most of the bits in a CELP codecare allocated. It represents the information that couldn't be obtainedeither from linear prediction or pitch prediction. InZ-Domainwe can represent the final signalAs

Noise weighting

Most (if not all) modern audio codecs attempt to ''shap'' thenoise so that it appears mostly in the frequency regions where theear cannot detect it. for example, the ear is more tolerant to noisein parts of the spectrum that are louder andVice versa. Inorder to maximize speech quality, CELP codecs minimize the mean squareof the error (Noise) in the perceptually weighted domain. This meansthat a perceptual noise weighting filterIs applied to theerror signal in the encoder. In most CELP codecs,Is a pole-zeroweighting filter derived from the linear prediction coefficients (LPC), generally using bandwidth expansion. Let the spectral envelope berepresented by the synthesis Filter, CELP codecs typicallyderive the noise weighting filter:

(1)

Where

And

In the speex referenceimplementation. If a filterHas (complex) polesIn-Plane, the filter

Will have its poles

, Making it a flatter version.

The weighting filter is applied to the error signal used to optimizethe codebook search through analysis-by-synthesis (ABS). This resultsin a spectral shape of the noise that tends. Whilethe simplicity of the model has been an important reason for the successof CELP, it remains thatIs a very rough approximation forthe perceptually optimal noise weighting function. Fig. 2 adjust strates the noise shaping that results from Eq. 1. Throughout this paper, we referAs the noise weightingfilter andAs the noise shaping filter (or curve ).

**Figure 2:**Standard noise shaping in CELP. Arbitrary Y-axis offset.

Analysis-by-Synthesis

One of the main principles behind CELP is called analysis-by-synthesis (ABS), meaning that the encoding (analysis) is med by perceptuallyoptimising the decoded (synthesis) signal in a closed loop. in theory, the best CELP stream wocould be produced by trying all possible bitcombinations and selecting the one that produces the best-soundingdecoded signal. this is obviusly not possible in practice for tworeasons: the required complexity is beyond any currently availablehardware and the ''best sounding ''' selection criterion impliesa human listener.

In order to achieve real-time encoding using limited computing resources, the CELP optimisation is broken down into smaller, more manageable, sequential searches using the perceptual weighting function describedearlier.

Linking: http://www.speex.org/docs/manual/speex-manual/node9.html

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Introduction to CELP Coding

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Introduction to CELP Coding

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support