WEBRTC Audio-related Neteq (ii)

Last Update:2018-08-01 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The previous article (WEBRTC Audio-related Neteq (a)) is an overview of Neteq, know that it is mainly used to solve the network delay jitter drops and other problems to improve the voice quality, but also know that it has two large units of MCU and DSP components. MCU is mainly received from the network of voice RTP packets into the packet buffer, but also based on the calculated network delay and jitter buffer delay and the feedback from the DSP unit to determine what control command (command mainly has normal play, acceleration, deceleration, packet loss compensation, fusion, etc.) , the voice packet is also taken out of the packet buffer for processing by the DSP unit. DSP mainly is to take out the voice packet decoding and according to the control command given by the MCU to do signal processing. In this article we continue to talk about Neteq, the main data structure.

To study a module, we first have to figure out its data structure. The topmost structure in Neteq is the maininst_t (that is, the NETEQ structure), which consists of two member variables, one dspinst_t and the other mcuinst_t, which corresponds to the Neteq two unit DSP and MCU. See Figure 1 (the secondary member variable is ignored here, the same below). Generates an instance of Neteq at Neteq initialization, which contains two sub-instances of DSP and MCU.

Figure 1

First look at the DSP structure, see figure 1. Pw16_readaddress and pw16_writeaddress are used to interact with the MCU data. The MCU also has these two member variables, first of all the MCU and DSP how to interact. The MCU will give the DSP control command to execute what signal processing algorithm, the command-related data written in its own pw16_writeaddress address, so that the DSP to the address to take data, that is, the DSP pw16_readaddress is the MCU pw16_ Writeaddress. DSP after processing a frame will give the MCU feedback data, the feedback data is written in their pw16_writeaddress address, the MCU from this address to read the feedback data, that is, the MCU pw16_readaddress is the DSP pw16_ Writeaddress. Main_inst points to the parent struct Neteq (maininst_t), which is a common modus operandi for finding the parent struct instance. Speechbuffer (voice buffer) is used to store decoded and signal-processed voice data. It is divided into two pieces, one is already played voice data, and the other is not played the voice data will be played, member variable curposition is the boundary point. The other member variable endposition represents the size of the speech buffer, depending on the sample rate. The relationship of these three member variables can be represented in Figure 2:

Figure 2

The Endtimestamp is used to record the last timestamp of the voice data that is not played in the speech buffer (the MCU gives the DSP the timestamp of the current frame in the control command, which can be obtained by conversion after decoding Endtimestamp). FS is the sample rate. W16_framelen is the number of sample points per frame. W16_mode is the current frame processing method (acceleration or deceleration processing, etc.), this value will give the MCU,MCU based on this and network delay jitter buffer delay, etc. to determine the next frame of processing commands. Pw16_speechhistory and W16_speechhistorylen are used for packet loss compensation (PLC, Control command is expand), pw16_speechhistory to put the most recently played historical voice data, In order to make the PLC, we use these historical voice data as the reference data to produce the compensated voice data. W16_speechhistorylen is the length of the buffer (i.e. pw16_speechhistory) that puts the historical voice data, which is a fixed value, depending on the sampling rate. There are several sub-instances in the DSP structure, mainly decoder instance (codecfuncinst_t), packet loss Compensation instance (expandinst_t) and background Noise generation (bgninst_t).

As for the data interaction between the MCU and the DSP, let's look at what data is interacting. MCU to DSP is the control command, control command data accounted for 3 short size, the first short is command-related, the 23rd is timestamp high 16-bit and low 16-bit. DSP is sent to the MCU is the feedback data, the structure of the feedback Data 3:

Figure 3

Playedoutts represents the timestamp of the last data in the speech buffer, equal to the Endtimestamp in the DSP structure body. Samplesleft indicates the length of data that is not always played by the speech buffer. Lastmode represents the processing method of the previous frame, which is equal to the W16_mode in the DSP structure body. Framelen represents the growth of the previous frame after decoding.

Look at the MCU structure again, see Figure 1. The First_packet is set to 1 at initialization and then 0 after the packet is received. It is used primarily to assign values to some member variables (such as SSRC) to the MCU after the first packet is received. Pw16_readaddress and pw16_writeaddress are used to interact with the DSP data, as in the DSP. The Main_inst is the same as in DSP. There are also two main instances in the MCU, one of which is packetbuffer_inst, which is used to store voice packets received from the network. The other is Bufferstat_inst, which is used to count network delays and so on. Both of these are very important structural bodies. First say packetbuffer_inst, it's definition 4:

Figure 4

The initialization will allocate a block of buffer that can hold the maximum number of voice packets (defined in advance). The contents of the deposit are timestamp/payloadlocation (payload address of the package, point to payload)/seqnumber/payloadtype/payloadlengthbytes/rcuplcntr/ Waitingtime/payload, etc. (see the Red box inside the section). Storage is not the timestamp/payload of each package and so on, but all the package timestamp put together, all the package sequence number together, other also, so that the following buffer distribution Figure 5:

Figure 5

Figure 5 does not look intuitive. Neteq has the concept of slots, each package timestamp/payload and so on in the same slot, so that figure 5 can be represented as Figure 6 (the figure of each buffer is continuous, the end of a buffer is the next buffer header, For example, timestamp's tail is the payload location of the first), so it seems more intuitive. To get the properties or payload of a package, you can get it through slot_index. For example, to obtain a No. 0 packet of timestamp, it can be expressed as timestamp[0]. The other member variables in the structure can be well understood by how the storage package is made clear. Packsizesamples indicates how many sample points the last decoded packet has. The startpayloadmemory represents the starting address for the payload to be placed. MEMORYSIZEW16 indicates that the allocated buffer has the remaining memory size. Currentmemorypos represents the start address of the payload of a package, which is placed in the corresponding payloadlocation of the next package. After the next package, Currentmemorypos adds the payload length of the package to Currentmemorypos and serves as the starting address for the payload to drop the next packet. Numpacketsinbuffer indicates how many packets are placed in packet buffer. Insertposition indicates where the next package will be placed. Maxinsertpositions represents the maximum number of packets that can be put in packet buffer. Discardedpackets indicates the number of packets that are actively discarded.

Figure 6

Besides Bufferstat_inst, it's defined by 7:

Figure 7

W16_noexpand indicates that the processing of the previous packet is not expand. The avgDelayMsQ8 represents the average buffer delay. The maxdelayms represents the maximum buffering delay. Automodeinst_t is a sub-instance of Bufferstat_inst, and is mainly used to calculate network delay and jitter buffering delay. Its definition of 8:

Figure 8

Member variables are mainly divided into three parts, one is the IAT (inter-arrival time, adjacent packets to the interval) statistics related, IAT in the number of packets, assuming a packet 20Ms, two adjacent packets to the time interval is 40ms,iat 2. There is an array of size 65 to hold the value of the IAT (IAT from 0 to 64) count, based on these values to calculate the network delay statistics. The second is the IAT peak statistic correlation, with two length 8 array to hold the IAT peak, one to hold the peak amplitude, and the other to hold the peak interval. The peak interval is another parameter Peakiatcountsamp in the AutoMode structure, which is used to count the time interval for the last peak of the current detected peak distance, in units of sample points. Three are packet related, have lastseqno (last received packet sequence number), Lasttimestamp (last received packet timestamp) and so on. These are all designed to calculate optbuflevel (network delay) and bufflevelfilt (jitter buffering delay). MCU to DSP control command is based on the network delay and jitter buffer delay and the last processing method and so on. Note that some of these variables are in q format (q format related can see my previous article on the Android phone audio DSP frequency low Memory small response measures), calculate the network delay and jitter buffer delay is used in the Q format calculation, which increases the difficulty of understanding.

WEBRTC Audio-related Neteq (ii)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

WEBRTC Audio-related Neteq (ii)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

WEBRTC Audio-related Neteq (ii)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support