Author Karthik Krishnan
Learn how Intel integrated performance primitive (Intel IPP) provides build modules to develop VoIP applications with advanced features. Understand the build module to build a complete softphone application.
By combining voice and data into an IP network, IP phones (VoIP) are fundamentally changing the telecom industry. Intel can provide a variety of products, services and build modules to implement VOIP solutions in various domains. Intel integrated performance primitive (Intel IPP) is a software library that provides a variety of highly optimized features, such as multimedia and audio codecs. This article provides a reference for using Intel IPP for voice codecs and complete VoIP softphone implementation. A sample application has been built using Windows Sockets * for network communication, directsound * for audio capture and playback, and a broadband Decoder Using Intel IPP (GSM-AMR adaptive multirate.
Intel integrated performance primitive Intel IPP is a highly optimized cross-platform library that includes various features related to multimedia and communication software. G.168, g.167, g.711, g.722, g.722.1, g.722.2, amrwb, g.723.1, g.726, g.728, g.729, GSM-AMR and GSM-FR are by the International Telecommunication Union (ITU) * International Standards released by the European Telecommunications Standards Association (ETSI) *, 3GPP *, and other organizations. The following is an example of Speech Encoding built using Intel's integrated performance primitive as a fully compliant building module.
| Speech Encoding example |
Windows * |
Linux * |
| G.722.1 |
|
|
| GSM/wmr WB/g.722.2 |
|
|
| G.723.1 |
|
|
| G.726 |
|
|
| G.728 |
|
|
| G.729 |
|
|
| GSM-AMR |
|
|
| GSM-FR |
|
|
Note that the implementation of these standards or standards-compliant platforms may require license from various entities, including Intel. This article uses the ITU GSM-AMR (Adaptive multi-rate) as the standard codecs to be used during VoIP calls.
The link model Intel IPP provides various mechanisms to link application code to libraries, such as static links, dynamic links, and automatic scheduling. For more information, see link model (PDF 231 KB ). The included softphone applications (see links in "Other Resources") Use Dynamic Links and automatic scheduling.
The GSM-AMRGSM-AMR has a sampling rate of 16 bits per sample, 16 kHz, and supports a variety of output bit rates (6.6 kbps, 8.85 kbps, etc ). The following table lists all supported bit rates provided by Intel IPP and the corresponding output size per frame (that is, 20 ms audio input in 600 bytes ).
| Frame Type |
GSM AMR-WB (Bit Rate in Kbps) |
Number of output bits per frame |
| 0 |
6.6 |
132 |
| 1 |
8.85 |
177 |
| 2 |
12.65 |
253 |
| 3 |
14.25 |
285 |
| 4 |
15.85 |
317 |
| 5 |
18.25 |
365 |
| 6 |
19.85 |
397 |
| 7 |
23.05 |
461 |
The unified speech codec API speech codec example uses Intel IPP as the building module and contains the complete implementation of all supported codecs that fully comply with the standard. The sample code in Intel IPP 5.0 also contains a unified approach that facilitates the integration of all codecs. The following section describes some inspirations for integrating the encoding and decoding functions of GSM-AMR codecs using the unified speech codecs (USC) method. USC initialization API # ifdef _ cplusplus
Extern "C "{
# Endif
Extern usc_fxns usc_amrwb_fxns;
# Ifdef _ cplusplus
}
# Endif
// Usc_xxx_fxns is the template of all codecs
Static usc_fxns * usc_codec_fxn =
Static int nbanksenc = 0, nbanksdec = 0;
Static usc_membank * pbanksenc = NULL;
Static usc_membank * pbanksdec = NULL;
Static usc_handle huscencoder;
Static usc_handle huscdecoder;
Static usc_codecinfo pinfo;
// This will allocate memory and initialize the AMR-WB encoder/Decoder
// Handle.
Int initializecodec (INT bitrate)
{
Int I;
Freecodecmemory ();
Ippstaticinitbest (); // select the optimal code
/* Obtain the gxxx decoder information */
If (usc_noerror! = Usc_codec_fxn-> STD. getinfo (
(Usc_handle) null, & pinfo ))
Return-1;
/*
Encoder instance Creation
*/
Pinfo. Params. Direction = 0;/* direction: encoding */
Pinfo. Params. modes. VAD = 0;/* disable mute compression */
Pinfo. Params. Law = 0;/* linear PCM input */
Pinfo. Params. modes. bitrate = bitrate;
/* Learn how many memory blocks need to be used for Encoder */
If (usc_noerror! = Usc_codec_fxn-> STD. numalloc (
& Pinfo. Params, & nbanksenc ))
Return-1;
/* Allocate memory for the memory database table */
Pbanksenc = (usc_membank *) malloc (sizeof (usc_membank) * nbanksenc );
/* Query the required size of each memory block */
If (usc_noerror! = Usc_codec_fxn-> STD. memalloc (
& Pinfo. Params, pbanksenc ))
Return-1;
/* Allocate memory for each memory block */
For (I = 0; I <nbanksenc; I ++)
{
Pbanksenc [I]. pmem = (char *) malloc (pbanksenc [I]. nbytes );
}
/* Create an encoder instance */
If (usc_noerror! = Usc_codec_fxn-> STD. INIT (
& Pinfo. Params, pbanksenc, & huscencoder ))
Return-1;
/*
Decoder instance Creation
*/
Pinfo. Params. Direction = 1;/* direction: Decoding */
/* Learn how many memory blocks need to be used for decoder */
If (usc_noerror! = Usc_codec_fxn-> STD. numalloc (
& Pinfo. Params, & nbanksdec ))
Return-1;
/* Allocate memory for the memory database table */
Pbanksdec = (usc_membank *) malloc (sizeof (usc_membank) * nbanksdec );
/* Query the required size of each memory block */
If (usc_noerror! = Usc_codec_fxn-> STD. memalloc (& pinfo. Params, pbanksdec ))
Return-1;
/* Allocate memory for each memory block */
For (I = 0; I <nbanksdec; I ++)
{
Pbanksdec [I]. pmem = (char *) malloc (pbanksdec [I]. nbytes );
}
/* Create a decoder instance */
If (usc_noerror! = Usc_codec_fxn-> STD. INIT (
& Pinfo. Params, pbanksdec, & huscdecoder ))
Return-1; return 1;
} The USC encoding API/* assumes that the initialization is complete. Once a VoIP call is started, the bitrate of the sample softphone cannot be changed. Modifying to support the variable bit rate per frame is straightforward. */
Int encodeoneframe (char * SRC, char * DST) // The speed is determined.
{
Usc_pcmstream in;
Usc_bitstream out;
In. pbuffer = SRC;
Out. pbuffer = DST;
In. bitrate = pinfo. Params. modes. bitrate;
In. nbytes = pinfo. framesize;
In. pcmtype. bitpersample = pinfo. pcmtype. bitpersample;
In. pcmtype. sample_frequency = pinfo. pcmtype. sample_frequency;
/* Encode the frame */
If (usc_noerror! = Usc_codec_fxn-> STD. encode (huscencoder,
& In, & out ))
{
Debugbreak (); // should not occur
Return-1;
} Return out. frametype;
} USC decoding apiint decodeoneframe (char * SRC, char * DST, int frametype)
{
Usc_bitstream in;
Usc_pcmstream out;
In. pbuffer = SRC;
In. frametype = frametype; // rx_speech_good;
In. bitrate = pinfo. Params. modes. bitrate;
/* Evaluateencodedbytesize should return the output of the supported Bit Rate
Bytes. Please note that this value should be rounded
The closest limit
*/
In. nbytes = evaluateencodedbytesize (in. bitrate );
Out. pbuffer = DST;
Out. pcmtype. bitpersample = pinfo. pcmtype. bitpersample;
Out. pcmtype. sample_frequency = pinfo. pcmtype. sample_frequency;
Out. bitrate = pinfo. Params. modes. bitrate;
If (usc_noerror! = Usc_codec_fxn-> STD. Decode (
Huscdecoder, & in, & out ))
Return-1; Return out. nbytes;
} The audio capture/playback encoder uses a 16-bit linear PCM data input, which is a non-compressed pure binary code representation after the analog signal (such as voice) value is digitalized. The decoder uses the compressed data of the encoder as the input and outputs the original PCM file. This section describes how to use directsound * to capture and play the original PCM file at the desired sampling frequency (16 kHz GSM-AMR.
Microsoft directsound provides various APIs to capture and play audio content. The attached softphone application uses the sample code provided by the Microsoft platform SDK to capture and play audio. This section describes more detailed implementation information.
The method of audio capture is to create a circular buffer to store captured audio data in the original PCM format. Set and allocate the sampling rate, the size of each sampling bit (16 kHz, 16 bits), and the total size of the capture buffer during initialization. Directsound also provides a method to trigger event objects when a certain amount of audio data is captured in each buffer zone. Generally, audio capture is processed in a separate thread. The audio extraction thread regularly accesses these event objects to extract the captured audio data. The following section describes the control flow for capturing audio using directsound.
Directsound * audio capture thread control flow
The audio data captured by the Audio Data Extraction thread control flow needs to be periodically extracted (usually once each frame is extracted) to pass it to the encoder. You can use the timesetevent () API to periodically extract data every 20 ms. The extraction function is outlined in the following section.
Note that the buffer zone is cyclical, so the original PCM data is extracted in two stages, and the captured data may have been wrapped at the end of the allocated memory. Capture and extract functions are run in separate threads. Some notification events may not be accessed, and these events may be lost ;. In this case, the corresponding PCM data will be extracted from the next signal.
The audio playback control flow playback code works in a way similar to audio capturing. The playback buffer must be filled with the original PCM audio content from other speakers. After receiving the encoded data packet, it is passed to the decoder to extract the source PCM data. Lock the playback buffer, copy the PCM data to the buffer, and then play the video. The impact of jitter should be considered. The method is to maintain a threshold value before playing PCM data.
The Network Layer example application uses Windows Sockets with TCP/IP as the network to transmit voice packets between nodes. Softphone allows Multi-User Conference and supports GSM-AMR codecs of various bit rates. The initiator (or host) acts as a server and listens to all incoming calls on the specified port. Connect other VoIP speakers to the host on the specified port. The host waits for the incoming connection until it times out. The host broadcasts the IP addresses of all connection nodes after the timeout period. Then create a star network that connects all VoIP participants to each other.