Android source code analysis: VoIP

Source: Internet
Author: User
Tags mremote

Overview

The voip function of Android is available in the directory frameworks/base/voip. It includes a package that supports rtp.

RTP support

The RTP support package is located in the directory frameworks/base/voip/java/android/net/rtp. It mainly contains four Java classes: it represents RTP-based RtpStream, RTP-based voice stream AudioStream, AudioCodec that describes the voice Codec information, and AudioGroup of The Voice session group ,.

RTP stream: RtpStream

It is a data stream based on RTP (Real-time Transport Protocol) Protocol. The Java-layer API class is android.net. rtp. RtpStream, representing a stream that sends and receives network multimedia data packets through the RTP protocol. A stream mainly includes the local network address and port number, remote host network address and port number, socket number and stream mode.

RtpStream supports three stream modes, which can be set by the setMode function:

MODE_NORMAL: normally receives and sends data packets.

MODE_SEND_ONLY: Send only data packets

MODE_RECEIVE_ONLY: only receives data packets

The local host IP address (InetAddress, Supporting IPv4 and IPv6) is passed in when the constructor is called. In the constructor, the create FUNCTION implemented by the native layer is called to obtain a local host port number (according to RFC 3550). At the same time, the create function of the native layer also gets a socket connection number, the socket number is updated to the Java class instance in the native layer.

The remote host address and port number are specified by the associate function:

Public void associate (InetAddress address, int port)

The process of obtaining the socket number is as follows:

A private member integer variable of android.net. rtp. RtpStream stores the socket Number:

Private int mNative;

It is stored in the JNI layer after a socket number is obtained after the socket function is called. The variables identified in the JNI layer (frameworks/base/voip/jni/rtp/RtpStream. cpp) are:

JfieldID gNative;

Obtain the specific value in the registerRtpStream function:

(GNative = env-> GetFieldID (clazz, "mNative", "I") = NULL |

In the create function of the jni layer, the socket function is called to obtain a socket Number:

Int socket =: socket (ss. ss_family, SOCK_DGRAM, 0 );

This socket number will be assigned to the mNative variable of the Java layer:

Env-> SetIntField (thiz, gNative, socket );

The port number is directly returned by the create FUNCTION. The create FUNCTION supports IPv6.

Voice Stream: AudioStream

Android.net. rtp. AudioStream is inherited from RtpStream and represents a voice stream that is built on the RTP protocol to communicate with the other party. A speech Codec is used to describe the corresponding Codec information for a speech stream. Before a call is established, the voice stream must be added to the session Group android.net. rtp. AudioGroup. Therefore, it contains the speech group, speech Codec, and DTMF (Dual-Tone Multi-Frequency) type (RFC 2833.

Speech Codec: AudioCodec

An android.net. rtp. AudioStream requires an android.net. rtp. AudioCodec for its codec. Java-layer AudioCodec only describes the Codec Information Class, mainly including three types of information:

Public final int type; // The RTP payload type of the encoding.

Public final String rtpmap; // The encoding parameters to be used in the corresponding SDP attribute.

Public final String fmtp; // The format parameters to be used in the corresponding SDP attribute.

You can use AudioCodec. getCodec to easily obtain a Codec:

Public static AudioCodec getCodec (int type, String rtpmap, String fmtp)

For ease of use, Android defines several commonly used Codec: PCMU, PCMA, GSM, GSM_EFR, and AMR in AudioCodec.

Voice group: AudioGroup

Android.net. rtp. AudioGroup represents a session, which may be a conversation between two people or a teleconference between two or more people. There can be multiple sets of sessions at the same time, because microphones and speakers can only be exclusively used, so only one group of sessions is active, and others must be in the HOLD state.

The Voice Group maintains the voice streams added to the Group through a ing table:

Private final Map mStreams;

The process for adding an AudioStream to AudioGroup is as follows:

First, AudioStream joins an AudioGroup by calling join:

Public void join (AudioGroup group)

Then call AudioGroup. add and then call:

Private native void nativeAdd (int mode, int socket, String remoteAddress, int remotePort, String codecSpec, int dtmfType );

Among them, the first four parameters: mode, socket number, remote address, and remote port number come from the information in the RTPStream parent class of the voice stream. codecSpec comes from the three types of Codec information corresponding to the voice stream, the last parameter dtmf type also comes from the voice stream.

In the JNI layer (frameworks/base/voip/jni/rtp/AudioGroup. in the add function of cpp, first save the remote network address and port number to the addr_storage structure [TODO: see UNIX socket programming:

Sockaddr_storage remote;
If (parse (env, jRemoteAddress, remotePort, & remote) <0) {// obtain the address through traversal and store it in sockaddr_storage.
// Exception already thrown.
Return;
}

Then get the codec information and create a native layer AudioCodec:

Sscanf (codecSpec, "% d % 15 [^/] % * c % d", & codecType, codecName, & sampleRate );
Codec = newAudioCodec (codecName); // create the corresponding native layer Codec Based on the name

Then create an native-layer audio stream AudioStream:

// Create audio stream.
Stream = new AudioStream; // creates a voice stream
If (! Stream-> set (mode, socket, & remote, codec, sampleRate, sampleCount, codecType, dtmfType) {// set relevant information to the voice stream
JniThrowException (env, "java/lang/IllegalStateException", "cannot initialize audio stream ");
Goto error;
}

Finally, obtain or create the AudioGroup of the native layer and add the AudioStream of the native to the AudioGroup of the native layer:

// Create audio group.
Group = (AudioGroup *) env-> GetIntField (thiz, gNative );
If (! Group) {// If the Java-layer AudioGroup does not have a Group corresponding to the native layer. Note: The Java layer calls add multiple times and only executes the following code for the first time.
Int mode = env-> GetIntField (thiz, gMode );
Group = new AudioGroup; // create a native AudioGroup
If (! Group-> set (8000,256) |! Group-> setMode (mode) {// For details, refer to the subsequent explanations of the two functions.
JniThrowException (env, "java/lang/IllegalStateException ",
"Cannot initialize audio group ");
Goto error;
}
}

// Add audio stream into audio group.
If (! Group-> add (stream) {// add native stream to Group
JniThrowException (env, "java/lang/IllegalStateException ",
"Cannot add audio stream ");
Goto error;
}
// Succeed.
Env-> SetIntField (thiz, gNative, (int) group); // convert the native Group pointer to an integer variable and store it in the member variable of the Java instance.

Let's take a closer look at the above processes. When creating the AudioCodec on the native layer, the system obtains the corresponding creation function based on the saved name in the preset array of the query, and calls the creation function to obtain AudioCodec. The default array in AudioCodec. cpp is as follows:

Struct AudioCodecType {// struct Definition
Const char * name; // Codec name
AudioCodec * (* create) (); // corresponding creation Function
} GAudioCodecTypes [] = {// Global Array
{"PCMA", newAlawCodec}, // G.711 a-law Speech Encoding
{"PCMU", newUlawCodec}, // G.711 u-law Speech Encoding
{"GSM", newGsmCodec}, // GSM full-speed voice encoding, also known as GSM-FR, GSM 06.10, GSM, or FR
{"AMR", newAmrCodec}, // adaptive multi-rate Narrowband Speech Encoding AMR or AMR-NB, currently does not support CRC verification, robust sorting, and interleaving ), for more features, see RFC 4867.
{"GSM-EFR", newGsmEfrCodec}, // enhanced GSM full rate voice encoding, also known as GSM-EFR, GSM 06.60 or EFR
{NULL, NULL },
};

These C ++ implemented Codec are inherited from AudioCodec and implement its set, encode, and decode functions, such:

The encode and decode functions are used for encoding and decoding. The set function is used to set relevant information to Codec.

The native implementation of the add function calls the set function after creating the voice stream at the native layer and sets the relevant information to AudioStream. The information includes: mode, socket, remote address, sampling rate, number of samples, decoder type, and DTMF type. In the set function, another important operation is to allocate memory for the Jitter Buffer (Jitter Buffer. Due to network congestion, time drift, or route changes, most of the network data packets do not arrive even. To make the voice not distorted, the jitter buffer collects data packets and sends them evenly to the speech processor, in this way, the voice of the other party can be clearly played back.

The native implementation of the add function finally creates the AudioGroup object. When the AudioGroup object is created, two threads NetworkThread and DeviceThread are created, and then the set function and setMode function of the AudioGroup are called; finally, add the AudioStream object to the ing table maintained by AudioGroup.

In the set function of AudioGroup, call epoll_create to create a round robin descriptor, and then call socketpair to obtain a connected socket pair. The first socket pair [0] is assigned to the member variable mDeviceSocket, the second socket pair [1] is used for the newly created AudioStream:

Bool AudioGroup: set (int sampleRate, int sampleCount)
{
MEventQueue =Epoll_create(2); // create a round robin descriptor to monitor IO events on the socket
If (mEventQueue =-1) {// if creation fails
LOGE ("epoll_create: % s", strerror (errno ));
Return false;
}

MSampleRate = sampleRate;
MSampleCount = sampleCount;
// Create device socket.
Int pair [2];
If (Socketpair(AF_UNIX, SOCK_DGRAM, 0, pair) {// obtain the connected socket pair
LOGE ("socketpair: % s", strerror (errno ));
Return false;
}
MDeviceSocket = pair [0];
// Create device stream.
MChain = new AudioStream; // creates a Device stream)
If (! MChain-> set (AudioStream: NORMAL, pair [1], NULL, NULL,
SampleRate, sampleCount,-1,-1) {// The device stream AudioStream uses the second socket of the socket pair, no remote address, no Codec
Close (pair [1]); // closes the second socket if a failure occurs.
LOGE ("cannot initialize device stream ");
Return false;
}

// Give device socket a reasonable timeout.
Timeval TV;
TV. TV _sec = 0;
TV. TV _usec = 1000 * sampleCount/sampleRate * 500; // calculate the timeout value
If (Setsockopt(Pair [0], SOL_SOCKET, SO_RCVTIMEO, & TV, sizeof (TV) // set the timeout value for the first socket {
LOGE ("setsockopt: % s", strerror (errno ));
Return false;
}

// Add device stream into event queue.
Epoll_event event; // round-robin event
Event. events = EPOLLIN; // monitor read events
Event. data. ptr = mChain;
If (Epoll_ctl(MEventQueue, EPOLL_CTL_ADD, pair [1], & event) {// register the second socket so that epoll can monitor its read events
LOGE ("epoll_ctl: % s", strerror (errno ));
Return false;
}

// Anything else?
LOGD ("stream [% d] joins group [% d]", pair [1], pair [0]);
Return true;
}

Echo suppression EchoSuppressor

Echo suppression is implemented in C ++ class EchoSuppressor. It reduces the call echo based on a certain algorithm, but is not completely eliminated. In the run function. In the code comment of EchoSuppressor, it is described.

DeviceThread thread

The task of thread DeviceThread deals with the sound IO of the device. It uses AudioTrack to play the sound of the other party (output, output), and uses AudioRecord to record its own sound (input, input ). After declaring the local variables of AudioRecord and AudioTrack, it sets the parameters of AudioTrack and AudioRecord, and then sets the socket mDeviceSocket (that is, a socket pair [0] In the set function of AudioGroup). then, check whether the platform supports AEC sound effects. If yes, create the sound effect. If not, use echo suppression; then let AudioRecord and AudioTrack start to work (call their start function); finally, enter the while loop. In the while LOOP, use recv to receive data, and then send the data to (memcpy) AudioTrack. At the same time, the audio data entered in AudioRecord is also sent out through sending.

MDeviceSocket of AudioGroup

The mSocket of the device Audiostream

Send

AudioRecord

Pair [0]

Pair [1]

Socket pair

Recv

AudioTrack

The following code snippet is taken from the threadLoop function of the thread DeviceThread. It first receives the data sent from the other end of the socket to the output buffer, then obtain a writable buffer from AudioTrack, copy the data in the output buffer to the buffer of AudioTrack, and send it to AudioFlinger for playback:

Int16_t output [sampleCount];
If (Recv(DeviceSocket, output, sizeof (output), 0) <= 0) {// receive data from the other end of the socket pair
Memset (output, 0, sizeof (output ));
}
... // Omit Part of the Code

Status_t status = track. obtainBuffer (& buffer, 1 );
If (status = NO_ERROR ){
Int offset = sampleCount-toWrite;
Memcpy(Buffer. i8, & output [offset], buffer. size); // copy the received data to the AudioTrack buffer and send it to AudioFlinger for playback.
ToWrite-= buffer. frameCount;
Track. releaseBuffer (& buffer );

Correspondingly, in a thread loop, it uses AudioRecord to obtain PCM Data in the input device (via AudioFlinger), after Echo suppression (if sound effects are not used) processing, send local audio data. The main code snippets are as follows:

Status_t status = record. obtainBuffer (& buffer, 1 );
If (status = NO_ERROR ){
Int offset = sampleCount-toRead;
Memcpy (& input [offset], buffer. i8, buffer. size); // copy data from AudiRecord (from AudioFlinger side) to the input buffer
ToRead-= buffer. frameCount;
Record. releaseBuffer (& buffer );

... // Omit Part of the Code

If (mode! = MUTED) {// if there is no mute
If (echo! = NULL) {// If echo suppression is used
LOGV ("echo-> run ()");
Echo-> run (output, input); // suppress the output of the Peer to obtain the new recording input.
}
Send (deviceSocket, input, sizeof (input), MSG_DONTWAIT); // send the recorded sound to the other end of the socket pair
}

NetworkThread

The NetworkThread is mainly used to call the encode Function Code of AudioStream, then send the data, send the DTMF event, and call the decode function to receive and decode the data. Its threadLoop function is as follows:

Bool AudioGroup: NetworkThread: threadLoop ()
{
AudioStream * chain = mGroup-> mChain;
Int tick = elapsedRealtime ();
Int deadline = tick + 10;
Int count = 0;
For (AudioStream * stream = chain; stream = stream-> mNext ){
If (tick-stream-> mTick> = 0 ){
Stream-> encode (tick, chain); // encode and send data
}
If (deadline-stream-> mTick> 0 ){
Deadline = stream-> mTick;
}
++ Count;
}

// Call each AudioStream to send DTMF
Int event = mGroup-> mDtmfEvent;
If (event! =-1 ){
For (AudioStream * stream = chain; stream = stream-> mNext ){
Stream-> sendDtmf (event );
}
MGroup-> mDtmfEvent =-1;
}

Deadline-= tick;
If (deadline <1 ){
Deadline = 1;
}
Epoll_event events [count];
Count =Epoll_wait(MGroup-> mEventQueue, events, count, deadline); // wait for the read event to occur. Each AudioStream registers its own socket for listening.
If (count =-1 ){
LOGE ("epoll_wait: % s", strerror (errno ));
Return false;
}
For (int I = 0; I <count; ++ I ){
(AudioStream *) events [I]. data. ptr)-> decode (tick); // receives and decodes data.
}
Return true;
}

Encode, the sending function of AudioStream Encoding

Void AudioStream: encode (int tick, AudioStream * chain)
{

If (tick-mTick> = mInterval ){
// We just missed the train. Pretend that packets in between are lost.
Int skipped = (tick-mTick)/mInterval;
MTick + = skipped * mInterval;
MSequence + = skipped;
MTimestamp + = skipped * mSampleCount;
LOGV ("stream [% d] skips % d packets", mSocket, skipped );
}
Tick = mTick;
MTick + = mInterval;
++ MSequence;
MTimestamp + = mSampleCount;

// If there is an ongoing DTMF event, send it now.
If (mMode! = RECEIVE_ONLY & mDtmfEvent! =-1) {// the following code sends DTMF
Int duration = mTimestamp-mDtmfStart;
// Make sure duration is reasonable.
If (duration> = 0 & duration <mSampleRate * DTMF_PERIOD ){
Duration + = mSampleCount;
Int32_t buffer [4] = {// fill in 32 bytes of data
Htonl (mDtmfMagic | mSequence ),
Htonl (mDtmfStart ),
MSsrc,
Htonl (mDtmfEvent | duration ),
};
If (duration> = mSampleRate * DTMF_PERIOD ){
Buffer [3] | = htonl (1 <23 );
MDtmfEvent =-1;
}
Sendto(MSocket, buffer, sizeof (buffer), MSG_DONTWAIT,
(Sockaddr *) & mRemote, sizeof (mRemote); // send the above data to a remote
Return;
}
MDtmfEvent =-1;
}


Int32_t buffer [mSampleCount + 3]; // store the buffer for data to be sent: the number of samples plus 3 bytes
Bool data = false;
If (mMode! = RECEIVE_ONLY ){
// Mix all other streams.
Memset (buffer, 0, sizeof (buffer ));
While (chain ){
If (chain! = This ){
Data | = chain-> mix (buffer, tick-mInterval, tick, mSampleRate); // data mixing within a sampling interval, from data in the jitter buffer
}
Chain = chain-> mNext;
}
}

Int16_t samples [mSampleCount];
If (data) {// convert 32-bit buffer data to a 16-bit Buffer
// Saturate into 16 bits.
For (int I = 0; I <mSampleCount; ++ I ){
Int32_t sample = buffer [I];
If (sample <-32768 ){
Sample =-32768;
}
If (sample & gt; 32767 ){
Sample = 32767;
}
Samples [I] = sample; // value assignment
}
} Else {
If (mTick ^ mKeepAlive)> 10 = 0 ){
Return;
}
MKeepAlive = mTick;
Memset (samples, 0, sizeof (samples ));

If (mMode! = RECEIVE_ONLY ){
LOGV ("stream [% d] no data", mSocket );
}
}

If (! MCodec) {// The AudioStream device does not have the corresponding codec
// Special case for device stream.
Send (mSocket, samples, sizeof (samples), MSG_DONTWAIT); // send the data in the buffer samples to the first socket in the socket pair.
Return;
}


// Cook the packet and send it out.
// Add the header data
Buffer [0] = htonl (mCodecMagic | mSequence );
Buffer [1] = htonl (mTimestamp );
Buffer [2] = mSsrc;
Int length =MCodec-> encode(& Buffer [3], samples); // call the encoder to encode the audio data (excluding the first three bytes)
If (length <= 0 ){
LOGV ("stream [% d] encoder error", mSocket );
Return;
}
Sendto(MSocket, buffer, length + 12, MSG_DONTWAIT, (sockaddr *) & mRemote, sizeof (mRemote); // send to remote host

}

Normal AudioStream

"Normal" is a Device Stream that interacts with a remote host.

For normal AudioStream, In the encode function, first add the header, then encode it, and then send it to the remote host:

// Cook the packet and send it out.
Buffer [0] = htonl (mCodecMagic | mSequence );
Buffer [1] = htonl (mTimestamp );
Buffer [2] = mSsrc; // Add a header
Int length =MCodec-> encode(&Buffer[3], samples); // use Codec Encoding
If (length <= 0 ){
LOGV ("stream [% d] encoder error", mSocket );
Return;
}
Sendto(MSocket,Buffer, Length + 12, MSG_DONTWAIT, (sockaddr *)&MRemote, Sizeof (mRemote); // send it to the remote host

For normal AudioStream, In the decode function, first, the data is received and stored in the buffer array, and then the packet header is unbound, and then decoded using Codec:

_ Attribute _ (aligned (4) uint8_tBuffer [2048]; // Buffer Array
Sockaddr_storage remote; // What is mRemote?
Socklen_t addrlen = sizeof (remote );
Int length =Recvfrom(MSocket,Buffer, Sizeof (buffer), MSG_TRUNC | MSG_DONTWAIT, (sockaddr *)&Remote, & Addrlen); // receives remote host data and stores it in the buffer Array buffer.
// Do we need to check SSRC, sequence, and timestamp? They are not
// Reliable but at least they can be used to identify duplicates?
// The following part of the code is to parse the header
If (length <12 | length> (int) sizeof (buffer) | (ntohl (* (uint32_t *) buffer) & 0xC07F0000 )! = MCodecMagic ){
LOGV ("stream [% d] malformed packet", mSocket );
Return;
}
Int offset = 12 + (buffer [0] & 0x0F) <2 );
If (buffer [0] & 0 × 10 )! = 0 ){
Offset + = 4 + (ntohs (* (uint16_t *) & buffer [offset + 2]) <2 );
}
If (buffer [0] & 0 × 20 )! = 0 ){
Length-= buffer [length-1];
}
Length-= offset;
If (length> = 0 ){
Length = mCodec-> decode (samples, count, & buffer [offset], length); // use Codec to decode and store the decoded sample data in the samples array.
}
If (length> 0 & mFixRemote ){
MRemote = remote;
MFixRemote = false;
}
Count = length;

Finally, the decoded data is stored in the mBuffer:

For (int I = 0; I <count; ++ I ){
MBuffer [tail & mBufferMask] = samples [I]; // assign a value to the mBuffer Array
++ Tail;
}

Device Stream)

In the decode function, for the device stream AuioStream, pair [1] socket receives data, that is, the data sent from pair [0], that is, the data sampled from the local recording, stored in the buffer samples array.

Int16_t samples [count];
If (! MCodec ){
// Special case for device stream.
Count = recv (mSocket, samples, sizeof (samples ),
MSG_TRUNC | MSG_DONTWAIT)> 1;
}

Finally, the data is stored in the mBuffer Buffer:

// Append to the jitter buffer.
Int tail = mBufferTail * mSampleRate;
For (int I = 0; I <count; ++ I ){
MBuffer [tail & mBufferMask] = samples [I];
++ Tail;
}

For Device AudioStream, In the encode function, the socket pair [1] sends data, receives data in the Peer socket pair [0] (In DeviceThread), and sends it to AudioTrack for playback.

If (! MCodec ){

// Special case for device stream.

Send (mSocket, samples, sizeof (samples), MSG_DONTWAIT );

Return;

}

TODO: the network-side stream is received and decoded by the common AudioStream (decode function), stored in the jitter buffer, and then mixed by the encode function in the device stream, sent to pair [0], that is, the DeviceThread thread, and then sent to AudioTrack for playback. On the contrary, in the DeviceThread thread, the sound data from AuioRecord is, the socket pair [1] receiving (decode function) in the device stream is stored in the buffer mBuffer and sent by the encode function of the normal stream. TODO: How to Use mBuffer?

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.