VoIP in-depth: An Introduction to the SIP protocol, part 2

Last Update:2018-12-05 Source: Internet

Author: User

Tags rfc

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Document directory

Offer-answer

In Part 1 of our sip primer, I covered the SIP Foundation layers starting from the message structure and ending with the SIP transactions. we saw how phone registrations and proxies cocould work using these layers. this second part completes the discussion by covering the way sip defines cals, and in general, any type of communication. naturally, this installment is built on the previous part, and therefore you shoshould read Part 1, or at least have some prior knowledge, before proceeding with Part 2. similar to the previous installment, I will also refer here to the latest specs that influenced the basic sip scenarios.

SIP dialogs

The following INVITE message does not just start a new transaction, it is also a request to start a new dialog.

INVITE sip:hannibal@arstechnica.com SIP/2.0 Via: SIP/2.0/UDP home.mynetwork.org;branch=z9hG4bK8uf35f To: Jon Stokes <sip:hannibal@arstechnica.com> From: Gilad <sip:gilad@voxisoft.com>;tag=n23ycs Call-ID: nbo34tsggvsqap@home.mynetwork.org CSeq: 59164 INVITE Contact: sip:gilad@voxisoft.com Max-Forwards: 70

A 2XX response opens the dialog. the client that originally sent the invite wocould send an ACK request to confirm that it has ed the 2XX response. however, we'll see that this is specific for invite and not for dialogs in general. SIP dialogs are not limited for the invite method only. extensions may also use define methods that initiate a dialog.

Clients are expected to maintain the state of the dialogs. (as we saw in the first part, proxies along the signaling path do not maintain dialog State). Each dialog holds the following information:

Call-ID
Local tag
Remote tag
Local URI
Remote URI
Remote target
Route-Set
Local CSeq
Remote CSeq
A boolean flag called "secure"

The first three values identify the dialog. the dialog initiator chooses a call-ID and places the value in its header. the initiator also chooses a random local tag and places it as a parameter of the "from" header (the "to" header remains without a tag ). the device that accepts this request refers to the tag in the "from" header of the request as the dialog's remote tag. the specified er then creates an additional tag and places it as a parameter in the "to" header of the response. the initiator sees the tag value in the "to" header and refers to it as the dialog's remote tag.

When one party sends a dialog request, several different 2XX responses may arrive. this multiple-response situation occurs when a proxy forks the request and several devices answer. proxies cannot interrupt with 2XX responses, as they are not aware of dialogs. hence, all these responses propagate back to the one who sent the request. when you receive several of these responses, inclutively, several dialogs were created based on that single request. these dialogs each have a different identifier, even at the source, as the remote tag is unique for each of these dialogs. any subsequent requests on a specific dialog contain the same identifiers as the ones established in the handshake process.

The contact of the Request becomes the other end's "remote target. "However, the initial request URI is not necessarily the initiator's remote target. when it has es the 2XX response, it also has es the actual remote target via the response's contact header. thus, if one sent a request to an AOR, the response may come back with a contact address that is specific for the device, at least for the lifetime of the dialog. ack, and all subsequent requests, wocould place in its request URI the dialog's remote target. therefore, if a proxy previusly forked a request to an AOR, it wocould not do that for subsequent requests, as this time the request URI is different.

Dialogs also hold a route-set. this route-set is a list of SIP Uris and its goal is to contain all the proxies that route all requests on the dialog. the proxies themselves build the route-set, but do not store it internally. each proxy that routes the first request sends not only an additional "via" header, but also a "record-route" header. when the request has reached its destination, it has a list of URIs within the route-set. before sending a positive response to the request, the device stores the list in its internal dialog state and sends the same headers, in the same order on the response.

Responses are routed based on their "via" headers (which are also copied as-is from the request to the response ), and thus proxies do not add or remove any response "record-route" header. the initiator that also maintains its own dialog internal state also stores this list of URIs, but this time in reverse order (since the first proxy that added this header actually has the last of the headers in the response ). subsequent requests have this route-set copied to a "Route" header. "Route" is different than "record-route "; it tells proxies to route the request to a specific destination and not base it upon any other internal routing rules (as they did the first time ).

A proxy that has its own address in the top Route Header wocould remove itself from the request it sends out, And wocould resolve the IP address of the outgoing request from the next Route Header. if it doesn' t have an additional Route Header, it wocould send it based on the request URI.

If you look at a route-set, you wocould notice all routes have an "LR" parameter. this parameter states that this is a loose route, specified tively meaning that the proxy is RFC 3261 compliant. proxies that comply with previous RFC have strict route rules and must have their own address in the request URI. thus, for backwards compatibility, one must change the request URI if the proxy does not specify it supports loose route.

Whenever one of the two dialog tasksends a request, it places the local tag in the "from" header of the request and the remote tag in the "to" header. when a response is sent, this is reversed, the local tag is placed in the "to" header and the remote tag goes to the "from" header. because one endpoint's local tag is the other's remote tag, the "from" and "to" tag parameters look the same. the same idea goes for the URIs in the "from" and "to" headers. these are mapped to "Local Uri" and "remote Uri" in a similar way.

The party creating a dialog chooses its first CSeq value, which becomes its local CSeq value, and the other's remote CSeq value. as previusly discussed, the response provided des the same CSeq value and therefore the other participant ipant's CSeq value only becomes known when it sends the first request. thus, when a dialog has been established, only one of the CSeq fields has value. someone sending a request on a dialog shocould first increment its local CSeq value by one and then send the request using this local CSeq value. this helps to know the order of the requests on a given dialog.

The last item in the dialog internal state is the secure flag. this flag simply indicates whether a device shoshould generate requests with encryption (for sip that means TLS ). the addresses of the remote target and the contact start with "sips:" and not "SIP :".

As you can see, both endpoints hold the same information in different fields. The following validation shows the relationship between the fields on both ends:

There must be a way to close the dialog, not just open it. for invite dialogs, the way you close a dialog is to send a bye request. obviusly, any of the two dialog particle may send bye, which correspond to one hanging up its phone. as with any request, you must respond to the bye, but in this case, even an error response (or timeout of the transaction) wocould prompt the sender of the bye to close the dialog. this is to prevent from the remote end from forcing a device to keep a dialog open with its State.

Another scenario that sip must support is to be able to cancel a call attempt. this happens when one party does not pick up the phone. the cancel method is used in this situation, but this method has quirks. cancel is unique due to the fact that it has the same branch value in the "via" header as the invite transaction, but in the CSeq part its method is cancel and not invite. this differentiates the two transactions, as both values identify a transaction. cancel also oddly does not have multiple "via" headers. each proxy authentication ing a cancel request wocould issue its own cancel request without the previous "via" headers. most importantly, cancel does not guarantee that the request has been canceled, even though most of the time you wocould get a 200 response for it. A successful response on cancel means that it reached its destination. only when you receive a 487 (request terminated) response does it mean the cancel request was honored. particle, it's quite possible that a 2XX response for the invite request was sent prior to sort the cancel request, and thus one wowould receive this response to the invite even though the cancel was sent. if this is the case, the only way to close the dialog at that point is to ack the 2XX response and send a bye request.

Cancel scenarios

Please note that not all requests are part of a dialog. A common misconception is that register creates a new dialog. register is just a transaction that does not initiate a dialog or take place within any dialog. this is despite the fact that you wowould usually find subsequent registrations with the same call-ID and CSeq value incremented by one. an outcome of this is that a register request does not create a bidirectional communication between the registering device and the Registrar, and therefore, based on the register message alone, the registrar cannot privilege y the registering device that a registration has expired prior to the original expiration time.

Signaling and media

So far, we 've focused most of our attention on the signaling part, but having a signaling protocol just to control the session is rather useless without the means to send the contents of the call. in a voice call, this is known as speech, and that's the media part. before being able to send and receive media, the parties must negotiate the media properties. why do you need media negotiation? For voice, the reason is that there are already different ways to represent the contents and compress it. this is similar to having several formats to play sound on your desktop (WAV, MP3 and Ogg), but in this case, the devices choose the format for the conversation. furthermore, VoIP does not necessarily mean it has actual voice.

From day one, VoIP has been aiming to not just replace the standard telephone system, but also enhance it, so it's possible to negotiate a media type that has nothing to do with sound. naturally, there is a protocol to negotiate the media type and representation, but you wowould usually find this negotiation encapsulated within the signaling messages and not sent separately. the reason for this is simple-as you create a session or modify a session, the media properties are also negotiated. thus we have the signaling divided to two parts: Session and media negotiation.

The media is also separated into two parts: Contents and control. once the session maid on the media attributes, they can start sending the contents. because VoIP operates on an IP network, the media is divided into packets. for voice, it's common that each packet represents 20 ms of sound. this means that in a conversation with two particle, you wocould send a packet every 20 ms, or 50 packets per second. there are ways to improve that rate by actually detecting Voice Activity and refraining from sending packets when the conversation contains only silence. this usually generated CES the traffic by 50 percent. 20 ms is not a necessity, and some choose to use 30 ms or even 10 ms, but 20 ms is by far most common.

As media packets are sent and stored Ed, the parties wocould like to get some feedback. since media contents are sent at a high rate, it makes no sense to acknowledge each packet that was received, so instead, each end party sends a report detailing some key statistics such as how many media packets it has sent and how many it has already ed. this helps the devices verify that the network is actually transferring all the packets and that the quality is acceptable.

There are several ways to calculate quality; most of these methods compare the time packets arrived to the time they shocould have arrived, and they also detect and count the number of packets that were lost.

VoIP elementsoffer-answer

The default media negotiation protocol is the Session Description Protocol or SDP. this is defined by RFC 4566. SDP is not used solely by SIP and thus has some fields that are irrelevant for SIP's case.

v=0 o=me 634962690 634962690 IN IP4 home.mynetwork.org s=- c=IN IP4 home.mynetwork.org t=0 0 m=audio 28534 RTP/AVP 0 a=rtpmap:0 PCMU/8000

We'll cover only those headers important to this discussion:

'M'-line: This defines a media stream, and each media stream has a type. in this case, it's an audio stream. the number that follows the type indicates the listening port. the port number is followed by the Protocol; in this case, one uses RTP, probably the most common protocol to use for audio/video. A list of possible codecs follows the "RTP/AVP" text to signify which codecs are supported. we mentioned that the media has the content, sent over RTP in this case, as well as the control that provides information, such as statistics. RTCP provides the media control and, by default, you receive the RTCP on the RTP Port + 1. for that reason RTP usually uses even port numbers and RTCP uses odd port numbers.
'A'-line: This is an attribute. attributes can be anything that describe either the whole SDP (appearing before the 'M'-line) or the 'M'-line itself. the 'a'-line in this example provides an attribute called rtpmap that matches a numeric value of the Codec in the 'M'-line to the actual registered codec value. another attribute, for example, may set the RTCP port value explicitly.
'C'-line: This is the IP address where one wishes to receive media. this does not have to be equal to the IP address processing ing the signaling. it is also possible to have a specific IP address for a media stream by adding a 'c'-line after the corresponding 'M'-line.

RFC 3264 defines the SDP offer-Answer Model for sip. the media properties are negotiated as the call is set up, during the call three-way handshake. the simplest scenario is to have the offer in the invite request and the answer in the 200 OK of the invite. the offer nodes all the media streams to set up, and each stream has the offered codecs and other attributes. the answer must have the same number of 'M' lines that were in the offer. if one of the media streams is not accepted, the answer will have '0' in the port number, thus disabling it. the answer also chooses only the codecs it supports out of the proposed codecs. naturally, if the device authentication ing the offer cannot establish a communication, it will reject the invite request.

A different scenario is to have the invite request without any SDP. in this case, the offer is expected to be in the 2XX response of the invite, and the answer to this offer wocould be In the Ack. this time, if media communication cannot be established, you cannot just ignore the 2XX response and cannot respond to it negatively. it shoshould ack the 2XX response with all media lines disabled, and then immediately send a bye.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More