[SIP Protocol] learning notes for beginners

Last Update:2018-12-06 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Author: gnuhpc
Source: http://www.cnblogs.com/gnuhpc/

1. How is the SIP coming and how is it constructed?

In general, SIP is a lightweight signaling protocol, which can be used as a signaling for audio, video, and timely information.

Speaking of how the SIP is coming out, we need to mention H.323, and the standard has to mention ITU-T, we will first talk about the IETF (Internet Engineering Task Force) that specifies the SIP) something interesting with the International Telecommunications Union-telecommunications standard sector, an H.323 ITU-T. ITU-T and IETF think things are always different, they are often from two different angles. As an International Organization for Standardization (ISO) Affiliated to the United States, ITU-T inherits the consistency of American policies and tries to fulfill the role of safeguarding world peace, therefore, the criteria it has developed must have gone through numerous rounds of repetition and drafts, and it took N years to come up with a final and widely accepted result. On the contrary, IETF is closer to pragmatism and believes in "rough consensus and Running code", that is, it can be used and supplemented by edge. Therefore, in most cases, the IETF standard-setting cycle is shorter than the ITU-T, which may also be attributed to three open forums each year. These spring, summer, and autumn meetings will be arranged around the world and open to interested organizations (Take A Look: http://www.ietf.org/meetings/meetings.html) multiple IETF working groups gathered at these meetings to finalize some technical details on their current system development, if the problem is still unsolved at the end of the meeting, continue the study at the next meeting several months later. In short, IETF can improve the system architecture and protocol design from a practical perspective.

After talking about these things, you may not realize why I'm pulling them, but remember that these are the core ideas embodied in the Session Initiation Protocol architecture, extension. The structure of SIP is based on two common protocols: Simple Mail Transfer Protocol in RFC 2821, which defines the Message format of an email, and the HTTP protocol (Hypertext Transfer Protocol) defined in RFC 2616-It defines Web-based multimedia communication messages. In addition, SIP uses the RTP/RTCP protocol (Real Time Transport Protocol/real time control protocol) defined in RFC 3550. It defines the multimedia package format on the IP network, also, the Session Description Protocol defined in RFC 2327 is used to define the parameters and features of a multimedia session. Therefore, SIP is built on the protocol proposed by other IETF, which is a bit like the Protocol established by H.225.0 and H.245 ITU-T, the two basic RFC versions are version 1.0 rfc2543 and Version 2.0 rfc3261. Of course, SIP also runs on other IETF-defined transmission protocols, such as TCP (Transport Control Protocol), UDP (User datasync protocol), and IP (Internet Protocol. In this way, so many well-known and widely used protocols provide sip with simple and clear features Over H.323.

I saw three mindmaps on the Internet and thought they could express the basic core idea of SIP:

Regarding the SIP architecture, we must know that this is not a new thing. You can see what it is based on. How far can this new Dongdong architecture be created? My rough understanding is that the main architecture of SIP is actually a typical C-S architecture: a Client client defined in rfc3261 as a network element that sends a SIP request and receives a SIP response, this client may or may not interact with people. Correspondingly, a server is a network element that accepts and responds to a SIP request. For example, a typical SIP request is an invite that invites a user or server to participate in a session. If a positive response is received, a response to success is returned. If we look at this further, we can segment such a simple classification:

There are two types of clients:

User Agent clientUser Agent client: It is a logical function that creates a request and sends a request using some specific functions of this feature entity.

ProxyProxyIt is an intermediate device,AsClient, AlsoServer To interpret, translate, and rewrite a request, and forward the request to other servers to complete the routing function. Sometimes there are stateful and stateless proxies. The so-called stateful proxy is used to make different processing based on different situations. These processing have a positive correlation. For example, if three people send numbers, A needs to pass a number to C through B. In one case, whatever a says, B says the same number to C, which is called stateless proxy; another State is that if a says 1, B will pass to the C Number 2. If a says 2, B will pass to the C number 3, which is a stateful proxy. Some materials have b2bua, which is essentially similar to proxy, but more flexible.

Server Category 3:

User Proxy ServerUser Agent Server:It is a logical function that responds to a request.

Redirect ServerRedirect Server:One server redirects the client request to another server to complete the request.

Registration ServerRegistrar: A server that accepts the register registration request and places the information on the server.

Note: What we often see in the book is a general description of UA. The reason why UAS and UAC are distinguished logically is not necessarily an independent physical entity.

2. Where should I find the sip? What can I do?

To learn about sip, we need to know its position and have a macro understanding to know the relationship between N multiple protocols:

Since SIP is based on stmp and HTTP according to RFC 3261, and IP, UDP, and TCP are used at the underlying layer, it is an application layer protocol, that is, it can provide services for end users, you can see and touch it. The unit of the SIP application service is Session, which is the word session in English. It means that information is exchanged sequentially between two or more participants, SIP must first assume the role of session establishment, and then manage sessions in communication. There are three points of attention:

A. More than two participants mean that the call may be multi-point, not just point-to-point.

B. the end user may not always initiate a call from the same location. We need to add the End User tracking function.

C. End users may use mixed media types such as text, audio, and video. These types have different requirements and restrictions on network bandwidth and maximum transmission latency. Sip also needs to handle this effectively.

For the above concerns, RFC 3261 mainly defines the SIP multimedia session management capabilities in five aspects:

· User location management: determines which end system is used for this communication.

· User availability: determines whether the called end is willing to participate in the communication.

· User Capacity: determines the media used for this communication and its parameters.

· Session creation: establishes session parameters between the called and the caller.

· Session management: includes transferring and terminating sessions, modifying session parameters, and calling session services.

3. How does sip work?

Since SIP is based on SMTP and HTTP, its message format is also very similar. However, it is noted that the resources of a sip session are communication resources, not page or webpage resources, this is different from HTTP. An identity or addressing system must be established before the request/response set is established. The identity is the so-called sip uri (SIP Uniform Resource indicator), which contains sufficient information to initialize a session. Examples of resources using this identifier (provided in RFC 3261) include: users with online businesses and one mailbox in a messaging system, A group of logical users (such as sales departments) in an organization, a PSTN phone number, and so on. The sip uri is similar to the email address, which is also based on the SMTP protocol. It consists of two parts: the first part is the user name and the second part is the host name,SIP: huangpc@bupt.cn,This is the most common format introduced in RFC 2543..Of course, there are other formats for the SIP Uri, such as the secure sip URI introduced in RFC 3261:Sips: huangpc@bupt.cn,This is a way to use TLS on TCP as a secure transport layer.

After the user ID is defined, we can define the Request ID.-- SipCalled method (Method). Other extension methodsRFCAre defined.

· Register: Used to register with the SIP server.

· Invite: used to indicate that the user or server is invited to participate in this session. The message body contains a description of the session on the called end.

· Ack: only the invite request is used, indicating that the request is received.

· Cancel: Used to cancel a pending request.

· Bye: sent by the user agent client to tell the server that it wants to end the call.

· Options: query its capabilities from the server.

After defining the request, we naturally need to respond to the language specification: including the status code and descriptive phrase. There are six types:

· 1xx: the temporary response indicates that the request has been received and is being processed.

· 2XX: a successful response indicates that the action has been received, understood, and accepted.

· 3xx: Redirection response. Further actions are required to process the request.

· 4xx: client error response. The request syntax is incorrect and cannot be accepted by the server.

· 5xx: server error response. The server cannot process this valid request.

· 6xx: A Global error response. This request cannot be accepted by any server.

You may be wondering about the establishment and removal of SIP sessions. But how do you know the text, audio and video formats to be transmitted in this session? This information is contained in invite. The format of this information introduces another RFC, RFC 2327, and Session Description Protocol (SDP )).

Like other protocols, SIP has the following requirement: both ends of a session must have sufficient information exchange at the beginning. The two protocols used are SAP (Session announcement protocol) defined in RFC 2974 and SDP (Session Description Protocol) defined in RFC 2327 ). To put it simply, SAP provides a mechanism to regularly promote multimedia sessions and transmit relevant session information to interested participants. It is used to support mbone (Internet multicast backbone), so interested parties will clearly guide some ongoing sessions. SDP defines the format of a communication session. It can also be used for different transmission protocols, such as SAP, sip, HTTP, or other transmission protocols. Note that the SDP carrier is sip. RFC 2327 specifically specifies some key information that SDP can provide:

Session name and purpose.

The time when the session is activated.

Media that constitutes a session.

How to receive these media (addresses, port numbers, formats, etc)

Other information is optional, such as the bandwidth used by the Meeting and the contact information of the person responsible for the session.

Speaking of these, you may think it is very abstract. Let's take a look at the example in RFC 2327:

V = 0
O = mhandley 2890844526 2890842807 in ip4 126.16.64.4
S = SDP Seminar
I = a Seminar on the Session Description Protocol
U = http://www.cs.ucl.ac.uk/staff/M.Handley/sdp.03.ps
E = mjh@isi.edu (Mark Handley)
C = in ip4 224.2.17.12/127
T = 2873397496 2873404696
A = recvonly
M = audio49170 RTP/AVP 0
M = video 51372 RTP/AVP 31
M = Application 32416 UDP WB
A = Orient: Portrait

We can see that the SDP format contains multiple lines of text, all written in the = format, the asterisk in RFC * refers to the selection item. We have three types of session descriptions: Session Description, time description, and media description. For more information, see RFC 2327. Note that in this example, two rows starting with M define audio and video summaries, which are in section 3550 of RFC 13th Real Time Protocol (RTP) and RFC 3551RTP profile for audio and video conferences with minimal controlIn section 6th, the last 0 and 31 are encoded. This is the load type value to be used in subsequent RTP frames to determine the media and encoding types. Both 49170 and 51372 are the receiver ports, and the sender ports Add 1 respectively. That is to say, in this example, 49171 and 51373 are the sender ports.

4. How does sip work?

The simplest example is to establish two directly connected end-to-end calls in a SIP call. The initiator initiates an INVITE message to the Peer to initiate a session, and then receives the ringing and OK messages. The ack returned by the called end indicates that the connection is complete and information can be exchanged. When this connection is not required, any end sends a bye message to the peer end, and the peer returns OK to terminate the call.

Note,SIPMessages and specific media streams do not work at one level. For exampleVoIPThe call passes firstSIPAfter the signaling completes the interaction, start the transmission of the specific media stream, for example,SIPThe fundamental role of the completion of point-to-point (or multi-point) Media Stream Transmission in the forward order of work.

A complex example is described in RFC 3261: the proxy server is used as the communication path. The SIP Proxy Server initiates a request on behalf of other clients. In many cases, as the routing mode, the SIP request is forwarded to another device that is closer to the final destination (that is, the called end. Therefore, the SIP Proxy Server plays two roles: the server role when receiving the request and the client role when sending the request. Note: The proxy server must be able to interpret a SIP message and rewrite the message before it needs to be forwarded. A large network may have multiple proxy servers. In section 4 of rfc3261, an interesting example describes how two SIP terminals establish a call through two proxy servers. In this example, two terminals are located in two different cities: Atlanta and Biloxi, so they are in two isolated networks. Each network has its own proxy server calledAtlanta.comAndBiloxi.com.If Alice in Atlanta wants to call Bob in Biloxi, Alice's phone will send the following INVITE message to her proxy server.Atlanta.com:

Via: SIP/2.0/udp pc33.atlanta ta.com; branch = z9hg4bk776asdhds
Max-forwards: 70
To: Bob
From: Alice; tag= 1928301774
Call-ID: a84b4c76e66710@pc33.atlanta.com
CSeq: 314159 invite
Contact:
Content-Type: Application/SDP
Content-Length: 142

After the message is transferred to Bob over the network, Bob will return an OK message if he is willing to accept the call.Biloxi.comProxy:

Sip/2.0 200 OK
Via: SIP/2.0/udp server10.biloxi.com
; Branch = z9hg4bknashds8; received = 192.0.2.3
Via: SIP/2.0/udp bigbox3.site3.atlanta ta.com
; Branch = z9hg4bk77ef4c2312983.1; received = 192.0.2.2
Via: SIP/2.0/udp pc33.atlantic ta.com
; Branch = z9hg4bk776asdhds; received = 192.0.2.1
To: Bob; tag = a6c85cf
From: Alice; tag= 1928301774
Call-ID: a84b4c76e66710@pc33.atlanta.com
CSeq: 314159 invite
Contact:
Content-Type: Application/SDP
Content-Length: 131

We noticed the same call-ID to ensure the singularity of the session. For more details, see the specific explanation of this example in RFC 3261.

5. Relationship between signaling and media in the SIP:

SIP is equivalent to implementing communication for the establishment of the media. In a metaphor, only the person who knows that you want to speak is there, and both people need to find a language that can communicate smoothly with each other, in this way, we can communicate more effectively.

When the two really started to talk with each other, they had nothing to do with the mutual understanding and basic communication before those conversations.

6. Several important concepts:

Call (Call):Call is an informal term used to represent a multimedia session, which is identified by call-id, the same call-ID is used in each UA;

Transactions(Transaction):Request (UAC) + final response (adjacent UAS), and SIP is based on transactions. The so-called adjacent means that the transaction exists in the adjacent sip entity, rather than between two UA. CSeq ID. A transaction contains one request message, zero or more temporary response messages, one or more final response messages (2XX ~ 6XX ). SIP is a transactional protocol. Transaction differentiation is determined by the branch value at the top of the via field stack. This is because every time a request message passes through a proxy with a transaction status, the proxy needs to create a server transaction and a client transaction for this transaction, and add its own URI to the top of the stack via, and generate a global ID as the branch value, this value represents a corresponding transaction. SIP defines the state machine and timer at the transaction level to implement retransmission.

Is a reply200 OKOfInviteTransaction: is it true?InviteThe difference between transactions is thatUAC requires the final request (2XX ~ 6XX) generates an ACK response, while other request messages (Info, option, etc) do not. Because invite is important, such a three-way handshake mechanism is required to ensure that both parties in the session can ensure the integrity of the transaction, which is similar to the three-way handshake established by the TCP connection.

Note that in the two UA instances, each Proxy Server adds its own address to the returned ack via header domain, but not the successful transaction. For details, see RFC 3261 (P.24 ). The value of the CSeq Header field must be the same as that of invite, and the CSeq method must be ack. The intermediate Response Message 1xx is designed to save network overhead. Once UC receives any intermediate response message, UC must stop the message resend timer and stop sending the request message, otherwise, wait until the final response message is received or the resend timer times out. Once the client UAC transaction receives any intermediate Response Message 1xx in the calling status, the transaction automatically switches to the processing status to stop sending the request message. In addition, the intermediate response message must be sent to the TU Transaction user. In the call service, the Tu and upper-layer applications can prompt the user based on the intermediate response message on the user interface. Once the transaction is switched to the processing status, any other intermediate response message must be sent to the Tu.

RatherInviteThe transaction is as follows:

When a UAC sends a non-invite request, it enables the timer F (TCP) or E (UDP) on the transaction management sub-layer to ensure re-transmission during timeout. This applies to non-invite requests except ack requests. The e time is doubled every time the re-transmission times out until the maximum is 4 seconds. When F times out, UAC considers it as timeout and the transaction will be deleted.

Dialog(Dialog/leg):Represents an end-to-end connection (for example, a call) between two sip ua instances ). That is to say, it only exists in the end-to-end signaling relationship. When a UAS sends a non-Failed final response to invite (or refer) <=> 200ok (bye), the dialog is created, which is also the beginning of the session. There is no dialog between UA and the SIP proxy server. In sip, a call contains one or more Dialog (which only exists in multi-party calls ). The dialog ends at any end to issue bye. Early dialog can end with the cancel sent by UAC. More specifically, all early conversations are terminated when they receive a non-2XX final response. Call-ID-value, to, and from are identified. Forking is obvious.

In this forking example, the user registers three devices. When the user is called, the invite Contact Header domain is converted to three invite and sent to three devices. Q indicates the priority. The smaller the Q, the higher the priority. The SIP registration server is equivalent to a forking proxy. Although this entity receives two ack messages, in addition to these ack messages, Its Signaling interaction with the caller is a transaction, transaction is set up with the called party. In addition, the two ACK packets received by the callee establish transaction respectively. Note that device3 returns an unsuccessful response such as 488. The SIP registration server (forking Proxy Server) does not send the response back to the caller. This is an important feature of the SIP proxy, the SIP proxy can also send a request: Cancel message by itself.

After receiving a new dialog request INVITE message, the UAS dialog layer copies all the route-record fields in the request message to 2XX in the Response Message 2XX of the established session, in addition, a contact field must be added to the UAS dialog layer to allow subsequent responses (invite also includes ack messages in 2XX responses) and request messages to be directly connected to the UA. After the UAC receives the 2XX Response Message from the UAS invite, if 2XX does not contain any route-record fields, the UAC can directly send ACK to the address & Port in the contact.

Session(Session):Multi-party media relationships are established under the control of dialogs.

Is early dialog, session, dialog, transaction and so on in the call of a UA-UA embodiment:

In this example, the dialog created successfully through the invite transaction must have an ACK to respond. This is the beginning of the second transaction, although ack does not reply, however, because the new branch-value is filled in, this ack represents the beginning of a new transaction. Note:Transaction number (CSeq)Not added based on invite -- that is, if the final response received is not 2XX (3xx--6xx), the transaction contains ack. If the final response is 2XXACKBelong to a newTransaction (In this case, we suspect that some foreign materials regard it as a newTransaction,Rfc3261HoweverACKDoes not belongInvite transactionOr create a newTransactionBut it will re-calculateTransactionParameters-- Branchid). Early dialogs are established by UAS with a 1xx response as the response time. The advantage of this is that UAC may send SIP requests such as update in early conversations.

7.Online status (Presence)

Some people translate the form into "Presentation", and I personally think it is more appropriate to translate the form into the online status. This is an exciting sip application that allows you to determine the user location, determine whether the communication can be performed by phone, email, text, or video. Both personnel and applications can use status information to enable enterprises to integrate communications into business processes. IETF specifies a large number of SIP extensions to support the online status function, as shown below:

· RFC 2778: A Model for presence and instant messaging

· RFC 2779: instant messaging/presence Protocol requirements

· RFC 3261 SIP: Session Initiation Protocol

· RFC 3856: A presence event package for the Session Initiation Protocol

· RFC 3859: Common Profile for presence

The main components include:

Presence agent (PA): the proxy of the SIP user, which can receive and process presence messages. It can also respond to and reorganize other messages (such as public messages or any messages other than sip ). When a user changes its status, it can also send notifications to subscribers. It can be implemented together with the SIP proxy server or as an independent entity.

Presence user agent (PUA): queries and updates the PA.

8.Network-wide viewingSIP

In a non-IMS network, the topology is as follows:

In this structure, there are three main parts:

Public Network and business customizer node: This is a PSTN network and a user-level device.

DMZ zone: Some network elements connected to the Internet for security purposes.

Core network: the core area of network message processing.

Note:

1. The media server in the core network can sometimes play the role of UA. For example, if you customize the voice message service, the Media Server is responsible for playing the prompt and recording. The Gateway also plays the role of UA at a certain level.

2. ip pbx is a type of b2bua, and SBC is also a type of b2bua, which is responsible for hiding the Intranet topology. The application server is also a type of b2bua for modifying business parameters and other operations.

9. NatProblem

As shown in the following figure, Nat problems may occur in the SIP:

Currently, the commercial method is to use session Border controller (SBC), that is, to provide Nat services at the application layer. SBC listens to SIP requests. When a request is obtained, it not only checks the IP header to route this packet, but also checks the contract address in the SIP message and changes the address to a routable address before routing it to the next hop, this routable address is not necessarily a public address, as long as the next node can be routed, as shown in:

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

[SIP Protocol] learning notes for beginners

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support