SIP protocol (Chinese)-1

Last Update:2018-12-04 Source: Internet

Author: User

Tags rfc to domain fully qualified domain name

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1. Introduction to the SIP protocol
Many applications on the internet need to establish and manage a session. The session here refers to the exchange of data between participants. Considering the actual situation of participants, the implementation of these applications is often very complicated: Participants may move between agents, they may have multiple names, the communication between them may be based on different media (such as text, multimedia, video, audio, etc.)-sometimes multiple media interact together. Countless communication protocols have been created for real-time multimedia session data such as sound, image, or text. This SIP (Session Initiation Protocol), like these protocols, also allows the use of internet endpoints (User proxies) to find participants and allow the creation of a shared session description. To be able to locate precise session participants and for Other Purposes, SIP allows the creation of basic network
Hosts (called a proxy server) allows end users to register, Send Session invitations, or send other requests. SIP is a lightweight, multi-purpose tool that can be used to create, modify, and terminate sessions. It operates independently under communication protocols and does not depend on established session types.

2. Overview of SIP protocol functions
SIP is an application-layer control protocol that can be used to establish, modify, and terminate multimedia sessions (or meetings), such as Internet calls. SIP can also invite participants to existing sessions, such as multi-party meetings. Media can be conveniently added or deleted in an existing session ). The SIP display supports name ing and redirection services, which are used to support personal mobile businesses-users can use a unique external sign without having to relate to their actual network locations. SIP supports the establishment and maintenance of the termination of multimedia session protocol in five aspects:
User positioning: Check the location of the end user for communication.
User validity: Check the user's willingness to participate in the session.
User capability: Checks media and media parameters.
Establish a session: "Ringing", and set the session parameters on the caller and callee.
Session management: includes sending and terminating sessions, modifying session parameters, and activating services.
SIP is not a vertically integrated communication system. SIP may be called a part more suitable. It can be used as a part of other IETF protocols to construct a complete multimedia architecture. For example, these architectures will include real-time data transmission protocol (RTP) (RFC 1889) used to transmit real-time data and provide QoS feedback. Real-time stream protocol (RSTP) (RFC 2326) used to control the transmission of streaming media. The Media Gateway Control Protocol (Megaco) (RFC 3015) is used to control the gateway to the Public Telephone Exchange Network (PSTN), and the session Description Protocol (SDP) (RFC 2327) used to describe multimedia sessions. Therefore, SIP should work with other protocols to provide complete services for end users. Although basic sip functional components do not depend on these protocols.

SIP itself does not provide services. However, SIP provides a basis for implementing different services. For example, SIP can locate the user and transmit a encapsulated object to the current location of the other party. And if we use this to transmit the description of the session through SDP, the user agent of the other party can get the parameter of the session immediately. If we use this photo of the caller like Session Description SD, a "call ID" service will be easily created. This simple example shows that as a basis, SIP can provide many different services on it.

SIP does not provide conference control services (such as seat control or voting system), and it is not recommended that meetings be managed in that way. You can establish other conference control protocols on the SIP to initiate a meeting. Because SIP can manage sessions of all parties involved in the meeting, the meeting can be managed across heterogeneous networks, and the SIP cannot, and it does not intend to provide any form of Network Resource Reservation Management.

Security is especially important for the services provided. To achieve the desired security level, SIP provides a set of security services, including Denial-of-Service prevention, authentication services (from users to users, from agents to users), integrity assurance, encryption and privacy services.

SIP can be based on IPv4 or IPv6

3. Terms
In this document, the keywords "mandatory", "not allowed", "requirement", "OK", "no", "should", "no ", "recommended", "not recommended", "possible", and "optional" are different levels required for SIP implementation according to the specifications of bcp14, RFC 2119 [2 ].

4. Implementation Overview
This section describes the basic implementation of SIP through simple examples. This section describes how to use natural examples instead of regular expressions.

The first example illustrates the basic functions of SIP: to locate a breakpoint, send a communication request, establish a session through negotiation session parameters, and disassemble the session just created.
Figure 1 shows a typical example of SIP message exchange between Alice and Bob. (Each message uses the letter "F" and a number to mark the text.) In this example, alice uses a sip application (such as a soft phone) on her PC to call Bob's SIP Phone over the Internet. This example also hides how two sip proxies establish Session connections between Alice and Bob. This typical arrangement is often referred to as the "sip Trapezoid" as shown by
Geometric Shape of the dotted lines in Figure 1.

Alice calls Bob through Bob's sip sign, which is called the sip uri of the uniformly allocated Resource (Uniform Resource Identifier URI. The sip uri is defined in Section 19.1. It is similar to an email address. A typical sip URI includes a user name and a host name. In this example, the sip uri is the SIP: bob@biloxi.com, and biloxi.com is Bob's sip service provider. Alice has a sip URI: SIP: alice@atlanta.com. Alice can enter Bob's URI or click Bob's Uri on a hyperlink in the address book. Sip also provides a secret URI called sips
Uri. Example: sips: bob@biloxi.com. A sips uri-based call ensures that the call is secure, and all the SIP messages sent to the caller and the called are encrypted (called TLS ). In TLS, requests are transmitted to the called party through encryption, but this encryption mechanism is implemented based on the called party's host server.

SIP is a communication mode based on a request response similar to HTTP protocol. Each communication includes a request for a function and requires at least one response. In this response, Alice's soft phone sends an invite communication request containing BBO's sip URI boycott. Invite is an example of a SIP request, indicating that the requester (Alice) wants the service provider (Bob) to respond. An invte request contains a series of header fields ). The header contains many attributes and additional information for transmitting messages. The following fields are contained in invite: the unique identifier of the call, the destination identifier, the address of Alice, and the type of session that Alice and Bob establish. The invite request (F1 in Figure 1) may look like this:

Invite SIP: bob@biloxi.com Sip/2.0
Via: SIP/2.0/udp pc33.atlanta ta.com; branch = z9hg4bk776asdhds
Max-forwards: 70
To: Bob <SIP: bob@biloxi.com>
From: Alice <SIP: alice@atlanta.com>; tag = 1928301774
Call-ID: a84b4c76e66710@pc33.atlanta.com
CSeq: 314159 invite
Contact: <SIP: alice@pc33.atlanta.com>
Content-Type: Application/SDP
Content-Length: 142
(Alice's SDP not shown)

Atlanta.com... biloxi.com
. Proxy proxy.
..
Alice's ..................................... ....... bob's
Softphone SIP Phone
|
| Invite F1 |
| ---------------> | Invite F2 |
| 100 trying F3 | ---------------> | invite F4 |
| <--------------- | 100 trying F5 | ---------------> |
| <-------------- | 180 ringing F6 |
| 180 ringing F7 | <--------------- |
| 180 ringing F8 | <--------------- | 200 OK F9 |
| <--------------- | 200 OK F10 | <--------------- |
| 200 OK F11 | <--------------- |
| <--------------- |
| ACK F12 |
| -------------------------------------------------> |
| Media session |
| <===================================================== ==========>|
| Bye f13 |
| <----------------------------------------------- |
| 200 OK F14 |
| -------------------------------------------------> |
|
Figure 1: An Example of creating a sip session expressed in a sip rectangle.

The first line of a text message contains the request type (invite ). After this line, the request header is used. This example contains a set of minimum required header fields. A Brief Introduction:

The via domain contains the server address (pc33.atlanta ta.com) that Alice receives the request ). This also contains a branch parameter to mark session transactions between Alice and the server.

The to domain contains a display name (Bob) and a sip or sips uri (SIP: bob@biloxi.com) request will first be transmitted to this URI. Display names is described in RFC 2822.
The from domain also contains a display name (Alice) and a sip or sips uri (SIP: alice@atlanta.com) which is used to mark the request's original initiator.
This field also contains a tag parameter, which is a random string (1928301774) and a random string added to the URI by softphone. Used for marking purposes.
Call_id contains a globally unique identifier used to uniquely identify a call. It is generated by using a random string and softphone's own name or IP address. Through the to tag, the from tag and call-ID fully define the end-to-end sip relationship between Alice and Bob, and indicate that this is a conversational relationship.
CSeq or command sequence contains an integer and a request name. The CSeq number increases sequentially. Each time a new request is initiated in a dialog, the order of the number increases.
The contact domain contains a sip or sips uri to indicate the direct access method to Alice. It usually consists of the user name and the full name of a host (fully qualified domain name FQDN. When FQDN is preferred, many end users cannot access Alice's host because it is not registered by name. Therefore, IP addresses are optional.
The via domain tells you where the request is sent and where it is responded, and the contract domain tells you where the future request will be sent (Strange... Is it not initiated by Alice? In the future, it should be Bob ).
Max-forwards: the maximum number of forwards allowed for communication. It is composed of an integer. Each time it is forwarded, the integer is reduced by one.
Content-Type contains the description of the message body (the message body is not listed in this example)
Content-Length: the length (in bytes) of the message body)
The complete SIP header domain is defined in Section 20. Session details, such as the media type, codec, or sampling rate, are not described through sip. This can be described through the message body of the SIP, and can be described in the body through other defined protocols. Session descripotion protocol SDP (rfc2327 [1]). This SDP message (not listed in the example) is sent through a SIP message, just like an email sent through an attachment, or a webpage transmitted through HTTP.

Because softphone does not know where Bob or Bob's SIP Server biloxi.com is, softphone sends an invite request to Alice's SIP server, atlanta.com. The atlanta.com SIP server should have been configured in Alice's softphone, or can be obtained through DHCP. The atlanta.com SIP Server is a proxy server. The proxy server receives SIP requests and forwards the requests. In this example, the proxy server receives an invite request and sends a 100 (trying) Response to Alice's softphone. The 100 (trying) response indicates that the invite request has been received and the proxy server is forwarding the invite request. The SIP response is represented by a three-digit number. The SIP response also contains the to, from, call-ID, CSeq, and branch parameters in Via. This parameter allows Alice's softphone to associate requests and responses. After receiving the invite request, the proxy server of Atlantic ta.com may use the DNS service to find the SIP server that provides this biloxi.com. This is described in [4. Finally, forward the invite request to biloxi.com or the proxy server that can reach biloxi.com. Before forwarding a request, the proxy server of Atlantic ta.com will add a value on the Via header that contains the self-resisting value (invite already contains Alice's address via domain ). The biloxi.com proxy server receives the invite request and returns a 100 (trying) response to the Atlanta. com Proxy Server indicating that it has received the request and is processing the request. This proxy server queries the database, usually called the address service, which contains Bob's current IP address. (We can see what happened to this database in the next section.) The biloxi.com proxy service adds another via header domain containing its own address and sends it to Bob's sip
Phone number.

Bob's SIP Phone receives an invite request and notifies Bob of a incoming call from Alice so that Bob can decide whether to respond to the incoming call. This means that Bob's phone is ringing. Bob's SIP Phone sends a 180 (RINGING) response, which will be returned to Alice through two proxy servers. Each proxy server uses the via header field to determine where to send the response, and takes its address away from the header before sending the response. Although the DNS and location service route the initial invite request, the 180 (RINGING) response can be simply returned to the initiator without looking for the initiator and retaining the status on the proxy server, at the same time, each proxy that forwards the invite can obtain every response from the invite. This feature is also very useful.

When Alice's softphone receives a 180 (RINGING) response, it prompts Alice, probably through a return tone or a message prompt on the screen.

In this example, Bob decides to respond to the call. When he picks up the phone, his SIP Phone sends a 200 (OK) response to the sender, indicating that the phone has been received. This 200 (OK) contains a message body that contains the SDP media description that contains the media connection Bob wants to establish with Alice. Similarly, the SDP message is also a two-segment exchange: Alice sends one to Bob and Bob sends one back to Alice. The two-segment exchange provides basic compatibility negotiation and proposes/responds to the exchange model based on a simple SDP. If Bob does not want to respond to this call or is responding to another call, an incorrect response will be sent back instead of the normal 200 (OK). In this way, no connection is established. Complete Response Code of SIP is described in Section 21. Bob's 200 (OK) (F9 message in Figure 1) may look like this:

Sip/2.0 200 OK
Via: SIP/2.0/udp server10.biloxi.com
; Branch = z9hg4bknashds8; received = 192.0.2.3
Via: SIP/2.0/udp bigbox3.site3.atlanta ta.com
; Branch = z9hg4bk77ef4c2312983.1; received = 192.0.2.2
Via: SIP/2.0/udp pc33.atlantic ta.com
; Branch = z9hg4bk776asdhds; received = 192.0.2.1
To: Bob <SIP: bob@biloxi.com>; tag = a6c85cf
From: Alice <SIP: alice@atlanta.com>; tag = 1928301774
Call-ID: a84b4c76e66710@pc33.atlanta.com
CSeq: 314159 invite
Contact: <SIP: bob@192.0.2.4>
Content-Type: Application/SDP
Content-Length: 131
(Bob's SDP not shown)

The first line of the response contains the response code (200) and the cause (OK ). The remaining rows contain the header field. The via, to, from, call-ID, and CSeq headers are directly copied from the invite request package. (There are three Via domain values-one is added by Alice's SIP Phone, the other is added by the Atlantic ta.com proxy, and the other is added by the biloxi.com proxy ). Bob adds a tag parameter to the SIP Phone. This tag parameter will be used by all parties involved in the dialogue and will be used in future conversations. The contract domain contains a URI that can be directly connected to Bob. The Content-Type and content_length fields contain the message body (not reflected in the example), which contains Bob's SDP media information.

In addition to DNS and location services, the proxy server can independently decide the route, that is, it decides where to forward the request. For example, if Bob's SIP Phone returns a 486 (busy) signal, the proxy server biloxi.com can forward this invite request to Bob's voice mailbox server. A proxy server can send INVITE requests to N locations at the same time. This kind of concurrent searching is the legendary forking ).

In this example, the 200 (OK) response is sent to Alice's softphone through two proxies. Alice's softphone receives the response, stops the ringing, and indicates that Bob has answered the call. Finally, Alice's phone sent a confirmation message, ack, to Bob's SIP Phone to confirm receipt of this final 200 (o 'K) response. In this example, the ACK signal is sent directly from Alice's softphone to Bob's SIP Phone and crossed two proxy servers. This is because the two endpoints (Alice and Bob) Know the addresses of each other through the contact header in the Request Response packet of invite/200 (OK, this address is unknown when the invite request was initiated. Therefore, you do not need two proxy servers to find the address of the other party, so the proxy server does not participate in the next call flow. This completes the use of invite/200/ack
A third-party handshake is used to establish a sip session. The detailed description of the session creation process is described in section 13.

Now, Alice and Bob start their media sessions. They send sessions by sending the clear media packets agreed in the SDP packet that the session was just created. Generally, end-to-end media packets and SIP signal control packets are sent through different communication paths.

In a session, Alice or Bob can change their media session attributes. This can be done by sending a re-invite request containing the new media attribute description. This re-invite is bound to an existing session, so that the other party participating in the session can understand that this is to change the existing session attributes rather than creating a new session. After receiving this re-invite request, the other party will send a 200 (OK) Response to accept this change. The requester accepts the 200 (OK) response from the other party through an ACK. If the other party does not agree with this media attribute change, it will send an incorrect response, for example, 488 (not available for the time being), which will also receive an ACK response from the initiator. In any case, the failure of re-invite does not affect the existing session. The original session can continue with the attributes of the previous media session. You can find detailed descriptions of session attribute changes in Section 14.

At the end of the call, Bob first disconnects (hangs up) and sends a bye message. The bye message will be sent directly to Alice's softphone, which also skips the proxy. Alice sent a 200 (OK) Response to confirm that she received the bye message, which terminated the session and responded to the bye request. Ack does not need to be sent here-an ACK signal is sent only when responding to an invite response. We will discuss the special processing of this invite later, but based on the reliability mechanism of the SIP, a call time can be considered to include the call time (but relate to the reliability)
Mechanical ISMS in SIP, the length of time it can take for a ringing phone to be answered, and forking .) for this reason, the processing of SIP requests is generally classified based on whether or not the invite requests are processed separately. The session termination details can be found in section 15.

Section 24.2 describes the detailed explanations of all messages used in Figure 1. In some cases, it is useful to continue forwarding packets in all sessions through the proxy. For example, if the biloxi.com Proxy Server wants to keep the SIP message stream after invite, it will add a record-Route Header in invite) contains a URI pointing to the proxy server's hostname or IP address. This message will be received by Bob's SIP Phone and Alice's softphone (because the record-Route Header domain will be sent back in the 200 (OK) response) and will be kept in the session. Then, the biloxi.com proxy server can continue to receive and forward ack and bye and send a 200 (OK) Response to bye. Each proxy can independently determine whether to receive subsequent messages after the invite, and these subsequent messages can be sent to the proxy servers that decide to receive subsequent messages. This usually happens on the proxy server that provides the mid-call service.

The registration service is another common SIP Operation. The registration service is a method for the biloxi.com proxy server to know Bob's current address. Bob's SIP Phone sends a register message to a registration server of biloxi.com during initialization or at intervals. The register message contains Bob's sip or sips uri (SIP: bob@biloxi.com) that is currently logged on to the server (converted to a SIP or sips uri in the contact domain ). The registration server registers the ing, which is called binding and written into a database called location service. This database can be used by the proxy server of biloxi.com. Generally, the registration server and proxy server are combined. A very important concept is that the difference between SIP servers is logical, not physical.

Bob does not have to initiate registration on a single device. For example, both the SIP Phone and the company's SIP Phone can be registered. These messages are saved in the location service and the proxy server is allowed to find Bob by different means. Similarly, different users can register on the same device at the same time.

Location service is a logical concept. It allows the proxy service to enter a URI to query where requests should be forwarded. You can simply set up the information required for this positioning service through user registration, or you can use other methods. You can use any other address ing method to locate the service.

At last, it should be noted that the registration service is only used to provide the SIP request received by the route. It does not determine the identity authentication of the request. In sip, authorization and authentication can be achieved through context-related requests established in the request/response mode, you can also use a more underlying method (as described in section 26 ).

The complete example of registering a SIP message is described in Section 24.1.

Other sip operations, such as checking the load on the SIP server, using the options on the client, or canceling a pending request with cancel, will be described in subsequent sections.

5. Protocol Structure
SIP is a layered protocol, which means that the SIP protocol is composed of a set of unrelated processing layers with only loose relationships. The Protocol is divided into different levels to describe the common elements that have functions in the same section for clearer expression. This Agreement does not specify a specific implementation. When we say that an element "contains" a layer, we mean that this element reviews the rules defined by this layer.

Not every element of SIP must contain every layer. In addition, the elements defined by SIP are logical elements, not physical elements. A physical implementation can implement different logical elements, perhaps even based on the principles of serial transaction processing. The lowest layer of SIP is its syntax and encoding layer. The encoding method is Extended Backus-Naur Form grammar (BNF paradigm ). The complete BNF description is in section 25; Section 7th provides a brief description of the SIP Message structure.

The second layer is the transport layer. It defines how a client sends requests and receives responses, and how a server receives and sends responses. All sip elements contain a communication layer. Section 18th describes the communication layer.

The third layer is the Transaction layer. Transaction is a basic component of SIP. A transaction is a request transaction sent by the customer (through the communication layer) to a server transaction, together with all the responses to the request of the server transaction, sent back to the client transaction. The transaction layer processes the re-sending of the application service layer, matches the response of the request, and the timeout of the application service layer. All tasks completed by a user agent client UAC are composed of a group of transactions. The transaction discussion is described in Section 17th. The User Agent contains a Transaction layer to implement an existing proxy server. The stateless proxy server does not contain the Transaction layer. The transaction layer contains a customer element (which can be considered a customer transaction) and a server element (which can be considered a server transaction). They can all process specific requests with a finite state machine.

Above the Transaction layer is the Transaction user (TU ). Every sip entity, except the stateless proxy, is a Transaction user. When an tu sends a request, it first creates a client transaction instance and sends it together with the request. This includes the target IP address, port number, and device that sends the request. You can create or cancel a customer transaction. When a customer cancels a transaction, it requests the server to terminate the transaction being processed, roll back to the status before the transaction starts, and generate an error report for the specified transaction. This is done by the cancel request, which has its own transaction and contains a canceled transaction (section 9th ).

SIP elements, including user proxy clients and servers, stateless and stateful proxy servers and registration servers, including a core that can be differentiated from each other (cores ). Cores, except the stateless proxy server, are transaction users. The behaviors of UAC (User Agent client) and UAS (User Agent Server) Cores depend on implementation. For All implementations, there are several common principles (section 8th ). For UAC, these rules constrain the establishment of requests; For UAS, these rules constrain the processing and response of requests. As the registration service is an Important Role in SIP, UAS has a special name for processing register requests: the Register (Registrar, registration server ). Section 10th describes the Core Behavior of UAC and UAS on register. Section 11th describes the UAC of options and the core implementation of UAS. This options is used to detect the processing capability of UA (UA-user
Agent ).

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More