SOAP Protocol Primary Guide (i)

Source: Internet
Author: User
Tags expression file system header http request iis implement web services msmq

SOAP (Simple Object access protocal) technology helps to achieve interoperability between a large number of heterogeneous programs and platforms so that existing applications can be accessed by a wide range of users. SOAP combines the flexibility and scalability of a sophisticated HTTP based Web technology with XML.

This article takes you through a comprehensive review of the process of object remote process invocation (ORPC) technology to help you understand the basics of soap technology and how it overcomes many of the pitfalls of existing technologies such as CORBA and DCOM. It then describes the detailed soap coding rules and focuses on how SOAP maps to the existence of the ORPC concept.

Introduction:

When I started computing as my career in 1984, most programmers didn't care about network protocols. But in the 90 's, the web became ubiquitous, and now it's hard to imagine anyone using a computer but not using some form of network connectivity. Today, the average programmer is more interested in building scalable, distributed applications, rather than focusing on coolbars, a floating translucent, non rectangular rectangle, that uses MFC to personalize it.

Programmers often prefer to use programming models to think about problems, and rarely consider network protocols. Although this is usually a good thing to do, the soap I will discuss in this article is a network protocol with no obvious programming model. This does not mean that the architecture of soap fundamentally alters the way you program. Instead, one of the main goals of soap is to make existing applications available to a wider range of users. For this purpose, there is no SOAP API or SOAP object request proxy (Soap ORB), and soap is a technique that assumes that you will use as many existing technologies as possible. Several major CORBA vendors have pledged to support the SOAP protocol in their ORB products. Microsoft is also committed to supporting SOAP in future COM versions.

DevelopMentor has developed a reference implementation that allows soap to be used by any Java or Perl programmer on any platform.

The idea behind soap is "it's the first technology that hasn't invented any new technology". SOAP employs two protocols that have been widely used: HTTP and XML. HTTP is used to implement the RPC-style transport of soap, and XML is its encoding pattern. With a few lines of code and an XML parser, HTTP servers (such as MS IIS or Apache) immediately become soap orbs. Because more than half of the Web servers currently use IIS or Apache, SOAP will benefit from the extensive and reliable use of both products. This does not mean that all SOAP requests must be routed through a Web server, and that traditional Web servers are just one way of assigning soap requests. Therefore, Web services such as IIS or Apache are sufficient to establish SOAP-enabled applications, but are by no means necessary.

As this article will describe, soap simply encodes the HTTP transport content with XML. The most common application of soap is as an RPC protocol. In order to understand how soap works, it is necessary to briefly review the history of RPC protocols.

The history of RPCs

The two main communication models for building distributed applications are message delivery (often grouped together with queues) and request/response. The messaging system allows any party to send messages at any time. The request/Response protocol restricts the communication mode to both sides of the request/response. Message based applications are acutely aware that they are communicating with external parallel processes and need an explicit design style. Application based on request/response is more like a single process, because the application of sending requests is more or less blocked until a response from another process is received. This makes the request/response communication naturally appropriate for RPC applications.

Although message communication and request/response have their merits, they can be implemented with each other. Message systems can be built with a lower-level request/response protocol. A DCE RPC is used internally by Microsoft's message Queue Server (MSMQ) to establish most of the control logic. RPC systems can also be built using a lower-level message system. The association ID provided by MSMQ is for this purpose. Regardless of the evaluation, most applications still tend to use RPC protocols because they are widely used, they are simpler to design, and more natural to the mapping of traditional programming techniques.

In the 80 's, two major RPC protocols were Sun RPC and DCE RPC. The most popular Sun RPC application is the network File system (NFS) used by most Unix systems. The most popular DCE RPC application is Windows NT, which uses the DCE RPC protocol to implement many system services. These two protocols are proven to be applicable to a wide range of applications. However, in the late 80, the popularity of object-oriented technology has made the software industry addicted to building a link between object-oriented language and RPC based communication.

The object RPC (ORPC) protocol, which was generated in the 90 generation, is an attempt to link object-oriented and network protocols. The main difference between the ORPC and RPC protocols is that the ORPC code maps from a communication terminal to a language-level object. There is a cookie in the header of each ORPC request, which can be used by the server-side program to locate the target object in the server process. Usually this cookie is just an array of indexes, but other techniques are often used, such as the key to the hash table with the symbol name.

Currently, the two major OPRC protocols are DCOM and CORBA's Internet Inter-ORB Protocol (IIOP) or more general Inter-ORB Protocol (GIOP). The request format for DCOM and IIOP/GIOP is very similar. All two protocols use an object endpoint ID to determine the target object and use the method identifier to determine which method to invoke.

There are two main differences between the two protocols: the main difference is that when IIOP/GIOP is used, the interface identifier is implied, because a given CORBA object implements only one interface (although the OMG is currently working on a standardized work that has multiple interfaces supported for each object). Another subtle difference between DCOM and IIOP/GIOP requests is the format of the parameter values in the transport body. In DCOM, the transport body is written in the form of a network data expression (NDR) in which the IIOP/GIOP is written in the form of a public data expression (CDR). NDRs and CDR handle different data representations on a variety of platforms, respectively. But there are some small differences between the two formats that make them incompatible with each other.

Another important difference between ORPC and RPC protocols is the way in which communication endpoints are named. In the ORPC protocol, some transitive representations of ORPC endpoints are required to pass object references between networks. In Corba/iiop, this expression is referred to as an interactive object reference (IOR). Iors contains addressing information expressed in a compact format and uses it to determine an object endpoint for any CORBA product. In DCOM, this expression is called ObjRef, which combines the reference calculation of a distribution and the endpoint/object identity. Both CORBA and DCOM provide advanced mechanisms for finding object endpoints on the network, but eventually these mechanisms are mapped back to iors or objrefs.

What are the problems with the current technology?

Although both DCOM and IIOP are fixed protocols, the industry has not fully turned to any of them. The lack of integration is partly the result of cultural problems. And when some organizations try to standardize one or another agreement, the technical applicability of the two Protocols is challenged. It is traditionally considered that DCOM and CORBA are reasonable communication protocols to server-side. However, there are obvious weaknesses in both client and server communications, especially when the client is spread over the internet.

Both DCOM and Corba/iiop rely on a single vendor solution to maximize the use of the Protocol. Although the two protocols are implemented on a variety of platforms and products, the reality is that the selected releases need to be implemented by a single vendor. In the case of DCOM, this means that each machine will run in Windows NT. (Although DCOM has been moved to other platforms, it has only a broad extension on Windows.) In the case of CORBA, this means that each machine will run the same orb product. It is possible for two CORBA products to be invoked with IIOP, but many advanced services, such as security and transactions, are often not interoperable at this time. Furthermore, it is difficult to optimize the communication of any specialized manufacturer for the same machine unless all applications are built on the same orb product.

Both DCOM and CORBA/IIOP depend on a carefully managed environment. Two arbitrary computers have a low probability that DCOM or IIOP be successfully invoked outside the environment (calls out of the box). This is especially true when considering security. Although it is possible to write a tight-pack (shrink-wrap) application that can successfully use DCOM or IIOP, this is more about detail than the socket based application. This is especially true for tedious but required configuration and installation management tasks.

Both DCOM and Corba/iiop rely on a fairly high-tech operating environment. Although COM in-process seems particularly simple, com/dcom remoting is absolutely not just a matter of days. IIOP is a protocol that is easier to implement than DCOM, but two protocols have quite a number of esoteric rules for handling data flow, type information, and bitwise operations. This makes it difficult for a typical programmer to construct a simple CORBA or DCOM call without understanding the Orb product or OLE32.DLL.

Perhaps the most unbearable thing for DCOM and CORBA/IIOP is that they do not work on the Internet. For DCOM, the general user's imac or the inexpensive PC that runs Windows 95 will be almost impossible to perform domain based authentication with your server. Worse, if the firewall or proxy server separates the client and server machines, the likelihood that any IIOP or DCOM packages will pass is low, largely due to the preference of the HTTP protocol for most Internet connection technologies. Although some vendors such as Microsoft, Iona and Visigenic have established channel technology, these products are easily sensitive to configuration errors and they are not interactive.

In a server community these problems do not affect the use of DCOM or IIOP. Because the number of hosts in the server community is small (typically hundreds, not tens of thousands), this offsets the cost of DCOM's life-cycle management based on Ping. In a server community, there is a high probability that all hosts are managed by a common admin domain, making a unified configuration possible. A relatively small number of machines can also maintain the cost of controlling the use of commercial orb products, since only a small amount of orb licensing is required. If only IIOP is used in the server community, only a small amount of orb licensing is required. Finally, it is possible for all hosts in the server community to have direct IP connectivity, which eliminates the DCOM and IIOP issues associated with firewalls.

HTTP as a better RPC

Using DCOM and CORBA in the server community is a common practice, but the client uses HTTP to enter the server community. HTTP is similar to RPC protocol, which is simple, widely configured, and more likely to work with firewalls than other protocols. HTTP requests are typically handled by Web server software, such as IIS and Apache, but an increasing number of application server products are supporting HTTP as a protocol other than DCOM and IIOP.

Like DCOM and IIOP, the HTTP layer requests/responds to communication over TCP/IP. An HTTP client connects to the HTTP server with TCP. The standard port number used in HTTP is 80, but any other port can also be used. After a TCP connection is established, the client can send a request message to the server side. The server sends back an HTTP response message to the client after processing the request. Both request and response messages can contain information about any transport body, usually marked with Content-length and Content-type HTTP headers. The following is a valid HTTP request message:

Post/foobar http/1.1
host:209.110.197.12
Content-type:text/plain
Content-length:12
Hello, World

You may have noticed that the HTTP headers are just general text. This makes it easier to diagnose HTTP problems with package checker or text-based Internet tools such as Telnet. HTTP text-based Properties also make HTTP easier to apply to low technology programming environments that are popular in web development.

The first line of the HTTP request contains three components: HTTP method, request-uri, protocol version. In the previous example, these correspond to post,/foobar, and http/1.1 respectively. The Internet Engineering Task Force (IETF) has standardized a fixed number of HTTP methods. Get is the method that HTTP uses to access the Web. Post is the most common HTTP method for building an application. Unlike get, Post allows arbitrary data to be sent from the client to the server side. The request URI (uniform Resource Identifier) is an HTTP server-side software that identifies a simple identifier for the requested target (it is more like a iiop/giop object_key or a DCOM IPID). For more information on URIs please refer to "URIs, URLs, and urns". In this example, the protocol version is http/1.1, which represents compliance with RFC 2616 rules. http/1.1 adds several features to the http/1.0, including support for bulk data transfer and support for maintaining a TCP connection between several HTTP requests.

The third and fourth lines of the request specify the size and type of the request body. The Content-length header specifies the number of bits for the body information. The Content-type type identifier specifies the syntax of the MIME type as the body information. HTTP (like DCE) allows the server and the client to negotiate the transport syntax used to prepare information. Most DCE applications use NDRs. Most Web applications use text/html or other text-based syntax.

Notice the empty line between the Content-length header and the request body in the sample above. Different HTTP headers are delimited by the carriage-return/line code sequence. These heads and bodies delimit lines by using another carriage-return/line code sequence. The request then includes the original bytes, the syntax and length of which are identified by the content-length and Content-type HTTP headers. In this example, the content is a 12-byte plain text string "Hello, World".

After processing the request, the HTTP server is expected to send back an HTTP response to the client. The response must include a status code to represent the result of the request. The response can also contain arbitrary body information. The following is an HTTP response message:

OK
Content-type:text/plain
Content-length:12
Dlrow, Olleh

In this example, the server returns status code 200, which is the standard success code in HTTP. If the server side cannot crack the request code, it returns the following response:

Bad Request
content-length:0

If the HTTP server decides that the request to the target URI should be temporarily diverted to another different URI, the following response will be returned:

307 temporarily moved
Location:http://209.110.197.44/foobar
content-length:0

This response informs the customer that the request will be satisfied by passing it back to the address specified in the location header.

All standard status codes and headers are described in RFC 2616. Few of them are directly related to SOAP users, with one notable exception. At http/1.1, the underlying TCP connection is reused across multiple request/response pairs. The HTTP connection header allows either side of the client or server to close the underlying connection. By adding the following HTTP headers to the request or response, both parties require that the TCP connection be turned off after the request is processed:

Connection:close

In order to maintain a TCP connection when interacting with the http/1.0 software, it is recommended that the sender add the following HTTP headers to each request or response:

Connection:keep-alive

This header makes it impossible for the default http/1.0 protocol to restart the TCP connection after each response.

One advantage of HTTP is that it is being widely used and accepted. Figure 4 represents a simple Java program that sends the previously expressed request and resolves the result string from the response.

The following is a simple C program that uses CGI to read a string from an HTTP request and return it in reverse order via an HTTP response.

#include
int main (int argc, char **argv) {
Char buf[4096];
int cb = Read (0, buf, sizeof (BUF));
BUF[CB] = 0;
Strrev (BUF);
printf ("ok\r\n");p
printf ("content-type:text/plain\r\n");
printf ("Content-length:%d\r\n", CB);
printf ("\ r \ n");
printf (BUF);
return 0;

The implementation of the server is in the Java servlet to avoid the overhead of a process per request for CGI.

CGI is generally the least expensive way to write HTTP server-side code. In fact, each HTTP server-side product provides a more efficient mechanism for your code to process an HTTP request. IIS provides ASP and ISAPI as a mechanism for writing HTTP code. Apache allows you to write modules with C or Perl running in the Apache daemon. Most application Server software allows you to write Java servlet,com components, EJB session beans or CORBA servants based on a Portable object Adapter (POA) interface.



Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.