Web server in a closer view-Cognition

Last Update:2018-12-04 Source: Internet

Author: User

Tags ssl connection

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Web server in a closer view-Cognition

Author: Cao Wenlong, Li weisen

In the opinion of many users, the success or failure of a Web website mainly lies in the content and functions provided by it. But the web server that supports these content and functions is the real hero behind the scenes. According to statistics, there are more than 5 million websites around the world. A Web server is running behind each website. What is a web server? How does it work ?......

From c/s to Web
The earliest network system was a simple host/terminal system. All applications were completed by the host, and the terminal only ran the corresponding programs on the server. The arrival of the PC era has greatly improved computer networks and applications, there are fewer and fewer applications for terminal-oriented large hosts. In particular, the rise of network operating systems such as Netware and Windows NT and the emergence of network database systems have opened up a new network application model-C/S (Client/Server-Client/Server) mode. The C/S mode is a two-layer system. The first layer is to process the logic and business logic on the client, and the second layer is the server systems such as databases running through the network. In the C/S mode, transactions are processed separately to achieve distributed computing of the network. Over a long period of time, it has also helped enterprises implement LAN construction, improved internal business management, and improved work efficiency. However, the C/S mode has obvious limitations in system integration and maintenance, operating interface consistency, and system scalability, therefore, just as the host/terminal network is replaced by a network system in the C/S mode, an updated system mode will also appear in the Internet/Intranet technology environment.

Internet/Intranet Based on WEB technology has been widely used in recent years. Intranet is an enterprise intranet based on TCP/IP protocol and with web as the core, with a low-cost, easy-to-use client browser, you can access your desired data on the enterprise's Web site anytime and anywhere. The consistency of the browser client operation interface avoids the diversity of client programs in the C/S mode. The open and standard connection solutions on the server allow enterprises to easily contact the outside world through the Internet. At the same time, the dynamic and interactive publishing of Web Information has fundamentally changed the service quality of enterprises and increased their business opportunities.

In the layer-3 Web technology, the database does not directly provide services to each client, but communicates with the Web server to achieve dynamic, real-time, and interactive customer information services. This function is implemented through Server Applications created by CGI, ISAPI, nsapi, and Java. 1.

What is Web server)
The unique feature of web technology is Hyperlink and multimedia information. Web servers use HTML-hypertext marked language to describe network resources and create web pages for Web browsers to read. HTML documents are characterized by interactivity. Both common text and graphics can be connected to other documents on the server through links in the document, so that customers can quickly search for the information they want. HTML web pages also provide forms for users to fill in and submit to the database through the server application. This type of database generally supports multimedia data.

Web browser is a client application for document retrieval and display. It is connected to the Web server through Hypertext Transfer Protocol (HTTP. The general and low-cost browser reduces the development and maintenance costs of the two-layer C/S Mode client software. Currently, popular Internet Explorer and Netscape Navigator not only provide basic document retrieval, display, and navigation features, but also support advanced HTML display (such as tables and frames) and ActiveX, Java, JavaScript and other features.

How does a web server work?
In the opinion of many users, the success or failure of a Web website mainly lies in the content and functions provided by it. But the web server that supports these content and functions is the real hero behind the scenes. How does a web server work?

A few years ago, when the web server just appeared, it only supported the browsing of simple HTML files and images, when the web server received a request for a Web page, such as a http://www.ccidnet.com.index.html, the URL (Uniform Resource locator-unified Resource Locator) is used to locate the corresponding client file server and find the corresponding file index.html. Then, the file is downloaded from the host file server and transmitted to the Web browser through the HTTP protocol ). Of course, this is just a basic function. The relationship between web servers and Web browsers is far from that simple. One of the most important extensions of Web applications is the introduction of dynamic content. For example, a Web server can directly or indirectly create a web page based on user input requests, and then return it to a web browser. The earliest way to implement dynamic content application is through CGI (comman Gateway Interface public Gateway Interface ), it has a basic definition of program running on the Web server and transmission of dynamic content between the Web server and the web browser. 2.

Another development of Web applications is the emergence of HTTPS (hypertext transmission protocol, secure Secure Hypertext Transfer Protocol), which ensures the communication security between web servers and web browsers, this makes electronic transactions possible.

Communication between web servers and Web browsers is performed through the HTTP protocol. What is the HTTP protocol? Simply put, HTTP is an application layer protocol between a web browser and a Web server. It is based on the TCP/IP protocol and is a common, stateless, and object-oriented protocol. Its working principle includes four steps:

Connection: the Web browser establishes a connection with the Web server and opens a virtual file called socket. The establishment of this file indicates that the connection is successful.

Request: the Web browser submits a request to the Web server through socket.

Response: After a web browser submits a request, the request is sent to the Web server over HTTP. After the Web server receives the request, it processes the transaction and returns the result to the Web browser over HTTP, so that the requested page is displayed on the web browser.

Close connection: After the response is over, the web browser and the Web server must be disconnected to ensure that other Web browsers can establish a connection with the Web server.

In this way, the processing process of the Web Server includes a complete logical phase: accept connections-generate static or dynamic content and send them back to the browser-close the connection-Accept the next connection, so proceed. As you can imagine, when there are many visitors, the server will inevitably be overwhelmed. Two technologies can be used to solve this problem: multithreading and multi-process. The Web server supports port monitoring modules of UNIX systems (a multi-process mode), multithreading, multi-process, or a mixture of two technologies.

With the connection, how does the Web server provide content to the Web browser? The key here is that the content must be recognized and displayed by the browser. The primary mechanism for determining how to display content is the mime (multiple purpose Internet Mail Extension-multi-purpose Internet Mail Extension) type. Mime will tell the web browser what kind of documents will be sent, and, this type of identification is not limited to simple image documents and HTML documents. For example, there are 370 default MIME types in the mine. type configuration file of Apache webserver, and this is not all of the MIME types. The MIME type is distinguished by the type/subtype syntax related to the file suffix. For example, a file containing the MPEG video content has a suffix of MPEG, mpg, or MPE.

The role of web servers is ultimately reflected in the provision of content, especially dynamic content, which is also the fundamental difference between web servers and application servers, the Web server is mainly responsible for providing dynamically generated HTML documents when interacting with Web browsers (in addition to providing HTML document services, the web server also provides application data such as XML format, that is, the Web server not only provides HTML documents, but also can establish connections with various data sources in a larger scope to provide richer content for Web browsers .)

There are many technologies for implementing dynamic web content. The first is CGI, which dynamically transmits HTML data based on user input requests. CGI is not a development language. It is only a protocol that can implement web servers using programs written for it. Since each request for dynamic content needs to start a new CGI program, it will increase the burden on the Web server, therefore, a major defect of CGI is that it easily affects the speed of web servers.

Microsoft ASP (Active Server Pages-Dynamic Server Pages) technology is composed of VBScript interpreters embedded in IIS. It also supports multiple scripting languages, including JavaScript, perlscript, and VBScript, based on COM, it can easily access software components of other servers.

Like JSP and ASP, PHP is composed of a set of additional code tags placed in HTML documents. The difference is that it is used for developing web pages. Therefore, applications developed using it are more concise than those developed using VBScript or JSP.

Today, all Web servers support the Perl Acceleration Solution. Apache's mod_perl free solution embeds Perl into the Apache server. This not only improves the speed of interpretation of Perl code, but also greatly improves the code execution efficiency due to mod_perl caching. Mod_perl is also closely connected with Apache, so Perl developers can control the work of web servers just like C developers compile underlying Apache API programs.

When the system is running, Web servers often need to support a large number of intensive user clicks and dynamic content requirements. Therefore, even if the high-end server equipment is used to face the increasing number of users, the access volume supported per unit time is also limited, especially when there are many dynamic content, because the application of dynamic content needs to frequently call the database data and applications, it will occupy a large amount of server resources. In this case, the server load needs to be distributed between multiple server devices or between multiple sites.

There are many load balancing methods. The simplest method is to allocate website content between different servers. For example, you can store static html pages on one server and image files on the other, and run all CGI programs on the third server. However, it is obvious that this method is not very efficient because it cannot implement automatic content allocation between hosts. If there is too much content in one aspect, it will still form a load bottleneck.

The basic method of DNS load balancing is to place different copies of the same site on the same physical server. Then, the DNS server can return multiple IP addresses, the DNS server can return multiple IP addresses of the domain name or different IP addresses for the same DNS request. It is difficult to determine which IP address a client corresponds to, so DNS can only provide basic load balancing services. In addition, because DNS requests remain in the cache of the client and other servers, the same client will continue to access the same web server. Therefore, a large number of frequently accessed users may use one IP address, while a small number of other users access another IP address, resulting in uneven distribution. Another problem is that the DNS cache is not continuously activated, which may cause access to other IP addresses of the website to end when a client is using a Web site. This will cause problems for dynamic websites, especially when the client data needs to be accepted and stored.

The Software and Hardware load balancing method is similar to DNS load balancing, but the website only publishes one IP address, A machine is specially set to accept HTTP requests for this IP address and distribute these requests to each server of the website. This distribution usually occurs at the layer of TCP/IP routing, and can transparently map this single source/Target IP address to a specific server. This technology can be implemented through software or hardware. The hardware solution is more efficient and costly. Because its load balancing function can evenly allocate access requests between Web servers, this method is better than the DNS method. In addition, this method can continuously monitor web servers. If a server encounters a fault or a problem, you can dynamically redirect requests to servers with the same functions.

Reverse Proxy reverse proxying is also a simple method. The proxy intercepts client requests, forwards the requests to the Web server, sends the server replies to the client, and puts the content in its own cache. In this way, access to the same content does not need to be directly processed by the server, which can greatly reduce the burden on the server.

Server Load balancer is a way to scale the server performance horizontally. We can also improve the server performance by improving the Web server performance, that is, vertical method. The most obvious way is to increase server resources, including hard disk speed, memory, and CPU processing capabilities. The CPU processing capability is very important for content services, but mainly for dynamic websites, because dynamic websites need to run corresponding programs and consume a large amount of system resources. It is also easy to increase the access speed or memory of a hard disk. The proxy method has been described earlier. One thing to note is that the data on the Web server is different from the data storage structure of the database server or file server, the data structure settings of databases and so on are for the convenience of Content Retrieval, while the directory structure of the web server is for the convenience of user viewing to optimize the Organization, one of the proxy functions is to convert the data structure.

Last, many websites require SSL encryption for information transmission. However, establishing an SSL connection requires a large amount of system resources. Therefore, SSL Acceleration is also required. SSL Acceleration card products developed by many third-party vendors are a good choice. These products are not expensive, and because the SSL keys of web servers usually exist on the card, this prevents illegal intruders from stealing SSL keys from the website.

Due to the characteristics of Internet/Intranet applications, the security of web servers is also a key issue. The security of web servers has two levels. One is the security of data streams to prevent being seen or maliciously modified by third parties; the other is the security of content, that is, only authorized and authenticated users can view and modify information.

As we mentioned earlier, the URL headers of "HTTPS" Use the SSL (now called tranport level security-TLS) algorithm, the basic principle of this algorithm is to establish a secure and encrypted connection between the Web server and the web browser. In this case, SSL protects two types of data: one is the data sent to the Web server, such as the user's name and credit card password, and the other is the secret data retrieved from the web server, for example, the price information transmitted to users on the auction website. Authorization and authentication are also common security technologies used by Web servers. In actual work, the web server sends a message to the Web browser asking the user's name and password and asking the user to fill in, to confirm the correct identity of the user.

Through the previous introduction, we have learned a lot about Web servers. However, we all know that only a web server is insufficient for a web application. It must work with the application server to complete the functions of a web site. What is the difference between a web server and an application server? Simply put, Web servers are used to provide HTML documents and image data to browsers. Applications on Web servers are also used to generate HTML documents and image data, this is different from the application on the Application Server. The application server only contains the business logic of the application and processes the business application, not the database and user interface program.

In most cases, the application server serves as the intermediate layer of the three-tier structure. Generally, in a three-tier structure, the other two layers are the user interface and database/data storage. It should be noted that the above distinction is only functional. With the development of data standard technology, especially the emergence of XML, the boundaries between various data collection protocols and development languages on the Internet have been broken, both the Web server and application server can process the data of the other party and have the function of the other party. In this case, we may encounter difficulties when selecting servers. Are we using Web servers or application servers? Is it possible to solve two problems with one server? In actual application, we should separate the two so that they can focus on their own functions. For example, although the application server can easily provide Web pages, it is difficult to configure all web functions for the application server. Separating the two services can also improve the performance of the two servers and reduce the maintenance complexity. For example, because the Web server needs to frequently and extensively transmit HTML and image data, therefore, they generally require a high I/O speed, and the application server needs to process a large amount of data, so it requires a large CPU processing capability. In addition, separating them helps system stability. Because the performance indicators of the two are different, the requirements for debugging and configuration are different. mixing them together increases the maintenance difficulty.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More