Analysis on the technical principle of large Web site system architecture

Source: Internet
Author: User

Chatting with friends, I found a lot of people are very interested in large-scale web system architecture, I am also very interested, often in the home of 2 laptops and 1 servers in the LAN environment to do some experiments. I entered the IT industry time, about 97, 98, then the PC client software is the most popular, do software development is a very decent and very like work. I started from the VC1.5 on Win3.1 to VC6.0, and then to. NET development, which is basically engaged in client software development. My character is the crisis consciousness has always been serious, so deeply the internet will prevail, the traditional software will go to the decline, and then turned to the web development. Remember when I used to apply for a portal site, the interviewer asked me: What do you think is the difference between client development and Internet development? My answer was: Internet development is much simpler than client software development, and I no longer have to consider so many user environment factors, a bit of deployment, when and where are available.

Many years later, I think of my answer again, still think that answer is correct. On the product development level, Internet development is indeed much simpler. Here first clarifies a concept, I say the Internet development does not refer to all B/s applications, such as B/s Bank internal business system. What I mean by Internet applications is the use of the Internet to serve the public. Enterprise-Class business system, it is characterized by the business logic is relatively complex, but the user is generally not too large; Internet applications are, on the other hand, simple, but the face of massive users.

Since the business logic of Internet application development is not complex, why are so many technicians involved in large-scale websites? The main reason for this complexity is that the operating environment is too complex:

1. Openness

The service of the website is public, anyone can come to visit, so it will face a lot of bad users directly, the security of system data is faced with great risk, once the system is hacked, the result will be disastrous.

2. Large number of visits

A large number of visits means that the site must be able to withstand high concurrency and large traffic test, if the site's service capability and robustness, and so on, your system will be washed away.

3. User Experience

User experience is better, in addition to the product design factors, requires access to the site faster, with high availability, do not hang on for a while.

How to deploy the various subsystems of the website, how to improve the robustness and high availability of the system, how to achieve the security of the website, how to improve the access speed, how to load balance, and even what the hardware equipment, in addition, the different times of the website development may adopt different architectures, how to achieve a smooth transition of the architecture, How to make the current architecture resilient, with the ability to expand, this is a large web site must solve the problem, but also the small site growth process will encounter sooner or later. The article that follows me will gradually unfold on this topic.

The website organization includes the software architecture and the system architecture of the website, the software architecture mainly refers to the dividing structure of subsystem and logical layer, and the system architecture, which is the system deployment structure.

System Architect's knowledge system is relatively complex, so-called well-informed, most of them by the operation and maintenance engineers grew up, they are not strong development capabilities, coding is not much, but the ability to do a very strong, scripting very skilled, often do all kinds of experiments, closely follow the latest technology of new products related information. Of course, a large web site requires a team of architects, each of which is responsible for the architectural design of the domain, such as security architecture, storage architecture, and so on.

I think the general developers are still very difficult to get on this road, this job requires experience, need to continue to practice, but if the developer once embarked on this road, there will be a great development, mainly due to the developer's thinking habits and the depth of technology. My series of articles, developers can be used as a reference, such as how to develop a distributed deployment system, and many other friends are a number of jobs, from development to implementation, to the deployment of all. I personally feel limited energy, so I specifically found a few friends from the Unix/linux system and Windows systems to explore different perspectives to benefit the friends are groping, interested friends can also participate.

In fact, this part of the content I have been writing, such as the PHP Deep Exploration series, wrote a lot about the Apache content, I have roughly read the Apache code, very time, slow progress, but I think this helps us understand the configuration and tuning Apache.

Before introducing the site architecture, let's start by introducing some of the most basic and common concepts in the Web site architecture to better understand the following techniques for load balancing and distributed storage. The first one is to talk about CDN first.

1. What is CDN?

CDN (Content Delivery network), which is the contents of the publishing network or content distribution network, its main purpose: through the existing Internet to add a new layer of network architecture, the content of the site to the closest users to the edge of the network, so that users can get the required content nearby , so as to improve the response speed of users to visit the website, improve the user experience, while decentralized access pressure, the original user centralized access scattered around. Website content providers (such as Sina, Sohu, NetEase and so on) use CDN, can be in the macro layer to solve a part of the big traffic, massive user concurrency and other headaches.

2, the composition of the CDN

Content Publishing Network (CDN) is a strategic deployment of the overall system, including distributed storage, load balancing, network request redirection and Content Management 4 elements, and content management and global network traffic management is the core of the CDN. By judging the user's proximity and server load, the CDN ensures that the content is serviced in a highly efficient manner for the user's request, and that the service required by the user is only "one hop" away from the user.

Our usual content publishing model is to put the site data in one place, and then respond to access from around the world, most of us consider the software deployment architecture, rarely consider the network hardware architecture. In contrast, the CDN emphasizes the importance of the network in content publishing. By introducing the active Content management layer and global load balancing, the CDN is fundamentally different from the traditional content publishing model.

Content providers are responsible for content publishing services that they should not do well.

3, the Internet service industry chain

Throughout the value chain of broadband services, content providers and users are located at both ends of the entire value chain, relying on network service providers to connect them together. With the maturity of the Internet industry and the transformation of business model, the role of this value chain is more and more subdivided, there are content operators, hosting service providers, backbone network service providers, access service providers and so on. Each role in this value chain has to be a division of labor to provide customers with good service, resulting in a multi-win situation. From the combination of content and network mode, the release of content has gone through the content of ICP (application) server and IDC two phases. The boom in IDC also spawned the role of managed service providers. However, IDC does not address the issue of effective publishing of content. Content located in the center of the network does not solve the backbone bandwidth consumption and establish the traffic order on the IP network. Therefore, pushing content to the edge of the network, providing users with near edge services, so as to ensure the quality of services and access to the entire network is an obvious choice, this is the CDN service model. The establishment of CDN solves the dilemma of "centralization and decentralization" of content operators, which is undoubtedly valuable for building a good Internet value chain, and is also an indispensable and optimal website accelerator service.

4. CDN Service Provider

ChinaCache is China's largest CDN service provider, is not the only unknown also. To become a CDN service provider, I am afraid to settle telecommunications, Netcom, CTT and other operators, this need to what kind of ability and background is unknown. Its service node in the world has more than 130, of which more than 80 domestic nodes, covering the country's main 6 major networks (so-called 6-line room, that is,) the main provinces, such as the major portals, such as Sina, NetEase and so on are leased their services. So, whether you're in the south, or the north, or North America, visiting these portals feels fast, and one of the main reasons is that the CDN has worked. The general small site is not able to afford this service, so slow down the bar, you can rent the interconnection of the 6-wire room, if the network is wide enough, users can also endure. If you want to continue to improve the user experience, you need to do some site mirroring, deployed in a number of representative cities, such as South China can be deployed in Guangzhou, East China can be deployed in Shanghai, north China can be deployed in Beijing, but the process of content mirroring, you need to deploy and maintain. There are also sites that use content segmentation, such as building a sub-station for all locations, different business situations, and different strategies that may be deployed. CDN can be considered as a strategy of basic network construction.

Some of the principles and concepts of CDN, as well as the way to provide CDN Basic network services, are described earlier. CDN seems to be very suitable for static content, such as Html,js,image, and with the deployment of CDN, the user can access the content of the website only by jumping. What about dynamic content? I answer:

Dynamic content can be divided into three categories according to the existence pattern.

First Category: Content for a long time do not need to change, this kind of content is generally through the web static technology, to achieve dynamic content conversion to static content, so as to achieve CDN deployment, the typical content of the site, such as Sina, Sohu, NetEase and other content publishing system CMS, the content of the deletion and modification of the management work is quasi-real-time

The second Category: Content may change in a short time, but eventually it will be stable. such as forums, blogs and other applications, such services provide content in accordance with a certain time interval, to achieve batch static, of course, there are real-time static, like a hodgepodge of mop, NetEase community is the use of such strategies.

Category three: Content changes in real time and is very personal. such as mailbox applications, the content provided by such services is not static, can only be optimized by implementing the means of sub-regional deployment and load balancing.

For the vendor that provides CDN service, the CDN of static content is naturally no problem, and for the third kind of service, it can only be optimized from the communication link layer.

For many sites pseudo-static, some out of SEO considerations, and some for security reasons, the means is basically rewrite Url. It is only a form of external expression, and HTML static is two different things, it is still a dynamic content.

1. Classification of Load Balancing

Load balancing technology is widely used in the process of website operation, and the technology is very mature. Load balancing technology is divided into soft balance and hard balance according to hardware and software form. Soft balance is based on the balance of software technology, hard balance is based on the balance of hardware technology;

According to the network Protocol division is divided into four-layer equilibrium and seven-layer equilibrium. Four-layer equalization is based on the OSI Network layer of data equalization, seven-layer equalization is based on the OSI application layer of data equalization.

Various equalization methods are used in large web sites, and in most cases are combinations of multiple equalization methods.

2. DNS Polling equalization

This way, is more independent of a way, not in the above division, but the use of a wide range, generally used in the forefront of the site. You can do an experiment and run the Nslook command on the DOS command line. For example: Nslookup www.163.com, you will see the command gives a bunch of parsed IP addresses. These addresses are the multiple A records that www.163.com this domain name binding. Our access request from the browser http://www.163.com/, then you enter the domain name first need to go through the DNS server to resolve, the DNS server parsing process is in accordance with the order of a records, assigned IP address. The DNS polling method to achieve equalization is to use this principle, under a domain name to bind n IP address, access requests are balanced to different devices. The DNS polling method provides an IP address that is often the address of a cluster in a large web site, possibly a balanced switch or a balanced server. For small sites, there is no problem hooking up multiple servers.

The benefits of DNS polling equalization:

1, 0 Cost: Only in the DNS server binding a few A records, domain name registrars are generally provided;

2, deployment is simple: The network topology for device amplification, and then add records on the DNS server.

Disadvantages of DNS Polling equalization:

1, traffic distribution uneven: The DNS parsing process is a lot of, and is a layer of caching mechanism, although your DNS server is updated, but the client, and other DNS servers on the network will not be updated in real-time, so traffic is difficult to guarantee 100% average. At present, the DNS server provides a variety of means to adjust the DNS polling allocation policy, but does not guarantee a perfect balance.

2, Health check: DNS server in a record address of a server down, the DNS server is not known, will still assign access to this server. So people or tools need real-time detection, after a machine down, the backup machine into the production line, if you want to remove an address from a record address, this notification process will take several hours or more to spread to all clients.

The way DNS polling is pushed to the front of the service is still very efficient, and it maps access users to different service clusters in the most primitive way. For large Web sites, the IP address of the external service is unlikely to change frequently, and once the backend cluster is down, it can quickly push up the redundant cluster. In addition, it is generally a CDN deployment, services are split into various parts, so in the operation of the process will not have too much impact.

3. OSI seven-layer model

We'll talk about the seven-layer equilibrium next. To understand the principle of 47-layer equalization, first recall the network seven-layer model (OSI) that was studied in the university textbooks.

OSI is an open access System Interconnect Reference Model, which is a well defined protocol specification. The OSI model has 7 layers of structure, each of which can have several sub-tiers.

The OSI seven-layer model is a good theoretical model, but it has been cropped in practical applications. Especially the prevalence of TCP/IP, the 7-storey structure was pressed into 4 layers,

So many people have criticized the OSI seven layer model as overly complex, but as a complete and comprehensive network model, it is well recognized by everyone. The OSI layer 7 from top to bottom are application layer, presentation layer, session layer, Transport layer, network layer, data link layer and physical layer respectively.

functional description of the OSI layer 7:

(1) Application layer: An application that communicates with other computers, which is the communication service of the corresponding application. For example, a word processor without a communication function would not be able to execute the code of communication, and the programmer working on the word processing did not care about the 7th layer of the OSI. However, if you add an option to transfer files, the programmer of the word processor will need to implement the 7th layer of the OSI. Example: Telnet,http,ftp,www,nfs,smtp and so on.

(2) Presentation layer: The primary function of this layer is to define the data format and encryption. For example, FTP allows you to choose to transfer in either binary or asii format. If binary is selected, the sender and receiver do not change the contents of the file. If you choose the asii format, the sender will send the text from the sender's character set to the standard asii after the data is sent. The receiver converts the standard asii to the character set of the receiver computer. Example: Encryption, Asii, and so on.

(3) Session Layer: He defines how to start, control, and end a session, including control and management of multiple two-way hours, so that the application can be notified when only part of a continuous message is completed, so that the data that the presentation layer sees is continuous, and in some cases, data represents the presentation layer if all the data is received by the presentation layer Example: Rpc,sql and so on.

(4) Transport layer: the function of this layer includes whether to choose the Error recovery protocol or error-Free recovery protocol, and to reuse the input of the data streams of different applications on the same host, and also to reorder the packets that are received in the wrong order. Example: Tcp,udp,spx.

(5) Network layer: This layer defines the end-to-end packet transfer, which defines the logical address that identifies all the nodes, and defines the way in which the routing is implemented and how it will be learned. To accommodate transmission media with a maximum transmission unit length of less than the packet length, the network layer also defines a segmentation method that decomposes a packet into smaller packets. Example: Ip,ipx and so on.

(6) Data Link layer: He defines how data is transferred on a single link. These protocols are related to the medium of the song being discussed. Example: Atm,fddi and so on.

(7) Physical layer: The OSI Physical layer specification is a characteristic standard for transmission media, and these specifications are often referenced by standards developed by other organizations. Connection head, needle, needle use, current, current, encoding and light modulation are all the contents of various physical layer specifications. The physical layer often uses multiple specifications to complete the definition of all the details.

Analysis on the technical principle of large Web site system architecture

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.