Website Architecture Evolution

Source: Internet
Author: User
Tags failover website performance

Http://www.ha97.com/5095.html

When it comes to large sites, you have to say the features of large websites: high concurrency, large traffic, high availability, massive data, and more. Let's talk about the evolution of the architecture of large web sites.

1. Initial phase of the site architecture


The initial phase is relatively simple, usually a server can take a website, look at the picture:

2. Application Service and Data service separation


With the development of the website business, a server gradually can not meet the demand, this time need to separate applications and data,

3. Improve site performance with caching


There is no doubt that today's sites will basically use the cache, that is, 80% of business access will be concentrated on 20% of the data.

4. Improve the concurrency of your website with Application server clusters


Because a single application server can handle a limited number of requests, the application server becomes the bottleneck for the entire site during peak site visits. Therefore, the use of Load balancer processor potential is inevitable. The Load Balancer Dispatch server distributes access requests from the browser to any server in the application's cluster.

5. Database read/write separation


When the user reaches a certain scale, the database becomes the bottleneck of the website because the load pressure is too high. At present, the main database provides master-slave hot-standby function, by configuring two database master-slave relationship, you can synchronize data updates of one database to another server. The website uses database This function realizes the database reads and writes separates, thus improves the database load pressure.

6. Use reverse proxy and CDN Plus Web site accordingly


Improve website access speed, the main means have to use CDN and reverse proxy.


The basic principle of CDN and reverse proxy is cache, the difference is that the CDN is deployed in the network provider's room, and the reverse proxy is deployed in the central room of the website, when the user requests to reach the central room, the first access to the reverse proxy, if the reverse proxy cache the resources requested by the user, then directly returned to the user.

7. Using Distributed file systems and distributed database systems


Any powerful single server can not meet the continued growth of the business needs of large sites.


The last means of database splitting in a distributed database is used only when the scale of single table data is very large. In the last resort, the more commonly used database splitting method is business splitting, which deploys different business data on different physical servers.

8. Using NoSQL and search engines


The search engine has also basically formed the features that large Web sites must provide, and Web sites need to employ some non-relational database technologies such as NOSQL and non-database query technologies such as search engines.

9. Business splitting


In order to cope with increasingly complex business scenarios, large Web sites split the site into different product lines by using divide-and-conquer methods.


Specific to the technical, will also be based on the product line charges, a site is split into many different applications, each application independent deployment maintenance. Applications can be managed through hyperlinks, or distributed through Message Queuing, or, most of all, by accessing the same data storage system to form an associated complete system.

10. Distributed Services


Since each application system needs to perform many of the same business operations, such as user management, session management, these common services can be extracted and deployed independently.

Each pattern describes a problem that recurs around us and the core of the solution to the problem. This way, you can use the program again and again without having to do repetitive work.


The so-called site architecture model is to solve the large-scale web site with high concurrent access, massive data, high reliability, such as a series of problems and challenges. Therefore, many solutions have been put forward in practice to achieve the high performance, high reliability, scalability, scalability, security and other technical architecture goals of the website.

1. Layering


Word segmentation is the most common architecture in enterprise applications. The priest divides the system into several parts in the horizontal dimension, each part is responsible for relatively simple and relatively single duties, and then forms a complete system through the upper layer dependency and scheduling of the underlying.


In the hierarchical architecture of the website, the common 3 layer is the application layer, the service layer and the data layer. The application layer is responsible for the presentation of the business and the view; The service layer provides service support for the application layer; The database provides data storage access services such as databases, caches, files, search engines, and so on.


Layered architecture is logical, on the physical deployment, the three-tier architecture can be deployed on the same physical machine, but with the development of the website business, it is necessary to separate the layered module deployment, that is, the three-tier structure deployed on different servers, is the site has more computing resources to respond to more and more user access.


So although the initial goal of the layered architecture is to plan the clear logical structure of the software for ease of development and maintenance, in the course of the development of the website, the layered structure is critical to the development of the Web site to support high concurrency to the distributed direction.

2. Separating


If layering is slicing the software horizontally, the separation is to slice the software vertically.


The larger the website, the more complex the function, the more kinds of services and data processing, separating these different functions and services, and packaging them into high-cohesion and low-coupling modules, not only helps the development and maintenance of the software, but also facilitates the distributed deployment of different modules, and improves the concurrent processing ability and function expansion ability of the website.


Large Web sites can be separated by a small granularity. For example, in the application layer, different businesses are separated, such as shopping, forums, search, ads separated into different applications, the opposing team is responsible for the deployment on different servers.

3. Distributed


For large web sites, one of the main purposes of tiering and partitioning is to facilitate distributed deployment of the segmented modules, to deploy different modules on different servers, and to work collaboratively through remote calls. Distributed means that more computers can be used to complete the same work, the more computers, more CPU, memory, storage resources, the more can be processed by the amount of concurrent access and data, in turn, to provide more users with services.


In the Web application, there are several common distributed schemes.


Distributed Applications and services: Distributed deployment of tiered and separated application and service modules can improve site performance and concurrency, speed development and release, and reduce database connection resource consumption.


Distributed static resources: static resources such as JS, CSS, logo images and other resources distributed deployment, and the use of independent domain names, that is, people often say the separation of static and dynamic. A distributed deployment of static resources can reduce the load pressure on the application server and speed up the browser's concurrent loading by using a separate domain name.


Distributed data and storage: large Web sites need to process massive amounts of data in P, and a single computer cannot provide such a large amount of storage space, and these databases require distributed storage.


Distributed computing: The current Web site uses the Hadoop and MapReduce distributed computing framework for this batch calculation, which is characterized by mobile computing rather than moving the data, distributing the computational program to the location of the data to speed up computation and distributed computing.

4. Cluster


For a module in a user access set, a standalone deployed server needs to be clustered, that is, multiple servers deploy the same application to form a cluster, which is shared externally by a load balancer device.


A server cluster can provide more concurrent support for the same service, so when more users access it, only the new machine is added to the cluster, and when one of the servers fails, a load-balanced failover mechanism can be used to transfer the request to other servers in the cluster. Therefore, the availability of the system can be improved.

5. Caching


The purpose of caching is to lighten the server's calculations and return the data directly to the user. In today's software design, the cache is everywhere. The specific implementation has CDN, reverse proxy, local cache, distributed cache, and so on.


There are two conditions for using the cache: Access to the data hotspot is unbalanced, that is, some frequently accessed data needs to be placed in the cache, the data is valid for a certain period of time, but soon expires, whether the data expires due to the dirty read, affecting the correctness of the data.

6. Asynchronous


With async, messaging between businesses is not a synchronous call, but instead divides a business operation into multiple phases, each of which is executed asynchronously through the method of sharing data.


The implementation can be handled in a single server through multi-threaded shared memory, which can be implemented asynchronously through distributed Message Queuing in a distributed system.


The typical asynchronous architecture is the producer-consumer approach, where there is no direct invocation.

7. Redundancy


A Web site needs to run continuously for 7x24 hours, so there must be a redundancy mechanism to prevent access when a machine goes down, and redundancy can be highly available by deploying at least two servers to form a cluster. Databases require hot and cold backups in addition to regular backups. Disaster preparedness data centers can even be deployed on a global scale.

8. Automation


There are automated release processes, automated code management, automated testing, automated security detection, automated deployment, automated monitoring, automated alarms, automated failover, automated failure recovery, and more.

9. Security


The website has many modes in the security architecture aspect: through the password and the mobile phone verification code carries on the authentication, the login, the transaction needs to encrypt the network communication, in order to prevent the bot misuse resources, needs to use the verification code to identify, to the common XSS attack, the SQL injection needs the code conversion, the garbage information needs the filtering and so on.

The so-called architecture, a popular saying is "the highest level of planning, difficult to change the decision", these plans and decisions laid the direction of the future development of things and the final blueprint.


The software architecture is "an abstract description of the overall architecture and components of the software, which is used to guide the design of large software systems in all aspects". In general, software architectures need to focus on the 5 architectural elements of performance, availability, scalability, scalability, and security.

1. Performance


Performance is an important aspect of Web site architecture design, and any software architecture design must consider possible performance issues. And because performance issues are almost ubiquitous, there are a number of ways to optimize website performance:


Browser-side: can be through browser caching, page compression transmission, reasonable layout of pages, reduce the transmission of cookies, and even use the CDN acceleration function.


Application Server-side: can use the server local cache and distributed cache, can also be asynchronous operation to speed up the response, in the case of high concurrent requests, multiple application servers can be composed of a cluster of common external services, improve the overall processing capacity, improve performance.


Database server-side: Using indexes, caching, SQL performance optimization and other means, you can also use NoSQL database to optimize the data model, storage structure and so on.


Measuring website performance has a series of indicators, important response time, TPS, performance counters, etc., through these indicators to determine whether the system design to achieve the goal.

2. Availability


Availability is the time to provide service uninterrupted. Almost all sites promise 7x24 hours, but in fact no site can achieve full 7x24, there will always be some downtime, deduction of these downtime, is the site's available time. Some large sites can do 4 more than 9 availability, which is 99.99%.


The main means of Web site high availability is redundancy, application deployment on multiple servers to provide services at the same time, data storage on multiple servers to back up each other, no one server will affect the overall application can, the common implementation means that multiple servers through the load balancing device to form a cluster.


Measuring whether a system architecture is designed to meet a high-availability goal is to assume that the overall system is still available if any one or more servers in the system are down, and when there are various unforeseen problems.

3. Flexibility


Large Web sites need to face a large number of users of high concurrent access and storage of massive amounts of data, the site through a cluster of multiple servers to form a whole together to provide services. The so-called scalability refers to the continued integration of the server into the cluster to alleviate the continuous public access pressure and the growing demand for data storage.


The primary criterion for scaling the architecture is whether multiple servers can be used to build the cluster and whether it is easy to add new servers to the cluster. Whether you can provide a service that is not differentiated from the original server after joining the new server. Whether there is a limit on the total number of servers that can be accommodated in the cluster.

4. Extensibility


Unlike other architectural elements that focus on non-functional requirements, the extensibility architecture of the site directly focuses on the functional requirements of the site. The rapid development of the website, the expansion of functions, how to design the architecture of the site so that it can quickly respond to changes in demand, the site is the main goal of the extensible architecture.


The main criterion to measure the extensibility of the website architecture is whether there is no effect on the transparency of the existing products when the new business products are added to the website, and the coupling between different products is very little.


The main means of Web site extensible architecture are event-driven architecture and distributed services.


Event drivers typically take advantage of Message Queuing implementations, which separate the message production and processing logic in this way.


The server service is the separation of business and reusable services, called through the Distributed service framework. New additions can be made by invoking reusable services to implement their own business logic without any impact on existing products.

5. Security


The internet is developed, and anyone can access the site from anywhere. The security architecture of the website is to protect the website from malicious access and attack, and to protect the important data of the website from being stolen.


The criteria for measuring the security architecture of a Web site are the existence and potential of various attacks and spy, and whether there is a credible response strategy.

Website Architecture Evolution

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.