Original address: http://www.cnblogs.com/jiekzou/p/4677994.html Server Division
For sites with large access, it is necessary to split the parts of the site into separate servers. For example, separate the picture from the Web site. In general, there are several types of Web site-wide deployment on the server:
File Server: General storage System related pictures and files, to provide unified file invocation for each subsystem
Proxy Server: generally use Linux+nginx as the reverse proxy
Web server:the most commonly used Web server in. NET Iis,mono uses Nginx in general
Application Server: responsible for the provision of various business logic in the system, such as User Center, Settlement Center, payment center, etc.
Cache server: provide memcached cache service
Database server: responsible for the provision of Web site data, generally for sqlserver,mysql,oracle and other bandwidth calculation
Assuming that the site is subjected to 1 million PV traffic per day, compute bandwidth involves two metrics (peak flow and average page size) with a bandwidth of bps (bit/s).
1, assuming the peak flow is 5 times times the average flow;
2, assume that the average page size of each visit is about 100KB.
1b=8b---------------------1b/s=8b/s (1bps=8bps)
1kb=1024b-------------1kb/s=1024b/s
1MB=1024KB------------1mps=1024kb/s
1 million PV traffic is distributed evenly over a day, approximately 12 times per second, with a page size of byte (byte) and a total page size of 12*100kb=1200kb,1byte=8bit, 1200kb=9600kb,9600kb/1024 about 9Mb /s (9Mbps), our site in peak traffic must maintain normal access, the real bandwidth should be around 9m*5=45mbps. One of the evolutionary processes of website architecture
The company has just started, the volume of business is not large, it is often possible to lease a virtual host space quotient and a database to build a most basic site
The evolution of the website architecture two additional caches
As the volume of business increased, the user's access to more and more, the site frequently open, slow, and even the database link to reach the maximum limit, this time need to do some optimization strategy for the site:
- Reduce HTTP requests, compress css,js, picture size
- Integrating Microsoft Ajax Minifier into VS2010 compile-time compression for JS,CSS
- Increase page caching and increase data cache processing
- Full parsing of cache on Cnblogs
- Self-purchase server for IDC hosting
- Self-purchase server can improve the level of hardware and bandwidth can be freely controlled, is generally exclusive bandwidth, compared to the shared bandwidth can support more traffic
The evolution of the Web site architecture three additions to the Web server
When the number of system visits increased again, Webserver machine pressure at the summit to a higher level, this time began to consider adding a webserver, but adding a webserver means that the two servers to establish the same site, then there will be the following problems:
How do I assign access to these two machines? Nginx
How to maintain the synchronization of state information, such as user session?
The normal scenarios are write to database, open state Server, cookie, write cache, etc.
How do I keep data cache information in sync?
Cache server
How do I upload files with these similar features to continue normal?
Using File server Unified management of the evolution of the Web site architecture four-part library, sub-table, distributed cache
By increasing the Web server to enjoy a quick access to the happiness, found that the system began to slow down, after looking, found that the database write, update some of these operations of the database connection resource competition is very fierce, causing the system to become slow, how to do?
Sub-Library
Sub-table
Memcache,redis Distributed Cache
Horizontal partitioning VS Vertical Partitioning
level
Vertical
Storage dependencies
Can span db
Can span physical machines
Can span table spaces, different physical properties
cannot be stored across DB
Storage mode
Distributed
Centralized type
Scalability
Scale out (scaling out, adding cheap equipment)
Scale up (upgrade device)
Availability of
No single point
Single point exists (DB data itself)
Price
Low
Moderate, even expensive
Application Scenarios
Web 2.0 Architecture Evolution process of five Web gardens or add more webserver
After the work of the sub-Library, the pressure on the database has dropped to a relatively low, this time may be to the next bottleneck, look at the Windows performance counters found a large number of blocking requests, so you can do the Web garden or add some webserver server. In this process of adding webserver servers, there are several issues that may occur:
The soft load of an Nginx server can no longer afford a huge amount of web traffic, and it is possible to solve F5 or applications logically by using hardware load, and then spread to different soft load clusters
Some of the original state information synchronization, file sharing and other scenarios may be bottlenecks, need to be improved, perhaps this time will be based on the situation to write to meet the needs of the Web site Distributed file system, etc.
After doing this, we begin to enter an era of seemingly perfect infinity, and when website traffic increases, the solution is to constantly add webserver. The evolution of Architecture six read-write separation and inexpensive storage solutions
By increasing the Web server to enjoy a quick access to the happiness, found that the system began to slow down, after looking, found that the database write, update some of these operations of the database connection resource competition is very fierce, causing the system to slow down, how to do, read and write separation, subscription and release
Cheap Storage Scheme NoSQL
NoSQL = not-only SQL refers to a non-relational database. With the rise of internet web2.0 website, the traditional relational database in coping with web2.0 website, especially the web2.0 pure dynamic website of ultra-large-scale and high-concurrency SNS type, has been unable to overcome, exposing a lot of difficult problems, and the non-relational database has been developed very rapidly because of its own characteristics.
NoSQL databases are used in a large number of non-transactional systems such as microblogging systems
BigTable
Mongodb
The evolution of http://tech.it168.com/topic/2011/10-1/nosqlapp/index.html architecture into a large-scale distributed application ERA and cheap server group Dream era
After this long and painful process, and finally ushered in the perfect era, and constantly increase webserver can support the increasing traffic, but the original deployment on the Webserver Web application is very large, when more than one team began to change it, Quite inconvenient, reusability is also quite bad, basically each team has done more or less duplication of things, and deployment and maintenance is also quite troublesome, because the huge application package in the N machine copy, start all need to spend a lot of time, the problem is not very good to check, Another worse situation is the likelihood of a bug on an application that causes the whole station to be unavailable, as well as other factors like tuning bad operation (because the application deployed on the machine should be done, it is impossible to do targeted tuning) and so on, according to such analysis, began to make a painful decision, will The system is split according to responsibilities, so a large distributed application is born, usually this step takes quite a long time, because there are many challenges:
1, split into a distributed after the need to provide a high-performance, stable communication framework, and need to support a variety of different communication and remote Call mode;
2, it takes a long time to split a huge application, need to do business collation and system dependency control, etc.
3, how to operate (rely on management, health management, error tracking, tuning, monitoring and alarm, etc.) good this huge distributed application.
After this step, the architecture of almost the system enters a relatively stable phase, but also can start to use a large number of inexpensive machines to support the huge amount of traffic and data, combined with this architecture and the experience of so many evolutionary processes to adopt a variety of other methods to support the increasing volume of traffic. CDN Content Distribution Network
What is a CDN?
The full name of the CDN is the Content Delivery network, which is the contents distribution networks. The goal is to add a new layer of network architecture to the existing Internet, publish the content of the site to the "Edge" of the network closest to the user, so that users can get the content they need, solve the Internet congestion and improve the responsiveness of users to the website. From the technical comprehensive solution due to the network bandwidth is small, user access is large, dot distribution is not equal reason, to solve the user to visit the site of slow response speed of the root cause.
In narrow sense, the content sub-distribution network (CDN) is a new type of network construction, it is a network covering layer which can be specially optimized for releasing rich media in traditional IP network, and the CDN represents a network service model based on quality and order in a broad sense. Simply put, the Content Publishing network (CDN) is a strategic deployment of the overall system, including distributed storage, load balancing, network request redirection and Content Management 4 elements, while content management and global network traffic Management (traffic Management) is the core of the CDN. By judging the user's proximity and server load, the CDN ensures that the content serves the user's requests in an extremely efficient manner. In general, the content service is based on a cache server, also known as the proxy cache (surrogate), which is located at the edge of the network and is only "one hop" away from the user. At the same time, the proxy cache is a transparent image of the content provider's source server, which is typically located in the CDN service provider's Datacenter. Such architectures enable CDN service providers to provide the best possible experience to end users on behalf of their customers, content providers, who cannot tolerate any delay in request response time. According to statistics, the use of CDN technology, can handle the entire Site page 70%~95% content access, reduce the pressure on the server, improve the performance and scalability of the site.
How the CDN works
In describing the implementation principle of CDN, let us first look at the traditional non-cached service access process, in order to understand the way CDN cache access and non-cached access to the difference:
By visible, the process by which a user accesses a site that is not using a CDN cache is:
1), the user to the browser to provide the domain name to access;
2), the browser calls the domain name analytic function library to parse the domain name, in order to obtain this domain name corresponding IP address;
3), the browser uses the resulting IP address, the domain name of the service host to send data access requests;
4) The browser displays the content of the Web page according to the data returned by the domain host.
CDN's popular understanding is the website acceleration, can solve the cross-operator, across the region, the server load capacity is too low, too little bandwidth, such as the opening of the website slow and so on. Lan Homestay, Rui Jiang, blue News
Consistent hash algorithm
In a distributed architecture, the failure of a node is unavoidable, and when a node is added and removed, a large amount of hash data is invalidated and a re-hash is required. This means that the missing data is requested once in the database to be re-hashed to the corresponding server by hash (key)/server number = server number. This can be very significant for high-traffic systems.
People use consistent hash to solve this kind of problem
MORE: C # Implementations of the consistent hash algorithm (Ketamahash)
Reference:
Http://www.cnblogs.com/genson/archive/2009/10/22/1587836.html
Cdn
Reprint: Web Services Architecture