Site from the beginning of the construction of very few people, the number of users, low concurrency, to the next has tens of millions of users, tens of thousands of levels of high concurrency, between the experience of how the process, small Web site architecture is how to evolve, this article briefly discusses the content of this aspect, the main reference "large Web site architecture design", The knowledge points of this book are still relatively comprehensive.
Source: http://www.cnblogs.com/pflee/p/4507579.html
1. Initial stage
Site start is not too much traffic, just a server is more than enough, applications, databases, static resources, etc. are all on a server, generally use LAMP/LNMP (linux+apache/nginx+mysql+php/ Python, etc.) will be able to implement their own website.
The specific schema is as follows:
2. Separation of application services from data Services
With the development of website business, the increase of user access, the growth of storage data, the single server can not meet the demand, need to separate application services and data services.
As shown in the following examples:
Due to the different services provided, each server has different requirements for hardware resources, as follows:
Table of resource requirements for different services
Server type processing business resource requirements
Application server handles all business logic faster, more CPU
File servers store more disk space for users to upload files or services themselves that require more file resources
database servers do data caching and data retrieval for larger memory and faster disks
3. Caching
With the increasing number of users, the database pressure is too large, resulting in access delays, affecting the user experience, and site performance optimization is the highest priority is the cache ;
The 28 laws followed by the site's access characteristics : 80% of the business access is concentrated on 20% of the data;
Web site using the cache can be divided into Application Server local cache and remote distributed cache, the remote distributed cache can generally be deployed in a clustered way, the server memory is high requirements.
As shown in the following examples:
4. Application Server cluster deployment
With the increase of traffic, the single application server has been unable to cope with more and more requests, and the single server hardware resource is stronger, and it will not meet the load pressure at the peak of business.
Web site to solve high concurrency, massive data problems the most commonly used means or use clusters, to do horizontal expansion, the cluster can be well to meet the scalability;
Load Balancer Server implementation can have a lot of scenarios, LVS,NGINX,F5, and so on, can and HA software, such as heartbeat and keepalived, etc. together with;
The application server cluster deployment, using the Load Balancer Scheduler, can distribute the user's request to any machine in multiple application server clusters, and depending on the amount of user access, it is easy to add and remove servers, each server load is within acceptable range.
As shown in the following examples:
5. Database read/write separation
For the cache in the dead and cache expired data, still need to read from the database, and all write operations also need to access the database, the database pressure will increase as the traffic increases;
Can adopt the master-slave hot-standby scheme, realize the read-write separation, such as the master-slave mode of MySQL, when the reading operation is higher, but also can adopt a master many from the way to achieve;
The data access module in the application needs to ensure that the read/write separation of the database is transparent to the application.
As shown in the following examples:
6. Accelerating your website with CDN and reverse proxy
China's network environment is complex, users in different regions visit the same website, the speed difference is large, and the site access delay and user churn rate is positively correlated;
The main speed of website access, reduce the load pressure on the backend server is the use of CDN and reverse proxy;
The rationale for both CDN and reverse proxies is caching :
CDN deployed in the network provider's room, caches some hot-spot static resources of the website, the user requests the website service, from the distance own nearest network provides the opportunity room to obtain the data, such as the video, the picture and so on;
The reverse proxy is deployed in the central room of the website, belonging to the site's front-end architecture, when the user requests to reach the central room, the first access to the reverse proxy server, if the user requested resources cached (static), the direct return;
Reverse proxy more mature open source software: Squid, Varnish, recommended to use Varnish, from the stability, access speed, the number of concurrent connections, Varnish are more powerful.
As shown in the following examples:
7. Distributed file system and distributed database system
With the increase of business volume, the most commonly used database splitting is by Business Sub-Library , the different business databases are deployed on different servers;
The general distributed database is the last means of the Web database splitting, only used when the scale of the single table is very large.
As shown in the following examples:
8. Using NoSQL and search engines
Full-Text search has become an integral part of large web sites, such as Lucene,SOLR, etc.
NoSQL storage is more convenient for unformatted data, NoSQL is more suitable for big data calculations, and the more popular NoSQL databases are HBase, MongoDB, CouchDB, Redis, Cassandra, etc.
Different NoSQL databases use different storage methods, such as Redis,memcache, such as using Key/value key-value pairs of storage, mongodb,couchdb and so on by the document storage, a record of all the data are stored in the document, HBase, Cassandra and so on are column storage.
As shown in the following examples:
9. Split by Business
After the development and expansion of the website, often contains a variety of complex business scenarios, the use of divide-and-conquer means to divide the entire site business into different product lines, the site into a number of different applications, each application independent deployment maintenance, applications can be linked through hyperlinks, Message Queuing and so on.
As shown in the following examples:
10. Distributed Services
On the basis of business split above, some of the public services are extracted and deployed independently, such as user management, commodity management, reusable Business Connection database, providing public service, and application system only need to manage user interface.
As shown in the following examples:
分布式主要还是为了解决高并发问题,但也引入了一些其他问题:
1.
服务调用必须通过网络,可能会对性能造成较大影响;
2.
服务器越多,故障概率越大,一台服务器宕机可能会导致连锁反应(滚雪球效应),导致很多应用不可访问,网站可用性降低,设计时应尽量避免;
3.
数据在分布式环境保持数据一致性也比较困难,分布式事务难以保证,这对网站业务正确性和业务流程可能造成影响;
4.
导致网站依赖错综复杂,开发管理维护困难;
11. Brief summary
驱动网站技术发展的主要力量永远是网站业务的发展;
网站都是逐渐演化而来的,根据需要灵活应对才是最重要的;
技术是为了业务而服务的,永远不要为了技术而技术;
[To] architectural evolution of the website architecture