0x01. Large Web site Evolution
To put it simply, distributed is to improve efficiency by shortening the execution time of a single task, while clustering increases efficiency by increasing the number of tasks executed per unit of time.
Clusters are divided into: Highly available clusters (high availability Cluster), load Balancing clusters (load Balance Cluster,nginx), Scientific Computing Clusters (HI performance Computing Cluster).
Distributed refers to the distribution of different services in different places, and the cluster refers to a few servers together to achieve the same business. Each node in the distribution can be a cluster. Clusters are not necessarily distributed.
Previously saw a blog about the evolution of large Web sites on the Internet. Http://www.cnblogs.com/leefreeman/p/3993449.html
Each large site will have different architectural patterns, and the schema content is in the processing of balanced load, cache, database, file system, etc., only in different environments, under different conditions, the architecture of the model is not the same, the purpose is to improve the performance of the site.
The initial schema is only applications, databases, and file services.
To the later, distributed services, cluster setup.
0x02. About balanced Load scenarios
In the previous article, "Nginx reverse proxy to achieve balanced load" discussed the Nginx reality balanced load scheme, here choose another haproxy+keepalived dual-machine high-availability balanced load scheme.
Haproxy is a free, fast and reliable solution for providing high availability, load balancing, and proxy services for TCP and HTTP-based applications, especially for high-load Web sites that require a durable connection or a 7-tier processing mechanism.
Whether haproxy or keepalived or even upstream servers increase productivity and increase availability, that is, a service that HAPROXY,KEEPALIVED,HTTPD servers in any of the following architectures can run normally.
Advantages of Haproxy:
1, Haproxy is support virtual host, can work in 4, 7 layer (support multi-network segment);
2, can add some of the shortcomings of Nginx such as the session of the maintenance, cookie guidance and other work;
3, support the URL detection backend server;
4, itself is just a load balancer software, simply from the efficiency of the haproxy more than Nginx has a better load balancing speed, in concurrent processing is better than nginx;
5, Haproxy can load balance MySQL read, the back end of the MySQL node detection and load balancing;
0x03. About the Redis cache scenario
The cache is divided into server caches and application caches.
Regarding in-app caches, the module has been processed within the Jue background framework.
About server caching, the main cache server files, reduce server and PHP interaction, reduce the load balancing server and application server interaction.
There is a typical memcached in the cache, which is now used by the Redis lightweight caching scheme.
About Memcached and Redis, see this article "Memcached vs Redis?"
Redis primarily stores data in various formats: lists, arrays, collections, and sorted sets, which can accept multiple commands at a time, block read and write, wait until another process writes data to the cache.
An article about the Reids caching scheme. High-availability, open-source Redis cache cluster Solution
0x04. About the search engine Sphinx scheme
(The first period does not do, the late demand time consideration)
Sphinx is developed by the Russians, claiming to be very hanging, tens data retrieval, 10mb/s per second, over the environment.
Sphinx and MySQL are database-based full-text engines, and creating indexes is a B + tree and hash key-value way.
The principle is similar to retrieving MySQL with the bottom C, Then make a sphinx.conf configuration file, index and search are based on this file, for full-text retrieval, first of all, configure the sphinx.conf, Tell Sphinx which fields need to be indexed, which fields need to be Where,orderby, Used in the GroupBy.
Sphinx Chinese
0x05. About NoSQL Fast storage scenarios
The use value of NoSQL here is to deal with trivial things, such as some CSS values for a user's personal site, Height,width,color, and so on, with a small and varied data, using nosql designed to speed up the database and reduce the select request for MySQL.
There are a lot of options for NoSQL, so choose a simple mongdb.
0x06. About the distributed MySQL scenario
(Do distributed MySQL has not tried, the initial is not clear the pressure required by MySQL, so the first phase is not intended to do distributed MySQL)
5 open source compatibility programs outside of the standard MySQL database
0x07. Distributed cluster Scenarios
Comprehensive, basically is the following model, a preliminary discussion of the distributed architecture, many modules will adjust the situation, constantly updated, to be continued ...
This article from summer grass, reproduced please indicate the source: http://homeway.me/2014/12/10/think-about-distributed-clusters/
A distributed server cluster architecture scheme