Big data and high concurrency solution Rollup 1.3 massive data solution 1. Use cache: Use: 1, use the program to save directly in memory. The main use of map, especially Concurrenthashmap. 2, use the caching framework. Common frame: Ehcache,memcache,redis, etc. The key question is: When to create the cache, and its invalidation mechanism. Buffering for empty data: It is best to save with a specific type value to distinguish between empty data and two states that are not cached. 2. Database optimization: 1, table structure optimization. 2,sql statement optimization, syntax optimization and processing logic optimization. Can record the execution time of each statement, targeted analysis. 3, Partition 4, table 5, index optimization 6, use stored procedure instead of direct Operation 3. Separate active data such as users, can be divided into active users and inactive users. 4. Bulk Read and deferred modification high concurrency situations can combine multiple query requests into one. High concurrency and frequent modifications can be staged in the cache. 5. Read and write separation, the database server configuration multiple, configuration master-slave database. Write with the master database, read from the database. 6. The distributed database stores different tables in different databases and then onto different servers. Some complex problems, such as: Transaction processing, multi-table query. 7.NOSQL and Hadoop nosql,not only SQL. No relational database so many restrictions, more flexible and efficient. Hadoop, the data in one table is layered multiple blocks, and saved to multiple nodes (distributed). Each piece of data has multiple nodes saved (clusters). The cluster can process the same data in parallel and guarantee the integrity of the data. 1.4 High concurrency solution. 1. Application and static resource separation. Place static resources (JS,CSS, pictures, etc.) in a dedicated server. 2. Page caching can save a lot of CPU resources by caching the pages that are generated by the app. For parts of a page that often transforms data, it can be handled using AJAX. 3. Clusters and distributed clusters, multiple servers have the same function, mainly from the role of shunt. Distributed, different services are placed on different servers, and processing a request may require multiple servers, thereby increasing the processing speed of a request. Also divided into static resource cluster and application cluster. The latter is more complex, often to consider problems such as session synchronization. 4. The server directly accessed by the reverse proxy client is not a server that provides services directly, it obtains resources from other servers, and then returns the results to the user. Proxy server and reverse proxy server: The proxy server is for our visit to get the resources and then return the results. For example, access a proxy server on the extranet. ReverseProxy server is when we normally access a server, the server itself called the other server. Proxy Server We actively use, is to serve us, do not need to have their own domain name; reverse proxy is the server itself, we do not know, have their own domain name. 5,cdn CDN is a special kind of clustered paged server, compared with the common cluster of multiple paged server, the main difference is: its storage location and allocation request in different ways. CDN servers are distributed across the country, and requests are assigned to the most appropriate CDN server node to obtain the data after receiving the request. Each of its CDN nodes is a page cache server. Allocation method: Not ordinary load balancing, but a dedicated CDN domain name resolution server in the resolution of the domain name is allocated, the general cooking is: The ISP used a CNAME to resolve the domain name to a specific domain name, and then the resolution to the domain with a dedicated CDN server resolution (back to the browser, Access) to the corresponding CDN node. Each node may also have multiple servers clustered. You can see that the business logic that handles high concurrency is:
- Front End: Asynchronous request + resource static +CDN
- Back-end: Request Queue + poll distribution + Load Balancer + shared cache
- Data tier: Redis cache + Data Sub-table + Write queue
- Storage: RAID Array + hot standby
- Network: DNS polling +ddos attack protection
The entire evolution of the site architecture revolves around big data and high concurrency. The solution is to use both cache and multi-resource types. Multi-resource: Multi-storage, multi-CPU, multi-network. One request can be processed by a single resource, or multiple. Before using the complex framework, we must optimize the business of the project, the foundation of the foundation, the most important!
Summary of big Data and high concurrency solutions in PHP