In fact, any simple problem, as long as the scale is large, will become a problem, just as China has a large population and many minor problems will become a major problem. However, the method to deal with such massive data is nothing more than divide governance and "Sea of people" tactics. The premise for the use of human-sea tactics is that the problem can be divided to support such human-sea tactics, the means is nothing more than cutting (vertical, horizontal) and load balancing. Vertical separation is mainly divided by business (function), that is, service-oriented architecture. There are many horizontal separation methods, mainly relying on the processed object attributes, for example, the time attribute or specific business data attribute Division (for example, the number of trains for a railway ticket (each trip operation is basically independent); the load balancing can be an image (deployment) distribution (several copies of the same function deployment) and computing distribution (one problem is divided into several sub-problems run on different machines, and then the results are merged ). Of course, these methods can be used in a comprehensive manner, and eventually they can be used as a multi-pipeline distributed computing mode. On the other hand, in the face of data in the sea, general data processing methods will be very difficult, and efficient methods are basically targeted for business and data.
1) Basic Idea of massive data processing: Divide governance (this idea is everywhere in daily life. Ant financial knows that one operation is not enough, and multiple operations are performed)
2) basic means of massive data processing: cutting and load balancing (cutting is to reduce the scale, load balancing is a sea of people tactics, a large number of people, the same, the machine is also a strong computing ability)
3) Reliability Assurance for massive data processing: save a few more portions (good machines will also be bad, and eggs should not be placed in one basket)
4) the highest level of massive data processing: Multi-pipeline parallel jobs (many factories do this and it's okay to use it on computers)
5) The best way to process massive data: no best, only suitable (if you want to do everything well, it is basically equivalent to doing nothing well)
....
As for high concurrency processing, the best solution is to adopt specific methods for specific needs. The basic methods include locking and queuing. Another key is to simplify and reduce transactions as much as possible.
With this kind of consciousness, as long as you think about it, you can always solve it. There is no need to worry about these technologies. Technically, the ideas and algorithms involved in mass data processing are not very difficult.
PS: Many people despise the railway online ticketing system these days, and many people are developing their own ideas. I don't think it is necessary. Really, these ideas and technologies are not very difficult, at least I can think that the brothers and sisters who make online ticket sales can also think of it. As to why is this result, they just "get" and don't have the technology. Railway is the place where politics is spoken. Why is the emperor not in a hurry?
Data Division supplement: in two cases, data is divided into databases (in the early days, many enterprise-level business systems, especially financial systems ), table sharding (this is generally only for specific business tables ). When dividing by time, pay attention to the problem of cross-time delivery of a single business (many software transfers this data to the new time period by closing the account ).
: The supplementary data is divided by specific attributes. The most commonly used data is divided by data attribution. For example, the original set of books, multiple lease user IDs under cloud computing (enterprise user IDs ), this method can be implemented at three levels (Table level, database (Oracle sub-user level), physical level (multi-database instance). If you note the cache, you can use Server Load balancer, it can be expanded infinitely. Based on the existing database model, the reliability guarantee can only be achieved by the database itself. Although software can also be used to store the same data in multiple places, it is complicated. In addition, database links can be used to store vertical sub-databases and are transparent to applications. However, this method is difficult to maintain and is not necessary in many cases. (Both oralce and sqlserver can be used, and the join operation can be performed between different databases. It looks very convenient, but it is not recommended that the closely related services should be put together, do not use join on links between different libraries. It is faster to directly refer to the memory)
As mentioned above, after two days, I will release my architecture demo. Of course, the official version cannot be released (and not yet), which is also the company's copyright.
Add two figures:
By dynamically processing the configuration file on the Data Access scheduling layer and database access layer, you can store some databases in the data center and access data across data centers.