deified massive data processing and high concurrency processing

Source: Internet
Author: User

http://blog.csdn.net/hawksoft/article/details/7192207

Any simple problem, as long as the size of the big will become a problem, such as China's population, many small problems will become a big problem. But the way to deal with this huge amount of data is to divide and conquer the "crowd" tactics. The premise of using human sea tactics is that the partitioning of the problem can support this human sea tactics, which means nothing more than cutting (portrait, landscape) and load balancing. Vertical separation is mainly by business (function), that is, the so-called service-oriented architecture, horizontal separation method is more, mainly rely on the processing of the object properties, such as the time attribute or specific Business data attribute division (such as the train ticket of the train (the operation of each train is basically independent)) Load balancing can be a mirror (deployment) distribution (the same number of features deployed) and a compute distribution (a problem with several sub-problems running on different machines and then merging the results). Of course, these means can be integrated, and finally can be made into multi-pipeline distributed computing model. On the other hand, in the sea data before, the general method of data processing will be very difficult, efficient methods are basically business-specific and data-specific.

1) The basic idea of mass data processing: Divide and conquer (this kind of thought is everywhere in the daily life, the Ant knows, once the shipment is not finished, the multiple shipment)
2) The basic means of mass data processing: cutting and load balancing (cutting is to reduce the scale, load balance is human sea tactics, people are more powerful, the same, the machine is also strong computational capacity)
3) The reliability of mass data processing: Save a few more (the good machine will also be bad, the eggs do not put in a basket)
4) The highest level of mass data processing: Multi-pipeline parallel operations (many factories do this, and use the computer is not a problem)
5) The best method of mass data processing: No best, only suitable (what all want to do well, basically equal to do anything bad)

....

As for high concurrency, the best solution is to use specific methods for specific requirements, including locking, queuing, and so on. Another key is to simplify transactions and reduce transactions as much as possible.

There is this awareness, as long as to think, always solve, there is no need to make these technologies very God, technically speaking, the massive data processing of the ideas and algorithms are not difficult.

PS: These days many people despise the railway online ticketing system, there are many people for their ideas, I feel no need, really, these thoughts and technology is not very difficult, at least I can think of, do online ticket so brothers and sisters can also think, as to why this result, they are just "be" no technology. Railway is the place to speak politics, why should the emperor not hurry the eunuch?

Data partitioning supplement: If Divided by time, 2 cases, sub-database (many early enterprise-level business systems, especially the financial system), sub-table (this is generally only for specific business tables). Time Division requires attention to the problem of a single business spanning time (a lot of software is billing the data into a new time period by closing the account).

2012-1-11: Supplementary data division, according to specific attributes, the most used is divided by the data attribution, such as the original set of books, the current cloud under the multi-tenancy user ID (Enterprise User ID), this way can be at three levels (table level, database (Oracle sub-user) level, Physical level (multi-database instance) Implementation, note the point cache, the use of load balancing, can be infinitely extended. This model based on the existing database, the reliability guarantee can only be implemented by the database itself, although the software can also be used to achieve the same data multi-local storage, but more complex. In addition, the use of database links can also achieve vertical sub-library storage, and the application is transparent, but this way of maintenance is more troublesome, many times it is not necessary. (Oralce and SQL Server can be, and different libraries can join, it seems very convenient, but not recommended, the business is closely related to put together, between different libraries or do not use link join, directly in memory reference also faster)

Above all said, waiting for two days to have time, I put my architecture demo out, of course, the official version can not be put (also not), that is the company's copyright.

Add two graphs:

Only through the configuration file in the data access scheduling layer and the database access layer to do dynamic processing, you can achieve a part of the data Center database storage and data access across the data center function.

deified massive data processing and high concurrency processing

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.