Since the concept of large data has emerged, it has spread like a virus, so that the concept of not understanding this seems to be embarrassed to admit that they are engaged in IT personnel, involved in the field of major software companies are more and more. I think I have been engaged in data processing work, for a long time, the work of the process has been: the operators to provide their own interface (such as FTP), from the interface to obtain various types of files (such as CSV format, XML format, including even binary files), Parse the file and load the required information into the database; After the storage, some data will be for the time granularity or spatial granularity of the aggregate processing; then, for me, the wood has then, there will be another group of people for the development of the upper interface, according to the original data in the database or aggregated data, Create a cool interface to display this information and make it available to customers.
The reason to promote large data-related technologies, because the amount of data to be processed is increasing, and there will be a trend to continue to increase, and due to the real-time data acquisition, the efficiency of the processing needs to have certain requirements (the general data is generated every hour, if one processing in an hour can not be completed, The result is conceivable). For these reasons, large data-related technologies such as Hadoop are introduced. However, this is the only difference, simply by changing the data processing part from the original program to the "High-tech" way, the only difference is to get the aggregated results directly and import them into the database. After processing, the relevant data will complete the historical mission, and in a period of time after the "Ashes."
This is the big data in the legend. Deep doubt, perhaps at best, is the data volume is larger data.
As a matter of fact, after work, studied the legendary "Big Data", found that it is not like this, it is a new mode of thinking, even with the size of the data itself does not have a dime relationship. At the same time, also deeply felt that a long time in such a comfortable working environment, they do have some out. Maybe it's time to recharge and fully upgrade yourself.
However, for many years of habit, there is always a tendency to counter authority, like to understand in their own way, rather than all the theories of the complete copy, and, always like to their own understanding in a completely alternative way to express, so the following text. In this statement, this work is purely entertaining, if some of these parts of you have so little help, can become your intangible cultural heritage, that is also considered merit one; if you feel that there is no real value, then do not waste your precious time, direct disregard can be. Of course, because it is a beginner, there will be some deviations or deficiencies in understanding, so you are welcome to be able to conduct academic discussion in a "businesslike" manner.
As the volume of business increases, the amount of data that needs to be processed is increasing, and of course, the processing capacity of the corresponding server is improving. For now, if the data doesn't reach tens of thousands of records, the time taken to process it is largely negligible; for processing tens of thousands of records, the time consumed can be achieved in seconds (of course, this is not very large), for the 100,000 order of magnitude, as long as the processing process is reasonable, There is also the possibility of fighting in minutes, with millions of orders of magnitude, as long as the process is reasonable enough, the server is sufficient to power, and the time consumed is within tolerable range; if it is tens of millions of orders of magnitude, the maximum is the need for patience, of course, The premise is the same as the processing process is reasonable enough and the server enough to force; If it is tolerable, the order of magnitude will continue to increase until it is impossible to handle.
Perhaps some people would argue that the legendary "Big Data" is what the traditional technology cannot achieve, for that amount of data exceeds a certain number. and is it actually?
What is the amount of data that can be called "Big data"? It seems a pseudo proposition, like "How many hairs are not considered bald", is a question without a definitive answer. Of course, we can think that the hair less than 100 of the count is bald, but there are 101 hair, 102 of it? How much does this number fit? That's a bit of a struggle, but more importantly, even if there is a number that can be a watershed between Baldy and non-Baldy, who has the leisure to count how many hairs a person has, even if that person doesn't have a lot of hair. The same is true for large data: First, there is no exact value or order of magnitude that can be used as a dividing line for large data, and secondly, for a huge number of people, who will be free of a record, a record number.
In fact, "big data" is not an exact name, it has a certain deceptive nature. The so-called "big data" does not lie in its large amount of data, but in its data-"all". By analyzing all the data to find the appropriate rules to predict the future, this is the main idea of large data.
Victor Meyer Schoenberg, in his work, "Big Data Age", summed up the traditional mode of thinking, the big data era needs to make three changes, it can be said that the big data thinking mode of three ideas. Here is not ready to be unconventional, or along this line of thought. However, we still need to look critically at this point of view, neither as the comments of some people on the internet to criticize them as worthless, nor to accept it all without hesitation. In the view of Marxist philosophy, only the use of refine, Quweicunzhen way, take its essence, discard its dross, can really understand its essence, so as to achieve inheritance and carry forward.