Automatic big data mining is the true significance of big data.

Source: Internet
Author: User

Http://www.cognoschina.net/club/thread-66425-1-1.html for reference only

 

"Automatic Big Data Mining" is the true significance of big data.



Nowadays, big data cannot work very well. Almost everyone is talking about big data. But what is big data? I'm afraid not many people know it. There are too many people mixed up.

Big data does not mean a lot of data.

Therefore, if a lot of data is not stored, it is engaged in big data. Because "Big Data" is just short for short, the whole point should be "Big Data Mining ", the big data that has not been mined is only crude oil that has not been mined, and it is useless at all.

Big data does not mean data mining in the general sense.

Many people used to do data analysis or data mining. When the book "Big Data age" was published and big data started to get angry, they turned into big data experts. If this is the case, there is no need to mention big data, because it already exists, just to change the argument. It seems that we don't have to say "drinking H2O" instead of "Drinking Water" today. Well, that's the concept of playing.

"Big Data Mining" is not completely complete yet. The complete point should be "automatic Big Data Mining ".

Previous data analysis or mining refers to the analysis by people through data, and mining some regularity for future use.

However, in the face of big data, because not only the data volume is too large, but there are often many data dimensions, it is impossible for people to process such massive data, or even how to deal with it, at this time, the computer must be used for automatic processing to mine the rules in the data.

However, at present, computers cannot carry out rigorous and complex logic thinking as people do. Therefore, they cannot use our human thinking models to analyze data, people may be able to analyze the rules as long as there is a small amount of data. If there is more data, there is no way, so we humans use sampling analysis.

Computers are just the opposite. They cannot analyze rules based on a small amount of data, but they have the advantage that the computing speed is very fast. Therefore, it is possible to find out the rules after processing massive data.

Because the computer cannot carry out complex logic thinking, its processing method is very simple, that is, simple statistical operations, that is, "hard computing ", count what results will be produced in what situations, and when similar situations appear again, it will tell us that some results may occur.

Here, we can also see another feature of big data, that is, big data is mainly used to predict and tell you what results will happen in the future. Instead of analyzing the past trend and status quo, we should judge the future by people.
Why is this simple method effective? This is back to the word "Big Data", that is, because the data volume is very large, and the statistical results are often correct.

Everyone must know this example. If you throw a coin to calculate the probability of positive and negative display, if you only throw the coin for 10 times, it may appear nine times on the front, so as to draw a conclusion that it is definitely wrong; however, if you throw 0.1 million times, 1 million times, or even more times, the statistical results are basically correct, and the probability of positive and negative display is 50% each.

Yes, automatic Big Data Mining is based on this principle.

There is no rigorous cause-and-effect analysis here, instead of using data analysis to determine the cause and then export the result. Instead, you can know through statistics that such a situation usually produces such a result, that is, the correlation between phenomena and results. Therefore, big data has a notable feature. It only cares about correlation and does not care about cause and effect. In more general words, it means "only know the result, not the cause ".

This is actually a new data analysis and Mining Method Based on the advantages of computers. It is completely different from the traditional method, therefore, traditional experts engaged in data analysis or mining cannot be called big data.

However, you must be careful that you may encounter such an expert as a famous professor from a famous university. When you enter the bookstore, you will also see a lot of books about big data. The cover of the book has a big word "Big Data, however, they are all talking about traditional and manual data analysis methods, but they do not touch big data at all. Of course, this book is not included in the big data age.

In addition, artificial intelligence, such as Neural Networks and deep learning, is basically not big data, because there are still many human factors, including modeling and program training, here, we still need to be very familiar with the analyzed business logic. At present, this method cannot achieve practical results. Big data only allows computers to perform statistics on a large amount of data based on simple but clever algorithms to find out patterns that even humans cannot think. Big Data is basically unrelated to the business logic. People do not need to know what the business is, such as analyzing data in the mobile Internet industry, he does not need to know the ins and outs of the industry and the current situation. He only needs to make statistics on a large amount of historical data to find out the future trend.

Speaking of this, you must be curious to ask, so you cannot find a real big data developer?

Let's start with a little story:

In 1980s, there were two computer geeks working on translation systems at IBM. At that time, the brick house was exploring the internal connection between languages, including grammar and syntax. The two nerws are different in numbers. They make all the documents corresponding to the various languages they can find into data, and others criticize "This computer's brute force is not scientific ", later, they were moved away by a hedge fund boss. These two geeks are co-CEO of Fuxing technology, and the boss is Jim Simons.

Renaissance technology co-CEO has an annual revenue of about $0.1 billion, which is a little higher than the annual incomes of CEOs of Wall Street's major banks. The key is that the two are almost unknown. Their boss, James Simons, is a famous mathematician. He wrote a theorem with Chen, who is a colleague with Yang zhenning. He earned over $1 billion in annual income and is now retired as a charity. Tsinghua has the Chern-Simons building, which is paid for by Yang zhenning la Simons.

In the financial investment field, hedge funds that focus only on relevance and do not focus on cause and effect are doing well (Fuxing technology, DE Shaw ), however, companies with profound financial theories and poor big data analysis capabilities do not have similar performance. MIT financial expert Luo Wenquan admitted that he does not understand what Renaissance technology is doing.
Hey, let's talk about you. Don't keep an eye on people with an annual income of $0.1 billion.

The key here is that many people criticize "This kind of computer brute force is not scientific" (these people must be brick-and-mortar; otherwise, it is estimated that they are not qualified for criticism), and financial scientists do not understand what they are doing.

What does this mean? It shows that there are very few people in developed countries in favor of this method, and there are fewer people who know how to use this method. You can imagine how many people in China know how to do this method.

In China, if anyone is using this non-mainstream wild path to do things, let alone evaluation experts and professors, or even hundreds of millions of yuan in income, you are probably not at a high probability of starvation.

Anyway, I know a guy. Since 2000, just like the two nerds in the United States, semantic relevance analysis has been carried out using this "unscientific and brute-force hard computing method, what I did is just the same as the translation systems that the two geeks are engaged in. They are all about languages. It can be said that he has made a breakthrough in this regard. However, he did not read the results documents he wrote, but did not read them by doctors or experts. He is now doing a common IT job in a small company, barely maintaining food and clothing. He once could not find a proper job for a long time and almost went to wash dishes and work as a security guard.

Someone may ask, is it reliable to engage in numbers in terms of language? With the idea of big data, you don't need to worry about the reason. The two Fuxing tech geeks have already told you the result.

If you have to know the reason, you can also tell us:

In fact, the language is much more complicated than the number. Let's take a simple example: 1 and 2. Computers naturally know their relationship, which is big, which is small, and how big it is; but how do computers know the relationship between "people" and "big" and how do they know? The traditional method is to perform a lot of manual annotations (specialized parts of speech tagging ). It is too difficult for a computer to know the meaning and relevance of a text through data mining, or even to build a basic dictionary on its own, it's incredible, but that guy did.

That is to say, in terms of big data, compared with numbers, language processing is definitely not one or two orders of magnitude different. Therefore, it is easy to do language-based data. Big Data, no matter what type of data you have, is used to identify relevance. Therefore, there is no big difference between words and numbers.

Some time ago, when someone raised the issue of industry trend analysis, he said that he had come up with an algorithm in just one hour, and he could get the result by bringing a large amount of data, but no one in China can trust him.

Okay, I'm sorry. But now you know what the real "Big Data" is. First, remember that big data is used for prediction, that is, to directly tell you the future results. In addition, remember the seven words "automatic Big Data Mining ", no one will be able to fool you.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.