First of all, do not think that read this article can become a big data master, otherwise it will not use the word "cultivation", to cultivate into large data master is not an easy thing, can be said to be very difficult one thing. Or not even the big data source-the United States does not exceed 10 people (perhaps 5, 6) to reach this level, in China ... Forget it, I won't say it.
This article is actually for you to refer to a process unusually difficult, but the future is very bright road. No perseverance, look at the good, never seriously. (To be honest, even if you want to see this article, it's not easy.) )
Anyway。 To become a big data master, first of all, from the concept of a thorough transformation, a thorough understanding of large data thinking, and infiltration into the blood and bone marrow, otherwise it is impossible to become a master. In other words, your worldview must be completely transformed! I know, you must be in heart heart: Is it so serious? )
To achieve this change, we must go through the primary, intermediate and advanced three stages of learning.
How to study in those three stages? Below I will tell you each stage to use the textbook, read these books through, you will achieve the above transformation.
Primary stage: The Big Data Age
Author: [English] Victor Maire-Schoenberg, [English] Kenneth Couqueil
Translation: Shengyangyan, Zhou
Zhejiang Publishing House
Needless to say, it must be the book. After reading this book, you are asked to form the concept of large data, that is to say:
1, is not a lot of data is called large data;
2, the large data is a kind of data analysis way, and the traditional data analysis way has the essential difference;
3, the characteristics of large data is "pay attention to relevance, do not pay attention to cause and effect", which is the core of large data things, must really understand, and firmly remember, otherwise you will be easily fooled by others;
4, the big Data uses is the statistical method;
5, large data is mainly combined with artificial intelligence for automatic data mining machine;
6, large data is mainly used for forecasting. Rather than the general data analysis, but to analyze the history and status quo, the future or rely on people to predict, large data is directly to tell you the future results.
Intermediate stage: Out of control
Author: Kelly (KEVIN KELLY), many people affectionately called him KK
Translation: East-West Library
Nova Publishing house
Why is this book? How many things to remember in the early stages of learning? Yes, using statistical methods rather than causal methods to predict the future. OK, let's take a look at what the book says:
22nd Chapter Prophecy Machine
......
In the analysis of the predictive mechanism, farmer likes to use this example to illustrate: "Come on, then!" He said, throwing a baseball at you. You caught the ball. "Do you know how you caught the ball?" , "he asked. "By forecasting. 」
Farmer believes there's a model in your head about how baseball is flying. You can use Newton's classical mechanics formula F=ma to predict the trajectory of a goofy object, but your brain itself does not store such basic physics formulas. Rather, it establishes a model directly in accordance with empirical data. A baseball player, who watched the bat hit the baseball scene thousands of times, raised his hand in a baseball glove thousands of times, and adjusted his predictions thousands of times using gloved hands. Somehow, his brain gradually developed a model for baseball--a model almost equal to F=ma's, but not as wide as it used to be. This model is based on a series of hand/eye data produced during the past catch process. In the field of logic, such a process is collectively referred to as induction, which is very different from the derivation process of f=ma.
......
The "theory" of a baseball fielder based on experience in aerial fliers, much like the late stages of Ptolemy's planetary model. If we parse the "theory" of the fielder, we find that it is incoherent, impromptu, complex, and approximate. But it can also be developed. This is a disordered theory, but it is not only effective but also can be improved. No one can catch anything if it has to wait until everyone can figure out the F=ma formula (and, if you understand that half of the f=ma is better than nothing). Even if you know the formula now, it's no use. "You can use F=ma to solve a flying baseball problem, but you can't http://www.aliyun.com/zixun/aggregation/7432.html" > Solve problems in real time. "said Farmer.
......
It is almost clear that "living systems"-Lions, stock markets, evolving populations, intelligence-are unpredictable. The chaotic, recursive causal relationship between the various parts of the system makes it difficult to infer the future with the conventional linear extrapolation method. However, the whole system can act as a distributed device, making approximate assumptions about the future.
......
Most of the world's complex systems, including all markets, are non-linear.
......
In reality, the factors that affect the two-dimensional graph trajectory of stock are not several, but thousands of.
......
Just 100 variables can create a huge number of possibilities. Because each variable behavior interacts with the other 99 behaviors, you cannot look at any of the parameters if you do not examine the group as a whole. For example, even a simple climate model with only three variables will return to itself through some strange loop, feeding out some kind of chaos and making any linear prediction impossible.
--excerpted from Runaway
Using F=ma (Formula) to predict, or linear prediction, is through causal reasoning to predict, that is, according to the quality of the ball, acceleration, and so on, to find out why the ball from the place to fly to the reason;
and "Induction" is the meaning of "statistics", or more rough statistics, induction is not to ask the reason, catch this ball is over, tube it is what reason.
You want to be a big data master, you want to use statistical methods to predict something? (I'm here to predict what you're thinking.) Stock! Oh, don't flatter me, I just summed up a lot of people's ideas. )
Well, now you tell me, is there any reason not to read the book properly? (Of course, good intentions like I will certainly remind you: take this book must be held steady, smashing foot is not fun, because it has a brick so thick and heavy)
Advanced Stage: Complexity
Author: [Mei] Nicolas Rechel (NICHOLAS rescher)
Translation: Tong
After learning the intermediate stage, you are exposed to one thing, "complexity", knowing that things are complex to a certain extent, it is impossible to find cause and effect of the method to predict.
So what is complexity, and what is its essence and principle? To be a big data master, you can't know anything about it, because you're going to be dealing with complex or even extremely complicated things all your life.
If you read the "Out of control", perhaps at this time the psychological thinking: NI, "out of Control" this book is hard enough to read (yes, or micro-letter Mister Zhangxiaolong will not say, can read this book can go directly to his company to work. Note that he said "read", not "read", which is also a mid-level, this senior is not to burn the brain? (You predicted right, and this is compared to, "out of control" can only be counted as a leisure book) then I can not read this book? How much does it have to do with what I want to learn? (You don't listen to the teacher, your parents made it?) )
For your disobedient students, the teacher will disclose some content:
Instead of trying to solve problems based on how things must evolve according to the general principles of theory, it is about how things can be solved in the best circumstances that we can determine in the usual circumstances. Rather than seeking the general principles of abstract inevitability, it is better to seek guidance in some empirical spirit, in experience-with all its characteristics contingency and potential incompleteness.
......
A fanatic who is called the Scientific law of Newton's World Order (Newtonian). Their view was the corresponding worldview of Newton, Laplace (Laplace) and Darwin, which regarded the world as an orderly framework of natural law. The principle of Kant's causal relationship (Kantian principle of causality) is the most important part of their thinking, and the world-the natural world and the human world-is regarded as the universe, and everything is orderly, regular, rational and explanatory. It is really seen as a sort of methodical system, similar to a structured garden, arranged orderly and with neat boundaries.
...... Einstein, Planck, Schrodinger (Schrodinger) and his companions destroyed the old order of physics. Cantor (Cantor), Godel, Haiding (Heyting) and others broke the old mathematical order. The theory of quantum mechanics caused the collapse of causality. The theory of evolution now emphasizes not "survival of the fittest", but a completely random platform in which nature chooses to play a role.
......
The universe of opportunity and chaos is not an unruly (anarchy), but rather a complex, revealing the emergence of higher-order laws through its natural workings. When formal logic succumbed to its classical invariance, a new nonclassical, multivalued (or "fuzzy") logic was born and replaced. Certainty (certainties) is also effectively replaced by probability (probabilities) and Fidelity (plausibilities).
......
Given that it is difficult to make rational choices about the course of action in a complex world ... If we are scholars who believe in statistical conclusions, using probabilistic statistics to infer the correctness of behavior, then things become easier to deal with.
--Excerpt from complexity
Well, read or not read, you can do it yourself.
Yes, there is another reason to recommend this book. We all know that large data is the correlation of data, that is to find out the relationship between data. When I went through 15 years of statistical research on the correlation of artificial intelligence data, feeling has been studied almost, but also feel to solve all the problems there is not a small distance, then feel very confused, I do not know where to go next to the study, there is a way to narrow the feeling, there is a climax to the illusion, Until I see the words in this book:
"You can consider the relationship between them, and then consider the relationship in these relationships, so go on." ”
When I see this sentence, it is not enough to describe my feelings at that time, it is thunderstruck. As if I thought the world only own an acre of three points, this sentence such as a piercing the night sky bright Lightning, let me suddenly see the infinite universe, for I pointed out the direction, and opened up a golden road. (Excuse me for using so many a bit of a messy description, whenever I think of this sentence, I can not restrain the excitement of the mood, I think now, the illusion of my peak is how childish and ridiculous ah, ridiculous is not that I did not reach the peak, but in this world there is no peak (quite a word of Buddha-nature, I can't help but remind me of Liu Zu's words: Bodhi has no trees, mirror is not Taiwan, there is nothing, where the dust.
We must not underestimate the words "so down" in this sentence, he pointed out an infinite iteration, that is "relationship between ... Relationship ", and the intelligence will emerge here, the key to solving the complexity prediction is probably here, which opens up a very broad prospect that will be as endless as the universe.
After reading this sentence, the mood slightly calmed down, I immediately wrote it into a function:
X=f (f (a,b), F (c,d))
Then tell yourself: This is what you have to study for the rest of your life!
Now we go on, remember when I first said I wanted to change the world view? After reading these three books, you are ready for a change of worldview, and now it is time to make a final hit and complete this transformation!
In other words, you have to read a book in the Advanced Stage (Nieme, do you want to live?). I said it for you. You've learned the mechanics of complexity, but how complex the world is, you may not have a perceptual understanding. You must feel like you've been through a lot of very complicated things, like you've fired stocks, managed hundreds of thousand people, studied sociological issues, and so on, but it's just pediatrics compared to the world's most complex.
Maybe you've guessed what the book is, yes, quantum theory. Given the fact that this theory is so difficult and complex, Einstein died without understanding it, scientists have not yet figured it out, and we don't have to bother trying to figure it out, but since we're looking at the complexities of things, or we can talk about the nature of things, we have to understand that, So the book I give you is the first level of science. I don't have the heart to scare you anymore.
Advanced Stage (2): The History of quantum physics
Author: Cao Tianyuan (yes, Chinese)
Liaoning Education Press
You must have a question, "This book can make me change the world view?" ”
So let's take a look at what the book says:
The essence of the quantum world is randomness. The strict causal relationship in the traditional concept does not exist in the quantum world, it must be replaced by a statistical explanation, the wave function ψ is a statistic, and its square represents the probability of the particle appearing somewhere. When we say "electrons appear in X", we do not know what the "cause" of this event is, it is a completely random process, there is no causal relationship.
......
Cause and effect must die, because physics needs life!
Stop arguing, God really throws dice! Randomness is the cornerstone of the world, and when electrons appear here, it is a random process, and there is no need for anyone to add to it the intolerable rules. ...... The statistical law of the micro-level of lawlessness to become a macro on the orderly.
--excerpt from The History of quantum physics
"The statistical law of the micro-level of lawlessness has become a macroscopic order", which is in fact to show that statistical methods can make extremely complex random events predictable. I think societies, markets, stocks, and so on have much in common with the quantum world. Since quantum theory actually studies the world in which we exist, it is as a sentence in the book:
"The Nature of the world: it is statistical!" ”
If you look at quantum theory with entrenched materialism (which is absolutely wrong), you may find it difficult to understand it, and it can be very helpful to look at some degree of idealism (not necessarily in full agreement). Therefore, I suggest that you can also learn some Buddhism, such as the clearance Master said, this will help you stand in the idealism of the point of view.
Not the streamers, nor the wind, but the heartbeat.
Buddha said: "Very subtle heartbeat, the universe appears, Vientiane are present before, and I also appeared." Heartbeat, divided into three paragraphs, the first it moves, a move on the change, a transformation into can see phase, there can be seen immediately, see, appear. Can see is the consciousness, the consciousness appears, immediately has seen, namely the material appears, therefore the material is the illusion, you think it is the present phase, the material is the universe, therefore the universe's present before is the present, is not evolved.
--From the "clearance Master said Hua Yan Jing"
Quantum physicists said: "Consciousness" makes everything out of the quantum superposition state, become the real reality. That the emergence of the first conscious creature makes the universe instantaneous from creation to that moment a reality, and that the participation of "consciousness" can change the past at that moment, and that the "Past" even contains the evolutionary history of the conscious organism itself.
--excerpt from The History of quantum physics
Do you feel how complicated quantum theory is? If you're not too sure about the above two paragraphs, I'll give you a lite version:
Buddha said: "Echocardiography, the universe appears, Vientiane present before, but also I appear;"
Quantum physicists say: consciousness makes everything out of the quantum superposition, making the history of the universe instantaneous into reality, which contains the conscious organism itself.
You must still feel that you can't understand these words, it doesn't matter, you just need to understand one thing, Buddha predicted more than 2000 years ago what quantum physicists are going to say.
Well, after reading this book, your worldview does not change, you come to me to invite you to dinner.
In addition, there is a book can be read as reference books (don't scold Me, reference books, not necessarily read), Hofstadter (Douglas, R. Hofstardter) "Godel, Aicher, Bach" (Thicker than bricks, "out of control" is just as thick as bricks).
Introduction of this book: This book is in the English-speaking world has a very high evaluation of popular science. Won the Pulitzer Prize for Literature. Through the comprehensive exposition of Godel's mathematical logic, Aicher's printmaking and Bach's music, it is fascinating to introduce the theory of mathematical logic, computational theory, artificial intelligence, linguistics, genetics, music and painting, which is ingenious in conception, profound in meaning, broad in vision and rich in philosophical charm.
Mathematical logic, computational theory, artificial intelligence, and linguistics are helpful for what you learn later.
Another reason for recommending this book is that the book "Complexity" refers to its content.
In addition to the "Big Data Age", the recommendation of these books there is a common reason, that is the author of these books can be said to be the master of the prediction (Buddha's ability you have already learned), let people really admire pleasantly.
"Out of Control" written in 20 ago, is said to be the only one in the history of more than 20 years of better selling books, because people found that 20 years ago, the book is a piece of things to be achieved, people are strange, KK how he knows.
"Complexity" is also written nearly 20 years ago.
"Godel, Aicher, Bach" was written 30 ago.
Although the history of quantum physics was written in 2008, Quantum theory was born more than 100 years ago.
"The Sutra" was born more than 2000 years ago.
But they are so instructive to today's big data, what can I say?
Nicolas Rechel, author of Complexity, also has a book. Don't worry, you're going to want to read this book, because the title is called "Predicting the future." Unfortunately, this book does not have Chinese version, English good students can go abroad to look, if you can help me also get a copy, I will be grateful!
Now that you have changed the world view, with the idea of becoming a big data master, you can begin to learn concrete methods, that is, to learn artificial intelligence.
You must be thinking, are you going to read a lot of books? In theory: Yes, we need to learn the basics of artificial intelligence, natural language processing, machine learning, statistics, artificial intelligence and so on. However, seeing that you have been abused for so long, I have compassion, will give you a shortcut to learn a book (happy?) )。
Fundamentals of statistical natural language processing
Author: [Mei]christopher D. Manning [de]hinrich Schutze
Yuan Chun FA Li Qingzhong Yun Li Wei Cao Defang
Electronics Publishing House
You may have a question as to why it is "language," which has two reasons:
First, in the computer profession, the data is not only refers to the number, the text, the picture, the sound, the video and so on call the data;
Second, language is much harder than numbers, and if you can handle the language, it's much easier to deal with pure numbers. (See my other article, "What's real Big Data", http://www.36dsj.com/archives/7828)
You may also have a question as to what is the essential difference between statistical natural language processing and natural language processing in general?
To tell you an interesting thing, my nephew, about 4 years old, one to the airport to see the first automatic sidewalk, is similar to the kind of escalator, is just flat, people stand on the automatic forward, he blurted out "flat elevator." Very image, isn't it? The only reason he can make this word is because people have intelligence.
Here is a small question, why would he put the word "ping" in front of the "elevator"? If the general natural language processing of artificial intelligence to solve this problem, will be from the part of speech, grammar, syntax and so on, in secondary school you must have learned what the partial structure, the subject-predicate structure, moving object structure, etc.? Yes, after this analysis and processing, find out the cause and deduce the result, may know that "flat" word should be placed in front;
However, statistical natural language processing of different ways, after statistics, the word "ping" most of the time is placed in front, such as average, equality, platform, flat, peacetime, flat, normal, balance, translation, tablet computer ..., well, then put the word "ping" in front of it. Well, it's that simple.
Of course, it's just an analogy, it's not really that simple.
However, this leads to a problem, we think about how the 4-year-old child is handled, does he know what grammar, syntax, the structure of the positive? Certainly not, so he must be using the method of inductive statistics, many times when adults say similar words are put flat in front, so he put in front, he did not know what is the reason, and then the baseball is a reason, his brain does not have that physical formula.
Therefore, statistical natural language processing is closer to natural natural language processing way (tongue twisters?) , that is, closer to the way we deal with people, or even to the way people or nature, but the way the brain is handled is more complex, but the essence is the same.
There are also different opinions about this, and I present you with different viewpoints, and you can think and judge for yourself. A famous linguist, Chomsky, argues that "children are supposed to have knowledge of the basic grammatical structure of all human languages, and this innate knowledge is often referred to as Universal grammatical theory." "(Excerpt from Baidu Encyclopedia)
Naturally, I can hardly agree with this view. This view has also been opposed by quite a few linguists, "that it is too aggressive to assume that all human languages have a common ' underlying grammar ' before studying all human languages, and that in applying universal grammar to the study of unknown languages, it is necessary to assume many ' blank words ', in the study of the basic grammar ' The language of the principal-guest, such as the Irish Gaelic, has to assume that the ' underlying infrastructure ' of these languages is primarily called the object, a practice that itself may have violated the descriptive principle. There are also linguists (such as Michael Evans and Stephan Levinson) who argue that Universal grammar is based on the hypothesis of ethnocentrism, which can have a bad effect on cognitive science. ”
(Excerpt from Baidu Encyclopedia)
Read this book, does not mean you become a big data master, in fact it is just for you to lay the basic knowledge, the real master of data methods, need you on this basis to explore or to realize.
So far, master has brought in the door, self-cultivation in personal, and luck!
I wrote this article in fact there are two reasons, one is to want to become a big data master of the people to refer to a path;
Another reason is to make a counterattack against people who oppose big data, suspect big data, and use small data as big data. Note that I have absolutely no objection to small data here, large data is not omnipotent, traditional data analysis methods, sampling data analysis is still very useful, or at least for quite a long time, I am only opposed to the fact that many people use traditional data analysis methods as large data to confuse the public. If this retort is condensed into one sentence, it is:
The essence of this world is unity! Millions! Of!