Causal inference and large data

Source: Internet
Author: User
Keywords Fellows big Data karma this

One of the big data is that large data is longer than analysis, not causation. But this may be a pseudo proposition. How to infer causal relationship from related relationship is the real problem of big data. The problem, known as causal inference (causal inference), is the foundation of Apple's iphone 6 speech recognition and Google's unmanned car technology. Daniel of this field, academician of the American Academy of Engineering, Uda Pearl (Judea http://www.aliyun.com/zixun/aggregation/16865.html ">pearl", is generally translated into Judea in the country). Pearl) was thus awarded the 2011 Turing Award. The Fellowship of the Pearl presented a probabilistic and causal inference algorithm, which radically changed the direction of AI initially based on rules and logic.

The idea of

, the top design height of the Turing problem, changed my understanding of large data. With the depth of the thoughts of the Fellowship of the Pearl, it happened accidentally. The San Francisco station of "The Masters of the United States" arranges an academician in the afternoon of September 3. It was just a courtesy meeting, please give him a brief introduction of the research results. But the academician obviously misunderstood, thought it was a professional exchange, and prepared 64 pages of math handouts. When he heard that the audience from the media, legal, economic and other liberal arts background, can not help but dumbfounded. He said: "Sorry, I do not know you ...". Change the handout is too late, had to bite the bullet, casting pearls Before swine. Behold, two hours later, the rise of academician Harper, early forget what we are learning, unrestrained mathematical thought gushing out, graph theory, probability theory, non-linear mathematical formula like kangaroo, more than 10 steps more than 10 steps to jump, such as the Yellow River water, one by one and not collected. Time has come, the organizers repeatedly prompted the invalid, and said one hours.


the brother beside me, I was shaking awake, as if still in a dream, almost have been sitting. I propped myself up with a cup of coffee, barely listening. After that, but accidentally listened to the fans, and finally heard the ecstasy. Because I found that what the academician was talking about was the question I thought about on the big data.


In recent years, I have been puzzled by the notion of correlation and causation in introducing big data. Although the introduction of large U.S. data theory, such as Barabasi Academician's argument, but this doubt has not been eliminated. The correlation relation corresponds to the experience induction, the causal relation corresponds to the rational deduction. But is the big data only inductive, no deduction, or ask, how can large data achieve the transformation between induction and deduction? In this thinking bottleneck, I was dissected by the fellow of Pearl.


After the academician went, everyone looked at each other, asking each other, this 3.5 hours, irrigation is what dongdong. In exchange for learning experiences, a math expert said he felt that the fellow was using a non-linear method to solve linear problems. Statistics in the past can not deal with causality, only to deal with the relationship, the contribution of the fellowship is to introduce causal relationship to statistical probability analysis, unstructured things semi-structured. Charlie, a professional translator, is a large data division of Tencent, with a professional research direction and the same field as Pearl. He uses the "Xi ' an model can be used in Chengdu" as a metaphor, from a professional point of view to explain it again. I was regarded as a liberal arts representative, in the lack of psychological preparation of the situation, pushed to the stage to exchange experience. It was not until the Charles of the nonlinear physics that was studied that he confirmed what he had heard and thought, and it was the same as the prize in the Academy.


I came up to say that the core of Turing's problem is the relationship between man and nature (machine), artificial intelligence is to achieve the unity of both. This question corresponds to today's topic, which is qualitative (unstructured) and quantitative, inductive and deductive, the relationship between sensibility and rationality--the relationship and causality--how to unify the problem. In the words of the academician of Pearl, it is the question of thinking from Babylon to the thinking of Athens (the causal revolution from associations to counterfactuals from Babylon to Athens). The current problem with large data development is that it deviates from the trajectory of Turing's problem and becomes the world of rational computing, represented by Google's mathematical algorithm, and ignores the Facebook algorithm (based on the perceptual algorithm of human associations). The latter, in statistics, is related to relational data analysis. The latter were also dissatisfied, and the criticism was that "do not always think of the data, first simulate the reality by the model" (to the effect), meaning to structure the unstructured qualitative problem.


Charlie had previously said that the academician was proposing Hume's question. I say, the thought of the Fellowship of the Fellows to raise and solve the problem reminds me of Kant, and I think what he said today is the mathematical version of Critique of Pure Reason, and the method of thinking reminds me of Newton and Leibniz. When I went back home to consult professional materials, I found someone who was so appraised of the question consciousness of the Academician of pearl: "Someone mentioned the Hume question in Philosophy (history): Is it possible for man to get a causal law from limited experience?" This is indeed a problem, which finally prompted Kant, the German philosopher, to reconcile the British Experience Faction (Hume) and the mainland rational faction (Leibniz-Wolff) and wrote the "Critique of Pure Reason". "It seems alike."


The original problem of Kant's critique of Pure reason is the relationship between experience and rationality, which is equivalent to the relationship between correlation and causality in large data. I say that Kant's idea of solving the same problem was like the fellow of Pearl. Kant set up a concept called "schema" as the intermediate frame (frame) of communication experience and rationality. "Schema" is characterized by both the specificity of experience and the universality of rationality, but it is different from experience and not equal to rationality. The "schema" of the academician is the causal diagram (causal Diagram), which is his structural theory. This structure is not completely rational, but can be flexibly adjusted. I say that the only difference between the structure of the Fellowship of the Pearl and Kant's schema is that the former sets replaceable component modules for temporary adjustments based on conditions, so it is not a mechanical structure, but a living, loosely coupled structure (for example, as Charlie says, Xi ' An's "universal truth" model, as long as the replacement of some of the "concrete practices" in Chengdu) Sub modules, can be used in Chengdu).


in method, the member of the Pearl, in Plato's famous cave fable, illustrates the mapping between cause and effect (truth), structure (human) and correlation (shadow). I say this is more like Newton's and Leibniz's methodology: To use rationality as the limit, to experience the number of columns, and to set up a structured function in the middle (the equivalent of a person in a cave). Experience (related) can be infinitely close to reason (cause and effect), never reach cause and effect (limit value), but can be regarded as equal to cause and effect. The uniqueness of the academician is simply to transform the function (schematic) into a functional one, and realize the transformation from structural to unstructured and linear to non-linear. Therefore, in the structural model, a large number of complex mathematical expansion has become the focus of his theory. His model is called "graph model" or "Bayesian network" (Bayesian receptacle), which describes the joint distribution of variables or the mechanism of data generation. When the audience sleeps, he is talking about the specific content of this part. With regard to his theory of causal structure, I have privately argued in my lectures that this manifold calculus (calculus on Manifolds), called "topological geometry on the rubber membrane", is also available.


at present, people discuss large data, there is a bad tendency, in the structure has not yet played a good foundation, one-sided pursuit of so-called unstructured data. In this way, the "old thinking Data", which was criticized by the academician, is equivalent to the solution of differential, the function is not listed, so we want to find the extremum directly from the sequence. In China, this is particularly serious. This will make the big data a Zen that is off the table. In business, do not rule out the pragmatic use of large data, to find the fur of the sale of contact, but more suitable for stall vendors, after all, do not know why, do not, do not long.


But this is not surprising, the whole statistics and probability theory, still at this level, mostly on the "correlation" theory, and the "causal relationship" theory is very rare. Karl Pearson a clear objection to statistical research on causation. The fundamental problem of puzzling statistics (Simpson paradox, Yule-simpson's Paradox) is also a fundamental problem that plagues big data.


, in the whole logic, inductive theory can only express the relationship between things, and can not point out the real causal relationship. This is a problem that has been human. The Babylonians had mastered the application of the Pythagorean theorem 1000 years before Pythagoras, and had already begun astronomical observation, but the Athenians had distilled the speculative theory of astronomy from experience. We're still just the Babylonians in the big numbers.


, causal inference is too overkill. If the relationship is completely structured, there are also problems, which will exclude the space of human free will. It seems that fellow Bohr has not yet thought about the Godel paradox. As Chiong comments Bohr's fellow: "Unknowable and knowable, must be you have me, I have you." They want to solve the problem. A group of certainties and uncertainties. In a moment, the method is realized; ”


think of the academician also really not easy, graduated from the "Pheasant" university, ahead of human decades for large data foundation, but few people understand him. His son Daniel Pearl, a foreign correspondent for the Wall Street Journal, was caught by a terrorist group in Pakistan after 9-11 and beheaded after a few days. The academician did not answer the question and went away, and said to accompany his wife. Because the news came in the morning that the second American journalist had been beheaded by the terrorist group like his son, and his wife must have remembered her son again.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.