Text/cloud computing with big data Things (micro-credit public account Ccdcnewtrend)
Big data has become a fashion word, this thinking, there is no logic, no system, fragments, the purpose is to ask questions. thinking. No "master" in mind, starting from the phenomenon, secretly thought that the word "master" is only suitable for ghosts and spirits, not suitable for people. This thinking hopes to cause the collision of ideas, all kinds of views, whether it is encouragement, criticism, or even attacks, as long as the heart, can promote thinking.
Combined with everyone's criticisms and suggestions, some focus on the issue of some thinking after the replenishment, welcomed the road warrior communication.
One of the big data considerations
The data of any website is a very small subset of people's Internet behavior data, no matter how comprehensive the subset, how deep the analysis is, is a subset, not a complete collection. For enterprises, the competitor's data value far more than their own site data value, from the scale, for all companies are the same, their own data is far less than the complete data. The full data looks exactly like incomplete data.
Add: Some friends are wrong to judge that "the value of the competitor's data is far greater than the value of their site data". I humbly accept that the enemy is very important, the practical significance is "the survival of the enterprise is not the key to their own, but the competitors how to do their own things must be done, in this context, Competitor's data value far exceeds the value of its own website data "
Big Data Thinking bis
A large increase in data volume can result in inaccurate results, and mixed information from different sources will increase the confusion of the data. The study found that the huge data sets and the measurement of fine particles lead to an increase in the risk of "error detection". The argument that "hypotheses, tests, and validation of scientific methods are outdated" is the confusion and bewilderment of the big data age, and people simply embrace what Kelly called Chaos.
Add: The view that Schoenberg in the big Data age is widely accepted: Big data "without precision only mixed, no causal only relevant" is wrong. The hybrid needs to be sorted out to be reasonable to analyze the value, whether it is Newton, Einstein, or Weber's ideal type is in the hybrid search for analytical methods, many times is not found before causal understanding, causal and process understanding is the core of the study.
Three thoughts on Big data
The basic characteristics of Internet users, consumer behavior, Internet behavior, channel preferences, behavioral preferences, life track and location, etc., reflect the basic behavior of users. System integrity is the first step in all analytical work, and the complete framework is even better than advanced models. The greatest danger of human understanding is the use of local knowledge regardless of the consequences. If only concerned about their own site data, its analysis base is bound to be broken data.
Add: The harm of fracture data will become more and more obvious in the fierce competition, many Internet enterprises take CRM management system as data mining and data analysis system, the idea is wrong, the CRM aim is normative report, data analysis and data mining aim is exploratory induction.
Big Data Thinking Four
Now when it comes to big data, there are basically four confusing concepts: first, large data is full data, ignoring or even flouting sampling; second, the continuous data is large data; third, the data volume is large and large; The corresponding is: sampling data as long as reasonable sampling, the conclusion is accurate, continuous is only a data structure, a large number of noise will come to the wrong conclusion;
Add: The reality of the Internet field by the basic data on large books tired, the concept is very confusing, in fact, the data accumulated by the human experience is the basis of all analysis, including the so-called massive data, the methods of the books were born, but also did not fall, no actual operation experience accumulation, misleading too strong.
Big Data Thinking Five
Large data is not new, weather, earthquakes, quantum physics, genetics, medicine and so on, are useful for reference. They used a sample survey. The same is true of Internet data mining methodology, which is more difficult because of human complexity. Since it's about human research, we need to use all the research people to comb the big data. As long as the understanding of programming, understand the transfer of data can be large data mining argument is false.
Add: Big data is not new, just a new quick way to collect data, all the research methods and methods of human being applied to large data is the core of data mining, the ability to transfer data is only the technical part, the relationship is similar to director and clip.
Big Data Thinking Six
Analysis of large data analysis framework for the first, algorithm is also extremely critical, in the recent large data processing found that: the classification of the URL is a difficult, there are several aspects, a tens of thousands of people's network behavior data one day generated a domain name of about 50,000, although there are some algorithms, but confusing, difficult to identify, Continuous update and discrimination is an important step in the analysis, simple and easy, fine points difficult.
Complement: Algorithms rely on the architecture of the data, and algorithms need to really understand human behavior.
Big Data Thinking Seven
Algorithm, as long as the inclusion of text, there must be two key basic technologies: Keywords (dictionary) and semantic analysis, keyword technology mature, semantic technology is the bottleneck, Chinese semantics is too difficult to solve 50% of the team is good, especially the social language, such as "really can!" "What's the solution?" Required context. I hope VCs will encourage such basic technology research and development, break through this bottleneck is one of the key points of large data mining.
Big Data Thinking Eight
In social data mining, many teams focus on the use of Twitter waterfall ideas, that is, visualization technology, its beautiful composition is commendable, the problem is, its theory is still used more than 30 years ago, the concept of social metrology is limited to points, bridges, opinion leaders and other small group analysis, not suitable for giant nets, Social analysis that breaks through the visual framework requires theoretical exploration and practical effort.
Add: Understanding social significance is more important than structural presentation
Big Data Thinking Nine
The impact of mobile internet on social life is the deconstruction of time and space, the analysis of such large data needs to grasp these two points, if only analysis of app and network use behavior, then analysis on the loss of the meaning of movement. Simple figures such as flow rate and CTR can not solve complex marketing problems. The continuation of the original mode of thinking without innovation is the inertia of human thinking.
Add: The Internet and mobile Internet are related to two things.