1, the existence of the background
The huge leap in infrastructure, the rapid development of data storage technology and network technology have prepared the material foundation for the arrival of large data age.
The Internet of Things is essentially more portals and nodes of data acquisition, and cloud computing fosters the business model of service and centralized construction to reduce unit computing and storage costs. The mobile internet is even more interesting: the first feature is the identity, at the right time, the right place, the right information to the right person, you need to know who your partner is, what he likes, he is now what state, without this accurate identity information, everything is impossible to talk about. The second is the connection, two-way timely interactive connection, with the network, you want any information, from the computer to obtain. One half of us besides taking information, we have the other half of the information, but the change is that the information flow can be more timely. The third is the main characteristics of the mobile phone, information is the location attribute. Finally is induction, we have no more sensors on the computer, mobile phone sensor angle will be more and more, wearable products, the future of our mobile phone can smell taste, can feel formaldehyde exceeded, you can feel the electromagnetic radiation. The combination of these three is essentially the production, processing and application of large data, through a variety of new technologies and to help us solve a variety of problems, the reconstruction of information flow, capital flow, logistics.
2. How big is the data?
The change in the methodology of large data-driven methodologies is essentially that human behavior is increasingly virtualized, no one in the history of the past know you on the Internet is a person or a dog, now everywhere traces everywhere, language analysis, Natural semantic processing, image processing, signal processing, relationship prediction to accurately predict, resulting in the global data volume doubled every two years. With the iphone and all kinds of Android popularity, everyone in the cloud has a few G several T network disk, with a variety of information, large to a certain extent, can not deal with, we call the narrow sense of large data. There are a lot of new computers that are processed, stored, and mathematically modeled to analyze the data, which is divided into cold and hot data based on the frequency of access. The amount of information brought about by thermal data was more significant, the focus was on this, and the mathematical approach was based on statistical sampling. Fascinating correlation: But with the cost of computing and storage reduced, it is found that the total amount of data can be processed, the full amount of data accumulation has been a lot of wonderful phenomenon. Some have a major impact on current science, and the models based on statistics and sampling are most likely to be wrong, such as the Cape Town law and Pluto's mistakes. This can refer to me and the big Data era of the historical opportunity of the book. So in a narrow definition there will be 4 V defined by IBM and IDC, Data Scale (Volume), Fast (Velocity), multiple type (produced) value (value).
3. Why is big data so sensational? I think it is a profound social background, more important is the data thinking
The first is that I have been mentioning the data thinking, the so-called data thinking, to pay attention to the comprehensiveness of the data, rather than random sampling. Second: Focus on the complexity of the data, weakening accuracy, before we have a lot of people asked one, pull, now big data inside we don't require so refined, we require a large framework, fuzzy accuracy trend of judgment the third largest data is a new way to evaluate the business model, Data becomes the core asset and will profoundly affect the business model of the enterprise and even restructure its culture and organization.
I've defined the five dimensions: activity, granularity, dimensional space-time, emotion first called activity, basically you in the Internet company, for example, you use Ali service, may use 3 to 5 times a day, but you know the bank of the network you may be one months or more time to go once. The second is called particle size, it is you on the platform of the electric dealer from you into the shop to the shopping, to form the purchase, to the logistics, to the transportation, to the distribution, to the final appraisal and the sharing, all links to you fully records, this is very important, I call the granularity, The data of the information or financial data that we are seeing at the bank is the water, electricity and gas costs, plus your payroll date, this data is very rough. Three are called dimensions, like Yeepay payments, and when you use his data, your data stays with him, so there are more dimensions for data-related processing and analysis. The distance of the four. When someone has a loan requirement, my financial institution, my internet company, is likely to be the first to know you have a loan demand, or know when online, I may know the first time this customer, the bank knows that there are many in the process, this is a, this is near and far. The last one we call emotion, any message you send on Weibo is emotional, you know your state after you have feelings, and you know it is not useful to take any marketing.
4. What happens next? --Pan-Internet
Software, hardware will be free, as the data collection of the import industry vertical integration: The first is the software to do hardware, internet companies do hardware and software, the next is the electricity business to do financial, financial do electricity, software companies to provide value-added services. Why? Once you need wireless proximity to the customer, it's all about serving the customer's needs, and the boundaries of the industry are constantly being broken into assets: data becomes more and more important and is the foundation of all business models ' origins and refactorings.
5. In China and the global context, this change is taking place
The internet industry is the first, followed by business intelligence and consulting services, retail industry, but also medical, health, transportation, logistics and even biotechnology, astronomy ... The awareness and ability of data services, which are spawned by large data, are affecting all aspects of the society, from commercial technology to health care, government, education, economics, humanities and other areas of society, and have spawned transformative forces in all sectors. That's what we call a trans-border subversive. I divide the big data technology into the traditional enterprise level, and the innovation market. The enterprise-level market is also the replacement of IBM, EMC, HP, Oracle new bottles of old wine, and the old products of business intelligence data processing, more simply to trick customers into doing data analysis. At that time on the other hand, such as Google\facebook, domestic bat, etc. are really considering large data. And to Ali led to the trend of IOE, but also reflected in the future of mobile and large data waves, the foreigner's products can not meet the domestic rapid, open source, convenient growth needs. In the innovation market, the Big Data technology: On the one hand, mainly open source. Even IBM, Oracle and other industry giants, is also integrated open source technology, and the company's original products better combination, in the emerging large data processing areas, Chinese and foreign companies almost stand at the same starting line. In the narrow sense of the large data processing technology (such as Hadoop, MapReduce, pattern recognition, machine learning, etc.), the gap between China and foreign countries is very short. If you consider the size of digital assets and the technology used, the gap between China and foreign countries is more reflected in the gap of consciousness. Like Ali has completely replaced IoE products, currently not only for their own use, but also provides Aliyun external output. Save 2 billion of IT spending, like Amazon has turned EC2 and S3 into bigger profit points. and Ali now processing capacity of 100 million times per second, more than 4 major lines of synthesis (Ma Yun in the last few days in the People's Bank of the bluff) on the other hand, China's population and economic size, determines the size of China's data assets, the crown in the world. Objectively for the development of large data technology, provided the drill field. For example, I was in Oracle before the three major operators of the consulting manager, each encounter customer customers are mentioned, we are the world's largest number of data. A sharpshooter is honed by bullets, and good products are measured by data. In Ali, Jingdong, Baidu related companies, whether customer demand driven, or cost-driven have begun the replacement process, I think this trend will be further transmitted to finance, telecommunications, government and other important IT investment industry.
6, several typical company's big data
Baidu has China's largest consumer behavior database, covering 95% of Chinese netizens, daily response to 5 billion search requests, the search market accounted for 80%, Baidu Alliance, 600,000 Alliance partners every day 5 billion times daily behavior, which constitute the basis of huge data. Change the mode: Push Baidu index, and Baidu index based on the establishment of Baidu Cloud list, Baidu Data Center, research institutions of the way network search advisory reports. The advertising webmaster and the development team provide the Baidu (mobile) statistics and related developer service tools.
Tencent is more than 783.6 million QQ active accounts, 469 million microblogging users and more than 100 million of the video users, 597.6 million QQ space users, micro-letters, mobile phone butler and other mobile users also more than 400 million, overseas users faster than 100 million. Apart from mass users, Tencent "N Products xn platform xn a terminal xn user relationship" of the huge service matrix, resulting in data unstructured, fragmented, mass. The only tool is: Tencent analysis and Tencent Compass.
Ma said that platform, finance and data are Ali's three strategic directions in the future. Ali Future is essentially a data company, the electricity business is increasingly inseparable from data, the core of finance is also data. The acquisition of Sina Weibo, friends of the League, gold, Tintin, etc. is for the circle of data. I have sprayed the relevant things hundreds of times, not in detail. Person in charge: Car feel, interesting products: The internal Amoy data, KPI system, data portal, live broadcast, Sellers cloud, page click, Huang Jinze; To provide customers with data cube, infinite God needle and Class 360, Taobao index most epoch-making 2012 Ali and launched the "Poly Stone Tower" Products can provide data storage, data calculation two types of services 2012 years "double 11" that the 19.1 billion-dollar big promotion day, "Poly Stone Tower" processing orders more than the total number of cats 20%, increased by 20 times times than usual. Ali Finance is an example of the development of large data derivative products. Ali because of the electric quotient characteristic he goes furthest in the application.
7. Classification and scale of industrial chain
There are a lot of related basic industries, first, the data technology industry, including the hardware of intelligent pipelines, IoT, servers, storage, transmission, intelligent mobile devices, software, language, data platform, tools, structure and unstructured database, application software, etc., services, such as IDC, cloud computing, Web applications, etc. The second is data acquisition, including positioning, payment, SNS, mail and other industries; The third is the information industry, including data mining, data analysis, data consulting and other industries; Data application Industry: For example, internet-based finance based on data generation.
8, the summary of large data:
A thinking: Data thinking
Two major push: extreme experience, long tail effect
Three major trends: pan-Internet, vertical integration, data is assets
Four steps: Entry, flow, data, cash
Five standards: Activity, granularity, dimension, time and space, emotion
Six major models: data, information, consulting, media, data enabling, technology
Seven words: Focus, Ultimate, Word-of-mouth, fast
(Responsible editor: Mengyishan)