Large data mining is valuable

Source: Internet
Author: User
Keywords Large data large data these

Since 2012, large numbers have become a word from a concept and become more noticeable as time goes by, and by the year 2014, big data is clearly the star of attention in the IT circle.

The "4 V", summarized by the renowned research institute IDC, can well define large data concepts, 4V respectively capacity, type, speed, and value (volume, produced, velocity, and values). Large data is a new technical framework for acquiring value from large capacity data through high-speed capture, discovery and analysis.

There is a need for data mining and analysis in all walks of life, and we all want to look for business directions and new opportunities from massive data. The difference is that with the development of information technology, especially after the popularity of smartphones, the total amount of data generated by users participating in all kinds of business has become more, and the types of data that can be analyzed and processed are becoming more and more important.

The large data platform does not imply the negation of the original information system infrastructure, because the existing production system in the information system always exists, the customer's requirement for the reliability and vertical expansion ability of the key business will not be reduced, and the customer's reliability requirement of the centralized management of the data always exists. Large data platforms add better computing, more storage, more data storage tiers to the infrastructure, and all large data applications require a solid, reliable, flexible and efficient large data platform.

The data itself is the data, the value is hidden in the data, need to dig according to, collation, analysis to form valuable large data. From this point of view, is not more than the database, who is the big data. If the analysis data is not applied, then the data can only be used for archival storage, which is of no value. How to analyze and organize data efficiently, quickly and accurately is a difficult problem in large data application, the data needs to be sorted and analyzed, and the valuable part will be surfaced to the database.

For example, during the Spring festival in 2014, Tencent based on the data of the change of QQ users ' login location, and analyzed the changes of people's migration sites during the Spring Festival. Baidu is also based on mobile phone users during the spring festival during the changes in the location, given a time period of the population migration Roadmap services ... This kind of data based on a large number of conclusions, not only as a kind of news to disseminate, but also for the Spring festival during the railway, highway, civil aviation and other transport resources to make suggestions and reference. In the context of the increasingly mature value analysis of large data, large data can help governments to manage more scientifically. For companies, large data can help them to be more accurate in marketing and dissemination. For example, microblogging and Taobao cooperation, according to the user inquiries history to carry out the accurate advertising products.

For big data, Google and Facebook are the first companies to implement and explore, and they are far ahead in the analysis and exploration of big data. Google, for example, has hundreds of thousands of servers around the world, and behind it is the world's largest database system, and the analysis of these data has made Google discover a new world.

In fact, large data technology is still mainly open source, until today there is no one home to form an absolute technology monopoly. Even IBM, Oracle, SAP, EMC and other industry giants, but also is the open source of large data technology and its original product better integration, the formation of its product characteristics of a large data platform.

While a large commercial data platform is largely concentrated in the hands of international giants, it does not mean that China's big data lags behind the times. The most typical large data application in China is bat--Baidu, Ali, Tencent. To occupy more than 80% of the domestic netizens search Baidu, the introduction of Baidu Index, box calculation and other functions, without exception are large data typical applications; Ali's Taobao in the last year, "Double 11" detonated the Internet users of the shopping frenzy, so that the next one months, each express also for the "Double 11" busy, A large number of transaction data and local shopping characteristics of the data analysis also let Ali in the big data occupy the field of electrical business important position; Tencent with its veteran qq+ popular micro-letter, forming more than 1 billion active users of the large data base, by these massive user behavior accumulated data analysis, also formed a huge wealth of Tencent Foundation.

Sina Weibo and 360 as the emerging large data enterprises also have their own unique development characteristics. After renaming Weibo, Sina Weibo has apparently taken the top spot in social media, and as the first source of news and news, it has become an important place for almost all institutions, companies, media and social networking, and it is clearly an important user of large data. 360 in the domestic PC and mobile phone security Portal occupies an absolute advantage, nature is also the beneficiary of these user behavior data, so 360 also deserved to become the domestic large data application of the typical enterprise.

Do the big-mac Internet companies already have big data in their own right, and does that mean that the domestic data industry is ripe? No, these giants are far ahead of China's other industries in the pace of information construction, its own large data applications are also based on open source system, By their own strong technical team to meet their own business needs development, and gradually formed a large enterprise characteristics of data applications.

Compared with these internet giants, the industry users obviously do not have their strong technology development strength, obviously does not have the open source large data system and its own business docking strength. But they were previously users of products such as IBM, Oracle, SAP, and EMC, and they were able to get a large data application platform that could be directly connected to existing business data from these reputable vendors. Of course, these specific large data deployment will also rely on the help of SI and other channels, the difference is that currently in the country can implement large data platform deployment is the majority of international manufacturers.

In fact, many industry users still have large data positioning within the 100TB level, with the Internet enterprises without the upper limit of large data, 100TB memory is the industry real-time analysis of the upper limit of data. SAP's Hana and Oracle's Exadata hardware and software integration large data products just cover these industry application areas, these integrated large data analysis products also accelerate the possibility of large data real-time analysis. Unlike databases traditionally placed in a disk array, these new generations of products compress data stored in a disk array into real-time retrieval, or put data in memory and flash, layered calls to avoid the lag caused by I/O reading. In the past, when users query the terabytes of data in a disk array, they wait minutes or more to meet the needs of a large number of users concurrent queries, while the database products running in memory successfully solve the problem of user real-time query.

As can be seen from the above figure, the data is growing rapidly, but the user-tolerated system latency growth is indeed limited, so large data processing and response ratio is an important indicator. From the early GB database to today's TB-level, or even hundreds of TB-level databases, data growth has gone beyond the hardware's Moore's law. Now that the data is going through explosive growth, newer database technologies are needed to classify and extract the resources needed. This brings new requirements to the manufacturers of large data analysis.

Chinese enterprises currently lack large data implementation capabilities and related talent, and the big data analysis is no longer a simple software or hardware manufacturer, traditional database vendors take full advantage of the latest server technology, such as Oracle and SAP have launched an All-in-one product (large data software + custom optimized server + storage), and hardware server/storage vendors also launched a fully matched with large data integration machine, these integrated machine products will be the future of large data market a trend, but also the Chinese enterprises to large data a shortcut.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.