View: The vast amount of data on the internet contains huge "gold deposits"

Source: Internet
Author: User
Keywords Large data massive data gold deposits data mining and

According to the IDC survey, global data volumes reached 1.2ZB at the end of 2010. By 2020, global electronic devices will have a 30 times-fold increase in data storage, up to 35ZB (equivalent to 1 billion 1TB of hard disk capacity). But for the prepared enterprise this is undoubtedly an information gold mine, with the progress of data mining technology, valuable information will become easy to obtain.

With the advent of the large data age, data storage, data mining, and processing and analysis of large data-related technologies are more concerned than ever. Big data is becoming the cornerstone of enterprise development, and gradually changing the business model of many industries. Using non-traditional data-filtering tools such as Hadoop to handle massive amounts of structured and unstructured data is becoming a trend.

There is a widespread misunderstanding of large data

The IDC report says large numbers in the enterprise are partly due to lower computer hardware costs and that today's computing systems can perform multiple tasks. At the same time, as the cost of primary storage decreases, businesses are more suited to processing more data in "memory" than ever before. More importantly, connecting servers to a cluster today is much simpler than it used to be. The above three factors also contributed to the emergence of large data.

Nowadays, there are three misunderstandings about the understanding of large data technology. The first is that relational databases cannot be extended to very large capacity and therefore do not apply to large data technologies. Second, regardless of workload and business-specific circumstances, Hadoop or derived related technology is the best technology to deal with large data. The era of the final database management system sketch is over.

In the era of large data, enterprises not only to deal with the data brought by the business, but also to reasonable planning costs. In the past, while supercomputers had multiple processing capabilities, they were usually large clusters. And because supercomputers need to configure proprietary hardware, they cost hundreds of thousands of dollars or more. And now the enterprise can use ordinary hardware to set up with the previous supercomputing machine performance equivalent to the machine. That's why businesses can now process large amounts of data faster and cheaply.

At the same time, the big data technology wants to popularize and obtains the approval first needs to solve is the cost question. Not every enterprise with a large data warehouse uses large data technology. Enterprises need to meet the data of multiple formats (structured, unstructured, semi-structured), massive data (need to store or analyze large data), data processing speed of three factors.

Hadoop is favored but not all

The current situation is that Hadoop technology has become the first choice to handle massive amounts of data. The open source model of Hadoop attracts a large number of people to develop and innovate it, which is one of the main reasons Hadoop is leading the way in mass data processing. Many vendors, including Microsoft, IBM, Oracle, Cloudera, and MAPR, have introduced Hadoop products that combine with themselves.

At the same time, in order to match Hadoop technology, software developers have developed a variety of new technologies, many of them from the open source community. NoSQL, for example, learned that the word NoSQL first appeared in 1998 and was a lightweight, open source, SQL-enabled database developed by Carlo Strozzi. The method provided by NoSQL has great advantages for SQL databases. Because it allows new levels of application expansion. The new data services build the cloud and build the distributed based on the truly scalable architecture and architecture. This is very appealing for application development. No DBA, no complex SQL queries.

At present, Google's BigTable and Amazon's dynamo are used NoSQL database, and the traditional relational database in dealing with ultra-high-scale, high concurrent SNS, Web2.0 site has been powerless. But NoSQL is also not omnipotent, in particular, the selection of data models, interface specifications and the current face of new services, such as mobile business data processing problems, are nosql unavoidable.

But the idea that Hadoop is big data is clearly wrong. In addition to Hadoop,teradata, HPCC and other technologies, you can also handle large data without using Hadoop.

Data gold mine remains to be excavated

The challenges of big data will translate into big opportunities. With the interaction and integration of large data, scalable, low-cost paths can be obtained. A new data integration platform can transform large data into trustworthy authoritative data that companies can use to gain competitive insight and improve their business. Large data integration will help data-driven enterprises release the full business potential.

The big data is not just about the amount of data. Successful companies must develop diversified, complex data across enterprises, integrated trading, customer and financial perspectives on increasing demand. Collecting and storing huge new data is a challenge, but the new approach to analyzing the data is a powerful tool to help the most successful companies get away from their rivals.

For a large amount of data information, how to make the complex application of these data into the current data warehouse, business intelligence and data analysis technology research hotspot. Data mining is to find out the hidden regularity of data from a large number of data, and to solve the problem of application quality. The most important application of data mining technology is to make full use of useful data and discard false and useless data. The data in the traditional database is very strong, that is, the data is fully structured data, and the most important feature of the data is semi-structured, so this kind of data mining is much more complicated than the data mining oriented to a single data warehouse.

(Responsible editor: Lu Guang)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.