The relationship between large data and Hadoop

Source: Internet
Author: User
Keywords Big data already we very

Personal Summary:

Hadoop:hadoop is a software framework capable of distributed processing of large amounts of data, and it is a technology implementation

Large data:

Information:

We've all heard this prediction: by 2020, the amount of electronic data storage will increase by 44 times times to 35 trillion GB by 2009. According to IDC data, as of 2010, this number has reached 1.2 million PB, or 1.2ZB. If you put all this data into a DVD disc, the disc height would be equivalent to a round-trip from Earth to the moon, which is about 480,000 miles.

For those who like to worry too much, this is an ominous sign that the end of the data store is coming. For opportunists, it's like an information gold mine, and as technology progresses, gold mining becomes easier.

Into big data, a burgeoning data mining technology that is making data processing and analytics cheaper and faster. Large data technology, once in the supercomputing era, can soon be applied to ordinary enterprises, and in the process of proliferation, it will change the mode of business operation of many industries.

In the computer world, large data is defined as a mining process that uses non-traditional data filtering tools for a large number of ordered or unordered data sets, including but not limited to distributed computing (HADOOP).

Big data is already on the cusp of data store propaganda, and there are a lot of uncertainties, which are very much like "clouds". We consulted some analysts and large data enthusiasts and asked them to explain what the big data was and what it meant for future data storage.

Big data goes into the stage of history

Large data for the enterprise has emerged, partly thanks to the fact that computing power consumption is reduced and that the system has the ability to perform multiple processing. And as the cost of primary storage continues to fall, companies can store more data in memory than in the past. Also, it becomes easier to connect multiple computers to a server cluster. These three changes add up to big data, said Carl Olofson, IDC database management analyst.

"We're not just doing these things right, we're going to be able to afford the expenses," he said. "Some supercomputers in the past also had the ability to perform multiple processing of the system, which is tightly connected and formed a cluster, but it costs as much as hundreds of thousands of dollars or more because it uses specialized hardware." "Now we can complete the same configuration using normal hardware." Because of this, we can quickly and save more data. "

Large data technology has not yet been widely used in companies with large data warehouses. IDC believes that for large data technology to be recognized, the first technology itself must be cheap enough, and then must meet what IBM calls the 3V standard 2V, namely: type (produced), Volume (volume) and speed (velocity).

Category requirements refer to the types of data to be stored into structured and unstructured data. Volume refers to the amount of data stored and analyzed can be very large. "The amount of data is not just hundreds of TB,"

Olofson said: "Depending on the situation, because of the speed and time relationship, sometimes hundreds of GB may even be a lot." If I could finish the 300GB data analysis in a second that would take an hour to complete in the past, the result would be vastly different. Large data is a technology that meets at least two of these three requirements, and can be deployed by ordinary enterprises. ”

Three major misconceptions about big data

There are a lot of misunderstandings about what big data is and what big data can do. Here are three misconceptions about Big data:

1, the relational database can not be greatly increased capacity, so can not be considered as large data technology (NO)

2, regardless of workload or specific use, Hadoop or any other mapreduce is the best choice for large data. (Nor right)

3. The era of graphic management system is over. The development of illustrations will only be a stumbling block to large data applications. (Ridiculous mistakes)

The relationship between large data and open source

"Many people think Hadoop and big data are basically a meaning. That's wrong, "Olofson said. And explained: Teradata, MySQL and "Smart aggregation technology" some of the installation is not used to enable Hadoop, but they can also be considered large data.

Hadoop is an application for large data, because it is based on MapReduce, so it has aroused great concern. (MapReduce is a common method for supercomputing, then streamlined and refined by a project optimized primarily by Google.) Hadoop is a major installation enabler for a mixture of several tightly related Apache projects, including the HBase database in the MapReduce environment.

To take full advantage of Hadoop and similar advanced technologies, software developers have racked their brains to develop a variety of technologies, many of them developed in the open source community.

"They have developed a large number of so-called NoSQL databases, which are dazzling, most of which are key-value pairing databases that can optimize performance, variety or capacity using a variety of technologies," Olofson said. ”

Open source technology has not yet been commercially supported. "So it's going to take some time to develop and improve, and it may take years." For this reason, large data may take some time to mature in the market, "he added.

According to IDC, at least three business companies will be able to give Hadoop support in some way during the year. At the same time, several companies, including Datameer, will publish analytics tools with Hadoop components that help companies develop their own applications. Hadoop is already appearing in the Cloudera and Tableau products list.

Source: http://os.51cto.com/art/201205/339932.htm

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.