The development background of large data technology

Source: Internet
Author: User
Keywords Large data technology large data cost think background

For large companies, the rise in large data is partly because computing power is available at lower cost, and systems are now capable of multitasking. Second, the cost of memory is also plummeting, and businesses can handle more data in memory than ever before. And it's getting easier to aggregate computers into server clusters. Carl Olofson, IDC's database management analyst, believes the combination of these three factors has spawned big data.


"Not only do we do these things well, but we can do them at a lower cost," he said. "In the past, some large supercomputers have been involved in heavy processing systems, built together into tightly aggregated clusters, but because they are specially designed hardware, it costs hundreds of thousands of or even millions of of dollars." And now we can get the same computing power with ordinary merchandising hardware. This helps us to process more data more quickly and cheaply. ”


Of course, not all companies with large data warehouses can say they are using large data technology. IDC argues that for a technology to be a big data technology, it must first be cost-affordable, followed by the need to meet two of the three "V" criteria described by IBM: Diversity (produced), Volume (volume), and velocity (velocity).


Diversity means that data should contain structured and unstructured data. Volume refers to the amount of data that is aggregated together for analysis to be very large. Speed, however, means that data processing must be fast. Olofson says big data "is not always said to have hundreds of TB." Depending on the actual usage, sometimes hundreds of gigabytes of data can also be called large data, which depends mainly on its third dimension, i.e., speed or time dimension. If I can analyze 300GB of data in 1 seconds, and usually it takes 1 hours, the results of this huge change will add great value. Large data technology is an affordable application that achieves at least two of these three criteria. ”


Relationship to open source


"Many people think Hadoop is synonymous with big data," he said. But it was a mistake, "Olofson explained. Examples of Teradata, MySQL, and some of the "Smart Cluster Technologies" implementations do not use Hadoop, but are also considered to be implementation cases of large data.


As an application environment for large data, Hadoop attracts people's attention because it is based on the MapReduce environment, a simplified environment commonly used in the hyper-computation circle, primarily a project created by Google. Hadoop is a hybrid implementation environment that is closely related to various Apache projects, including the HBase database created in the MapReduce environment.


Software developers typically respond by using everything from Hadoop and similar advanced technologies-many of which are developed in the open source community. "They created a dizzying and changeable thing, the so-called NoSQL database, where most of the key values of the database have been optimized for processing power, diversification, or database size," Olofson said.


Open-source technology is generally not commercially supported, "so these things have to evolve over time and gradually eliminate flaws that typically take years." That is to say, fledgling's large data technology is not yet popular in the general market. At the same time, IDC expects at least three commercial vendors to provide some type of support for Hadoop by the end of the year. Other vendors, such as Datameer, also provide analysis tools with Hadoop components that allow businesses to develop their own applications. For example, Cloudera and tableau have used Hadoop in their products.


Upgrading a relational database


Industry watchers generally agree that large data technologies should also be considered when upgrading a relational database management system (RDBMS). "Big data technology applies to situations that are faster, larger, and cheaper," Olofson said. "For example, Teradata makes its system cheaper, scalable and clustered.


Others, however, do not think so. Marcus Collins, Gartner's data management analyst, said, "The BI tool is usually used when using an RDBMS, but this process is not really big data." This process has a long history. ”

(Responsible editor: The good of the Legacy)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.