Large data triggers limit how does Hadoop go farther?

Source: Internet
Author: User
Keywords Server large data more throw storage technology

Storage technology has grown and matured and has begun to be near-commodity in many data centers. Today's businesses, however, face a number of problems with the changing storage technology One example is the push for large data analysis, a move to bring business intelligence BI functionality to large datasets.

Large data analysis processes require the following capabilities beyond the typical storage paradigm-typical storage paradigm, in short, traditional storage technologies such as Sans, Nas, and other storage technologies that cannot be processed locally with the challenge of large data, terabytes and petabytes of unstructured information. In addition, Success http://www.aliyun.com/zixun/aggregation/14294.html "> 's Big Data analysis needs more things-a new way to handle high-volume data, in other words, a new storage platform."

Hadoop is an open source project that provides a platform for handling large data. Although Hadoop has been there for some time now, many companies are starting to use Hadoop.

The Hadoop platform is designed to address the problems caused by massive data, especially those that mix complex, unstructured, structured information that is not suitable for storage in tables. Hadoop works well when it needs to support depth and compute extensive analysis such as clustering and positioning. So what does Hadoop mean for IT professionals looking to make the most of the big data? The simple answer is that Hadoop solves the most common problem associated with large data: efficient storage and access to massive amounts of data.

The intrinsic design of Hadoop allows it to run as a platform capable of working between a large number of computers that do not share any memory and disk. With this in mind, it's easy to see how Hadoop delivers added value--that network administrators can simply buy many commodity servers, put them on racks, and run Hadoop software on each server.

Moreover, Hadoop helps reduce the administrative overhead associated with large datasets. In operations, once the enterprise's data is loaded into the Hadoop platform, the software decomposes the data into manageable fragments and automatically assigns the data to different servers. The natural distributed nature of data means that access to data from a single server is impossible. Hadoop tracks where data resides and protects this information further by creating multiple storage replicas. In this way, the scalability of the system is enhanced: If a server goes offline or fails, the data can automatically replicate a known normal copy.

How does Hadoop go farther?

Hadoop further processes data in multiple steps. For example, restricting the association of traditional, centralized database systems may include large disk drives connected to server-level systems with multiple processors. In this case, due to disk performance constraints, data analysis is limited and, ultimately, the number of processors that can be purchased.

Once Hadoop is deployed, each server in the cluster can participate in the processing of data through Hadoop to propagate the functionality of the data distributed in the cluster. In other words, an index job sends code to each server in the cluster, and then each server operates on its own piece of data, and then the processing results are delivered as a whole. With Hadoop, processes are considered MapReduce, and in MapReduce, the code and processes are mapped to all servers and the results are reduced to a single dataset.

Hadoop is able to handle massive amounts of data because of this process. Hadoop propagates data and can work in parallel with all available cluster processors to handle complex computing problems.

(Responsible editor: The good of the Legacy)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.