Hadoop 2: A big leap in the evolution of large data

Source: Internet
Author: User
Keywords Work big data run
The new Hadoop not only makes it possible to further stimulate the application of Hadoop, but it will also create a new method of data processing within Hadoop, which is impossible under previous architectural constraints. In short, this is a good thing.

What has
been restricting the development of Hadoop? More importantly, what is the future of Hadoop?


's criticisms of Hadoop revolve around its extended limitations, and the biggest problem here is its work. All of the work in Hadoop is batch processing through a daemon called Jobtracker, which creates a bottleneck for scalability and processing speed.


and in Hadoop 2, this jobtracker method has disappeared. Hadoop uses a completely new work-processing framework, using two daemons: resourcemanager-manages all the work in the system, and nodemanager-runs on each Hadoop node, And let ResourceManager know what happened on the node. Each running application also has its own management program-applicationmaster.


MapReduce is also completely different from the previous, Apache gave it a brand new name: YARN, or verb Another Resource negotiator, another resource coordinator, the new MapReduce run as one of its many possible components. In fact, Apache claims that any distributed application can run on yarn, although some porting is needed. To do this, Apache provides a list of yarn compatible applications, such as the social icon analysis system Apache Giraph (Facebook is using).


Apache wisely decided not to break backwards compatibility, so MapReduce 2 will still use the same API, and the existing work needs to be recompiled to work properly.


yarn allows Hadoop to implement more Cross-platform compatibility with other Apache projects to handle large data. If you use one of these platforms, it becomes easier to use other platforms. This improvement in Hadoop will help drive other Apache projects.


The biggest improvement here is that MapReduce itself is one of many ways of mining data through Hadoop. Apache's own spark (another way to migrate to yarn) may be more suitable for some types of work than MapReduce, and Hadoop 2 gives you more options for choosing the right engine.


two large vendors Cloudera and Hortonworks share a common view of the importance of yarn, although they use Hadoop in completely different directions. Cloudera's Impala allows low latency SQL queries to be run on HDFs-stored data, which makes it ideal for real-time analysis, and Hortonworks chooses to use Apache's native hive technology, which is ideal for large data warehouse operations ( For example, long-running queries with many connected operations


porting applications to yarn is not a straightforward task, and the rewards that Hadoop brings in this area will depend on the yarn deployment within the new framework. Cloudera and Hortonworks are solid supporters of Hadoop 2, and they are not turning to other technologies or sticking to a generation of technology, and from this point of view, Hadoop 2 is not just smoke or the image of the previous generation.
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.