Jeff markham:100% Open Source is the core of Hadoop

Source: Internet
Author: User
Keywords Open source Core application

November 2013 22-23rd, as the only large-scale industry event dedicated to the sharing of Hadoop technology and applications, the 2013 Hadoop China Technology Summit (Chinese Hadoop Summit 2013) will be held at four points by Sheraton Beijing Group Hotel. At that time, nearly thousands of CIOs, CTO, architects, IT managers, consultants, engineers, enthusiasts for Hadoop technology, and it vendors and technologists engaged in Hadoop research and promotion will join the industry.

The Haoop China Technology Summit was hosted by Chinese Hadoop Summit Expert committee, organized by IT168, Itpub, Chinaunix, and the media in charge of the drainage. The conference will uphold the theme of "effectiveness, application and innovation", aiming to promote the ability and level of Chinese enterprise users to improve the application of Hadoop, reduce the application threshold of Hadoop technology and the threshold of investment budget, and popularize the application value of large data through open and extensive sharing and exchange.

In the run-up to this conference, IT168 reporters had the privilege of interviewing the conference's guest speaker--hortonworks, Mr. Jeff Markham, Asia Pacific Technology director. Prior to joining Hortonworks, Mr. Markham helped companies such as VMware, Red Hat and IBM to build distributed applications using distributed data. He has years of experience in Hadoop technology and data analysis, and is one of the authors of the Apache Hadoop yarn:moving Beyond MapReduce and Batch 處理 with Apache Hadoop 2.

YARN: Next Generation Hadoop platform

Hadoop has been 7 years since it was born in 2006. When it comes to Hadoop, we have to mention the origins of the three companies, such as Yahoo, Hortonworks and Cloudera, and Hadoop. Hadoop originated in the 2002 Apache Nutch Project, one of the Apache Lucene subprojects. 2006 gradually became a set of complete and independent software, named Hadoop. At the beginning of 2008, Hadoop became the Apache Top project, which was applied to many internet companies except Yahoo!. Since 2010, Yahoo's Hadoop team has isolated two Hadoop technical consulting startups, Hortonworks and Cloudera. Among them, the Hortonworks has 58 Hadoop PMC, occupy the community 35% share. According to Jeff, according to the Code line statistics, Hortonworks and Yahoo contributed to the Hadoop backbone project more than 80% of the source code.

▲hortonworks, Asia Pacific Technology director Jeff Markham

As a company that was born out of Yahoo and focused on Hadoop, Hortonworks can be said to be a continuation of Yahoo's contribution to the Hadoop biosphere. In an interview, Jeff said Hortonworks not only had a large number of Hadoop experts, but also a major contributor to Hadoop 2.0 and Apache yarn, and yarn was considered the next generation Hadoop platform by the industry. Speaking of the birth of yarn, Jeff said that the jobtracker/tasktracker mechanism of the older version of MapReduce needed to be scaled to fix its flaws in scalability, memory consumption, threading model, reliability and performance. The Hadoop development team has done some bug fixes over the past few years, but the cost of these fixes has increased recently, suggesting that it is becoming more difficult to make changes to the original framework. To fundamentally address the performance bottlenecks of the old MapReduce framework, starting with version 0.23, the MapReduce framework underwent a massive update and was completely refactored, and the new version of MapReduce2.0 was named yarn or MRv2.

Compared to the Hadoop 1.0,hadoop2.0, there has been a significant improvement in the stability or rationality of the architecture, making Hadoop more important than a single batch platform and the ecosystem more prosperous and tight. In Jeff's view, yarn's role in Hadoop 2.0 was the most important. Hortonworks, he says, wants to radically redesign Hadoop's architecture when it comes to building Hadoop2.0, to achieve the goal of running multiple applications on Hadoop and working with related datasets. This allows multiple types of applications to run on the same cluster efficiently and controllable. This is the real reason why Apache yarn, based on Hadoop 2.0, can be born. Using yarn to manage cluster resource requests, Hadoop upgrades from a single application system to a multiple-application operating system.

Essentially, yarn is the operating system of Hadoop, breaking through the performance bottlenecks of the MapReduce framework. It is also a true Hadoop resource manager that allows multiple applications to run simultaneously and efficiently on one cluster. With Yarn,hadoop will be a truly multiple application platform that can serve the entire enterprise. Jeff also revealed that yarn has been used for the Hortonworks data platform, and the combination of Hadoop and yarn is key to the success of the enterprise's big data platform.

100% Open Source is the core of Hortonworks Hadoop

Now that there are so many versions of Apache Hadoop in the market, how can hortonworks stand out in the competition? Jeff believes that Hortonworks Hadoop differs from other Hadoop distributions (such as Cloudera) in that Hortonworks products are all open source. It is understood that Hortonworks is a fully open source company, all of its code will be back to the Apache Hadoop project. In an interview, Jeff boldly predicted that by 2015, half of the world's data would be handled by Apache Hadoop, and Hadoop would be the platform for future big data. As a company dedicated to creating and pushing open source Hadoop, Hortonworks's mission is to provide 100% open source Hadoop platforms. As long as the Hortonworks platform, it must be open source, but also for the global open source, so that any future partners and suppliers will be on the open source platform to cooperate. In addition, Hortonworks is less reliant on proprietary code than Cloudera, so users don't have to worry about vendor lock-in issues.

In addition to focusing on open source, Hortonworks has invested heavily in Apache Hadoop to make it an enterprise-wide data platform, while the company encourages ecosystem developers to provide more ecosystems to support the Hadoop platform. A broad range of partner support is another big success factor for Hortonworks, where the most-watched partner is not "Microsoft". As a strategic partner of Microsoft, Hortonworks is leveraging its expertise in this area to help integrate Hadoop into Microsoft's offerings and build Windows Server and Windows Azure platform on Apache Hadoop. It is reported that last June, Hortonworks and Microsoft formally released a large data analysis platform based on Hadoop HDP (Hortonworks, Platform). The Windows version of the Hortonworks large data platform HDP also launched in February this year, which marks the hortonworks of large data analysis technology can be applied to both Linux and Windows two major platforms.

In addition to Microsoft, Hortonworks has more than 140 technical partners, such as Teradata and Rackspace, have built their own Hadoop product lines, which are based on hortonworks data platforms, according to Jeff.

But for the vast majority of Hadoop enthusiasts, fast learning and mastering the technology of Hadoop is not easy. In this respect, Jeff strongly recommends that it geeks use Hortonworks's hortonworks sandbox tutorials. He says Hortonworks Sandbox is a good initial course for software architects looking for solutions to big data problems, and for application developers who are learning new technologies. In Sandbox, Hortonworks offers a number of practical online training courses, including how to use Apache Pig, Apache Hive, and the latest HDP distributions to process data, and so on. It is understood that Hortonworks sandbox can run in the Virtualbox,vmware,hyper-v three virtual environments, more to the point is that the sandbox tutorial is completely free, click into the Hortonworks sandbox page.

As one of the speakers of this Hadoop conference, Jeff Markham A lot of anticipation for the Conference. In his view, more and more companies are starting to focus on data Analysis Services, which bodes well for the explosive growth of Hadoop in the Global and Asia-Pacific markets. He hopes the conference will bring the latest Hadoop trends to the attendees, and he'll bring you the latest information on Hadoop 2.0 and yarn and the future direction and highlights of Hadoop, bringing the Hadoop2.0 cyclone to China, so we'll see!

It is reported that Hadoop China Technology Summit 2013 is based on the Hadoop platform for the first large-scale industry-wide data Industry Technology Summit, the General Assembly will be around the Hadoop ecosystem to carry out a full range of technology sharing, discussion and results show. The topics of the Conference will cover the following seven major areas: Hadoop technology innovation, Hadoop infrastructure deployment and optimization, virtualization and Hadoop, Hadoop applications in the Internet, Hadoop applications in the non-Internet industry, and integration of Hadoop with the existing IT architecture of the enterprise, Big data start-ups and investments.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.