Dong Xicheng: Hadoop will expand its advantages in the fast development and perfection!

Source: Internet
Author: User
Keywords Appearance advantages DFS present

The current development of Hadoop, especially after the advent of Hadoop 2.0, HDFs and yarn Two systems have a number of significant features have been achieved, and thus promote the development of the upper computing system, including the emergence of Tez to make hive and pig have a greater performance improvement, There are a variety of new frameworks based on yarn.

May 20, 2014, csdn together chinahadoop small elephant community will build a distributed online storage system HBase, Data Warehouse hive, Hadoop in the application of telecom operators and other content of "Hadoop in the enterprise application of practical" high-end technical training.

Before the start of this training, I and this training lecturer Dong Xicheng made a simple communication, he said that will be May 20 "Hadoop in the enterprise application of combat" and share some of Hadoop some typical application cases, mainly including HDFs, Yarn and MapReduce three systems, including background, basic architecture and usage, typical application cases, etc.

The following is a reporter to interview the original:

-What causes you to delve into Hadoop technology?

I initially started to study Hadoop at the graduate stage. At that time just entered the laboratory to do the project, the first project is the Hadoop optimization related, and then Hadoop has been with me, its openness, clever architecture design, the number of participants, updated quickly and so on the advantages of attracting me, is I continue to study the power of Hadoop. The vast majority of internet companies are using Hadoop, which has become a "public identity" or "public language", allowing people of different backgrounds and experiences to communicate and communicate with interest, and grow together in exchange for a sense of achievement.

-How does Hadoop have a unique advantage in solving problems?

Hadoop has already covered most of the Internet apps, especially after the advent of Hadoop yarn, where many systems can be used in a friendly way with Hadoop to accomplish tasks that are difficult to accomplish before. In general, Hadoop now covers various areas of data collection, distributed storage, and distributed computing, with unique advantages in all areas:

Data collection: Hadoop provides distributed collection tools, including Flume, Sqoop, to collect data from distributed, discrete data sources (Web services, traditional relational databases, etc.), and to import centralized storage systems.

Distributed storage: Includes unstructured storage HDFs, semi-structured storage hbase, to meet most of the offline storage and online storage requirements, along with HDFs itself (such as new features HDFs Cache, support for heterogeneous storage media, etc.) and the emergence of new storage file formats (including Orcfile, Parquet, etc.), HDFs will become more and more powerful.

Distributed computing: In the Hadoop 1.0 era, Hadoop was mostly for off-line batch computing, and with the advent and stability of Hadoop 2.0, it has gradually begun to support interactive computing and real-time computing, especially after the advent of Hadoop yarn Multiple types of computing tasks can be allowed to run in a cluster, and users can develop their own computing framework to suit their needs. In short, Hadoop in the rapid development and improvement, it is expanding its own advantages.

-What is the biggest difficulty for enterprise to apply Hadoop today?

Different types of enterprises encounter difficulties, for some small and medium sized internet companies because of their limited number of Hadoop technicians, when the need to operation of multiple systems in the Hadoop ecosystem and to upgrade them, will encounter greater difficulties. This is mainly the Hadoop update speed too fast, new features and features are constantly emerging, and these new features and features often lack documentation, and often require platform maintenance and developers to follow code to understand the implementation details and configuration of new features, which is a challenging and energetic task.

For traditional non-internet companies, the difficulties they may encounter are:

Select the appropriate Hadoop solution based on your application type.

How to migrate existing schemas onto Hadoop.

If you have used a commercial software that has been replaced with Hadoop, how do you maintain and manage Hadoop, etc.

-Based on your understanding, what is the current situation of Hadoop development?

The current development of Hadoop, especially after the advent of Hadoop 2.0, HDFs and yarn Two systems have a number of significant features have been achieved, and thus promote the development of the upper computing system, including the emergence of Tez, so that hive and pig have a greater performance improvement, There are a variety of new frameworks based on yarn.

-Please talk about the topic you are about to share in this Hadoop training session.

In this Hadoop training, I mainly share some of the basic knowledge of Hadoop and some typical application cases, mainly including HDFs, yarn and MapReduce three systems, introduce the basis of these systems, including the background, basic architecture and usage, typical application cases, of course, It also introduces their recent developments and trends, and has some guiding role in mastering the development trend of Hadoop technology.

-which people should attend this training? What will help them?

The training is geared toward companies and Hadoop junior academics who are preparing to try Hadoop. This training will give guidance to people who want to know what Hadoop is, what they can do, and what success stories they have, as well as the technical selection of Hadoop, the design features of Hadoop architecture, and the application of Hadoop.

Original link: http://www.csdn.net/article/2014-04-28/2819523-Hadoop-ChinaHadoop

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.