"Now is the best time for enterprises to apply Hadoop ." Jeff Markham, Hortonworks's chief technology officer, said in a speech at the November China Hadoop Technology Summit at the end of 2013. At this summit, Hadoop entered the 2.0 era and became the focus of discussion. Jeff Markham said that Hadoop 2.0 has more powerful and more extensive new features that meet the needs of enterprise users, making up for the shortcomings of Hadoop 1.0 and better meet the needs of enterprise users.
Hadoop face-to-face
When Jeff Markham introduced the new features of Hadoop 2.0, the reporter heard a whisper: "You see, there are several strange functional modules in the Hadoop 2.0 Framework ." Yes, YARN is the most important component in these functional modules. YARN is actually a resource manager. To some extent, it subverts Hadoop's core MapReduce for data processing, allowing users to run Hadoop in a new way of interaction completely different from batch processing. As we all know, Hadoop was designed to search and index Web pages, while MapReduce, which is responsible for data control, is good at processing and analyzing unstructured or semi-structured data, such as log files, however, it is not suitable for processing all types of data. As the data size increases and the data complexity increases, people prefer to be able to process multiple types of applications in a cluster. This is also the background of Hadoop 2.0.
Some people think that YARN is essentially a new Hadoop operating system, which breaks through the performance bottleneck of MapReduce. The combination of Hadoop and YARN is more suitable for enterprise big data applications. YARN is designed to separate resource management from job scheduling/monitoring. Its architecture is implemented by combining a global ResourceManager with several application-oriented applicationmasters, resourceManager is responsible for allocating resources to various applications, while ApplicationMaster is responsible for running and monitoring tasks. Jeff Markham said: "joining the YARN management layer enables Hadoop to better meet the needs of enterprise-level users for big data platforms. Our company has prepared for Hadoop 2.0's entry into the enterprise from the security, management, configuration, and other aspects ."
Hadoop 2.0 is no longer an idea, but a real solution. Xinghuan Information Technology (Shanghai) Co., Ltd. (hereinafter referred to as xinghuan technology), a Chinese big data company, announced at the Summit that, the Transwarp Data Hub, a big Data platform integrated with Spark and Hadoop 2.0, was officially launched. "A common idea for enterprise users is to process more data and reduce latency more efficiently ." Sun yuanhao, co-founder and CTO of xinghuan technology, said: "In the past, people used different processing technologies to deal with data of different orders of magnitude, for example, memory technology, indexing technology, and some performance optimization technologies. One of the most prominent advantages of Transwarp Data Hub is that it can process GB-to PB-level Data on a single platform ."
It is precisely because Transwarp Data Hub has such capabilities that it has a wide range of applications, including offline analysis, statistics and mining, online storage, and online memory-based high-speed analysis. Transwarp Data Hub integrates Data integration/ETL, big Data storage and online service systems, memory-based efficient computing engines, high-performance SQL, statistical analysis, and machine learning, achieving Performance breakthroughs. In Sun yuanhao's words, Transwarp Data Hub has a "lightning" speed, which is 10-10 faster than the open-source Hadoop 2.0 ~ 100 times. In addition, Transwarp Data Hub has powerful analysis capabilities and is fully compatible with the Hadoop ecosystem.
With Transwarp Data Hub as the core, xinhuan technology also cooperates with many big Data vendors, including Revolution R, Informatica, and Tableau, to integrate these manufacturers' Data processing and analysis tools, A complete big data platform.
650) this. width = 650; "src =" http://www.bkjia.com/uploads/allimg/131229/115IJC9-0.jpg "width =" 375 "height =" 175 "alt =" 15(1132.16.jpg "/>
Lower application threshold
Due to the complexity of Hadoop and the lack of relevant big data technical personnel in the enterprise, it is not easy for Hadoop to be popularized quickly among enterprise users. Therefore, many IT vendors throw an "olive branch" to Hadoop. Some provide Hadoop-based hardware solutions, and some have released a commercial release version of Hadoop software, with only one purpose, is to lower the threshold for Hadoop applications.
At this summit, many well-known IT vendors, including Intel, VMware, Huawei, and many other telecom operators and Internet companies, came to the fore to help Hadoop's Chinese Promotion sites. He jingxiang, general manager of Intel Asia Pacific R & D Co., Ltd., said that in addition to the Hadoop commercial release, Intel's hardware (including processors and solid state disks) it provides comprehensive support for Hadoop from multiple perspectives, such as security, management, and optimization, so that Hadoop can better meet the needs of enterprise users.
This article from the "Guo Tao's storage world" blog, please be sure to keep this source http://gtstorageworld.blog.51cto.com/908359/1339080