Hadoop Reference Design Group components and key steps (i)

Source: Internet
Author: User
Keywords Hadoop reference Design Group key steps
Tags .mall apache applications based class customer data design

There are more articles on the Hadoop reference Design group components and critical steps, so the small set of Hadoop reference Design group components and key steps are divided into sections to give you a detailed introduction.

Software

Operating system: Hadoop supports any operating system that can run the Java environment. In practical applications, the average customer will choose the 64-bit versions of different Linux distributions. In this reference design we chose the free enterprise-class Linux CentOS6.3 x64 version.

Hadoop system: Hadoop is an open-source software based on the Apache licensing protocol and allows customers to choose between free open source and commercial support editions. Free Open source version There are still a large number of software bugs, before the use of a certain amount of software development efforts to verify and improve. In large data industry applications, a more mature and reliable version of the business support is generally recommended. Intel provides its own Apache hadoop* Intel distribution and includes a number of performance optimizations and improvements to industry application needs, so we use Intel's hadoop* release in our reference designs.

Internet

The Hadoop system has the flexibility to support different Ethernet technologies. The Apache hadoop* Intel distribution also adds support for unlimited network (Infiniband) technology.

Typically, gigabit and Gigabit Ethernet applications are common in large industry data applications. Our reference design also uses these Ethernet technologies.

Unlimited Network (Infiniband) technology is used in the implementation of Hadoop for data storage or processing of low latency has special requirements of the occasion.

Key steps in reference design implementation

Hardware Device Deployment

Cabinet deployment

As mentioned earlier, the deployment of the Hadoop scheme is typically in cabinets. In a cabinet it usually contains 1 to 2 switches, multiple servers and corresponding cabinet distribution (PDU).

In our installation simulation environment, we use only 4 servers. The process of installation is essentially the same as a larger deployment (but more data node nodes need to be installed repeatedly), only slightly different on the network design to connect more servers.

Network Connections

In the experimental system, each server (Intel s host has 6 Ethernet ports, from the back of the sequence is eth0 to Eth5, where eth0 to Eth3 is the speed of 1G network port, Eth4 and Eth5 is the speed of 10G network port. In a pre-configured configuration, all servers are connected to the same local area network, and any network port (other than the management port) can be selected to meet the needs.

Screen diagram:

Network environment topology:

Software deployment

The following figure depicts the components of the Apache hadoop* Intel distribution

Distributed File System (HDFS)

Distributed Database (HBase)

Distributed Data Warehouse (Hive)

Distributed data Analysis (PIG)

Parallel Computing Framework (MAPREDUCE)

Distributed synchronization software (zookeeper)

Data Mining (Mahout)

Structured data connectors (SQOOP)

Log data connectors (Flume)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.