Build hbase environment on AWS EMR

Source: Internet
Author: User
Tags aws emr

0. Overview

AWS's EMR service provides customers with a managed Hadoop framework that allows you to easily and quickly
Quickly and cost-effectively distribute and process across multiple dynamically scalable Amazon EC2 instances
Large amounts of data. You can also run other common distribution frameworks, such as those in Amazon EMR
Spark and Presto) and other AWS data storage services such as Amazon S3 and
Land-handling Big data use cases, including log analytics, Web indexing, data warehousing, machine
Learning, financial analysis, scientific simulations and biological information.
With the EMR service, we don't have to manually install Jdk,hadoop,
No longer bothered to install these software, people who have built Hadoop clusters know that it is tedious,
There are many configurations, and even the problems encountered on each machine are different. Now with the EMR everything
Are simple enough to keep you focused on your own development, now it's easy to say, but I'm using EMR's
Also encountered a lot of problems due to well-known reasons AWS in North Korea and other countries and
Inconvenient applications, especially EMR, so the information I can find about EMR doesn't help me.
Completely done, the official documentation is many, but some of the issues are not clear, and the AWS Technical support
Only after the communication is held.

1. Create a key pair

The key pair is used to SSH to the remote host.
1 Open the Amazon EC2 console https://console.aws.amazon.com/ec2/.

2 key pair found on left side

3 Creating a key pair


Click Create will pop up a dialog box to let you save the key, save, SSH login will be used when

2. Create a S3 bucket

If this step is ignored, a bucket is created by default later
1 Open the Amazon S3 console via the following URL: https://console.aws.amazon.com/s3/.

2 Creating buckets

3. Creating an EMR, hosting a Hadoop cluster

1 Open the Amazon EMR console via the following URL: https://console.aws.amazon.com/elasticmapreduce/.

2 Creating a Cluster

Enter the name of the cluster, set the S3 storage path, which is the S3 bucket created in the second step and, if not created, automatically assign a path


Select HBase

Set the number of clusters, by default

Select the key pair, which is the key pair created in the first step, then click Create to start the cluster, knowing that the cluster status is displayed as a wait state when the boot is successful

This allows the Hadoop cluster to be created, click on the cluster you just created in the cluster list, and make the cluster details

4. Landing Host

According to the official documentation, you are allowed to log on directly with the master node for DNS and key pairs, but it does not indicate that you want to set up a security group, so I've been in this place for a long time.
1 Modifying security policies
AWS for Security, the default security group to disable the SSH 22 port, also disables the ping, in short, you need to use the port must be open, I for the sake of convenience, because it is the test, so all open the port, the specific operation is as follows:
In the cluster details, locate the security group for the host point, and click on the following connection to enter the security group settings

Click on the Inbound, here is all open ports, here can set specific IP access, click Edit, add your security rules

Open ICMP is to be able to ping the same host, open SSH 22 port is for SSH login, but also why to create a good host direct logon is not successful reason, because I want to use the program through the thrift server to connect hbase so open TCP port, for security purposes, What ports are recommended to open ports

2 Login
The PuTTY itself does not support the private key format (. Pem) generated by Amazon EC2. The PuTTY has a tool called PuTTYgen that converts the key to the desired PuTTY format (. ppk). You must convert the private key to this format (. ppk) before you try to connect to your instance using PuTTY.
Convert your private key
-Start PuTTYgen (for example, on the Start menu, click All Programs > PuTTY > PuTTYgen).
-in type of key to generate (the type of key to generate), select SSH-2 RSA.
-click "Load". By default, PuTTYgen only displays files with the extension. PPK. To find your. pem file, select the option to display all types of files.
-Select the. pem file for the key pair that you specified when you launched the instance, and then click Open. Click OK to close the confirmation dialog box.
-Click Save Private key to save the key in PuTTY available format. PuTTYgen displays a warning about saving the key without a password. Click Yes (yes).

Then log in with the generated PPK key

Login success:

Enter HBase Shell to manipulate hbase

EMR has thrift turned on by default, so it is no longer manual to start it, so the program can access the host
You can connect using the host public DNS name when you connect to the program.

Build hbase environment on AWS EMR

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.