As in the previous articlesArticleAs shown in, I am using cdh4 to install and develop hadoopProgramWhen I try to write a hadoop program using eclipse, I find that the hadoop package in cloudera does not have the eclipse-plugin plug-in. programmers who have used the common version of hadoop know that, the hadoop file on the Apache official website will integrate this Eclipse plug-in. Therefore, I have been searching for methods on Google and Baidu to c
I. What is cdh? CDHisCloudera '100% opensourceHadoopdistribution, builtspecificallytomeetenterprisedemans is an open source distributed storage system. ii. what software and functions does cdh4 contain? First, hbase, hadoop, zookeeper, these are essential, followed by h
I. What is cdh?CDH is Cloudera's 100% open source Hadoop distribution, builtspecifically to meet enterprise demandsAn open-source distributed storage system
II. what software and funct
Preface
After a while of hadoop deployment and management, write down this series of blog records.
To avoid repetitive deployment, I have written the deployment steps as a script. You only need to execute the script according to this article, and the entire environment is basically deployed. The deployment script I put in the Open Source China git repository (http://git.oschina.net/snake1361222/hadoop_scripts ).
All the deployment in this article is based on
(we recommend that you forget whether 32-bit machines are supported). The supported Linux versions also have requirements.
Http://www.cloudera.com/content/cloudera-content/cloudera-docs/Impala/latest/PDF/Installing-and-Using-Impala.pdf
Install cdh4
Http://archive.cloudera.com/cdh4/cdh/4/
Both CDH and hive can be found here.
Three machinesMaster to install namenode, secondnamenode, ResourceManager, Impala-s
Cloudera cdh4 has three installation methods:
1. Automatic Installation through cloudera Manager (only 64-bit Linux operating systems are supported );
2. Use the yum command to manually install the package;
3. Manually install the tarball package;
I personally recommend that you try either method 1 or 2. You should first have a clear understanding of the hadoop architecture, built-in components, and configurations. For specific installation, refer to
server Load balancer node) is responsible for managing the application lifecycle and applying for resources from the resource manager, and monitors the status of tasks (such as restarting failed tasks ).
Therefore, each datanode node runs a nodemanager and a mapreduce
5 zookeeper planning description:
Considering that there are not many resources required by the zookeeper cluster, we generally recommend that you deploy ZK nodes and other services on the same machine. Zookeeper must have at lea
enter a password without prompting me. It proves that SSH is connected correctly)
4. Configure hadop
Important files
Hadoop-env.sh sets hadoop Environment Variables
Core-site.xml core configuration file
Mapred-site.xml mapreduce configuration file
Hdfs-site.xml HDFS configuration file
Log4j. Properties
1. Go to the configuration file directory of hadoop and configure it.
$cd/usr/local/hadoop2.0-cdh/etc/hadoop/
2. Configure hadoop Java environment variables
$ Vim hadoop-env.sh modify the follow
Linux handle restrictions
Linux --- Process Handle restriction Summary
!! This article is only tested on RHEL6.4.
Linux handles can be divided into 1 system level limit and 2 user level limit:
/Proc/sys/fs/nr_open>/proc/sys/fs/file-max> = ulimit-Hn> = ulimit-Sn1 system-level limit: 1.1/proc/sys/fs/nr_open
The maximum number of file handles supported by the System File System. The default value is 1048576 (1 M). The maximum value is limited by the system memory. This is the maximum value for all
Linux open handle limit adjustment, linux handle limit AdjustmentLinux handle restrictions
References:
Linux --- Process Handle limits Summary (http://blog.csdn.net/jhcsdb/article/details/32338953)
!! This article is only tested on RHEL6.4.
Linux handles can be divided into 1 system level limit and 2 user level limit:
/Proc/sys/fs/nr_open>/proc/sys/fs/file-max> = ulimit-Hn> = ulimit-Sn1 system-level limit: 1.1/proc/sys/fs/nr_open
The maximum number of file handles supported by the System File S
Exited_with_faIlure 2014-03-31 19:50:50,496 DEBUG org.apache.hadoop.yarn.event.AsyncDispatcher:Dispatching the event Org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainersLauncherEvent.EventType:CLEANUP_ CONTAINER 2014-03-31 19:50:50,496 INFO Org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:Cleaning up Container container_1396266549856_0001_01_000001
This is not a waste of time, because only to find that CDH has provided a ready-made c
Linux handle restrictionsLinux handle restrictions
Linux handles can be divided into 1 system level limit and 2 user level limit:
/Proc/sys/fs/nr_open>/proc/sys/fs/file-max> = ulimit-Hn> = ulimit-Sn1 system-level limit: 1.1/proc/sys/fs/nr_open
The maximum number of file handles supported by the System File System. The default value is 1048576 (1 M). The maximum value is limited by the system memory. This is the maximum value for all restrictions.1.2/proc/sys/fs/file-max
The maximum number of fi
start CDH and its services on all hosts.
Use the Cloudera Manager Setup Wizard to give the host a role and configure the cluster. Many of the configurations are automatic.
Use the Cloudera Manager Setup Wizard to grant the host a role and configure the cluster. Many of the configurations are automatic.
Use the Cloudera Manager Setup Wizard to give the host a role and configure the cluster. Many of the configurations are automatic. You can also use the Cloudera Manager API to mana
1. install CDH4 on the official website
1. install CDH4On the official websiteStep 1a: Optionally Add a Repository KeyRpm -- import http://archive.cloudera.com/cdh4/redhat/5/x86_64/cdh/RPM-GPG-KEY-clouderaStep 2: Install CDH4 with MRv1Yum-y installhadoop-0.20-mapreduce-jobtrackerStep 3: Install CDH4 with YARNYum-y inst
Author: Dong | Sina Weibo: XI Cheng understands | reprinted, but the original source and author information and copyright statement must be indicated in the form of a hyperlink. Website: dongxicheng. orgmapreducecdh4-jobtracker-ha everyone knows that HadoopJobTracker has a single point of failure, and there has been no perfect open source solutions. In Hadoop
Author: Dong | Sina Weibo: XI Cheng understand | can be reproduced, but must be in the form of hyperlink to indicate the original source o
performance is very poor, or you can implement MapReduce programs for query analysis, this also inherits the latency of MapReduce.To achieve Impala and HBase integration, we can obtain the following benefits:
We can use familiar SQL statements. Like traditional relational databases, it is easy to provide SQL Design for complex queries and statistical analysis.
Impala query statistics and analysis is much faster than native MapReduce and Hive.
To integrate Impala with HBase, You need to map
We used distcp on the CDH4 version of Hadoop to copy the data from the CDH5 version of Hadoop to Cdh4, which commands the following
Hadoop Distcp-update-skipcrccheck hftp://cdh5:50070/xxxx hdfs://cdh4/xxx
When the file is very general there is such an error,
2017-12-15 10:47:24,506 info execute. bulkloadhbase-caused By:java.io.IOException:Got EOF But currentpos
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.