according to the Developerworks on these several articles to do, you can put Hadoop configuration up, I am here is not much verbose, the following is my configuration in the process of encountering problems when some records, you can refer to.
$ bin/hadoop jar Hadoop-0.18.0-examples.jar wordcount test-in test-out
#执行完毕, see execution results below:
$ cd Test-out
$ cat part-00000
Bye 1
Goodbye 1
Hadoop 2
Hello 2
World 2
-------------------------20080822
Pseudo distributed operation mode
This pattern is also run on a single machine, but uses different Java processes to simulate various nodes in a distributed operation (Namenode, DataNode, Jobtracker, Tasktracker, secondary namenode), Note the difference between these nodes in a distributed operation:
from the perspective of distributed storage, nodes in a cluster consist of one namenode and several DataNode, and a secondary namenode as a backup of Namenode. From the point of view of distributed application, the node in the cluster is composed of a jobtracker and several tasktracker, Jobtracker is responsible for the task scheduling, and Tasktracker is responsible for executing the task in parallel. The Tasktracker must be run on the DataNode so that it is easy to compute local data. Jobtracker and Namenode do not need to be on the same machine.
(1) Modify conf/hadoop-site.xml in code Listing 2. Note that Conf/hadoop-default.xml is the default parameter for Hadoop, and you can read this file to see what parameters are available in Hadoop, but do not modify the file. You can change the default parameter value by modifying Conf/hadoop-site.xml, and the parameter values set in this file override Conf/hadoop-default.xml parameters with the same name.
#将本地文件系统上的./test-in directory to HDFS root directory, directory name changed to input
#执行 Bin/hadoop Dfs–help can learn the use of various HDFS commands.
$ bin/hadoop jar hadoop-0.18.0-examples.jar wordcount Input Output
#查看执行结果:
#将文件从 HDFS to the local file system to view again:
$ bin/hadoop dfs-get output Output
$ cat output/*
#也可以直接查看
$ bin/hadoop dfs-cat output/*
$ bin/stop-all.sh #停止 Hadoop process
Fault Diagnosis
(1) Execute $ bin/start-all.sh After the Hadoop process is started, 5 Java processes are started and five PID files are created in the/tmp directory to record the process ID numbers. Through these five files, you can learn about Namenode, Datanode, secondary namenode, Jobtracker, Tasktracker, respectively, which Java process corresponds to. When you feel that Hadoop is not working properly, you can first see if the 5 Java processes are running correctly.
(2) uses a web interface. Access http://localhost:50030 can view the running state of Jobtracker. Access http://localhost:50060 can view the running state of Tasktracker. Access http://localhost:50070 can view the status of Namenode and the entire Distributed file system, browse files in the Distributed file system, and log.
(3) To view the log files in the ${hadoop_home}/logs directory, Namenode, Datanode, secondary namenode, Jobtracker, tasktracker each have a corresponding log file, Each run of the compute task also has a pair of application log files. Analyzing these log files helps to find the cause of the failure.
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.