I recently tried to build the environment for Hadoop, but I really don't know how to build it. The next hop was a step-by-step error. Answers from many people on the Internet are also common pitfalls (for example, the most typical is the case sensitivity of commands, for example, hadoop commands are in lower case, and many people write Hadoop, so when you encounter errors such as Command not found, first check whether it is case sensitive.) In short, the process is very painful. Fortunately, it was finally set up.
I don't want to repeat the setup steps. I will also briefly introduce other articles, so there are too many steps on the network, but I just want to record the problems I encountered during the setup.
I have almost no compilation process in the Linux environment, and I am not familiar with Linux, so there will be a lot of errors, and I will make corrections later.
- Build a Hadoop environment on Ubuntu 13.04
- Cluster configuration for Ubuntu 12.10 + Hadoop 1.2.1
- Build a Hadoop environment on Ubuntu (standalone mode + pseudo Distribution Mode)
- Configuration of Hadoop environment in Ubuntu
- Detailed tutorial on creating a Hadoop environment for standalone Edition
- Build a Hadoop environment (using virtual machines to build two Ubuntu systems in a Winodws environment)
Environment: RedHat6 (Enterprise Edition)
Hadoop version: 1.04
Eclipse: 3.4
Mode: hadoop has three modes: Local Mode, pseudo distribution mode, and distribution mode. Because it is only used for learning, (the condition is not allowed to set up the distribution mode), only the environment in the pseudo distribution mode is set up.
Mark the problem for the time being and add it later.
1: hadoop version Selection Problems
2: ssh password-free Login
3: mutual ping between Windows and Linux
4: hadoop Service Startup (start-all.sh)
5: Eclipse connection to Hadoop (firewall)
Mark it here first. Linux-related knowledge is badly needed ~~~
Question 1: Version Selection Problems
Hadoop version 0.22.0 was used for the first stage. Later, it was found that some jar packages were missing (I don't know why ?) So I went to the Apache official website to go to version 1.04. When I introduced it on the official website, I also mentioned that this version is a stable version.
Question 2: SSH password-free Login
I have not completed this problem so far. I tried it on my colleague's machine, but I still need a password for my machine, and I also checked some information online, the general steps are as follows,
Root @ localhost hadoop] # ssh-keygen-t rsa press enter until the generated key pair is saved in the. ssh/id_rsa file according to the default options.
Generating public/private dsa key pair.
Your identification has been saved in/root/. ssh/id_dsa.
Your public key has been saved in/root/. ssh/id_dsa.pub.
The key fingerprint is:
74: 79: 98: eb: fa: e0: 53: aa: e3: 1b: e4: a4: 16: 7a: 6b: 31 root @ localhost
Run the following command:
[Root @ localhost hadoop] # cp id_rsa.pub authorized_keys
Then execute ssh localhost, which can be connected through ssh without a password.
If you still need a password, it is a permission issue. Set the permission as follows:
Chmod 700/. ssh/(corresponding to the ssh path installed by yourself)
Chmod 600/. ssh/authorized_keys
In general, the above operation can achieve ssh password-free login on the local machine (or not, I don't know, because I still need a password, so depressed)
The purpose of writing ssh is not to set up SSH password-free login, but to demonstrate that this step is not necessary in standalone mode or pseudo distribution mode, because there are not many clusters, even if you need a password during connection, you only need to enter the password several times (in pseudo-distribution mode, you only need to enter the password three times ). Many people say that a password-free password must be set. I personally think this is a bit misleading, at least misleading me. It took me about a day to complete the ssh settings. In fact, if it is only for the purpose of learning Hadoop, if it gets stuck in this step, leave it alone and check whether it can be solved. (Of course, it would be best if you can successfully set ssh password-free login)
Question 3: ping between Windows and Linux
If you are familiar with development in Windows like me and want to develop programs in Windows-Eclipse, this step is necessary, at least to ping Linux in Windows. I dare not elaborate on this part of the content, for fear of misleading people. The following steps are found on the Internet. They are used for the time being, but I cannot guarantee the rationality.
Please refer to this link:
4: hadoop Service Startup (start-all.sh)
After hadoop1.0, it is no longer recommended to use start-all.s and stop-all.sh commands, but to use the start-dfs.sh start-mapred.sh and stop-dfs.sh stop-mpared.sh commands to start and close the service.
Tip: You need to format namenode before hadoop is started for the first time. (This operation is performed only when the environment is installed for the first time. If it is formatted later, the datanode is inconsistent with the namenode namespace, and the solution is simple, you only need to manually modify any of them to make them consistent. For more information about how to modify them, see the Internet)
5: Eclipse connection to Hadoop (firewall)
When using Eclipse (in Windows) to connect to Linux, I encountered two problems, resulting in no connection. One was caused by the firewall settings of Linux. For more information, see () the firewall setting in step 3 can solve the problem, but it is obviously not very good. If there is a better way to handle it, use it.