Hadoop Copvin--45 common problem solutions

Last Update:2015-03-17 Source: Internet

Author: User

Keywords Nbsp; name running SSH

Tags check cloudera configuration configure data default directory distributed

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

In the work life, some problems are very simple, but often search for half a day can not find the required answers, in the learning and use of Hadoop is the same. Here are some common problems with Hadoop cluster settings:

1.Hadoop cluster can run 3 modes?

single-machine (local) mode pseudo-distributed mode

2. Attention points in single machine (local) mode?

There is no daemon in stand-alone mode (standalone), and everything is running on a JVM. There is also no DFS, using the local file system. Stand-alone mode is suitable for running mapreduce programs during development, which is also the least used mode.

3. Attention points in pseudo distribution mode?

Pseudo-distributed (Pseudo) is suitable for development and test environments in which all daemons run on the same machine.

4. Can VMS be called pseudo?

No, two things, and pseudo only for Hadoop.

5. What is the focus of the full distribution model?

The full distribution pattern is commonly used in the production environment, where we use n hosts to form a Hadoop cluster, which runs on each host. There will be Namenode running hosts, Datanode running hosts, and host tasks tracker running. In a distributed environment, the master node and the from node are separated.

6. Does Hadoop follow Unix mode?

Yes, Hadoop also has a "conf" directory under UNIX use cases.

7. What directory is Hadoop installed in?

Cloudera and Apache use the same directory structure, and Hadoop is installed in cd/usr/lib/hadoop-0.20/.

8. Namenode, Job Tracker and task tracker are the port numbers?

Namenode,70;job Tracker,30;task tracker,60.

9. What is the core configuration of Hadoop?

The core configuration of Hadoop is done through two XML files: 1,hadoop-default.xml;2,hadoop-site.xml. These files are all in XML format, so there are attributes in each XML, including names and values, but the files are no longer present.

10. How to configure the present?

Hadoop now has 3 profiles: 1,core-site.xml;2,hdfs-site.xml;3,mapred-site.xml. These files are saved in the conf/subdirectory.

What is the overflow factor for ram?

The overflow factor (spill factor) is the size of the file stored in the temporary file, which is the Hadoop-temp directory.

Fs.mapr.working.dir is just a single directory?

Fs.mapr.working.dir is just a directory.

3 main attributes of Hdfs-site.xml?

Dfs.name.dir determines the path of the metadata store and how DFS is stored (either disk or remote) Dfs.data.dir determines the path of the data store Fs.checkpoint.dir for the second Namenode

14. How do I exit input mode?

Exit input in the form of: 1, press esc;2, type: Q (If you have not entered any of the present) or type: Wq (if you have entered the present), and press ENTER.

15. What happens to the system when you enter HADOOPFSCK/cause "Connection refused Java exception"?

This means that Namenode is not running on top of your VM.

16. We use Ubuntu and Cloudera, so where do we go to download Hadoop, or do we install it by default with Ubuntu?

This default configuration for Hadoop, you have to download from Cloudera or Edureka Dropbox and run on your system. Of course, you can also configure it yourself, but you need a Linux box,ubuntu or red Hat. There are installation steps in the Cloudera Web site or Edureka Dropbox.

What is the use of the "JPS" order?

This command checks to see if Namenode, Datanode, Task Tracker, job Tracker are working correctly.

18. How to restart Namenode?

Click Stop-all.sh, then click Start-all.sh. Type sudo hdfs (enter), Su-hdfs (Enter),/etc/init.d/ha (enter), and/etc/init.d/hadoop-0.20-namenode start (enter).

The full name of fsck?

The full name is: File System Check.

20. How do I check if Namenode is working properly?

If you want to check if Namenode is working properly, use the command/etc/init.d/hadoop-0.20-namenode status or simply JPS.

The role of Mapred.job.tracker command?

Lets you know which node is the job Tracker.

What is the function of the/ETC/INIT.D order?

/ETC/INIT.D describes the location or state of the Daemon (service), which is actually a Linux feature and is not related to Hadoop.

23. How do I find namenode in the browser?

If you really need to find Namenode in the browser, you no longer need the Localhost:8021,namenode port number is 50070.

24. How do I go from su to Cloudera?

You only need to type exit from Su to Cloudera.

25. What files are used for startup and shutdown commands?

Slaves and Masters.

What is slaves made of?

The slaves consists of a list of hosts, each 1 rows, used to describe the data node.

What is masters made of?

Masters is also a list of hosts, each of which is used to illustrate the second Namenode server.

What is hadoop-env.sh used for?

HADOOP-ENV.SH provides a running environment for java_home in Hadoop.

Does the master file provide multiple portals?

Yes, you can have multiple master file interfaces.

What is the current position of the hadoop-env.sh file?

Hadoop-env.sh is now located in Conf.

31. What does PID represent in Hadoop_pid_dir?

The PID represents the Process ID.

What does/var/hadoop/pids do?

/var/hadoop/pids is used to store PID.

What is the role of hadoop-metrics.properties document?

Hadoop-metrics.properties is used as a "Reporting" to control the Hadoop report, and the initial state is "not".

What kind of network does Hadoop need?

The Hadoop core uses shell (SSH) to drive the server process from the node and use the Password-less SSH connection between the master node and the node.

35. Why need password-less SSH in all distributed environment?

This is mainly because the communication in the cluster is too frequent and job tracker need to publish tasks as quickly as possible to task tracker.

36. Does this lead to security issues?

Don't worry about it at all. The Hadoop cluster is completely isolated and typically cannot be manipulated from the Internet. Unique configuration, so we do not need to care about this level of security vulnerabilities, such as through the Internet intrusion and so on. Hadoop provides a relatively safe way to connect machines.

What is the port number of SSH work?

The port number for SSH work is no.22, which can certainly be configured, and 22 is the default port number.

The points of attention in SSH are also included?

SSH is just a secure shell communication, it can be used as a protocol on the no.22, only need to configure a password to secure access.

39. Why does the SSH local host need a password?

Using passwords in SSH primarily increases security and in some cases does not set up password traffic at all.

40. If you add a key in SSH, do you need to set a password?

Yes, even if you add a key to SSH, you need to set a password.

41. What if there is no data in the Namenode?

Namenode without data cannot be called Namenode, and in general, Namenode will certainly have data.

42. What happens to Namenode when job tracker is down?

When the job tracker fails, the cluster still works, as long as the Namenode is OK.

43. Is the client or Namenode decide to input the fragment?

This is not determined by the client, in the configuration file and in determining the fragmentation rules.

44. Can you set up your own Hadoop cluster?

Yes, you can do that as long as you're familiar with the Hadoop environment.

45. Can I run Hadoop on Windows?

You'd better not, Red Hat Linux or Ubuntu is the best operating system for Hadoop. In a Hadoop installation, Windows is not usually used because there are a variety of problems. As a result, Windows is definitely not a recommended system for Hadoop.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More