Hadoop interview 45 Questions and answers

Last Update:2018-07-20 Source: Internet

Author: User

Tags ssh port number mapr

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1.the 3 modes that the Hadoop cluster can run.

Single-machine (local) mode pseudo-distributed mode fully distributed mode

2. note points in the stand-alone (local) mode.

There is no daemon in stand-alone mode (standalone), and everything runs on a JVM. There is also no DFS here, using the local file system. Stand-alone mode is suitable for running mapreduce programs during development, which is also the least used mode.

3. note points in pseudo-distribution mode.

Pseudo-distributed (Pseudo) is suitable for development and test environments where all daemons run on the same machine.

4. whether the VM can be called pseudo.

No, two things, and pseudo only for Hadoop.

5. What are the points of attention in the full distribution mode?

The full distribution pattern is typically used in production environments where we use n hosts to form a Hadoop cluster, with the Hadoop daemon running on top of each host. There will be Namenode running hosts, Datanode running hosts, and task tracker running hosts. In a distributed environment, the master and slave nodes are separated.

6. does Hadoop follow Unix mode?

Yes, Hadoop also has a "conf" directory under UNIX use cases.

7. What directory is Hadoop installed in?

Cloudera and Apache use the same directory structure, and Hadoop is installed in cd/usr/lib/hadoop-0.20/.

8. the port number for Namenode, Job Tracker, and task tracker is.

Namenode,70;job Tracker,30;task tracker,60.

9. What is the core configuration of Hadoop?

The core configuration of Hadoop is done through two XML files: 1,hadoop-default.xml;2,hadoop-site.xml. These files are in XML format, so there are attributes in each XML, including names and values, but these files are no longer present.

then how to configure it now.

Hadoop now has 3 configuration files: 1,core-site.xml;2,hdfs-site.xml;3,mapred-site.xml. These files are saved in the conf/subdirectory.

One . the overflow factor for RAM is.

The overflow factor (spill factor) is the size of the file stored in the temporary file, which is the Hadoop-temp directory.

Fs.mapr.working.dir is just a single directory.

Fs.mapr.working.dir is just a directory.

the 3 main attributes of Hdfs-site.xml.

Dfs.name.dir determines the path to the metadata store and how the DFS is stored (disk or remote) Dfs.data.dir determines the path to the data store Fs.checkpoint.dir for the second Namenode

How to exit input mode.

Exit the input by: 1, press esc;2, type: Q (If you do not enter any now) or type: Wq (if you have entered now), and press ENTER.

What happened to the system when you entered HADOOPFSCK/caused "Connection refused Java exception".

This means that Namenode is not running on top of your VMS.

We use Ubuntu and Cloudera, so where do we go to download Hadoop, or install it with Ubuntu by default?

This is the default configuration for Hadoop and you must download it from Cloudera or Edureka Dropbox and run it on your system. Of course, you can also configure it yourself, but you need a Linux box,ubuntu or red Hat. There are installation steps on the Cloudera website or in Edureka Dropbox.

The use of the "JPS" command.

This command checks to see if Namenode, Datanode, Task Tracker, job Tracker are working properly.

How to restart Namenode.

Click Stop-all.sh, and then click Start-all.sh. Type sudo hdfs (enter), Su-hdfs (Enter),/etc/init.d/ha (enter), and/etc/init.d/hadoop-0.20-namenode start (enter).

The full name of the . fsck.

The full name is: File System Check.

How to check if the Namenode is running properly.

If you want to check if Namenode is working correctly, use the command/etc/init.d/hadoop-0.20-namenode status or simply JPS.

The role of the Mapred.job.tracker command.

You can let you know which node is the job Tracker.

The function of the/ETC/INIT.D command is.

/ETC/INIT.D illustrates the location or state of the Daemon (service), which is actually a Linux feature that has little to do with Hadoop.

How to find Namenode in the browser.

If you do need to find Namenode in the browser, you no longer need to localhost:8021,namenode the port number is 50070.

How to transfer from Su to Cloudera.

To go from Su to Cloudera, you only need to type exit.

What files are used to start and close commands.

Slaves and Masters.

What is the composition of slaves.

Slaves consists of a list of hosts, each of 1 rows, used to describe the data node.

what constitutes the Masters.

The Masters is also a list of hosts, each row, that describes the second Namenode server.

hadoop-env.sh is used to do something.

HADOOP-ENV.SH provides a running environment for the. Java_home in Hadoop.

Does the . Master file provide multiple portals.

Yes, you can have multiple master file interfaces.

The current location of the . hadoop-env.sh file.

Hadoop-env.sh is now located in Conf.

in Hadoop_pid_dir, what the PID represents.

The PID represents the "Process ID".

What is/var/hadoop/pids used to do.

The/var/hadoop/pids is used to store the PID.

the effect of the. hadoop-metrics.properties file is.

Hadoop-metrics.properties is used as a "Reporting" to control Hadoop reports, and the initial state is "not to the report".

What kind of network does Hadoop need?

The Hadoop core uses shell (SSH) to drive server processes from the node and use password-less SSH connections between the primary and slave nodes.

Why is the requirement password-less SSH in a fully distributed environment?

This is mainly because the communication in the cluster is too frequent, and the job tracker needs to release tasks to task tracker as quickly as possible.

This can lead to security issues.

Don't worry at all. Hadoop clusters are completely isolated and generally cannot be operated from the Internet. Unique configuration, so we do not need to care about this level of security vulnerabilities, such as through the Internet intrusion and so on. Hadoop provides a relatively safe way to connect machines.

The port number of the operation is.

The port number of SSH work is no.22, of course it can be configured, 22 is the default port number.

note points in SSH are also included.

SSH is just a secure shell communication, it can be used as a protocol on no.22, only need to configure a password to secure access.

Why the SSH Local host requires a password.

The use of passwords in SSH is primarily to increase security and in some cases does not set up password communication at all.

If you add a key to SSH, you also need to set the password.

Yes, you need to set a password even if you add a key to SSH.

What happens if there is no data in Namenode.

Namenode without data cannot be called Namenode, and usually namenode will have data.

When Job tracker goes down, what happens to Namenode.

When the job tracker fails, the cluster still works, as long as the namenode is not a problem.

whether the client or namenode determines the input shard.

This is not determined by the client, in the configuration file, and in determining the Shard details.

Whether you can build your own Hadoop cluster.

Yes, as long as you're familiar enough with the Hadoop environment, you can do it completely.

If you can run Hadoop on Windows.

You'd better not do this, Red Hat Linux or Ubuntu is the best operating system for Hadoop. In a Hadoop installation, Windows is usually not used because of a variety of problems. Therefore, Windows is definitely not a recommended system for Hadoop.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More