Hadoop Copvin-45 common questions and Answers
1.What are the 3 modes that a Hadoop cluster can run?
- Stand-alone (local) mode
- Pseudo-distributed mode
- Fully distributed mode
2. note points in stand-alone (local) mode?
There is no daemon in stand-alone mode (standalone), and everything runs on a JVM. There is also no DFS here, using the local file system. Stand-alone mode is suitable for running mapreduce programs during development, which is also the least used mode.
3. What are the points of note in pseudo-distribution mode?
Pseudo-distributed (Pseudo) is suitable for development and test environments where all daemons run on the same machine.
4. can VMS be called pseudo?
No, two things, and pseudo only for Hadoop.
5. What are the points of attention in the full distribution model?
The full distribution pattern is typically used in production environments where we use n hosts to form a Hadoop cluster, with the Hadoop daemon running on top of each host. There will be Namenode running hosts, Datanode running hosts, and task tracker running hosts. In a distributed environment, the master and slave nodes are separated.
6. does Hadoop follow Unix mode?
Yes, Hadoop also has a "conf" directory under UNIX use cases.
7. What directory is Hadoop installed in?
Cloudera and Apache use the same directory structure, and Hadoop is installed in cd/usr/lib/hadoop-0.20/.
8. What is the port number for Namenode, Job Tracker, and task tracker?
Namenode,70;job Tracker,30;task tracker,60.
9. What is the core configuration of Hadoop?
The core configuration of Hadoop is done through two XML files: 1,hadoop-default.xml;2,hadoop-site.xml. These files are in XML format, so there are attributes in each XML, including names and values, but these files are no longer present.
How do I configure it now?
Hadoop now has 3 configuration files: 1,core-site.xml;2,hdfs-site.xml;3,mapred-site.xml. These files are saved in the conf/subdirectory.
What is the overflow factor for ram?
The overflow factor (spill factor) is the size of the file stored in the temporary file, which is the Hadoop-temp directory.
Fs.mapr.working.dir is just a single directory?
Fs.mapr.working.dir is just a directory.
3 main attributes of Hdfs-site.xml?
- Dfs.name.dir determines the path to the metadata store and how DFS is stored (disk or remote)
- Dfs.data.dir determines the path to the data store
- Fs.checkpoint.dir for the second Namenode
How do I exit input mode?
Exit the input by: 1, press esc;2, type: Q (If you do not enter any now) or type: Wq (if you have entered now), and press ENTER.
What happened to the system when you entered HADOOPFSCK/caused "Connection refused Java exception"?
This means that Namenode is not running on top of your VMS.
We use Ubuntu and Cloudera, so where do we go to download Hadoop, or is it installed with Ubuntu by default?
This is the default configuration for Hadoop and you must download it from Cloudera or Edureka Dropbox and run it on your system. Of course, you can also configure it yourself, but you need a Linux box,ubuntu or red Hat. There are installation steps on the Cloudera website or in Edureka Dropbox.
What is the use of the "JPS" command?
This command checks to see if Namenode, Datanode, Task Tracker, job Tracker are working properly.
How do I restart Namenode?
- Click Stop-all.sh, and then click Start-all.sh.
- Type sudo hdfs (enter), Su-hdfs (Enter),/etc/init.d/ha (enter), and/etc/init.d/hadoop-0.20-namenode start (enter).
full name of fsck?
The full name is: File System Check.
How do I check if the namenode is working properly?
If you want to check if Namenode is working correctly, use the command/etc/init.d/hadoop-0.20-namenode status or simply JPS.
What is the role of the Mapred.job.tracker command?
You can let you know which node is the job Tracker.
What is the purpose of the/ETC/INIT.D command?
/ETC/INIT.D illustrates the location or state of the Daemon (service), which is actually a Linux feature that has little to do with Hadoop.
How do I find namenode in the browser?
If you do need to find Namenode in the browser, you no longer need to localhost:8021,namenode the port number is 50070.
How do I go from su to Cloudera?
To go from Su to Cloudera, you only need to type exit.
What files do I use to start and close commands?
Slaves and Masters.
What is the composition of slaves?
Slaves consists of a list of hosts, each of 1 rows, used to describe the data node.
What is the composition of masters?
The Masters is also a list of hosts, each row, that describes the second Namenode server.
What is hadoop-env.sh used for?
HADOOP-ENV.SH provides a running environment for the. Java_home in Hadoop.
Does the . Master file provide multiple portals?
Yes, you can have multiple master file interfaces.
What is the current location of the . hadoop-env.sh file?
Hadoop-env.sh is now located in Conf.
in Hadoop_pid_dir, what does the PID represent?
The PID represents the "Process ID".
What is/var/hadoop/pids used to do?
The/var/hadoop/pids is used to store the PID.
What is the purpose of the . hadoop-metrics.properties file?
Hadoop-metrics.properties is used as a "Reporting" to control Hadoop reports, and the initial state is "not to the report".
What kind of network does Hadoop need?
The Hadoop core uses shell (SSH) to drive server processes from the node and use password-less SSH connections between the primary and slave nodes.
Why is the requirement password-less SSH in a fully distributed environment?
This is mainly because the communication in the cluster is too frequent, and the job tracker needs to release tasks to task tracker as quickly as possible.
does this cause security problems?
Don't worry at all. Hadoop clusters are completely isolated and generally cannot be operated from the Internet. Unique configuration, so we do not need to care about this level of security vulnerabilities, such as through the Internet intrusion and so on. Hadoop provides a relatively safe way to connect machines.
What is the port number of an SSH operation?
The port number of SSH work is no.22, of course it can be configured, 22 is the default port number.
What are the points of note in SSH?
SSH is just a secure shell communication, it can be used as a protocol on no.22, only need to configure a password to secure access.
Why does the SSH local host require a password?
The use of passwords in SSH is primarily to increase security and in some cases does not set up password communication at all.
If I add a key to SSH, do I need to set a password?
Yes, you need to set a password even if you add a key to SSH.
What if there is no data in Namenode?
Namenode without data cannot be called Namenode, and usually namenode will have data.
What happens to Namenode when job tracker is down?
When the job tracker fails, the cluster still works, as long as the namenode is not a problem.
Is it a client or a Namenode decision to enter a shard?
This is not determined by the client, in the configuration file, and in determining the Shard details.
Can I build a Hadoop cluster on my own?
Yes, as long as you're familiar enough with the Hadoop environment, you can do it completely.
Is it possible to run Hadoop on Windows?
You'd better not do this, Red Hat Linux or Ubuntu is the best operating system for Hadoop. In a Hadoop installation, Windows is usually not used because of a variety of problems. Therefore, Windows is definitely not a recommended system for Hadoop.
Original link: Hadoop interview questions–setting up Hadoop cluster!
Hadoop Copvin-45 Frequently Asked questions (CSDN)