Test and verify the hadoop cluster function of hadoop Learning

Source: Internet
Author: User

A few days ago, I summarized the hadoop distributed cluster installation process. Building a hadoop cluster is only a difficult step in learning hadoop. More knowledge is needed later, I don't know if I can stick to it or how many difficulties will be encountered in the future. However, I think that as long as I work hard, the difficulties will always be solved. This article mainly summarizes the content related to hadoop cluster testing. For every programmer, practice is the best teacher. This process is like learning a lot of theoretical knowledge during programming, on the surface, we know that this is the same thing. Once we practice it, we will be in a mess. If we don't always feel that there is something missing, let's test the hadoop cluster we installed a few days ago to see if it works well?

This test mainly consists of two steps: HDFS and map-reduce to check whether the functions of these two parts are normal:

1. Verify that the HDFS file system function of hadoop is normal.

But how should we verify it? We know that a file system is used to process file-related systems, including file copying, deletion, and viewing. If it has these functions, the file system is normal. However, there is still a problem in front of me. In the hadoop cluster environment currently installed, its HDFS file system should be empty. First, you must add some materials to it. In other words, from the perspective of Linux, it is obviously the most basic operation to copy files from the Linux File System to hadoop HDFS. In fact, hadoop has helped us think about these problems. Here, we first prepare the materials, that is, the content to be uploaded to hadoop's HDFS system. The process is as follows:

Here, we first create two common files test1.txtand test2.txt, which contain several words. Store these two files in the input directory of Linux. Then, you need to upload the two files to the HDFS file system in some way. The process is mainly completed by the following command:

Hadoop: program name

DFS: a parameter of the program. This parameter mainly indicates that the DFS file system is to be operated.

-Put: a parameter included in the program. This parameter indicates the "Upload" action.

./: A program parameter. This parameter indicates the path of the uploaded source file. In this example, it indicates that the files under the input directory are to be uploaded ./

In: this parameter indicates the destination address (PATH) of the file to be uploaded. At this time, it is uploaded to the In directory in the main directory of HDFS.

 

The second command displays the file content in the in directory of the HDFS file system. These operations are a bit similar to Linux. The difference is that hadoop uses shell commands in Linux as parameters of HDFS File System to execute operations. I think this is a good idea, at least it can make friends familiar with Linux feel more friendly to hadoop. After passing the second command, we can see that we have successfully uploaded two files test1.txtand test2.txt, and successfully displayed file operations under the in directory in the HDFS File System of hadoop.

But it is not enough. I have to try to see if the file copy function can be implemented in HDFS, as shown below:

Run the following command to generate the test1.txt file named test1.txt. Bak. You can see that this step is successful by looking at the In directory in the HDFS system. How can I delete a file in the HDFS file system? In fact, it is also very simple, such:

As you can see, the beibeitest1.txt. Bak in the preceding file has been deleted successfully. Even if we can upload files to the hadoop file system, we should be able to download the files in HDFS in hadoop. This implementation is also very simple, similar to uploading, you only need to change the parameter-put to-get, for example:

In the preceding command, create an empty directory named dir_from_hdfs, download the two files you just uploaded from HDFS, and download them to dir_from_hdfs.

Here, we will briefly summarize the previous operations:

1. copy the files in the Linux File System to a path in the HDFS File System: hadoop DFS-put Linux source file path hadoop HDFS file path

2. copy the files in the HDFS file system to a path in the Linux File System: hadoop DFS-Get hadoop HDFS file path Linux source file path

3. Know how to display files in the HDFS File System: hadoop DFS-ls HDFS file path

4. Know how to create a file copy in the HDFS File System: hadoop DFS-CP original file target file

5. Know how to delete a file in the HDFS File System: target file to be deleted by hadoop DFS-rm

6 What if I want to delete the file directory on HDFS? The file directory name on the hadoop DFS-rmr hdfs system is not shown here.

 

Through the above steps, we can prove that the HDFS File System in the hadoop cluster that was previously installed can work normally.

2. Verify that the map-Reduce function of hadoop is normal.

Hadoop HDFS implements file management operations. Map-Reduce mainly processes jobs. How can we verify whether the map-Reduce function is normal? The principle is actually very simple. We can use a test case that comes with hadoop. we enter a job and then execute the map-Reduce program to let it process the operation job, and then check whether the operation result is correct, if it is correct, it indicates that the map-Reduce function is normal. First, we will execute a map-performanceprogram on hadoop(the main point is to count the words in test1.txtand test2.txt files ):

A simple explanation of the commands in is as follows:

Hadoop jar ../hadoo/hadoop-0.20.2-examples.jar wordcount in out

Program name: wordcount function input path in Java package of Java program 

In fact, the above operations can be seen as inputting some materials to the hadoop cluster for processing. There are two input conditions (here we should refer to the files test1.txtand test2.txt ), map-Reduce is then allocated to each node for job processing. If no warning or error occurs, the execution is successful.

After the program is executed, view the HDFS file system again:

It can be seen that there is an out directory in the HDFS file system, which stores the output results after the map-Reduce program is executed. View the content in the out directory:

As shown in, the out directory contains two main items, one of which is the Directory, which stores the log information of some processes, and the other is the file used to store the execution results of Map-reduce.

The following shows the execution result of Map-reduce:

By viewing the file content, we have seen the program output result, which is correct. Therefore, this proves that the map-Reduce function is normal.

 

The above shows how to view file data through the HDFS File System of hadoop. This is natural, but if you want to view the file data on HDFS in hadoop from the perspective of the Linux File System, what is it like? For example:

Because data is stored in datanode in the hdfs file system, you should view it from the slave node. From the perspective of Linux, the files on HDFS are mainly metadata and data items, which constitute a complete file, that is to say, when viewing the data content of HDFS files from a Linux perspective, it is a bunch of messy things and it makes no sense.

Finally, remind yourself that you can view hadoop HDFS and cluster namenode on the web site, this is mainly from the view for easy viewing.

Http: // 192.168.1.100: 50030

Http: // 192.168.1.100: 50070

 

Supplement:

1. Run the following command to view the basic statistics of HDFS in hadoop:

2. How to enter and exit the hadoop Security Mode

From: http://blog.csdn.net/ab198604

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.