Hadoop pseudo-distributed cluster setup and installation (Ubuntu system)

Source: Internet
Author: User
Tags hadoop fs

1:vmware virtual software installed in the Ubuntu operating system after the use of ifconfig command to view the IP;

2: Use the Xsheel software to remotely link their own virtual machine, easy to operate. After entering your Ubuntu operating system's account password, the link is successful;

3: Modify the name of the host Vi/etc/hostname and the domain name and the host map corresponding to the relationship vi/etc/hosts, after the change is effective, I can ping, I here IP corresponding to master, such as Ping Master after the discovery can ping;

4: After modifying the host name and host name corresponding to the IP relationship; start uploading the JDK, using the FileZilla tool to upload JDK files and other files to the Ubuntu operating system;

Click the left mouse button to select the file you want to upload to the right, as shown below:

Upload success can be checked, the default is uploaded to the root directory, the display has been uploaded successfully;

5: After uploading, create a folder to store uploaded files or compressed packages;

Remember-c is uppercase, lowercase-C will error, see the following test results;

After decompression, you can go into the Hadoop directory you created to see the effect, determined that it has been decompressed;

6: After extracting the JDK, start adding Java to the environment variable (configure the JDK environment variable in ubuntu OS):

Go in and press Shift+g to the last face, to the front double-click G, click a/s/i These three any one letter into the command line mode, you can modify the configuration file;

There are many ways to configure this, which is just one of them.

After you have configured the JDK, you can test whether the configuration was successful, such as if you did not use Source/etc/profile to refresh the configuration;

You can view the version of Java after you refresh the configuration with Source/etc/profile.

Here is a little episode, my Linux version of the JDK for the first time seems to be unable to use, reported the wrong, thought not configured Well, later found that the JDK is wrong, so here are careful;

7: Start uploading Hadoop and decompressing hadoop; upload and upload jdk like the practice, here do not do much narration;

View the directory for Hadoop: Hadoop-2.4.1/share/hadoop inside is the core jar package;

8: After decompression, start configuring Hadoop and find the path shown below;

Modify the following configuration files, as shown in the following configuration:

Modify the first configuration file, hadoop-env.sh;

The changes are as follows: The main modification is the java_home of the JDK; If you forget your JDK directory you can execute the command echo $JAVA _home copy the results;

Modify the second configuration file: Core-site.xml;

The contents of the modification are as follows: because it is pseudo-distributed, the node configuration is configured directly with the host name;

Modify the third configuration file: Hdfs-site.xml

The contents of the modification are as follows:

Modify the fourth configuration file: First modify the mapred-site.xml.template to Mapred.site.xml, and then start to modify the configuration file;

The contents of the modification are as follows:

Modify the fifth configuration file: Yarn-site.xml;

The contents of the modification are as follows: This configuration basically ends;

Modify the sixth configuration file: VI Slaves

The modified content is your own host name:

9: Check the status of the firewall under Ubuntu and turn off the firewall:

Shown is to turn off the firewall, view the status of the firewall, start the firewall and view the state of the firewall;

10: In order to perform Hadoop commands conveniently, also configure the environment variables of Hadoop, the same vi/etc/profile, the configuration is as follows:

After the configuration save remember source/etc/profile refresh configuration;

11: Formatted Namenode (is initialized for Namenode)

after you execute the format command, you see that successfully indicates a successful format;

12: Start Hadoop, start hdfs,sbin/start-dfs.sh First, then start yarn,sbin/start-yarn.sh;

The output of the boot process is about three times Yes and password;

13: Verify that Startup is successful, verify with JPS command, view several processes, and start start-dfs.sh and start-yarn.sh respectively;

14: After building a pseudo-distributed cluster, you can access the Cluster Web Service in window;

15: Simply test and upload a file to HDFs as follows:

Go to the Web service to see the effect as follows: just upload the file;

16: Download the files from the HDFs distributed cluster:

The effect is as follows:

17: Test the effects of MapReduce using the MapReduce program that comes with Hadoop:

the procedure for calculating pi;

Simple use of mapreduce to calculate the number of words as an example;

Create a count.txt to test the number of words repeated inside:

Because the data is running on the cluster, so the file should be put on the cluster;

First, you need to create a folder for storing files;

The created folder can be viewed in the Web server as follows:

Place the newly created Count.txt file in the input folder as follows:

Start the word stress test with the self-brought case of MapReduce:

You can query the results after execution: You can also go directly to the Web server to view the results of the execution;

You can use commands to view the results of the execution as follows:

The idea of the general realization of HDFS:
1:hdfs is a distributed cluster to store files, providing clients with a convenient way to access, is a virtual directory structure
2: When files are stored in the HDFs cluster, they are cut into block blocks.
3: block of files is stored on several datanode nodes
Files in the 4:hdfs file system are mapped between real blocks, managed by Namenode
5: Each block in the cluster will store multiple replicas, the advantage is that it can improve the reliability of data, but also provide access to the throughput;

18:hdfs frequently used commands:

Hadoop FS Displays the capabilities of the Hadoop FS
Hadoop Fs-ls/Enumerate the folders under a directory Hadoop FS-LSR enumerates the files in the folder and its folders under a directory Hadoop Fs-mkdi R/user/hadoop Create a Hadoop folder under the user folder Hadoop fs-put a.txt/user/hadoop/upload the a.txt file to the user folder              Hadoop folder under Hadoop fs-get/user/hadoop/a.txt/get to the user folder under the Hadoop folder under the A.txt file Hadoop FS-CP/original path/destination path Copy files, copy from original path to target path Hadoop FS-MV/original path/destination pathMove from the original path to the target path Hadoop fs-cat/user/hadoop/a.txt View the contents of the A.txt file Hadoop fs-rm/user/hadoop/a.txt Delete US The A.txt file below the Hadoop folder under the ER folder
Hadoop fs-rm-r/user/hadoop/a.txt recursive deletions, folders and filesThe Hadoop fs-copyfromlocal/local Path/destination path is similar to the Hadoop fs-put feature. Hadoop fs-movefromlocal localsrc DST uploads local files to HDFs while deleting local files. Hadoop fs-chown User name: User group name/file name modification belongs to the user and user group, permission modification
Hadoop fs-chmod 777/File name file permissions can be read writable executable permission modification
Hadoop fs-df-h/View disk space under the root directory, usable and unused, etc.
Hadoop fs-du-s-H/View the size of a file
Hadoop fs-du-s-H hdfs://hostname: 9000/* View the size of all files under the root directory

To be continued .....

Hadoop pseudo-distributed cluster setup and installation (Ubuntu system)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.