Big Data "Two" HDFs deployment and file read and write (including Eclipse Hadoop configuration)

Last Update:2017-08-05 Source: Internet

Author: User

Tags readfile hadoop fs

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

A principle elaborated

1 ' DFS

Distributed File System (ie, dfs,distributed file system) means that the physical storage resources managed by the filesystem are not necessarily directly connected to the local nodes, but are connected to the nodes through the computer network. The system is built on the network, it is bound to introduce the complexity of network programming, so the Distributed file system is more complex than the ordinary disk file system.

2 ' HDFS

In this regard, the differences and contacts between GFS and HDFs view my blog in the park of senior blog >>http://www.cnblogs.com/liango/p/7136448.html

　　　　HDFS (Hadoop Distributed File System) provides the most basic storage functionality for all other components of the big data platform.

Features: High fault tolerance, high reliability, scalability, high throughput, etc., providing a powerful underlying storage architecture for big data storage and processing.

HDFs is a master/slave (Master/slave) architecture that, from an end-user perspective, is like a traditional file system that can perform CRUD operations on files through directory paths. Due to the nature of its distributed storage, the HDFs cluster has a namenode and some datanodes,namenode to manage the file system metadata, Datanode store the actual data.

The HDFs Open File System namespace allows users to store data as files, adhering to the "write once, read multiple" principle. The client accesses the file system through Namenode and datanodes interactions, contacts the Namenode to get the metadata of the file, and the real file I/O operation interacts directly with Datanode.

3 ' Applicable scenarios

HDFS provides high-throughput application data access capabilities for applications with large datasets, and here are some common application scenarios:

Data-intensive Parallel computing: The amount of data is enormous, but relatively simple parallel processing, such as large-scale web information search;

Compute-intensive Parallel computing: The amount of data is relatively not large, but the computational complexity of parallel computing, such as 3D modeling and rendering, weather forecasting and scientific calculation;

Data-dense and compute-dense hybrid parallel computing, such as 3D movie rendering.

HDFs has the following limitations during use:

HDFs is not suitable for storing large amounts of small files, because Namenode stores the file system's metadata in memory, so the number of files stored is limited by the namenode memory size;

HDFs is suitable for high throughput and is not suitable for low latency access;

Streaming read, not suitable for multiple users to write a file (a file can only be written by one client), and any location write (not support random write);

HDFs is more suitable for write-once, read multiple scenarios.

3 ' Basic commands

Format: Hadoop fs-cmd args where cmd is the specific operation and args is the parameter

Common commands:

　　　　　　　　Hadoop fs-mkdir/user/trunk #建立目录/user/trunk

Hadoop fs-ls/user #查看 directories and files in the/user directory

Hadoop fs-lsr/user #递归查看 directories and files in the/user directory

Hadoop fs-put test.txt/user/trunk #上传test. txt files to/user/trunk

Hadoop fs-get/user/trunk/test.txt #获取/user/trunk/test.txt file

Hadoop fs-cat/user/trunk/test.txt #查看/user/trunk/test.txt file contents

Hadoop Fs-tail/user/trunk/test.txt #查看 The last 1000 lines of the/user/trunk/test.txt file

Hadoop fs-rm/user/trunk/test.txt #删除/user/trunk/test.txt file

Hadoop fs-help ls #查看ls命令的帮助文档

Two HDFS deployment　　　　　

The main steps are as follows:

1. Configure the installation environment for Hadoop;

2. Configure the configuration file for Hadoop;

3. Start the HDFs service;

4. Verify that the HDFs service is available.

1 ' See if there is a Hadoop installation directory Ls/usr/cstor/hadoop if not, use the tool to import the Hadoop installation files locally.

See if the JDK is present, if you do not import the same method

2 ' Confirm ssh-free login between cluster servers

Use the SSH tool to log on to each server, execute the command SSH host name, and confirm that each cluster server can ssh-free login. How to view my blog >> http://www.cnblogs.com/1996swg/p/7270728.html

3 ' view hadoop_env.sh file, this file needs to be modified Java_home

Modify this file with the Vim editor and change the export Java_home=${java_home} to the directory of the JDK, for example, export java_home=/usr/local/jdk1.7.0_ on my Computer 79/

4 ' Specify the HDFS master node

Here you need to configure the file Core-site.xml, view the file, and modify the configuration between the <configuration></configuration> tags:

5 ' Copy this configuration to other subsets of the cluster, first view all subsets of your cluster

Input command for x in ' Cat ~/data/2/machines ', do echo $x, Scp-r/usr/cstor/hadoop/etc $x:/usr/cstor/hadoop; do　　NE; Implementing a Copy

6 ' Start HDFs node

First format the master node HDFs Namenode-format on the master server

Second, configure the slaves file to change localhost to slave1~3; 　　　

Finally, the Hadoop installation directory to launch the unified HDFS;

The JPS command is used to verify the success of the node in each subset.

7 ' HDFs can upload files to HDFs on client after a successful configuration:

Three read-write HDFs files

1 ' Configuring Client server Classpath

(1) Log in to the client server using the SSH tool, execute the command vi/etc/profile, and edit the file. Changes to/etc/profile files in Linux can involve the environment of the system, that is, the Linux environment variables.

The purpose of modifying the settings classpath is to tell the Java execution Environment in which directories you can find the Java program (. class file) you want to execute .

The following lines will be at the end:

java_home=/usr/local/jdk1.7.0_79/

Export JRE_HOME=/USR/LOCAL/JDK1.7.0_79//JRE

Export path= $PATH: $JAVA _home/bin: $JRE _home/bin

Export classpath=.: $JAVA _home/lib: $JRE _home/lib

Export Hadoop_home=/usr/cstor/hadoop

Export path= $PATH: $HADOOP _home/bin

Export hadoop_common_lib_native_dir= $HADOOP _home/lib/native

Export hadoop_opts= "-djava.library.path= $HADOOP _home/lib"

Replace with the following lines (note the path differs by itself):

java_home=/usr/local/jdk1.7.0_79/

Export Hadoop_home=/usr/cstor/hadoop

Export JRE_HOME=/USR/LOCAL/JDK1.7.0_79//JRE

Export path= $PATH: $JAVA _home/bin: $JRE _home/bin

Export classpath=.: $JAVA _home/lib: $JRE _home/lib: $HADOOP _home/share/hadoop/common/*: $HADOOP _home/share/hadoop/ common/lib/*

Export path= $PATH: $HADOOP _home/bin

Export hadoop_common_lib_native_dir= $HADOOP _home/lib/native

Export hadoop_opts= "-djava.library.path= $HADOOP _home/lib: $HADOOP _home/lib/native"

(2) The execution of the order Source/etc/profile, so that the environment variable changes to take effect;

2 ' write the HDFS program on the client server

(1) Execute the command VI WriteFile.javaon the client server and write the HDFs write file program:

1 Importorg.apache.hadoop.conf.Configuration;2 ImportOrg.apache.hadoop.fs.FSDataOutputStream;3 ImportOrg.apache.hadoop.fs.FileSystem;4 ImportOrg.apache.hadoop.fs.Path;5  Public classWriteFile {6  Public Static voidMain (string[] args)throwsexception{7Configuration conf=NewConfiguration ();8FileSystem HDFs =filesystem.get (conf);9Path DFS =NewPath ("/weather.txt"); TenFsdataoutputstream OutputStream =hdfs.create (DFS); OneOutputstream.writeutf ("NJ 20161009 23\n"); A outputstream.close (); - } -}

WriteFile.java

(2) Compiling and packaging the HDFs write program

Compile the code you just wrote with Javac and package it as Hdpaction.jar using the jar command

(3) Execute HDFs Write program

Use the Hadoop jar commandon the client server to execute Hdpaction.jar:

Check to see if the Weather.txt file has been generated, and if so, see if the contents of the file are correct:

3 ' Write HDFs read program on client server

(1) Execute the command VI Readfile.javaon the client server and write the HDFS read WriteFile.txt file program:

1 Importjava.io.IOException;2  3 Importorg.apache.Hadoop.conf.Configuration;4 ImportOrg.apache.Hadoop.fs.FSDataInputStream;5 ImportOrg.apache.Hadoop.fs.FileSystem;6 ImportOrg.apache.Hadoop.fs.Path;7  8  Public classReadFile {9    Public Static voidMain (string[] args)throwsIOException {TenConfiguration conf =NewConfiguration (); OnePath InFile =NewPath ("/weather.txt");//Read WriteFile.txt file AFileSystem HDFs =filesystem.get (conf); -Fsdatainputstream InputStream =Hdfs.open (inFile); -System.out.println ("MyFile:" +Inputstream.readutf ()); the inputstream.close (); -   } -}

Readfile.java

(2) Compile the file and package, then execute;

Four configuration eclipase Hadoop plug-in and package upload

1 ' first download the Eclipse Hadoop plugin , unzip it as a jar file and place it under the Plugins folder in the Eclipse file location, for example D:\ Eclipse-standard-kepler-sr2-win32\eclipse\plugins

2 ' Configuring the local Hadoop environment, download the Hadoop component (to Apache down bar ^_^, http://hadoop.apache.org/), unzip to

3 ' Open eclipase new project to see if there is already an option for Map/reduce project. The first time you create a new Map/reduce project, you need to specify the location after the Hadoop decompression (that is, the location of the second component decompression), in the new project filling interface right in the middle of the Hadoop path to fill;

4 ' Write Java files, such as the above Readfile.java

5 ' package into a jar file , right-click on the project's export jar file, and select the desired file to package into jar files (this step is the focus)

　　　　>>>>>>>>>>>>

6 ' sftp tool using WinSCP,Xmanager or other ssh tools to upload the newly generated Hdpaction.jar package to the client server (I use a tool) and use the hadoop jar command on the client server to execute Hdpaction.jar, view the program run results.

> Run the jar file for Hadoop jar ~/hdpaction.jar ReadFile
　　　　　　

Summarize:

For the HDFs file reading and writing learning, very basic is also very important, in the later about Yarn,mapreduce and so on the study should be based on above to gradually understand.

Only one failure is halfway. So every day of learning will gradually accumulate, imperceptible.

Big Data "Two" HDFs deployment and file read and write (including Eclipse Hadoop configuration)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More