Chapter 2 mapreduce IntroductionAn ideal part size is usually the size of an HDFS block. The execution node of the map task and the storage node of the input data are the same node, and the hadoop performance is optimal (Data Locality optimization, avoid data transmission over the network ).
Mapreduce Process summary: reads a row of data from a file, map function processing, Return key-value pairs; the system sorts the map results. If there are multi
Protoc (requires compiling the specified compiled path./configure--prefix=/usr/app/protoc)Config/etc/profileMvn-v OKProtoc--version OK SVN download Source Compile HadoopMVN Package-dskiptests-pdist,native,docs-dtar (-dtar comes with generating a. Tar installation package)SVN checkout http://svn.apache.org/repos/asf/hadoop/common/trunk/(Hadoop trunk or/common/tags/x.x.x for oldVersionThe compiled storage
/home/large.zip testfile.zip
Copy the local file large.zip to the root directory of HDFS/user/hadoop/. The file name is testfile.zip. view the existing files:
[Hadoop @ hadoop1 hadoop] $ sbin/hadoop dfs-ls
9. hadoop online update
: hadoop DFS-ls HDFS file path
4. Know how to create a file copy in the HDFS File System: hadoop DFS-CP original file target file
5. Know how to delete a file in the HDFS File System: target file to be deleted by hadoop DFS-rm
6 What if I want to delete the file directory on HDFS? The file directory name on the
=/opt/lib64/hadoop-2.5.1Export PATH = $ HADOOP_HOME/bin: $ PATHExport CLASSPATH = $ HADOOP_HOME/lib: $ CLASSPATH
Save (ESC,: wq)
Oh, don't forget to run the source/etc/profile command on the terminal to make the modified profile take effect immediately.
Then go to the etc/hadoop/(not the system's etc, but under hadoop) under
reads local data, while pseudo-distributed reads data on HDFS.
To use HDFS, you must first create a user directory in HDFS:#./Bin/hdfs dfs-mkdir-p/user/hadoop#./Bin/hadoop fs-ls/user/hadoopFound 1 itemsDrwxr-xr-x-hadoop supergroup 0/user/hadoop/input
Next. the xml file in/e
) View HDFs system[[emailprotected] ~] $ hadoop fs -ls /View the Hadoop HDFs file management system through Hadoop fs-ls/commands, as shown in the Linux file system directory. The results shown above indicate that the Hadoop stand
1. Hadoop Java APIThe main programming language for Hadoop is Java, so the Java API is the most basic external programming interface.2. Hadoop streaming1. OverviewIt is a toolkit designed to facilitate the writing of MapReduce programs for non-Java users.Hadoop streaming is a programming tool provided by Hadoop that al
Debian source list, it is not easy to use. The simplest way is to download Sun's JDK-> decompress-> modify java_home information. 1. Prepare the JDK file.
This article describes how to copy files to the VM system through SSH. 2. Install JDK
I installed it under/usr/lib/JVM/jdk1.7.0 _ 21 (this directory should be consistent in all servers, otherwise it would be a dead man ~)
: Sudo tar xvf ~ /Downloads/javasjdk).tar.gz-C/usr/lib/JVM
: CD/usr/lib/JVM
:
Mapred-site.xml
Create a file in the directory, fill in the above content configuration Yarn-site.xml
start Hadoop
Execute First: Hadoop namenode-format
Then start hdfs:start-dfs.sh, if the Mac computer shows localhost port 22:connect refused, need to set-share-tick telnet, allow access to that add current user.
You will be asked to enter the password 3 times after executing start-dfs.sh.
Then: start-
a source directory and a target file as input, and connects all files in the source directory to the local destination file. ADDNL is optional and is used to specify that a newline character be added at the end of each file. ls
How to use: Hadoop fs-ls
If it is a file, the file information is returned in the following format:File name If it is a directory,
.
Example:
Hadoop FS-Get/user/hadoop/file localfile
Hadoop FS-Get HDFS: // host: Port/user/hadoop/file localfile
Return Value:
0 is returned for success, and-1 is returned for failure.
Getmerge
Usage: hadoop FS-getmerge
Accept a source directory and a target file as inp
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.