Hadoop version 1.2.1
Jdk1.7.0
Example 3-1: Use the urlstreamhandler instance to display files of the hadoop File System in standard output mode
hadoop fs -mkdir input
Create two files, file1, file2, and file1, as Hello world, and file2 as Hello hadoop, and then upload the f
name including the package path needs to be specified after ***.jar when running the Hadoop jar command
For example, Hadoop jar/home/hadoop/documents/hadooptest.jar hadoop.test.maxtemperature/user/hadoop/temperature output
)
4 data that will be analyzed is sent to HDFs
The recent use of Web projects in Hadoop,hbase, a summary of the use of spring for Hadoop (SHDP), mainly uses some of the encapsulated HBase modules provided in SHDP. This blog will be written in two parts:
Basic understanding of spring for hadoop--working with HBas
(result.getValue("T".getBytes(),"key".getBytes()));returnt;}});
hbasetemplate Introduction to Common methods:Hbasetemplate.get ("Gw_tiles", "0_1_1", new rowmapper are commonly used for queries, as shown in the following example:
1234567891011
Tile t = hbaseTemplate.get("GW_TILES","0_1_1",newRowMapper@OverridepublicTile mapRow(Result result,introwNum)throws Exception {// TODO Auto-generated method stubTile t =newTile();t.set
[Spring tutorial] create a simple spring example, spring Tutorial example
1. First, the main idea of spring is dependency injection. Simply put, there is no need to manually create new objects, and these objects are managed by the
Springhadoop is called HDFs through the spring framework, and the biggest difference from calling HDFs directly is that spring generates the configuration and filesystem objects needed to operate HDFs through dependency injection. All other APIs that call HDFs do not change
1. Create the Resources folder in the project's main directory and add it to the project's resource file.??2. Create Beans.xml and
hadoop
1) download the corresponding hadoop file from http://hadoop.apache.org/common/releases.html#download( I downloaded version 1.0.3)
2) decompress the file
Command: tar-xzf hadoop-1.0.3.tar.gz
3) test whether hadoop is successfully installed (go to The hadoop installat
Hadoop example code:
1. creatinga configuration object: to be able to read from or write to HDFS, you need tocreate a configuration object and pass configuration parameter to it usinghadoop configuration files.
ImportOrg. Apache. hadoop. conf. configuration;
ImportOrg. Apache. hadoop. fs. path;
PublicClassMain {
Publi
Document directory
1. Map stage
3. Let's take a general look at the Running code of the job:
This series of hadoop learning notes is based on hadoop: the definitive guide 3th, which collects additional information on the Internet and displays hadoop APIs, and adds its ownPracticeIs mainly used to learn the features and functions of
Data deduplication:
Data deduplication only occurs once, so the key in the reduce stage is used as the input, but there is no requirement for values-in, that is, the input key is directly used as the output key, and leave the value empty. The procedure is similar to wordcount:
Tip: Input/Output path configuration.
Import Java. io. ioexception; import Org. apache. hadoop. conf. configuration; import Org. apache. h
Immediately after the completion of the installation and running of Hadoop, it is time to run the relevant example, and the simplest and most straightforward example is the HelloWorld-wordcount example. Follow the blog to run: http://xiejianglei163.blog.163.com/blog/static/1247276201443152533684/ First create a folde
#pi值示例hadoop Jar/app/cdh23502/share/hadoop/mapreduce2/hadoop-mapreduce-examples-2.3.0-cdh5.0.2. Jar PI - $#生成数据 The first parameter is the number of rows the second parameter is the location of the Hadoop jar/app/cdh23502/share/hadoop/mapreduce2/
=131Reduce Input groups=131Reduce Shuffle bytes=1836Reduce Input records=131Reduce Output records=131For:warn-unable to load Native-hadoop library for your platform ... using Builtin-java classes where applicable this warning is fine, run It's not reported on Linux.8.Finally, try to split the wordcount into subclasses and move the mapper out of the way, because sometimes multiple classes will prompt for an error:Delete Output directory, rerun: OK, no
I used to write some mapreduce programs in Java. Here's an example of using Python to implement MapReduce via Hadoop streaming.Task Description:There are two directories on HDFS/A and/b, there are 3 columns in the data, the first column is the ID, the second column is the respective business type (this assumes the/a corresponds to a,/b b), and the third column is a JSON string. One
]
Ssh-copy-id–i ~/.ssh/id_rsa.pub [email protected]
The purpose of this is to SSH from Hadoopnamenode to the other three servers without requiring a password. After Ssh-copy-id, the public key is actually added to the other three server ~/.ssh/authorized_keys files.For example, to log in to Hadoop2ndnamenode from Hadoopnamenode, the process is probably: Hadoop2ndnamenode sends a random string to Hadoopnamenode, and Hadoopnamenode encrypts it
;ImportOrg.apache.hadoop.mapreduce.Job;ImportOrg.apache.hadoop.mapreduce.Mapper;ImportOrg.apache.hadoop.mapreduce.Reducer;ImportOrg.apache.hadoop.mapreduce.lib.input.FileInputFormat;ImportOrg.apache.hadoop.mapreduce.lib.output.FileOutputFormat;ImportOrg.apache.hadoop.util.GenericOptionsParser;/*** Description: WordCount explains by York *@authorHadoop Dev Group*/publicclass WordCount {/*** Build Mapper class tokenizermapper inherit from generic class Mapper * Mapper class: Implements the Map fun
Step 4: configure the hadoop pseudo distribution mode and run the wordcount example
The pseudo-distribution mode mainly involves the following configuration information:
Modify the hadoop core configuration file core-site.xml, mainly to configure the HDFS address and port number;
Modify the HDFS configuration file hdfs-site.xml in
Inkfish original, do not reprint commercial nature, reproduced please indicate the source (http://blog.csdn.net/inkfish). (Source: Http://blog.csdn.net/inkfish)
Pig is a project Yahoo! donated to Apache and is currently in the Apache Incubator (incubator) phase, and the current version is v0.5.0. Pig is a large-scale data analysis platform based on Hadoop, which provides the sql-like language called Pig Latin, which translates the SQL-class data analy
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.