Discover how to write mapreduce program in hadoop, include the articles, news, trends, analysis and practical advice about how to write mapreduce program in hadoop on alibabacloud.com
Hadoop Reading Notes series article:http://blog.csdn.net/caicongyang/article/category/2166855 (series of articles will be gradually trimmed to complete, add data file format expected related comments)1. Description:From the given file, find the maximum of 100 values, given the data file format as follows:5331656517800292911374982668522067918224212228227533691229525338221001067312284316342740518015 ...2. Use the TreeMap class in the code below, so
1. To write MapReduce on eclipse, you need to install the Hadoop plug-in on eclipse by installing the contrib/eclipse-plugin/in the Hadoop installation directory The Hadoop-0.20.2-eclipse-plugin.jar is copied to the plugins directory in the Eclipse installation directory.
2.
to facilitate the MapReduce direct access to the relational database (mysql,oracle). Hadoop offers two classes of Dbinputformat and Dboutputformat. Through the Dbinputformat class, the database table data is read into HDFs, and the result set generated by MapReduce is imported into the database table according to the Dboutputformat class.error when executing
(Firstpartitioner.class);//partition functionJob.setsortcomparatorclass (Keycomparator.class);//This course does not have custom sortcomparator, but instead uses Intpair's own sortJob.setgroupingcomparatorclass (Groupingcomparator.class);//Group functionJob.setmapoutputkeyclass (Intpair.class);Job.setmapoutputvalueclass (Intwritable.class);Job.setoutputkeyclass (Text.class);Job.setoutputvalueclass (Intwritable.class);Job.setinputformatclass (Textinputformat.class);Job.setoutputformatclass (Text
In the development of the Mr Program of Hadoop, it is often necessary to statistic some Map/reduce's running state information, which can be implemented by custom counter, which is done by the Code runtime check instead of the configuration information.1. Create a Counter enumeration class of your own.enum Process_counter { bad_records, bad_groups;}2, in need of statistics, such as map or reduce phase
Reference: http://blog.csdn.net/zklth/article/details/11829563Hadoop processing GBK text, found that the output is garbled, the original Hadoop is involved in encoding is written dead UTF-8, if the file encoding format is other types (such as GBK), it will appear garbled.When you simply read the text in the Mapper or reducer program, use TransformTextToUTF8 (text, "GBK"), and transcode to ensure that it is
file size does not exceed 1.1 times times the Shard size, it will be divided into a shard, avoid opening two map, one of the running data is too small, wasting resources.Summary, the Shard process is about, first traverse the target file, filter some non-conforming files, and then add to the list, and then follow the file name to slice the Shard (the size of the previous calculation of the size of the formula, the end of a file may be merged, in fact, often
operations, and the default ones are not used.
Define KeyPair
The custom output type is run by putting the map's output into reduce, so you need to implement the Writablecomparable interface of Hadoop, and the template variable for that interface is KeyPair, It's like longwritable a meaning (see longwritable's definition to know)
To implement the Writablecomparable interface, you must override the Write/re
= serverSocket. accept ();
// Construct a data input stream to receive data
DataInputStream in = new DataInputStream (soc. getInputStream ());
// Construct a data output stream to send data
DataOutputStream out = new DataOutputStream (soc. getOutputStream ());
// Disconnect
Soc. close ()
Client Process
// Create a client Socket
Socket soc = new Socket (serverHost, port );
// Construct a data input stream to receive data
DataInputStream in = new DataInputStream (soc. ge
Using python to write MapReduce functions -- Taking WordCount as an ExampleAlthough the Hadoop framework is written in java, the Hadoop program is not limited to java, but can be used in python, C ++, ruby, and so on. In this example, wr
Step OneIf not, do not set up the HBase development environment blog, see my next blog.HBase Development Environment Building (Eclipse\myeclipse + Maven) Step one, need to add. As follows:In the project name, right-click,Then, write Pom.xml, here not much to repeat. SeeHBase Development Environment Building (Eclipse\myeclipse + Maven)When you are done, write the code, right.Step two some steps after the HB
"the Input Folder you want to pass to the program and the folder you want the program to save the computing result" in program arguments, for example, Java code
HDFS: // localhost: 9000/user/panhuizhi/input01 HDFS: // localhost: 9000/user/panhuizhi/output01
Here input01 is the folder you just uploaded. You can enter the folder address as needed.
4. Click Run
, using the tool class to use the job * @param args */public static void main (string[] args) thro WS Exception {if (args = = NULL | | Args.length Accesslogwritable.java Package Com.uplooking.bigdata.mr.secondsort;import Org.apache.hadoop.io.writablecomparable;import Java.io.datainput;import java.io.dataoutput;import java.io.ioexception;/** * Custom Hadoop data type, as key, Need to implement Writablecomparable interface * Map in order to compare the
the host name that I customized in "C: \ windows \ system32 \ drivers \ etc \ hosts: 218.195.250.80 master
If the following "DFS locations" is displayed in eclipse, it means that eclipse has successfully connected to remote hadoop (Note: Do not forget to switch your view to the map/reduce view, instead of the default Java view ):
3. Now let's test the maxtemperature example program in the
I. Installation and setup of Eclipse1. Download the eclipse-jee-oxygen-3a-linux-gtk-x86_64.tar.gz file on the Eclipse official website and copy it to/home/jun/resources, and then copy the file to/home/ June and unzip.CP /home/jun/resources/eclipse-jee-oxygen-3a-linux-gtk-x86_64. tar. gz/home/jun/tar -zxvf/home/jun/eclipse-jee-oxygen-3a-linux-gtk-x86_64. Tar2. Execute the. Eclipse program to start eclipse[Email protected] ~]$ CD eclipse/lsartifacts.xml
, for example D:\ Eclipse-standard-kepler-sr2-win32\eclipse\plugins2 ' Configuring the local Hadoop environment, download the Hadoop component (to Apache down bar ^_^, http://hadoop.apache.org/), unzip to3 ' Open eclipase new project to see if there is already an option for Map/reduce project. The first time you create a new Map/reduce project, you need to specify the location after the
-gtk.tar.gzCopy to home directory and unzip[email protected] downloads]$ CP eclipse-sdk-3.7.2-linux-gtk.tar.gz/home/liuqingjie/[Email protected] ~]$ TAR-ZXVF eclipse-sdk-3.7.2-linux-gtk.tar.gzStart Eclipse (provided you enter the graphical interface):[[Email protected] ~]$ CD eclipse[Email protected] eclipse]$./eclipseStep Two: Configure the MapReduce program development environment1. Copy the
To help you compile mapreduce programs, I have specially compiled a script that can be used to compile and run mapreduce programs directly and written in bash awk.The usage is as follows:
1. CD hadoop/to the hadoop directory2. If the script is used for the first time, you need to create a new playground directory and a
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.