The original ecosystem runs Java programs on Hadoop

Source: Internet
Author: User
Tags hadoop fs

The first kind: The original ecological running jar package
1, using Eclipse to write Map-reduce method, generally introduce Hadoop-core-1.1.2.jar. Note that there is no plugin for Hadoop installed here in Eclipse, just introducing its turn packet, which can be installed in Windows or Linux, if installed in Windows, and installed on its virtual machine Linux, can be passed through the shared folder.
2. Write the data to be tested, as named TempData
3, the use of Eclipse's export packaging has been written, in the use of Eclipse packaging jar, only need to select SRC, generally only package program files, and need to select the main class, the jar to be placed in the/home/hadoop/docum/ Test.jar
4, the data to be analyzed is transmitted to HDFs
Hadoop fs-put/home/hadoop/myhadoopdata/tempdata./testdata/
5, start executing the jar
Hadoop Jar/home/hadoop/docum/test.jar/user/hadoop/temperatur Output
This is a use of jars to run.
There's no need to upload the jar to HDFs, Test.jar.
The parameters are sequentially
The local mapred program, HDFS test data is the input file, output folder.
Hadoop Jar/home/hadoop/temperature.jar InputPath OutputPath
Note: You may not need to specify the name of the class here, and the output folder OutputPath cannot already exist beforehand.

the second type: pseudo-distributed running WordCount
1, copy the source code
Cp/usr/local/hadoop1.1.2/src/examples/org/apache/hadoop/examples/wordcount.java ~/ygch/hadoop/
2, compile the source code, put it in the specified folder below class
Javac
-classpath/usr/local/hadoop1.1.2/hadoop-core1.1.2.jar:
/usr/local/hadoop1.1.2/lib/commons-cli-1.2.jarwordcount.java-d class,
Use the-classpath option to specify the jar package required by the WordCount. There are two locations for the jar packages in the Hadoop directory: the root directory and the/lib directory. We can then view the contents of the jar package through the jar TVF *.jar, so that we can know which jar packages wordcount need.
The-d option specifies the location of the generated class file, which must be re-specified at compile time and cannot have the class file generated in the current directory.
3. Package the class file into a jar package:
Jar CVF wordcount.jar-c classes/.
Be careful not to forget that there is a point at the end. This dot must be preceded by a space, or the jar command will be an error. This point indicates the list of files for the jar command, representing all files under the specified directory.
4. Generate the input file:
Since the input files specified when running Hadoop can only be files in the HDFs file system, we must copy the files that will be wordcount from the local file system to the HDFs file system.
Hadoop Fs-mkdir Input
Hadoop fs-put testfile Input
5. Run the jar package:
Hadoop jar Wordcount.jar org.apache.hadoop.examples.WordCount Input Output
In the Run command because the package information is included in the Wordcount.java, we include the complete information of the package when specifying the class.
6. View Results
When the Hadoop program finishes running, the results are placed in the output directory, which is automatically generated. The view commands are:
Hadoop Fs-cat output/part-r-00000
Use-cat or-text.

The third type of Hadoop executes class files directly
You can pre-compile the class in eclipse and then directly use Hadoop to execute the class file
When you run a job in a Hadoop cluster, you must package the program as a jar file.
In Hadoop local and pseudo-distribution can run the jar file, you can also run the class file directly, note that directly run the class file, must be no map and reducer, directly get filesystem to operate.
If the class has a package name, copy the package name and then
Hadoopwhut/structfile/sequencefilewritedemo Numbers.seq

With Hadoop commands and Java commands, you can execute a class file that joins the Hadoop library file path to the Classpath and also gets the configuration file for Hadoop.
It is generally necessary to set a HADOOP_CLASSPATH environment variable in advance to add the path to the application class, note that the path here is the user's local file path, not the path in HDFs.

The following are specific:
Export Hadoop_classpath=/home/hadoop/hadoopclass
Hadoop Filesystemcat Hdfs://localhost:9000/user/hadoop/inputdata/iodata.txt
You can also add Hadoop jars to classpath directly in the IDE and run directly

Fourth building of Hadoop and eclipse environments
The essence of the Hadoop jar operation is:
1. Use the Hadoop script to start a JVM process;
2.JVM process to run Org.apache.hadoop.util.RunJar this Java class;
3.org.apache.hadoop.util.runjar Decompression Temperature.jar
to the hadoop.tmp.dir/hadoop-unjar*/directory;
4.org.apache.hadoop.util.runjar dynamically loading and running the class specified in the MainClass or parameters;
Set the job's properties in the class specified in the 5.MainClass or parameter
6. Submit job to Jobtracker and monitor the performance

When executing a hadoop jar, the jar does not need to be placed in HDFs, and Hadoop is typically looking for a jar locally, and if the user is in the jar directory, the jar name can be written directly.

For both the input and output folders, there is no need to write the built-in path name of Hadoop, which is under the/user/hadoop/directory, and the Hadoop command can be used without a root path.

Related reading:

Installation and deployment of HDFs http://www.linuxidc.com/Linux/2013-06/86599.htm

The block recovery mechanism of HDFs http://www.linuxidc.com/Linux/2013-05/84199.htm

Copy the local file into HDFs http://www.linuxidc.com/Linux/2013-05/83866.htm

Data cross-conduction between MySQL and HDFs systems using Sqoop http://www.linuxidc.com/Linux/2013-04/83447.htm

The original ecosystem runs Java programs on Hadoop

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.