Hadoop jar **.jar and Java-classpath **.jar run MapReduce

Source: Internet
Author: User
Tags log log

The command to run the MapReduce jar package is the Hadoop jar **.jar

The command to run the jar package for the normal main function is Java-classpath **.jar

Because I have not known the difference between the two commands, so I stubbornly use Java-classpath **.jar to start the MapReduce. Until today there are errors.

Java-classpath **.jar is to make the jar package run locally, then MapReduce will only run on this node. So very slow.

At that time in order to detect why so slow, began to think that the number of mapper is too much, in the code set a lot of log information, and then Java-classpath **.jar start MapReduce to observe log log.

However, after the correction was made using the Hadoop jar **.jar to start the MapReduce, it was found that the original log log will not be generated immediately, because Mapper is sent to many machines running, so can not immediately get the return results.

Since it was a locally initiated mapreduce, the intermediate files were all generated on the node (this node has only 50G of space), which was later detected by the operator to remove the intermediate files.

A problem to be noted when playing jar packs is that when Maven runas is used, the resulting jar packages are all under Lib and only their contents in the current program's jar package. So you need to use a compression program to open the jar package, create a new lib directory inside it, and then put the jar package you need (Hadoop's jar pack) so that you can just put the jar package on the server and start.

Since Java-classpath **.jar is run locally, only the jar packages needed for the project need to be placed in the same directory,

However, the Hadoop jar **.jar is run on a cluster, and you need to put the jar package needed for the project inside the project Jar bundle.

The dependent jar package is called into the project jar package to form a total jar package, which can be configured in the Pom.xml file for Maven with the following configuration:

	<build> <sourceDirectory>src</sourceDirectory> <resources> <resource>
	        <directory>conf</directory> <excludes> <exclude>**/*.java</exclude>
				</excludes> </resource> </resources> <pluginManagement> <plugins>
					<!--ignore/execute plugin execution--> <plugin> <groupId>org.eclipse.m2e</groupId> <artifactId>lifecycle-mapping</artifactId> <version>1.0.0</version> <configuratio 
								N> <lifecycleMappingMetadata> <pluginExecutions> <!--copy-dependency plugin--> <pluginExecution> <pluginExecutionFilter> <groupid>org.apache.maven.plugins& Lt;/groupid> <artifactId>maven-dependency-plugin</artifactId> <versionrange>[1.0. 0,) </versionRange>
										<goals> <goal>copy-dependencies</goal> </goals> </plug inexecutionfilter> <action> <ignore/> </action> </pluginexecu tion> </pluginExecutions> </lifecycleMappingMetadata> </configuration> </plu gin> </plugins> </pluginManagement> <plugins> <plugin> <groupid>org.apache
					.maven.plugins</groupid> <artifactId>maven-dependency-plugin</artifactId> <executions> 
							<execution> <id>copy-dependencies</id> <phase>test</phase> <goals> <goal>copy-dependencies</goal> </goals> <configuration> <excludearti Factids>hadoop-core</excludeartifactids> <excludeGroupIds>org.slf4j</excludeGroupIds> &  Lt;outputdirectory>  Here is the key <span style= "color: #FF0000;"
				>target/classes/lib</span> </outputDirectory> </configuration> </execution> 
				</executions> </plugin> <plugin> <artifactId>maven-compiler-plugin</artifactId> <version>2.3.2</version> <configuration> <source>1.6</source> <target& gt;1.6</target> <encoding>UTF-8</encoding> </configuration> </plugin> </plu Gins> </build>
But need to runas two times to do, do not know why.

If you are simply running MapReduce without using other jar packages, you do not need to hit other jar packs into the project, which means that the project's jar's lib directory cannot have a Hadoop jar package, because in the runtime environment, Here are just a few other things like a jar bag that you've written.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.