IntelliJ Idea Building a Hadoop development environment

Last Update:2018-07-25 Source: Internet

Author: User

Tags zookeeper log4j

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Preface

This is a series of articles about Hadoop.

Basic Hadoop Concepts Guide

Eclipse builds Hadoop development environment two or three

IntelliJ Idea Building a Hadoop development environment

Hadoop file Storage System-hdfs detailed and Java programming implementation Readiness

In fact, the development environment for Hadoop that I built earlier is enough. But it is always submitted to the local, the task in the local run, it always feels strange. It also relies on environment variables such as Hadoop_home, as well as the dependent jar packages and dependencies on so-called plug-ins. So I was wondering if I could use Maven to manage the jars we needed, and then set up our task to go remote with certain settings.
Let me start by saying that the construction of this project depends on something:
Just maven, yes, just maven. All right, no more nonsense. Start working quickly. Dependent jar

In fact, we just need to rely on Hadoop's core jar package. As shown below are all dependencies in my pom:

    <dependencies> <dependency> <groupId>junit</groupId> <artif Actid>junit</artifactid> <version>4.12</version> <scope>test</scope&
        Gt </dependency> <dependency> <groupId>org.apache.hadoop</groupId> & Lt;artifactid>hadoop-common</artifactid> <version>2.6.1</version> </dependency > <dependency> <groupId>org.apache.hadoop</groupId> <artifactid&gt ;hadoop-hdfs</artifactid> <version>2.6.1</version> </dependency> <d Ependency> <groupId>commons-cli</groupId> &LT;ARTIFACTID&GT;COMMONS-CLI&LT;/ARTIFAC
            tid> <version>1.2</version> </dependency> <dependency> <groupid&Gt;org.apache.hadoop</groupid> <artifactId>hadoop-mapreduce-client-core</artifactId> <version>2.6.1</version> </dependency> <dependency> &LT;GROUPID&G
            T;org.apache.hadoop</groupid> <artifactId>hadoop-mapreduce-client-jobclient</artifactId> <version>2.6.1</version> </dependency> <dependency> <group Id>log4j</groupid> <artifactId>log4j</artifactId> <version>1.2.17</v Ersion> </dependency> <!--https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-mapredu Ce-examples-<dependency> <groupId>org.apache.hadoop</groupId> &lt ;artifactid>hadoop-mapreduce-examples</artifactid> <version>2.6.1</version> </ Dependency> &Lt;dependency> <groupId>org.projectlombok</groupId> <artifactid>lombok</ar
            tifactid> <version>1.16.6</version> </dependency> <dependency>
            <groupId>org.apache.zookeeper</groupId> <artifactId>zookeeper</artifactId> <version>3.4.8</version> </dependency> </dependencies>

The reason to rely on this examples is to look at some of the basic examples that Hadoop provides to us, in fact Hadoop provides us with a lot of examples, and you just need to search for the most famous wordcount to find very rich examples under the same package, In fact, as long as these examples can be skillfully mastered so that the basic knowledge can be very good. After all, the official information is the best.
OK, let's look at the code, from Windows to Hadoop Linux cluster submission is problematic, you search the Internet, a lot of posts will say to change what source code ah such, in fact, in the early 1. The x version does exist. Because there was a bug in the time when Hadoop submitted from Windows to Linux. And now, the 2.X version is fixed. However, additional settings are required. Well, let's look at the code a little bit, and look at the difference between the code and the previous one. I was broadly divided into two areas:
The first aspect is that we are going to set up our cluster configuration file here, which is very important, in fact, if you debug with a breakpoint you will find that Hadoop has set the default configuration file for us. The following figure:

As we can see, in the jar package there is, in fact, the corresponding four configuration files. So when you do not do anything, of course, according to others. The result is that the task is submitted locally. Of course, we can set it ourselves, set as follows:

private static void Setproties (Jobconf conf) throws FileNotFoundException {
        Conf.addresource ("/core-site.xml");
        Conf.addresource ("/hdfs-site.xml");
        Conf.addresource ("/mapred-site.xml");
        Conf.addresource ("/yarn-site.xml");
    }

With the above settings, we can submit to remote Linux. Of course not. Next go to the second step, set some other parameters, I have mentioned before, cross-platform commit in the 1.x version is a bug, and then fix, but need to be configured in a way to normal submission. The configuration is as follows:

        Conf.set ("Mapreduce.app-submission.cross-platform", "true");
        Conf.set ("Mapreduce.job.ubertask.enable", "true");
        Conf.setuser ("Hadoop");
        Conf.set ("Mapreduce.job.jar", "E:\\github\\hadoop\\target\\fulei-1.0-snapshot.jar");

we can clearly see that the first set of parameters Cross-platform is clearly cross-platform meaning ah, the AH. Of course it is also important to set up the jar package later, because you have to submit the task to the far end, so you have to make a local jar location, aha. In that case, we'll be able to submit the task to the far end soon. Of course, the far end of the area refers to our virtual machines. Of course, if you have a cloud server, it's better. Ah, aha.
Of course, we also need to configure the following in Marped-site.xml, which I picked up from the default configuration file.

<property> <description>if enabled, user can submit an application Cross-platform i.e. submit an Applicati
  On from a Windows client to a Linux/unix server or vice versa. </description> <name>mapreduce.app-submission.cross-platform</name> <value>false</ Value> </property> <property> <description>classpath for MR applications. A comma-separated List of CLASSPATH entries. If Mapreduce.application.framework is set then this must specify the appropriate classpath for that archive, and the Nam
  E of the archive must is present in the classpath. If Mapreduce.app-submission.cross-platform is false, platform-specific environment vairable expansion syntax would be us
  Ed to construct the default CLASSPATH entries.
  For Linux: $HADOOP _mapred_home/share/hadoop/mapreduce/*, $HADOOP _mapred_home/share/hadoop/mapreduce/lib/*. For Windows:%hadoop_mapred_home%/share/hadoop/mapreduce/*,%hadoop_mapred_home%/share/hadoop/mapreduce/lib/*. If Mapreduce.app-submission.cross-platform is True, platform-agnostic the default CLASSPATH for MR applications would are use D: {{hadoop_mapred_home}}/share/hadoop/mapreduce/*, {hadoop_mapred_home}}/share/hadoop/mapreduce/lib/* Parameter
  Expansion marker'll be replaced to NodeManager on container launch based on the underlying OS accordingly. </description> <name>mapreduce.application.classpath</name> <value></value> </ Property>

This has been very clear, if you set up the Cross-platform, it is necessary to set the value of the jar on the cluster location, or Hadoop will be submitted to the task after the ClassNotFound issue. When we get here, we can say that we have finished the configuration.
Let's take a look at how it works:

The task has been successfully run, well, we can now use this project to run anywhere, no longer have to worry about environmental issues. Haha summary

Ford Caishuxueqian, may have said not enough place, I hope you see correct.
When we get here, our environment is basically a sure way to use this set of things, and the next step is Mr's treatment. The next article will bring you the process of Mr Processing.
All right, everybody, good night. What do you not understand the welcome to ask questions in the comment area ah.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More