In the previous article we implemented Java+spark+hive+maven implementation and exception handling, the test instance is packaged to run in the Linux environment, but when the Windows system runs directly, there will be Hive related exception output, This article will help you integrate the Hadoop+spark+hive development environment on a Windows system. I. Development environment
System: Windows 7
jdk:jdk1.7
Eclipse:mars.2 Release (4.5.2)
hadoop:hadoop-2.6.5
spark:spark-1.6.2-bin-hadoop2.6
hive:hive-2.1.1 Two. Pre-preparation 1. System Environment Configuration
Jdk,hadoop and Spark Configure system environment 2.Hadoop related files
Winutils.exe and Hadoop.dll, download address: hadoop2.6.5 winutils and Hadoop
Place the top 2 files. The \hadoop-2.6.5\bin directory;
Place the Winutils.exe in the C:\Windows\System32 directory at the same time; 3. New Tmp/hive directory
In the application Engineering directory, new tmp/hive directory, because my project is placed in e disk, gu can be in e disk new tmp/hive directory three. Hive Configuration
1.Hive Environment
The hive of this system is deployed on the remote Linux cluster environment. Primary installation directory IP address: 10.32.19.50:9083
For specific hive deployment in the Linux environment, please review the relevant documentation, not described in this article.
2.Windows hive-site.xml File Configuration
<?xml version= "1.0"?>
<?xml-stylesheet type= "text/xsl" href= "configuration.xsl"?>
< Configuration>
<!--Configure Data source storage address-->
<property>
<name>hive.metastore.warehouse.dir </name>
<value>/user/hive/warehouse</value>
</property>
<!--configuration is local-- >
<property>
<name>hive.metastore.local</name>
<value>false</value >
</property>
<!--configuration data source address-->
<property>
<name> hive.metastore.uris</name>
<value>thrift://10.32.19.50:9083</value>
</property >
</configuration>
Hive-site.xml Configuration in Windows
four. Instance Test
Requirements: Query hive data, Eclipse normal display 1. Case Engineering Structure
Example Project 2.pom file
<project xmlns= "http://maven.apache.org/POM/4.0.0" xmlns:xsi= "Http://www.w3.org/2001/XMLSchema-instance" xsi: schemalocation= "http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd" > < Modelversion>4.0.0</modelversion> <groupId>com.lm.hive</groupId> <artifactId> Sparkhive</artifactid> <version>0.0.1-SNAPSHOT</version> <packaging>jar</packaging > <name>SparkHive</name> <url>http://maven.apache.org</url> <properties> <pr Oject.build.sourceencoding>utf-8</project.build.sourceencoding> </properties> <dependencies>
; <!--spark--> <dependency> <groupId>org.apache.spark</groupId> &L T;artifactid>spark-core_2.10</artifactid> <version>1.6.0</version> <exclus Ions> <exclusion> <gRoupid>org.apache.hadoop</groupid> <artifactId>hadoop-client</artifactId>
</exclusion> </exclusions> </dependency> <dependency>
<groupId>org.apache.spark</groupId> <artifactId>spark-sql_2.10</artifactId> <version>1.6.0</version> </dependency> <dependency> <groupid >org.apache.spark</groupId> <artifactId>spark-hive_2.10</artifactId> <vers ion>1.6.0</version> </dependency> <dependency> <groupid>org.mongod B.spark</groupid> <artifactId>mongo-spark-connector_2.10</artifactId> <versio n>1.1.0</version> </dependency> <dependency> <groupid>org.apache.de
Rby</groupid> <artifactId>derby</artifactId> <version>10.10.2.0</version> </depen dency> <!--Hadoop--> <dependency> <groupid>org.apache.hadoop</groupi
D> <artifactId>hadoop-client</artifactId> <version>2.6.4</version> <exclusions> <exclusion> <groupid>javax.servlet</groupid&
Gt <artifactId>*</artifactId> </exclusion> </exclusions> </DEP endency> </dependencies> <build> <SOURCEDIRECTORY>SRC/MAIN/JAVA</SOURCEDIRECTORY&G
T <testSourceDirectory>src/main/test</testSourceDirectory> <plugins> <plugin> <artifact id>maven-assembly-plugin</artifactid> <configuration> <descriptorRefs> <descriptorref>jar-with-dependencies</descriptorref> </descriptorRefs> <archive> <manifes
t> <mainClass></mainClass> </manifest> </archive> </configuration> <executions> <execution> <id>make-assembly</id> <phase>package</phas e> <goals> <goal>single</goal> </goals> </execution> </exe cutions> </plugin> <plugin> <groupId>org.codehaus.mojo</groupId> <artifactid& gt;exec-maven-plugin</artifactid> <version>1.2.1</version> <executions> <execution > <goals> <goal>exec</goal> </goals> </execution> </execut ions> <configuration> <executable>java</executable> <includeprojectdependencies>t Rue</includeprojectdePendencies> <includePluginDependencies>false</includePluginDependencies> <classpathscope>co mpile</classpathscope> <mainClass></mainClass> </configuration> </plugin> &L T;plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-compiler-plugin<
/artifactid> <version>3.1</version> <configuration> <source>1.7</source> <target>1.7</target> <showWarnings>true</showWarnings> </configuration> </p
lugin> </plugins> </build> </project>
3. Test Case Implementation
Package com.lm.hive.SparkHive;
Import org.apache.spark.SparkConf;
Import Org.apache.spark.api.java.JavaSparkContext;
Import Org.apache.spark.sql.hive.HiveContext;
/**
* Spark SQL FETCH hive Data
*
*
/public class App
{public
static void Main (string[] args)
{
sparkconf sparkconf = new sparkconf (). Setappname ("Sparkhive"). Setmaster ("local[2]");
Javasparkcontext sc = new Javasparkcontext (sparkconf);
Do not use SqlContext, the deployment exception cannot find the database and table
hivecontext hivecontext = new Hivecontext (SC); SqlContext sqlcontext = new SqlContext (SC);
Query table Top 10 data
hivecontext.sql ("select * from Bi_ods.owms_m_locator limit"). Show ();
Sc.stop ();
}
4. Test results show
Test results show
Code Download Address: ECLISPE Integrated hadoop+spark+hive Development Instance code