上一篇文章我們實現了Java+Spark+Hive+Maven實現和異常處理,測試的執行個體是打包運行在linux環境,但當直接在Windows系統運行時,會有Hive相關異常的輸出,本文將協助您如何在Windows系統上整合Hadoop+Spark+Hive開發環境。 一.開發環境
系統:windows 7
JDK:jdk1.7
eclipse:Mars.2 Release (4.5.2)
Hadoop:hadoop-2.6.5
Spark:spark-1.6.2-bin-hadoop2.6
Hive:hive-2.1.1 二.前期準備 1.系統內容配置
JDK,Hadoop和Spark配置系統內容 2.Hadoop相關檔案
winutils.exe和hadoop.dll,下載地址:hadoop2.6.5中winutils和hadoop
將上面2個檔案放置..\hadoop-2.6.5\bin目錄下;
將winutils.exe同時放置到C:\Windows\System32目錄下; 3.建立tmp/hive目錄
在應用工程目錄中建立tmp/hive目錄,由於我的工程是放置在E盤,顧可以在E盤建立tmp/hive目錄 三.hive配置
1.Hive環境
本系統的Hive是部署在遠程linux叢集環境上的。主安裝目錄ip地址:10.32.19.50:9083
具體Hive在linux環境的部署,請查看相關文檔,本文不介紹。
2.Windows中hive-site.xml檔案配置
<?xml version="1.0"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?><configuration><!-- 配置資料來源儲存地址 --> <property> <name>hive.metastore.warehouse.dir</name> <value>/user/hive/warehouse</value> </property><!-- 配置是否本地 --> <property> <name>hive.metastore.local</name> <value>false</value> </property><!-- 配置資料來源地址 --> <property> <name>hive.metastore.uris</name> <value>thrift://10.32.19.50:9083</value> </property> </configuration>
windows中hive-site.xml配置
四.執行個體測試
需求:查詢hive資料,eclipse正常顯示 1.執行個體工程結構
執行個體工程 2.pom檔案
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"><modelVersion>4.0.0</modelVersion><groupId>com.lm.hive</groupId><artifactId>SparkHive</artifactId><version>0.0.1-SNAPSHOT</version><packaging>jar</packaging><name>SparkHive</name><url>http://maven.apache.org</url> <properties> <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding> </properties> <dependencies> <!-- spark --> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-core_2.10</artifactId> <version>1.6.0</version> <exclusions> <exclusion> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-client</artifactId> </exclusion> </exclusions> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-sql_2.10</artifactId> <version>1.6.0</version> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-hive_2.10</artifactId> <version>1.6.0</version> </dependency> <dependency> <groupId>org.mongodb.spark</groupId> <artifactId>mongo-spark-connector_2.10</artifactId> <version>1.1.0</version> </dependency> <dependency> <groupId>org.apache.derby</groupId> <artifactId>derby</artifactId> <version>10.10.2.0</version> </dependency> <!-- hadoop --> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-client</artifactId> <version>2.6.4</version> <exclusions> <exclusion> <groupId>javax.servlet</groupId> <artifactId>*</artifactId> </exclusion> </exclusions> </dependency> </dependencies> <build><sourceDirectory>src/main/java</sourceDirectory><testSourceDirectory>src/main/test</testSourceDirectory><plugins><plugin><artifactId>maven-assembly-plugin</artifactId><configuration><descriptorRefs><descriptorRef>jar-with-dependencies</descriptorRef></descriptorRefs><archive><manifest><mainClass></mainClass></manifest></archive></configuration><executions><execution><id>make-assembly</id><phase>package</phase><goals><goal>single</goal></goals></execution></executions></plugin><plugin><groupId>org.codehaus.mojo</groupId><artifactId>exec-maven-plugin</artifactId><version>1.2.1</version><executions><execution><goals><goal>exec</goal></goals></execution></executions><configuration><executable>java</executable><includeProjectDependencies>true</includeProjectDependencies><includePluginDependencies>false</includePluginDependencies><classpathScope>compile</classpathScope><mainClass></mainClass></configuration></plugin><plugin><groupId>org.apache.maven.plugins</groupId><artifactId>maven-compiler-plugin</artifactId><version>3.1</version><configuration><source>1.7</source><target>1.7</target><showWarnings>true</showWarnings></configuration></plugin></plugins></build></project>
3.測試案例實現
package com.lm.hive.SparkHive;import org.apache.spark.SparkConf;import org.apache.spark.api.java.JavaSparkContext;import org.apache.spark.sql.hive.HiveContext;/** * Spark sql擷取Hive資料 * */public class App { public static void main( String[] args ) { SparkConf sparkConf = new SparkConf().setAppName("SparkHive").setMaster("local[2]"); JavaSparkContext sc = new JavaSparkContext(sparkConf); //不要使用SQLContext,部署異常找不到資料庫和表 HiveContext hiveContext = new HiveContext(sc);// SQLContext sqlContext = new SQLContext(sc); //查詢表前10條資料 hiveContext.sql("select * from bi_ods.owms_m_locator limit 10").show(); sc.stop(); }}
4.測試結果展示
測試結果展示
代碼下載地址:eclispe整合hadoop+spark+hive開發執行個體代碼