windows 本地構建hadoop-spark運行環境(hadoop-2.6, spark2.0)

來源:互聯網
上載者:User

標籤:問題   source   evel   http   group   udf   cti   lease   .sql   

  1. 下載hadoop
    1. http://hadoop.apache.org/releases.html --> http://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/hadoop-2.6.5/hadoop-2.6.5.tar.gz
    2. 安裝hadoop,配置HADOOP_HOME, 把${HADOOP_HOME}/bin放到path
  2. 下載spark
    1. http://spark.apache.org/downloads.html --> https://d3kbcqa49mib13.cloudfront.net/spark-2.0.2-bin-hadoop2.6.tgz 注意與hadoop版本匹配
    2. 安裝,配置SPARK_HOME,把${SPARK_HOME}/bin放到path
  3. 在運行spark程式時,會報找不到 winutils.exe
    1. 下載 https://github.com/srccodes/hadoop-common-2.2.0-bin.git 放到${HADOOP_HOME}/bin下
  4. 運行時設定本地運行即可
  5. spark範例:
 LocalSparkContext.scala 
  1. import org.apache.spark.{SparkConf, SparkContext}
  2. import org.scalatest._
  3. trait LocalSparkContext extends BeforeAndAfterAll {
  4.     self: Suite =>
  5.     @transient var sc: SparkContext = _
  6.     override def beforeAll() {
  7.         val conf = new SparkConf()
  8.                 .setMaster("local[2]")
  9.                 .setAppName("test")
  10.         sc = new SparkContext(conf)
  11.     }
  12.     override def afterAll() {
  13.         if (sc != null) {
  14.             sc.stop()
  15.         }
  16.     }
  17. }
 SparkWCSuit.scala
  1. import org.apache.spark.rdd.RDD
  2. import org.apache.spark.sql.{Row, SQLContext}
  3. import org.apache.spark.util.LongAccumulator
  4. import org.scalatest.FunSuite
  5. import tool.LocalSparkContext
  6. import algos.{MergedPCtr, PCtrUtils}
  7. class SparkWCSuit extends FunSuite with LocalSparkContext {
  8. //rdd wordCount
  9.     test("test rdd wc") {
  10.         sc.setLogLevel("ERROR")
  11.         val rdd = sc.makeRDD(Seq("a", "b", "b"))
  12.         val res = rdd.map((_, 1)).reduceByKey(_ + _).collect().sorted
  13.         assert(res === Array(("a", 1), ("b", 2)))
  14.     }
  15. }
 build.sbt
  1. name := "doc_rank"
  2. version := "1.0"
  3. scalaVersion := "2.10.5"
  4. libraryDependencies += "org.apache.spark" % "spark-core_2.10" % "2.0.2"
  5. libraryDependencies += "org.apache.spark" % "spark-mllib_2.10" % "2.0.2"
  6. libraryDependencies += "commons-cli" % "commons-cli" % "1.2"
  7. libraryDependencies ++= Seq(
  8.     "org.scalanlp" %% "breeze" % "0.11.2",
  9.     "org.scalanlp" %% "breeze-natives" % "0.11.2",
  10.     "org.scalanlp" %% "breeze-viz" % "0.11.2"
  11. )
  12. libraryDependencies ++= Seq(
  13.     "org.apache.hadoop" % "hadoop-core" % "2.6.0-mr1-cdh5.4.4",
  14.     "org.apache.hbase" % "hbase-client" % "1.0.0-cdh5.4.4",
  15.     "org.apache.hbase" % "hbase-common" % "1.0.0-cdh5.4.4",
  16.     "org.apache.hbase" % "hbase-server" % "1.0.0-cdh5.4.4",
  17.     "org.apache.hbase" % "hbase-protocol" % "1.0.0-cdh5.4.4"
  18. )
  19. resolvers += "Akka Repository" at "http://repo.akka.io/releases/";
  20. resolvers += "cloudera-repo-releases" at "https://repository.cloudera.com/artifactory/repo/";
  21. resolvers ++= Seq(
  22.     "Sonatype Snapshots" at "https://oss.sonatype.org/content/repositories/snapshots/";,
  23.     "Sonatype Releases" at "https://oss.sonatype.org/content/repositories/releases/";
  24. )
  
 
  1. hadoop範例
        
目錄結構:src/├── main│   ├── java│   │   ├── io│   │   │   └── longwind│   │   │       └── mapreduce│   │   │           ├── main│   │   │           │   └── Main.java│   │   │           ├── mapreduce│   │   │           │   └── InfoidUniquer.java│   │   │           └── utils│   │   │               ├── Constant.java│   │   │               └── HadoopUtils.java│   │   └── org│   │       └── apache│   │           └── hadoop│   │               ├── io│   │               │   └── nativeio│   │               │       └── NativeIO.java│   │               └── mapred│   │                   ├── ClientCache.java│   │                   ├── ClientServiceDelegate.java│   │                   ├── NotRunningJob.java│   │                   ├── ResourceMgrDelegate.java│   │                   ├── YarnClientProtocolProvider.java│   │                   └── YARNRunner.java│   └── resources│       └── log4j.properties└── test    ├── java    │   └── test    └── resources        └── log4j.properties pom.xml中關鍵依賴<dependency><groupId>org.apache.hadoop</groupId><artifactId>hadoop-common</artifactId><version>2.6.0-cdh5.4.4</version></dependency> <dependency><groupId>org.apache.hadoop</groupId><artifactId>hadoop-mapreduce-client-core</artifactId><version>2.6.0-cdh5.4.4</version></dependency> <dependency><groupId>org.apache.hadoop</groupId><artifactId>hadoop-mapreduce-client-common</artifactId><version>2.6.0-cdh5.4.4</version></dependency> 代碼方面:上面目錄結構顯示的org.apache.hadoop.* 那些是從hadoop源碼包裡拷出來的,注意是2.6.0-cdh5.4.4版本的程式運行起來報錯access0,如果是NativeIO.java 那應該是許可權問題,需要手動修改NativeIO.java 中的 public static boolean access(String path, AccessRight desiredAccess)throws IOException {    return true;//修改後    //return access0(path, desiredAccess.accessRight());//修改前}
這樣,就能在windows本地,輕鬆進行hadoop, spark開發調試了,順便吐槽一下mrunit不是很給力,問題一般是版本,包衝突,許可權。 參考:
  1. 平野大荒 http://www.cnblogs.com/tq03/p/5101916.html --windows上的mapreduce運行環境
  2. 在前進的路上 http://blog.csdn.net/congcong68/article/details/42043093 -- access0 問題解決
  3. xuweimdm http://blog.csdn.net/u011513853/article/details/52865076 -- spark在windows上
 

windows 本地構建hadoop-spark運行環境(hadoop-2.6, spark2.0)

相關文章

聯繫我們

該頁面正文內容均來源於網絡整理,並不代表阿里雲官方的觀點,該頁面所提到的產品和服務也與阿里云無關,如果該頁面內容對您造成了困擾,歡迎寫郵件給我們,收到郵件我們將在5個工作日內處理。

如果您發現本社區中有涉嫌抄襲的內容,歡迎發送郵件至: info-contact@alibabacloud.com 進行舉報並提供相關證據,工作人員會在 5 個工作天內聯絡您,一經查實,本站將立刻刪除涉嫌侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.