Windows Local build Hadoop-spark Runtime Environment (hadoop-2.6, spark2.0)

Source: Internet
Author: User
Tags sonatype

    1. Download Hadoop
      1. Http://hadoop.apache.org/releases.html-http://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/ Hadoop-2.6.5/hadoop-2.6.5.tar.gz
      2. Install HADOOP, configure Hadoop_home, put ${hadoop_home}/bin in Path
    2. Download Spark
      1. Http://spark.apache.org/downloads.html-https://d3kbcqa49mib13.cloudfront.net/ Spark-2.0.2-bin-hadoop2.6.tgz note matches with Hadoop version
      2. Install, configure Spark_home, put ${spark_home}/bin in Path
    3. Winutils.exe is not found when running the Spark program
      1. Download Https://github.com/srccodes/hadoop-common-2.2.0-bin.git and put it under ${hadoop_home}/bin
    4. Run-time settings can be run locally
    5. Spark Example:
Localsparkcontext.scala
    1. Import Org.apache.spark. {sparkconf, Sparkcontext}
    2. Import Org.scalatest._
    3. Trait Localsparkcontext extends Beforeandafterall {
    4. Self:suite =
    5. @transient var Sc:sparkcontext = _
    6. Override Def Beforeall () {
    7. Val conf = new sparkconf ()
    8. . Setmaster ("local[2]")
    9. . Setappname ("test")
    10. sc = new Sparkcontext (conf)
    11. }
    12. Override Def afterall () {
    13. if (sc! = null) {
    14. Sc.stop ()
    15. }
    16. }
    17. }
Sparkwcsuit.scala
    1. Import Org.apache.spark.rdd.RDD
    2. Import Org.apache.spark.sql. {Row, SqlContext}
    3. Import Org.apache.spark.util.LongAccumulator
    4. Import Org.scalatest.FunSuite
    5. Import Tool. Localsparkcontext
    6. Import Algos. {mergedpctr, pctrutils}
    7. Class Sparkwcsuit extends Funsuite with Localsparkcontext {
    8. Rdd WordCount
    9. Test ("Test Rdd WC") {
    10. Sc.setloglevel ("ERROR")
    11. Val Rdd = Sc.makerdd (Seq ("A", "B", "B"))
    12. Val res = Rdd.map ((_, 1)). Reducebykey (_ + _). Collect (). Sorted
    13. ASSERT (res = = = = Array ("A", 1), ("B", 2)))
    14. }
    15. }
Build.sbt
  1. Name: = "Doc_rank"
  2. Version: = "1.0"
  3. Scalaversion: = "2.10.5"
  4. Librarydependencies + = "Org.apache.spark"% "spark-core_2.10"% "2.0.2"
  5. Librarydependencies + = "Org.apache.spark"% "spark-mllib_2.10"% "2.0.2"
  6. Librarydependencies + = "commons-cli"% "commons-cli"% "1.2"
  7. Librarydependencies ++= Seq (
  8. "ORG.SCALANLP" percent "breeze"% "0.11.2",
  9. "ORG.SCALANLP" percent "breeze-natives"% "0.11.2",
  10. "ORG.SCALANLP" percent "Breeze-viz"% "0.11.2"
  11. )
  12. Librarydependencies ++= Seq (
  13. "Org.apache.hadoop"% "Hadoop-core"% "2.6.0-mr1-cdh5.4.4",
  14. "Org.apache.hbase"% "hbase-client"% "1.0.0-cdh5.4.4",
  15. "Org.apache.hbase"% "Hbase-common"% "1.0.0-cdh5.4.4",
  16. "Org.apache.hbase"% "hbase-server"% "1.0.0-cdh5.4.4",
  17. "Org.apache.hbase"% "Hbase-protocol"% "1.0.0-cdh5.4.4"
  18. )
  19. resolvers + = "Akka Repository" at "http://repo.akka.io/releases/";
  20. resolvers + = "cloudera-repo-releases" at "https://repository.cloudera.com/artifactory/repo/";
  21. resolvers ++= Seq (
  22. "Sonatype snapshots" at "https://oss.sonatype.org/content/repositories/snapshots/";
  23. "Sonatype releases" at "https://oss.sonatype.org/content/repositories/releases/";
  24. )
    1. Hadoop sample
Directory structure: src/├──main│   ├──java│   │   ├──io│   │    │   └──longwind│   │   │        └──mapreduce│   │   │            ├──main│   │   │            │   └──Main.java│   │    │           ├──mapreduce│   │    │           │   └── infoiduniquer.java│   │   │            └──utils│   │   │                ├──constant.java│   │   │                └──hadooputils.java│   │   └── org│   │       └──apache│   │            └──hadoop│   │                ├──io│   │                │   └──nativeio│    │               │        └──NativeIO.java│   │                └──mapred│   │        &nbSp;           ├──clientcache.java│   │                     ├──ClientServiceDelegate.java│   │                    ├──NotRunningJob.java│   │                     ├──resourcemgrdelegate.java│   │                    ├──yarnclientprotocolprovider.java│   │                     └──yarnrunner.java│   └──resources│       └── Log4j.properties└──test    ├──java    │   └──test    └── Key dependencies in Resources        └──log4j.properties pom.xml <dependency ><groupId>org.apache.hadoop</groupId><artifactId>hadoop-common</artifactId>< Version>2.6.0-cdh5.4.4</version></dependency> <dependency><groupid> Org.apache.hadoop</groupid><artifactid>hadoop-mapreduce-client-core</artifactid><version >2.6.0-cdh5.4.4</version></dependency> <dependency><groupid>org.apache.hadoop </groupid><artifactid>hadoop-mapreduce-client-common</artifactid><version>2.6.0- cdh5.4.4</version></dependency>  code aspect: The above directory structure shows the org.apache.hadoop.* those are copied from the Hadoop source code package, Note that the 2.6.0-cdh5.4.4 version of the program to run error ACCESS0, if it is nativeio.java that should be a permissions issue, you need to manually modify the  public static Boolean in Nativeio.java Access (String path, Accessright desiredacceSS) throws IOException {    return true;//modified     //return access0 (path, Desiredaccess.accessright ());//Pre-modification}
In this way, can be in Windows local, easy to Hadoop, spark development debugging, by the way Mrunit is not very strong, the problem is generally version, package conflicts, permissions. Reference:
    1. The MapReduce operating environment on the--windows of Hirano http://www.cnblogs.com/tq03/p/5101916.html
    2. On the way forward http://blog.csdn.net/congcong68/article/details/42043093--access0 problem solving
    3. XUWEIMDM http://blog.csdn.net/u011513853/article/details/52865076--Spark on Windows

Windows Local build Hadoop-spark Runtime Environment (hadoop-2.6, spark2.0)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.