Java Operation spark1.2.0

Source: Internet
Author: User

Although Scala is recommended, try it.

1  PackageOrg.admln.java7OperateSpark;2 3 Importjava.util.Arrays;4 Importjava.util.List;5 ImportJava.util.regex.Pattern;6 7 Importorg.apache.spark.SparkConf;8 ImportOrg.apache.spark.api.java.JavaPairRDD;9 ImportOrg.apache.spark.api.java.JavaRDD;Ten ImportOrg.apache.spark.api.java.JavaSparkContext; One Importorg.apache.spark.api.java.function.FlatMapFunction; A ImportOrg.apache.spark.api.java.function.Function2; - Importorg.apache.spark.api.java.function.PairFunction; -  the ImportScala. Tuple2; -  -  Public classOperatespark { -     //Word Segmentation Separator +     Private Static FinalPattern SPACE = Pattern.compile (""); -      +      Public Static voidMain (string[] args) { A         //Initialize atsparkconf sparkconf =NewSparkconf (). Setappname ("Javawordcount"). Setmaster ("spark://hadoop:7077")); -Javasparkcontext CTX =NewJavasparkcontext (sparkconf); -          -         //The second parameter is the smallest shard of a file -Javardd<string> lines = Ctx.textfile ("Hdfs://hadoop:8020/in/spark/javaoperatespark/wordcount.txt"); -javardd<string> words = Lines.flatmap (NewFlatmapfunction<string,string>() { in              PublicIterable<string>Call (String s) { -                 returnArrays.aslist (Space.split (s)); to             } +         }); -          the         //A pair of key values is divided into *javapairrdd<string,integer> ones = Words.maptopair (NewPairfunction<string,string,integer>() { $              PublicTuple2<string, integer>Call (String t) {Panax Notoginseng                 return NewTuple2<string,integer> (t,1); -             } the         }); +          Ajavapairrdd<string,integer> counts = Ones.reducebykey (NewFunction2<integer,integer,integer>() { the              Publicinteger Call (integer v1, integer v2) { +                 returnV1 +v2; -             } $         }); $          -list<tuple2<string,integer>> output =Counts.collect (); -          for(tuple2<?,? >tuple:output) { theSystem.out.println (tuple._1 () + ":" +tuple._2 ()); -         }WuyiCounts.saveastextfile ("hdfs://hadoop:8020/out/spark/javaoperatespark2/"); the ctx.stop (); -     } Wu}

There was an error while running.

In Eclipse:

Exception in thread "main" java.lang.NoSuchMethodError:com.google.common.hash.HashFunction.hashInt (I) lcom/google/ common/hash/hashcode; At org.apache.spark.util.collection.openhashset.org$apache$spark$util$collection$openhashset$ $hashcode ( Openhashset.scala:261) at ORG.APACHE.SPARK.UTIL.COLLECTION.OPENHASHSET$MCI$SP.GETPOS$MCI$SP (Openhashset.scala:165) at ORG.APACHE.SPARK.UTIL.COLLECTION.OPENHASHSET$MCI$SP.CONTAINS$MCI$SP (Openhashset.scala:102) at org.apache.spark.util.sizeestimator$ $anonfun $visitarray$2.APPLY$MCVI$SP (sizeestimator.scala:214) at SCALA.COLLECTION.IMMUTABLE.RANGE.FOREACH$MVC$SP (Range.scala:141) at Org.apache.spark.util.sizeestimator$.visitarray (Sizeestimator.scala:210) at Org.apache.spark.util.sizeestimator$.visitsingleobject (Sizeestimator.scala:169) at org.apache.spark.util.sizeestimator$.org$apache$spark$util$sizeestimator$ $estimate (Sizeestimator.scala: 161) at Org.apache.spark.util.sizeestimator$.estimate (Sizeestimator.scala:155) at org.apache.spark.util.collection.sizetracker$class. Takesample (sizetracker.scala:78) at org.apache.spark.util.collection.sizetracker$class. AfterUpdate (sizetracker.scala:70) at Org.apache.spark.util.collection.SizeTrackingVector. $plus $eq (Sizetrackingvector.scala:31) at org.apache.spark.storage.MemoryStore.unrollSafely (Memorystore.scala:249) at Org.apache.spark.storage.MemoryStore.putIterator (Memorystore.scala:136) at Org.apache.spark.storage.MemoryStore.putIterator (Memorystore.scala:114) at Org.apache.spark.storage.BlockManager.doPut (Blockmanager.scala:787) at Org.apache.spark.storage.BlockManager.putIterator (Blockmanager.scala:33]) at Org.apache.spark.storage.BlockManager.putSingle (Blockmanager.scala:992) at Org.apache.spark.broadcast.TorrentBroadcast.writeBlocks (Torrentbroadcast.scala:98) at Org.apache.spark.broadcast.TorrentBroadcast.<init> (torrentbroadcast.scala:84) at Org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast (Torrentbroadcastfactory.scala:34) at Org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast (Torrentbroadcastfactory.scala:29) at Org.apache.spark.broadcast.BroadcastManager.newBroadcast (Broadcastmanager.scala:62) at Org.apache.spark.SparkContext.broadcast (Sparkcontext.scala:945) at Org.apache.spark.SparkContext.hadoopFile (Sparkcontext.scala:695) at Org.apache.spark.SparkContext.textFile (Sparkcontext.scala:540) at Org.apache.spark.api.java.JavaSparkContext.textFile (Javasparkcontext.scala:184) at Org.admln.java7OperateSpark.OperateSpark.main (Operatespark.java:27)

In the shell are:

Exception in thread "main" Java.lang.VerifyError:classOrg.apache.hadoop.hdfs.protocol.proto.clientnamenodeprotocolprotos$addblockrequestproto OverridesFinalMethod Getunknownfields. () lcom/google/protobuf/Unknownfieldset; At Java.lang.ClassLoader.defineClass1 (Native Method) at Java.lang.ClassLoader.defineClass (Classloader.java:800) at Java.security.SecureClassLoader.defineClass (Secureclassloader.java:142) at Java.net.URLClassLoader.defineClass (URLClassLoader.java:449) ... At Sun.reflect.NativeMethodAccessorImpl.invoke (Nativemethodaccessorimpl.java:57) at Sun.reflect.DelegatingMethodAccessorImpl.invoke (Delegatingmethodaccessorimpl.java:43) at Java.lang.reflect.Method.invoke (Method.java:30S) at Org.apache.spark.deploy.sparksubmit$.launch (Sparksubmit.scala:358) at Org.apache.spark.deploy.sparksubmit$.main (Sparksubmit.scala:75) at Org.apache.spark.deploy.SparkSubmit.main (Sparksubmit.scala)

You can see that the protobuf version conflicts with Hadoop.

Default SPARK1. 2. The protobuf version of 0 is

and hadoop2.2.0 for protobuf2.5.0.

So modify the Pom in spark. XML Recompile build deployment package (takes one hours)

Then run the shell side successfully. But the eclipse still reported the mistake.

This is because I used the MAVEN reference spark package, there is a guava version conflict, the default is

Add a dependency alone

<dependency>        <groupId>com.google.guava</groupId>        <artifactid>guava</ artifactid>        <version>14.0.1</version>    </dependency>

Then eclipse submits the words without error, but the task has not been executed, the report resources are not enough

WARN taskschedulerimpl:initial Job has not accepted any resources; Check your cluster UI to ensure that workers is registered and has sufficient memory

Then add the number of cores to 2 and the memory to 1500M, but still report

INFO sparkdeployschedulerbackend:granted executor ID APP-20150111003236-0000/3 on Hostport hadoop:34766 with 2 cores, 512 .0 MB RAM

That is, the number of cores changed, but the implementation of memory can not be changed, do not know why, there is the same program shell-side commit to normal execution, eclipse external submissions reported insufficient memory

Not to drive memory.

I guess there are two possible causes.

The 1.spark bug,spark_driver_memory variable defaults to 512M, but external modifications do not take effect;

Resources for 2.centos and native windows are confusing because I saw the

ERROR sparkdeployschedulerbackend:asked to remove non-existent Executor 2

The error, I am this machine is 4 core, virtual machine is 2 core.

Do not know why the Internet does not have an eclipse submission of the sample, should be either itself does not support, and client resources will be confused, or no one knows.

Java Operation spark1.2.0

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.