Java Operation spark1.2.0

Last Update:2015-01-11 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Although Scala is recommended, try it.

1  PackageOrg.admln.java7OperateSpark;2 3 Importjava.util.Arrays;4 Importjava.util.List;5 ImportJava.util.regex.Pattern;6 7 Importorg.apache.spark.SparkConf;8 ImportOrg.apache.spark.api.java.JavaPairRDD;9 ImportOrg.apache.spark.api.java.JavaRDD;Ten ImportOrg.apache.spark.api.java.JavaSparkContext; One Importorg.apache.spark.api.java.function.FlatMapFunction; A ImportOrg.apache.spark.api.java.function.Function2; - Importorg.apache.spark.api.java.function.PairFunction; -  the ImportScala. Tuple2; -  -  Public classOperatespark { -     //Word Segmentation Separator +     Private Static FinalPattern SPACE = Pattern.compile (""); -      +      Public Static voidMain (string[] args) { A         //Initialize atsparkconf sparkconf =NewSparkconf (). Setappname ("Javawordcount"). Setmaster ("spark://hadoop:7077")); -Javasparkcontext CTX =NewJavasparkcontext (sparkconf); -          -         //The second parameter is the smallest shard of a file -Javardd<string> lines = Ctx.textfile ("Hdfs://hadoop:8020/in/spark/javaoperatespark/wordcount.txt"); -javardd<string> words = Lines.flatmap (NewFlatmapfunction<string,string>() { in              PublicIterable<string>Call (String s) { -                 returnArrays.aslist (Space.split (s)); to             } +         }); -          the         //A pair of key values is divided into *javapairrdd<string,integer> ones = Words.maptopair (NewPairfunction<string,string,integer>() { $              PublicTuple2<string, integer>Call (String t) {Panax Notoginseng                 return NewTuple2<string,integer> (t,1); -             } the         }); +          Ajavapairrdd<string,integer> counts = Ones.reducebykey (NewFunction2<integer,integer,integer>() { the              Publicinteger Call (integer v1, integer v2) { +                 returnV1 +v2; -             } $         }); $          -list<tuple2<string,integer>> output =Counts.collect (); -          for(tuple2<?,? >tuple:output) { theSystem.out.println (tuple._1 () + ":" +tuple._2 ()); -         }WuyiCounts.saveastextfile ("hdfs://hadoop:8020/out/spark/javaoperatespark2/"); the ctx.stop (); -     } Wu}

There was an error while running.

In Eclipse:

Exception in thread "main" java.lang.NoSuchMethodError:com.google.common.hash.HashFunction.hashInt (I) lcom/google/ common/hash/hashcode; At org.apache.spark.util.collection.openhashset.org$apache$spark$util$collection$openhashset$ $hashcode ( Openhashset.scala:261) at ORG.APACHE.SPARK.UTIL.COLLECTION.OPENHASHSET$MCI$SP.GETPOS$MCI$SP (Openhashset.scala:165) at ORG.APACHE.SPARK.UTIL.COLLECTION.OPENHASHSET$MCI$SP.CONTAINS$MCI$SP (Openhashset.scala:102) at org.apache.spark.util.sizeestimator$ $anonfun $visitarray$2.APPLY$MCVI$SP (sizeestimator.scala:214) at SCALA.COLLECTION.IMMUTABLE.RANGE.FOREACH$MVC$SP (Range.scala:141) at Org.apache.spark.util.sizeestimator$.visitarray (Sizeestimator.scala:210) at Org.apache.spark.util.sizeestimator$.visitsingleobject (Sizeestimator.scala:169) at org.apache.spark.util.sizeestimator$.org$apache$spark$util$sizeestimator$ $estimate (Sizeestimator.scala: 161) at Org.apache.spark.util.sizeestimator$.estimate (Sizeestimator.scala:155) at org.apache.spark.util.collection.sizetracker$class. Takesample (sizetracker.scala:78) at org.apache.spark.util.collection.sizetracker$class. AfterUpdate (sizetracker.scala:70) at Org.apache.spark.util.collection.SizeTrackingVector. $plus $eq (Sizetrackingvector.scala:31) at org.apache.spark.storage.MemoryStore.unrollSafely (Memorystore.scala:249) at Org.apache.spark.storage.MemoryStore.putIterator (Memorystore.scala:136) at Org.apache.spark.storage.MemoryStore.putIterator (Memorystore.scala:114) at Org.apache.spark.storage.BlockManager.doPut (Blockmanager.scala:787) at Org.apache.spark.storage.BlockManager.putIterator (Blockmanager.scala:33]) at Org.apache.spark.storage.BlockManager.putSingle (Blockmanager.scala:992) at Org.apache.spark.broadcast.TorrentBroadcast.writeBlocks (Torrentbroadcast.scala:98) at Org.apache.spark.broadcast.TorrentBroadcast.<init> (torrentbroadcast.scala:84) at Org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast (Torrentbroadcastfactory.scala:34) at Org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast (Torrentbroadcastfactory.scala:29) at Org.apache.spark.broadcast.BroadcastManager.newBroadcast (Broadcastmanager.scala:62) at Org.apache.spark.SparkContext.broadcast (Sparkcontext.scala:945) at Org.apache.spark.SparkContext.hadoopFile (Sparkcontext.scala:695) at Org.apache.spark.SparkContext.textFile (Sparkcontext.scala:540) at Org.apache.spark.api.java.JavaSparkContext.textFile (Javasparkcontext.scala:184) at Org.admln.java7OperateSpark.OperateSpark.main (Operatespark.java:27)

In the shell are:

Exception in thread "main" Java.lang.VerifyError:classOrg.apache.hadoop.hdfs.protocol.proto.clientnamenodeprotocolprotos$addblockrequestproto OverridesFinalMethod Getunknownfields. () lcom/google/protobuf/Unknownfieldset; At Java.lang.ClassLoader.defineClass1 (Native Method) at Java.lang.ClassLoader.defineClass (Classloader.java:800) at Java.security.SecureClassLoader.defineClass (Secureclassloader.java:142) at Java.net.URLClassLoader.defineClass (URLClassLoader.java:449) ... At Sun.reflect.NativeMethodAccessorImpl.invoke (Nativemethodaccessorimpl.java:57) at Sun.reflect.DelegatingMethodAccessorImpl.invoke (Delegatingmethodaccessorimpl.java:43) at Java.lang.reflect.Method.invoke (Method.java:30S) at Org.apache.spark.deploy.sparksubmit$.launch (Sparksubmit.scala:358) at Org.apache.spark.deploy.sparksubmit$.main (Sparksubmit.scala:75) at Org.apache.spark.deploy.SparkSubmit.main (Sparksubmit.scala)

You can see that the protobuf version conflicts with Hadoop.

Default SPARK1. 2. The protobuf version of 0 is

and hadoop2.2.0 for protobuf2.5.0.

So modify the Pom in spark. XML Recompile build deployment package (takes one hours)

Then run the shell side successfully. But the eclipse still reported the mistake.

This is because I used the MAVEN reference spark package, there is a guava version conflict, the default is

Add a dependency alone

<dependency>        <groupId>com.google.guava</groupId>        <artifactid>guava</ artifactid>        <version>14.0.1</version>    </dependency>

Then eclipse submits the words without error, but the task has not been executed, the report resources are not enough

WARN taskschedulerimpl:initial Job has not accepted any resources; Check your cluster UI to ensure that workers is registered and has sufficient memory

Then add the number of cores to 2 and the memory to 1500M, but still report

INFO sparkdeployschedulerbackend:granted executor ID APP-20150111003236-0000/3 on Hostport hadoop:34766 with 2 cores, 512 .0 MB RAM

That is, the number of cores changed, but the implementation of memory can not be changed, do not know why, there is the same program shell-side commit to normal execution, eclipse external submissions reported insufficient memory

Not to drive memory.

I guess there are two possible causes.

The 1.spark bug,spark_driver_memory variable defaults to 512M, but external modifications do not take effect;

Resources for 2.centos and native windows are confusing because I saw the

ERROR sparkdeployschedulerbackend:asked to remove non-existent Executor 2

The error, I am this machine is 4 core, virtual machine is 2 core.

Do not know why the Internet does not have an eclipse submission of the sample, should be either itself does not support, and client resources will be confused, or no one knows.

Java Operation spark1.2.0

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Java Operation spark1.2.0

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Java Operation spark1.2.0

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support