The original code:
javardd<articlereply> Javardd = Rdd.flatmap (NewFlatmapfunction<string, articlereply>() { Private Static Final LongSerialversionuid = 10000L; List<ArticleReply> NewList =NewArraylist<articlereply>(); PublicIterable<articlereply> Call (String line)throwsException {string[] splits= Line.split ("\ t"); Articlereply Bean=Newarticlereply (); Bean.setareaid (split[0]); Bean.setagent (Integer.parseint (splits[1])); Bean.setserial (splits[2]); Newlist.add (Bean); returnNewList; } });
Correct wording:
javardd<articlereply> Javardd = Rdd.flatmap (new flatmapfunction<string, articlereply>() {
Private static final Long serialversionuid = 10000L; Public iterable<articlereply> call (String line) throws Exception {
list<articlereply> NewList = new arraylist<articlereply> (); String[] splits = Line.split ("\ t"); articlereply bean = new articlereply (); Bean.setareaid (split[0]); Bean.setagent (Integer.parseint (splits[1 ])); Bean.setserial (splits[2]); Newlist.add (bean); return newlist;});
In the wrong way, the list is declared and initialized in the Flatmap function, resulting in each call to the Flatmap function, the list of beans will be added one, and the program will change the list back, then the object that spark receives 1+2+3+...+n,
Instead of N, it consumes Spark's memory substantially, causing the spark to run out of memory.
Spark encountered error 1-Insufficient memory