problem one, maven packs
The Spark project, the idea compiler, the way to import the jar package is no problem, the compilation after the Maven guide package is also in accordance with the previous project structure manner, the execution Spark-submit error: Exception in thread "main" Java.lang.SecurityException:Invalid signature File Digest for Manifest main attributes
Exception in thread "main" Java.lang.SecurityException:Invalid signature file Digest for Manifest main attributes at
Sun.security.util.SignatureFileVerifier.processImpl (signaturefileverifier.java:286) at
Sun.security.util.SignatureFileVerifier.process (signaturefileverifier.java:239) at
Java.util.jar.JarVerifier.processEntry (jarverifier.java:317) at
java.util.jar.JarVerifier.update ( jarverifier.java:228) at
java.util.jar.JarFile.initializeVerifier (jarfile.java:348)
The
can execute zip-d xxx.jar ' meta-inf/with a good jar package. SF ' meta-inf/. The RSA ' meta-inf/*sf '
command deletes the associated file inside the jar package.
This problem should occur because there are some certificate files in the referenced third-party jar package.
Of course, you can also use MVN packaging to perform mvn clean package
Pom-related configuration in terminal:
<plugin> <groupId>org.apache.maven.plugins</groupId> <artifactid>m
Aven-shade-plugin</artifactid> <version>2.2</version> <executions>
<execution> <id>make-assembly</id> <phase>package</phase> <goals> <goal>shade< /goal> </goals> </execution> </executions>
;
<configuration> <filters> <filter> <artifact>spark-streaming-twitter_2.10:spark-streaming-twitter_2.10</artifact> & Lt;includes> <include>**</include> </include
S> </filter> <filter> <artifact> ; *:* </artifact> <excludes> <exclude>meta-inf /*. Sf</exclude> <exclude>meta-inf/*. Dsa</exclude> <exclude>meta-inf/*. Rsa</exclude> <exclude>meta-inf/*.
Mf</exclude> <exclude>junit:junit</exclude>
<exclude>org.apache.maven:lib:tests</exclude> </excludes> </filter> </filters> </configuration> </plu Gin>
question two, SBT packaging
Using the SBT package will only package the program, and the dependent jar packages will not be included in the same jar, and the sbt-assembly plug-in can be used to hit the dependency package into a jar file (Fat jar)
can refer to:
http://todu.top/spark/sbt-idea-%E5%85%A5%E9%97%A8%E5%8F%8A%E9%85%8D%E7%BD%AE/
His typesetting I do not feel good, you can view the reprint version:
http://blog.csdn.net/x1066988452/article/details/51672660
But there's a problem. He's wrong: about Assemblypackagedependency, I don't know if he's running a successful package, but Git says:
To make a JAR file containing only the external dependencies, type
>assemblypackagedependency
That is, contains only the dependency package, will not contain your code, I tried, can be and jar package the same path below the classes inside the folder directly into the jar package inside, can run. I think I can use the assemblypackagedependency to get the dependency pack, and then use the SBT clean package only to hit the packages, the Spark-submit reference dependency, so dependency do not need to move, Each package is smaller.
Or it can be packaged like this:
SBT Clean Assembly
He will put all the dependencies and your own code into a jar package. question three, textfile read the file
Textfile (path1/*) way will be path1 path below the files are read into the RDD, but if path1 is an empty path, and then execution down to the action operator will be error, you can determine the size of the path1 below
Val conf = new Configuration ()
val fileSystem = filesystem.get (conf)
filesystem.getcontentsummary (New Path ( Path). getlength
Filesystem.getfilestatus (path) of the new path. Getlen
Can get the size of the file, if it is the directory will get 0