Spark Project Issue Record

Source: Internet
Author: User
problem one, maven packs

The Spark project, the idea compiler, the way to import the jar package is no problem, the compilation after the Maven guide package is also in accordance with the previous project structure manner, the execution Spark-submit error: Exception in thread "main" Java.lang.SecurityException:Invalid signature File Digest for Manifest main attributes

Exception in thread "main" Java.lang.SecurityException:Invalid signature file Digest for Manifest main attributes at
        Sun.security.util.SignatureFileVerifier.processImpl (signaturefileverifier.java:286) at
        Sun.security.util.SignatureFileVerifier.process (signaturefileverifier.java:239) at
        Java.util.jar.JarVerifier.processEntry (jarverifier.java:317) at
        java.util.jar.JarVerifier.update ( jarverifier.java:228) at
        java.util.jar.JarFile.initializeVerifier (jarfile.java:348)

The

can execute zip-d xxx.jar ' meta-inf/with a good jar package. SF ' meta-inf/. The RSA ' meta-inf/*sf '
command deletes the associated file inside the jar package.
This problem should occur because there are some certificate files in the referenced third-party jar package.
Of course, you can also use MVN packaging to perform mvn clean package
Pom-related configuration in terminal:

<plugin> <groupId>org.apache.maven.plugins</groupId> <artifactid>m 
                    Aven-shade-plugin</artifactid> <version>2.2</version> <executions>
                        <execution> <id>make-assembly</id> <phase>package</phase> <goals> <goal>shade< /goal> </goals> </execution> </executions&gt
                ;
                            <configuration> <filters> <filter> <artifact>spark-streaming-twitter_2.10:spark-streaming-twitter_2.10</artifact> & Lt;includes> <include>**</include> </include
S>                        </filter> <filter> <artifact&gt ; *:* </artifact> <excludes> <exclude>meta-inf /*. Sf</exclude> <exclude>meta-inf/*. Dsa</exclude> <exclude>meta-inf/*. Rsa</exclude> <exclude>meta-inf/*.
                                Mf</exclude> <exclude>junit:junit</exclude>
                        <exclude>org.apache.maven:lib:tests</exclude> </excludes> </filter> </filters> </configuration> </plu Gin>
question two, SBT packaging

Using the SBT package will only package the program, and the dependent jar packages will not be included in the same jar, and the sbt-assembly plug-in can be used to hit the dependency package into a jar file (Fat jar)
can refer to:
http://todu.top/spark/sbt-idea-%E5%85%A5%E9%97%A8%E5%8F%8A%E9%85%8D%E7%BD%AE/
His typesetting I do not feel good, you can view the reprint version:
http://blog.csdn.net/x1066988452/article/details/51672660
But there's a problem. He's wrong: about Assemblypackagedependency, I don't know if he's running a successful package, but Git says:

To make a JAR file containing only the external dependencies, type 
>assemblypackagedependency

That is, contains only the dependency package, will not contain your code, I tried, can be and jar package the same path below the classes inside the folder directly into the jar package inside, can run. I think I can use the assemblypackagedependency to get the dependency pack, and then use the SBT clean package only to hit the packages, the Spark-submit reference dependency, so dependency do not need to move, Each package is smaller.
Or it can be packaged like this:

SBT Clean Assembly

He will put all the dependencies and your own code into a jar package. question three, textfile read the file

Textfile (path1/*) way will be path1 path below the files are read into the RDD, but if path1 is an empty path, and then execution down to the action operator will be error, you can determine the size of the path1 below

Val conf = new Configuration ()
val fileSystem = filesystem.get (conf)
filesystem.getcontentsummary (New Path ( Path). getlength

Filesystem.getfilestatus (path) of the new path. Getlen
Can get the size of the file, if it is the directory will get 0

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.