The first way
Action: Package A third-party jar file into the resulting spark application jar file
Scenario: Third-party jar files are relatively small, with fewer places to apply
The second way
Action: Use Spark-submit to submit a command parameter:--jars
Requirements:
1. The corresponding jar file exists on the machine using the Spark-submit command
2. When the jar file is required for services on other machines in the cluster, the jar file is obtained through an HTTP interface provided by driver (for example: http://192.168.187.146:50206/jars/ Mysql-connector-java-5.1.27-bin.jar Added by User)
123 |
## 配置参数:--jars JARS 如下示例: $ bin /spark-shell --jars /opt/cdh-5 .3.6 /hive/lib/mysql-connector-java-5 .1.27-bin.jar |
Scenario: Requires that a corresponding jar file be required locally
The Third Way
Action: Use Spark-submit to submit a command parameter:--packages
# # Configuration parameters: The maven address of the--packages jar package is as follows example: $ Bin/spark-shell--packages mysql:mysql-connector-java:5.1.27-- Repositories http://maven.aliyun.com/nexus/content/groups/public/
# #--repositories is the MAVEN address for the Mysql-connector-java package, and if not given, it will be downloaded using the MAVEN default source installed on the machine
# # If you rely on multiple packages, repeat the above jar package, separated by commas
# # The default downloaded package is located in the. Ivy/jars folder in the current user's root directory
Scenario: Local can not, when the service in the cluster needs the package, is from the given MAVEN address, directly download
Fourth Way
Action: Change the configuration information for SPARK: Spark_classpath, add a third-party jar file to the SPARK_CLASSPATH environment variable
Note: Added third-party jar files must exist on all machines that are required to run the spark application
A. Creating a folder to hold third-party jar files: command: $ mkdir external_jars
B. Modifying the Spark configuration Information command: $ vim conf/spark-env.sh Modified content: Spark_classpath= $SPARK _classpath:/opt/cdh-5.3.6/spark/external_jars/ *
C. Copy the dependent jar files to the new Folder command: $ cp/opt/cdh-5.3.6/hive/lib/mysql-connector-java-5.1.27-bin.jar./external_jars/
Application scenario: The dependent jar package is very many, the writing command method is more cumbersome, the case that relies on the package application is also many cases
Note: (For spark on yarn (cluster) mode only) spark on yarn (cluster), if the app relies on third-party JAR file Final Solution: Copy the third-party jar file to the ${hadoop_home}/share/ Hadoop/common/lib folder (copy is required for all machines in the Hadoop cluster)
Spark application references other JAR packages