"Spark learning" Apache Spark security mechanism

Source: Internet
Author: User
Tags shuffle

Spark version: 1.1.1

This article is from the Official document translation, reproduced please respect the work of the translator, note the following links:

Http://www.cnblogs.com/zhangningbo/p/4135808.html

Directory
    • Web UI
    • Event Log
    • Network security (configuration port)
    1. Port only for standalone mode
    2. Universal port for all cluster managers

Now, spark supports authentication with a shared secret key. The Enable authentication feature can be configured via the parameter spark.authenticate. This parameter controls whether the spark communication protocol uses a shared secret key for authentication. This authentication method is based on the handshake mechanism to ensure that both sides of the communication have the same shared secret key to communicate. If the shared secret key is inconsistent, the two parties will not be able to communicate. You can create a shared secret key by using the following procedure:

    • In spark on yarn deployment mode, configuration Spark.authenticate is true to automatically generate and distribute the shared secret key. Each application uses a unique shared secret key.
    • In other deployment modes, the parameter Spark.authenticate.secret should be configured on each node. This key will be used by all master, worker, and application programs.
    • Note: The Experimental Netty Shuffle Path (spark.shuffle.use.netty) is not secure, so do not use Netty for shuffle If you enable the authentication feature.

Web UI

A secure spark UI can be implemented by setting the parameter spark.ui.filters to use the javax servlet filters. If the user has some data that should not be visible to others, then the user will require that the UI also be secure. The user specifies the Java servlet filter to be authenticated. Next, once the user logs in, spark can query the ACL list to see if the user has permission to view the UI. Configuration parameters spark.acls.enable and Spark.ui.view.acls control the behavior of the ACL. Note: Users who start the application always have access to the UI. On yarn, the Spark UI uses the standard yarn Web proxy mechanism and is certified with the installed Hadoop filters.

Spark also supports modifying ACL tables to control which user has permission to modify a running spark application. For example, kill an application or task. This type of operation is controlled by configuring Spark.acls.enable and Spark.modify.acls. Note: If you are authenticating the Web UI, you must add the user to the View ACLs table in order to use the Kill button on the Web UI. On yarn, the modified ACLs is passed in and controls which user can access the Web UI through the yarn interface.

If more than one administrator exists, Spark allows multiple administrators to be specified in ACLs so that they can always view all applications and modify the permissions of the app. This function is controlled by the configuration parameter spark.admin.acls. This is useful on shared clusters because there are often multiple administrators in such clusters or technical support staff that help users debug their programs.

Event Log

If your application is using event logging, you should manually create a path to hold the event log(spark.eventLog.dir),并赋予其合适的权限。如果你想让这些日志文件也是安全的,那么,该路径的权限应当设为drwxrwxrwxt。该路径的所有者应该是正在运行history server进程的那个超级用户,而且用户组权限应限制为超级用户组。这样做可以让所有用户都能对该路径执行写操作,但会阻止那些未经授权的用户删除或重命名文件,除非他们是该文件或者路径的所有者。事件日志文件由spark创建并赋予权限,比如只有所有者及其所在用户组有读写权限。

Network Security (Configure Port)

Spark uses a lot of network, and some environments are strict with strict firewall settings. Here is the primary port that Spark uses for communication, and how to configure these ports.

Port only for standalone mode
From To Default port Use Configuration Note
browser < Span style= "FONT-SIZE:15PX; Font-family: Imitation; " >standalone Master 8080 web UI spark.master.ui.port/ spark_master_webui_port jetty-based, only for standalone mode
browser < Span style= "FONT-SIZE:15PX; Font-family: Imitation; " >standalone Worker 8081 web UI spark.worker.ui.port/ spark_worker_webui_port jetty-based , only for standalone mode
driver/ standalone Worker Standalone Master 7077 spark_master_port akka-based. Set to 0 means random select port, only for standalone mode
Standalone Master Standalone Worker Random Dispatch executors Spark_worker_port Akka-based. Set to 0 means random select port, only for standalone mode

Universal port for all cluster managers
from to default Port purpose configuration remarks
Browser Application 4040 Web UI Spark.ui.port Jetty-based
Browser History Server 18080 Web UI Spark.history.ui.port Jetty-based
executor/ standalone Master driver random Connect to Application, or notify executor status change Spark.driver.port akka-based. Set to 0 means randomly select port
driver executor random dispatch task tasks spark.executor.port akka-based. Set to 0 means randomly select port
executor driver random files and jars file servers spark.fileserver.port jetty-based
executor driver random http broadcast spark.broadcast.port jetty-based, Torrentbroadcast does not use this port, it sends data through the Block manager
executor driver random spark.replclassserver.port jetty-based, Only for spark shell
Executor/driver Executor/driver Random Block Manager Port Spark.blockManager.port Raw socket via Serversocketchannel

For security configuration parameters see Configuration page, security mechanism implementation reference org.apache.spark.SecurityManager .

"Spark learning" Apache Spark security mechanism

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.