Spark version: 1.1.1
This article is from the Official document translation, reproduced please respect the work of the translator, note the following links:
Http://www.cnblogs.com/zhangningbo/p/4135808.html
Directory
- Web UI
- Event Log
- Network security (configuration port)
- Port only for standalone mode
- Universal port for all cluster managers
Now, spark supports authentication with a shared secret key. The Enable authentication feature can be configured via the parameter spark.authenticate. This parameter controls whether the spark communication protocol uses a shared secret key for authentication. This authentication method is based on the handshake mechanism to ensure that both sides of the communication have the same shared secret key to communicate. If the shared secret key is inconsistent, the two parties will not be able to communicate. You can create a shared secret key by using the following procedure:
- In spark on yarn deployment mode, configuration Spark.authenticate is true to automatically generate and distribute the shared secret key. Each application uses a unique shared secret key.
- In other deployment modes, the parameter Spark.authenticate.secret should be configured on each node. This key will be used by all master, worker, and application programs.
- Note: The Experimental Netty Shuffle Path (spark.shuffle.use.netty) is not secure, so do not use Netty for shuffle If you enable the authentication feature.
Web UI
A secure spark UI can be implemented by setting the parameter spark.ui.filters to use the javax servlet filters. If the user has some data that should not be visible to others, then the user will require that the UI also be secure. The user specifies the Java servlet filter to be authenticated. Next, once the user logs in, spark can query the ACL list to see if the user has permission to view the UI. Configuration parameters spark.acls.enable and Spark.ui.view.acls control the behavior of the ACL. Note: Users who start the application always have access to the UI. On yarn, the Spark UI uses the standard yarn Web proxy mechanism and is certified with the installed Hadoop filters.
Spark also supports modifying ACL tables to control which user has permission to modify a running spark application. For example, kill an application or task. This type of operation is controlled by configuring Spark.acls.enable and Spark.modify.acls. Note: If you are authenticating the Web UI, you must add the user to the View ACLs table in order to use the Kill button on the Web UI. On yarn, the modified ACLs is passed in and controls which user can access the Web UI through the yarn interface.
If more than one administrator exists, Spark allows multiple administrators to be specified in ACLs so that they can always view all applications and modify the permissions of the app. This function is controlled by the configuration parameter spark.admin.acls. This is useful on shared clusters because there are often multiple administrators in such clusters or technical support staff that help users debug their programs.
Event Log
If your application is using event logging, you should manually create a path to hold the event log(spark.eventLog.dir),并赋予其合适的权限。如果你想让这些日志文件也是安全的,那么,该路径的权限应当设为drwxrwxrwxt。该路径的所有者应该是正在运行history server进程的那个超级用户,而且用户组权限应限制为超级用户组。这样做可以让所有用户都能对该路径执行写操作,但会阻止那些未经授权的用户删除或重命名文件,除非他们是该文件或者路径的所有者。事件日志文件由spark创建并赋予权限,比如只有所有者及其所在用户组有读写权限。
Network Security (Configure Port)
Spark uses a lot of network, and some environments are strict with strict firewall settings. Here is the primary port that Spark uses for communication, and how to configure these ports.
Port only for standalone mode
From |
To |
Default port |
Use |
Configuration |
Note |
browser |
< Span style= "FONT-SIZE:15PX; Font-family: Imitation; " >standalone Master |
8080 |
web UI |
spark.master.ui.port/ spark_master_webui_port |
jetty-based, only for standalone mode |
browser |
< Span style= "FONT-SIZE:15PX; Font-family: Imitation; " >standalone Worker |
8081 |
web UI |
spark.worker.ui.port/ spark_worker_webui_port |
jetty-based , only for standalone mode |
driver/ standalone Worker |
Standalone Master |
7077 |
|
spark_master_port |
akka-based. Set to 0 means random select port, only for standalone mode |
Standalone Master |
Standalone Worker |
Random |
Dispatch executors |
Spark_worker_port |
Akka-based. Set to 0 means random select port, only for standalone mode |
Universal port for all cluster managers
from |
to |
default Port |
purpose |
configuration |
remarks |
Browser |
Application |
4040 |
Web UI |
Spark.ui.port |
Jetty-based |
Browser |
History Server |
18080 |
Web UI |
Spark.history.ui.port |
Jetty-based |
executor/ standalone Master |
driver |
random |
Connect to Application, or notify executor status change |
Spark.driver.port |
akka-based. Set to 0 means randomly select port |
driver |
executor |
random |
dispatch task tasks |
spark.executor.port |
akka-based. Set to 0 means randomly select port |
executor |
driver |
random |
files and jars file servers |
spark.fileserver.port |
jetty-based |
executor |
driver |
random |
http broadcast |
spark.broadcast.port |
jetty-based, Torrentbroadcast does not use this port, it sends data through the Block manager |
executor |
driver |
random |
|
spark.replclassserver.port |
jetty-based, Only for spark shell |
Executor/driver |
Executor/driver |
Random |
Block Manager Port |
Spark.blockManager.port |
Raw socket via Serversocketchannel |
For security configuration parameters see Configuration page, security mechanism implementation reference org.apache.spark.SecurityManager
.
"Spark learning" Apache Spark security mechanism