Use Hadoop ACL to control access permissions
Use Hadoop ACL to control access permissions
I. HDFS Access Control
Hdfs-site.xml settings startup acl
<Property>
<Name> dfs. permissions. enabled </name>
<Value> true </value>
</Property>
<Property>
<Name> dfs. namenode. acls. enabled </name>
<Value> true </value>
</Property>
Core-site.xml sets user group default permissions.
<Property>
<Name> fs. permissions. umask-mode </name>
<Value> 002 </value>
</Property>
The requirements and solutions are as follows:
1. Apart from the data warehouse owner, normal users cannot create databases or tables in the default database.
The default permissions of/user/hive/warehouse are changed to 755. If the owner is hadoop (or the data warehouse owner), no one can create a database or create a table in the default database.
2. After the data warehouse owner creates a database, it can be assigned to the project team, where the project team can create tables.
Change/user/hive/warehouse/database. db owner to project team.
3. After the data warehouse owner creates a database, he does not assign the table creation permission to the project team. Instead, he creates a table for it and only allows the project team to insert partitions.
The data warehouse owner keeps the permissions of/user/hive/warehouse/database. db. The project team cannot create tables. After the data warehouse owner creates a table for the project team, the table directory is assigned to the project team.
4. Some tables can only be read and written by the project team.
/User/hive/warehouse/database. db/table name directory changed to 770.
5. Some tables can only be read and written by special users in the project team.
Change the owner of the/user/hive/warehouse/database. db/table name directory to this user and change the permission to 700.
6. For tables in the project team, special users in other groups are required to insert data.
Use the following command to map the dntest. the database table testp1 has the write permission for hdfs dfs-setfacl-R-m user: mapengxu: rwx/user/hive/warehouse/cdntest. db/testp1
7. The table of the project team requires special users in other groups to have the permission to read data.
Hdfs dfs-setfacl-R-m user: mapengxu: r-x/user/hive/warehouse/cdntest. db/testp1
8. For tables in the project team, all users in other groups must have the permission to read data.
Hdfs dfs-setfacl-R-m group: data_sum: r-x/user/hive/warehouse/cdntest. db/testp1
9. Create a default database. All users of this database have the permission to create tables, but only save for 30 days.
The permission of/user/hive/warehouse/database. db is changed to 777, and the scheduled task is set to scan this directory and hive database. If a table has been created for more than 30 days, delete the table and its directory.
10. This measure is combined with basic SQL access control.
Task Scheduling
Manage queues by user group, unified permissions in the portal and jenkins, allocate resources by group, to facilitate statistics by project team every day, the number of cluster resources occupied by each week. mapred-site.xml configuration is as follows:
<Property>
<Name> mapred. acls. enabled </name>
<Value> true </value>
</Property>
<Property>
<Name> mapred. fairschedproperty. poolnameproperty </name>
<Value> group. name </value>
</Property>
Fair-scheduler.xml configuration is as follows:
<? Xml version = "1.0"?>
<Allocations>
<Pool name = "cdn">
<MaxResources> 1000 vcores </maxResources>
<MaxRunningJobs> 10 </maxRunningJobs>
<Weight> 1.0 </weight>
<SchedulingPolicy> fair </schedulingPolicy>
</Pool>
<Pool name = "data_sum">
<MaxResources> 1000 vcores </maxResources>
<MaxRunningJobs> 10 </maxRunningJobs>
<Weight> 1.0 </weight>
<SchedulingPolicy> fair </schedulingPolicy>
</Pool>
<Usermaxcompute default> 2 </usermaxcompute default>
<QueuePlacementPolicy>
<Rule name = "primaryGroup" create = "false"/>
<Rule name = "secondaryGroupExistingQueue" create = "false"/>
<Rule name = "user" create = "false"/>
<Rule name = "reject"/>
Tutorial on standalone/pseudo-distributed installation and configuration of Hadoop2.4.1 under Ubuntu14.04
Install and configure Hadoop2.2.0 on CentOS
Build a Hadoop environment on Ubuntu 13.04
Cluster configuration for Ubuntu 12.10 + Hadoop 1.2.1
Build a Hadoop environment on Ubuntu (standalone mode + pseudo Distribution Mode)
Configuration of Hadoop environment in Ubuntu
Detailed tutorial on creating a Hadoop environment for standalone Edition