The cluster management and security mechanism of Hadoop

Source: Internet
Author: User

HDFS Data Management

1, set the metadata and data storage path, through
dfs.name.dir,dfs.data.dir,fs.checkpoint.dir(hadoop1.x),
hadoop.tmp.dir,dfs.namenode.name.dir,dfs.namenode.edits.dir,dfs.datanode.data.dir(hadoop2.x) and other properties to set;

2, often perform HDFs file system Inspection Tool Fsck,eg: hdfs fsck /liguodong -files -blocks ;

[Email protected] mapreduce]# HDFs fsck/inputconnecting toNamenode via http://slave1:50070FSCK started byRoot (Auth:simple) from/172.23.253.22  forPath/input at Tue June -  +: in: +Cst -. Status:healthy Total size: theB Total dirs:0Total Files:1Total symlinks:0Total blocks (validated):1(avg.BlockSize theB) Minimally replicated blocks:1(100.0%) Over-replicated Blocks:0(0.0%) Under-replicated Blocks:0(0.0%) Mis-replicated Blocks:0(0.0%)DefaultReplication factor:1AverageBlockReplication1.0Corrupt blocks:0Missing Replicas:0(0.0%) Number ofData-nodes:1Number ofRacks:1FSCK ended at Tue June -  +: in: +Cst - inch 1Millisecondsthe filesystem under Path'/input '  isHEALTHY

3, once the data is abnormal, you can set Namenode for Safe mode, then Namenode is read-only mode;
Operation Command:hdfs dfsadmin -safemode enter | leave | get | wait

[Root@slave1MapReduce# HDFs Dfsadmin-reportConfiguredcapacity:52844687360(49.22GB) Presentcapacity:45767090176(42.62GB) DFSRemaining:45766246400(42.62GB) DFSused:843776(824KB) DFS used%:0.00%under Replicatedblocks:0Blocks withCorruptReplicas:0Missingblocks:0-------------------------------------------------Datanodesavailable:1(1Total0Dead) LiveDatanodes:Name:172.23. 253.:50010(SLAVE1)Hostname: Slave1decommission status:normalconfiguredcapacity:52844687360(49.22GB) DFSused:843776(824KB) Non DFSused:7077597184(6.59GB) DFSRemaining:45766246400(42.62GB) DFS used%:0.00%dfs remaining%:86.61%last Contact: Tue June -  +: -: -Cst -[Root@slave1MapReduce# HDFs Dfsadmin-safemode getSafe mode isOFF

4, each datanode will run a data scanning thread, it can detect and repair the command to repair the bad block or lost data block, through the property set scanning period;
dfs.datanode.scan.period.hourses, the default is 504 hours.

MapReduce Job Management

View job information: mapred job -list ;

Kill job: mapred job -kill ;

View a summary of the history log under the specified path: mapred job -history output-dir ;

Print the percentage of map and reduce complete and all counters: mapred job -status job_id ;

[[email protected] mapreduce]# mapred JobUSAGE:CLI <command> <args> [-submit <job-file] [-status <job-ID] [-counter <job-ID> <group-name> <counter-name] [-kill <job-ID] [-Set-priority <job-ID> <priority>]. Valid values forPriorities Are:very_high High NORMAL low very_low [-events <job-ID> < from-event-#> <#-of-events>][-history <jobhistoryfile>] [-List[All]] [-List-active-trackers] [-List-blacklisted-trackers] [-List-attempt-ids <job-ID> <task-type> <task-state>]. Valid values for<task-type> is REDUCE MAP. Valid values for<task-state> isRunning, completed [-kill-task <task-attempt-ID] [-fail-task <task-attempt-ID] [-logs <job-ID> <task-attempt-ID>]generic Options Supported are-conf <configurationfile> Specify anApplicationConfigurationfile-D < Property=value> Use value for given  Property-fs <Local|namenode:port> Specify a Namenode-jt <Local|jobtracker:port> Specify a job tracker-files <comma separatedList  ofFiles> Specify comma separated files tobe copied to  theMap reduce Cluster-libjars <comma separatedList  ofJars> Specify comma separated jar files toIncludeinch  theClasspath.-archives <comma separatedList  ofArchives> Specify Comma Separated archives toBe unarchived on  theCompute machines. [[email protected] mapreduce]# mapred Job-list the/ ./ -  +: -: -INFO client. Rmproxy:connecting toResourceManager at/0.0. 0. 0:8032Total jobs:0JobId State StartTime UserName Queue priority usedcontainers rsvdcontainers usedmem rsvdmem neededmem AM inf O
Hadoop cluster Security

Hadoop comes with two security mechanisms : simple mechanism, Kerberos mechanism

1, simple mechanism:
The simple mechanism is a mechanism for the JAAS protocol to be combined with delegation token, the Jaas (Java Authentication and Authorization service) Java Authentication and authorization service;

(1) When the user submits the job, the Jobtracker side should verify the identity, first verifying whether this person, that is, by checking the execution of the current code of the person and jobconf in the user.name of the user is consistent;

(2) then check the ACL (Access Control List) configuration file (configured by the administrator) to see if you have permission to submit the job. Once you pass the validation, you will get delegation tokens granted by HDFS or MapReduce (accessing different modules with different delegation tokens), and any subsequent operation, such as accessing the file, checks whether the token exists, The user is consistent with the person who previously registered the token.

2. Kerberos mechanism:
Kerberos mechanism is a method based on authentication server;

Princal (Security entity): A certified individual with a name and password;

KDC (Key Distribution Center): is a network service that provides ticket and temporary session keys;

Ticket: A record that customers use to prove their identity to the server, including customer identification, session key, timestamp;

As (authentication server): Authentication server;

TSG (Ticket granting Server): License authentication server;


(1) The client sends the previous TGT and the service information to be requested (service name, etc.) to the KDC, and the
Ticket Granting service in the KDC generates a session between the client and the service Key is used for service-to-client identification.
The KDC then wraps the session key with the user name, user address (IP), service name, validity period, timestamp together into a ticket (which is ultimately used for service-to-client identification) to be sent to the service,
However, the Kerberos protocol does not send ticket directly to the service, but to the service through the client, so there is a second step.
(2) The KDC now forwards the ticket to the client.
Since this ticket is to be given to the service, it cannot be seen by the client, so the key between the KDC and the service will be ticket encrypted before being sent to the client before the protocol starts.
in order to have the key shared between the client and service (the KDC creates the session key for them in the first step), the
KDC uses the key between the client and the session Key encryption is returned to the client along with the encrypted ticket.
(3) in order to complete the delivery of the ticket, the client forwards the ticket just received to service .
Because the client does not know the key between the KDC and the service, it cannot calculate the information in the ticket.
at the same time client will receive the session key decrypted, and then the user name, the user address (IP) package into authenticator with session key encryption also sent to the service. The
(4) service receives ticket and uses the key between it and the KDC to decrypt the information in the ticket to obtain session key and user name, user address (IP), service name, and expiration date. The
then uses session key to decrypt the authenticator to obtain the username, and the user address (IP) compares it with the user name, user address (IP) that was decrypted in the previous ticket to verify the identity of the client.
(5) If the service has a return result, return it to the client.

Using Kerberos for authentication within a Hadoop cluster

Benefits:
Reliable: Hadoop itself does not have authentication capabilities and the ability to create user groups, using a perimeter-dependent authentication system;
Efficient: Kerberos uses symmetric key operation, faster than SSL public key;
Easy to operate: Users can easily operate without the need for very complex instructions. For example, revoking a user only needs to be removed from the Kerbores KDC database.

HDFs Security

1, the client obtains Namenode initial access authentication (using Kerberos), will obtain a delegation token, which can be used as the next access to HDFs or submit the job credentials;
2, the same in order to read a file, the client first to interact with Namenode, get the block access token corresponding to the block,
Then read each block on the corresponding Datanode,
And Datanode in the initial start to Namenode registration, has been in advance to obtain these tokens,
When the client is going to read the block from the Tasktracker, the token is first validated before it is allowed to be read.

MapReduce Security

1, all about the job submission or job running status tracking is implemented with Kerberos authentication RPC.
When an authorized user submits a job, Jobtracker generates a delegation token, which is stored on HDFs as part of the job and distributed to the Tasktracker via RPC, which fails once the job runs.
2. Each task that the user submits to the job is initiated as a user, so that a user's task can not send the operating system signal to tasktracker or other user's task, causing interference to other users. This requires an account to be built for each user on all tasktracker;
3. When a map task runs at the end, it will tell the calculation results to manage its tasktracker, and each reduce task will request to the Tasktracker the piece of data it wants to process via HTTP. Hadoop should ensure that other users are not able to get intermediate results for map tasks,
The process is that the reduce task calculates the HMAC-SHA1 value for the request URL and the current time, and the value is validated as part of the request to Tasktracker,tasktracker when it is received.

The cluster management and security mechanism of Hadoop

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.