Overview
All hadoop commands are executed through scripts in the bin/hadoop directory. Running the hadoop script without any parameters will print the description of the command.
Usage: hadoop [-- config confdir] [command] [generic_options] [command_options]
Hadoop has an input option parsing framework that parses parameters when running class.
Command_option |
Description |
-- Config confdir |
Contains all configuration directories. The default directory is $ hadoop_home/CONF. |
Generic_option Command_option |
The set of this option is supported by multiple commands. The commands and their options are described in the following sections. These commands are divided into user commands and administrator commands. |
General items
Dfsadmin, FS, fsck, job and fetchdt all support subordinate options. The application must implement the tool interface before it can support parsing general options.
Generic_name |
Description |
-- Conf <Configuration File> |
Specifies the configuration file of a file. |
-D <property >=< value> |
Specify a value for the property |
-JT <local> or <jobtracker: Port> |
Specify a job tracker. Applies only to jobs. |
-Files <comma separated list of files> |
Use commas to separate files and copy them to the map reduce cluster. Applies only to jobs. |
-Libjars <comma separated list of jars> |
Use commas to separate jar files in classpath. Applies only to jobs. |
-Archives <comma separated list archives> |
Separate Unarchived files with commas. Applies only to jobs. |
USER commands
It is very convenient for hadoop cluster users to use commands.
Archive
Create a hadoop archive. You can find more information in the hadoop archive.
Usage: hadoop archive-archivename name <SRC> * <DEST>
Command_option |
Description |
-Archivename name |
Name of the created Archive |
SRC |
The working path of the file system, usually using a regular expression |
Dest |
Target directory containing the archive file |
Distcp
Recursively copy files or directories. For more information, see the hadoop distcp guide.
Usage: hadoop distcp <srcurl> <desturl>
Command_option |
Description |
Srcurl |
URL source |
Desturl |
Target URL |
FS
Usage: hadoop FS [generic_options] [command_options]
Use hdfs dfs instead.
Use a client to run a common file system.
You can find various command_options in the file system shell guide.
Fsck
Run an HDFS System Check tool. For more information, see fsck.
Usage: hadoop fsck [generic_option] <path> [-move |-delete |-openforwrite] [-file [-blocks [-locations | racks]
Command_option |
Description |
Path |
Start checking this path |
-Move |
Move the wrong file to/lost + found |
-Delete |
Delete the wrong file |
-Openforwrite |
Open a file for writing |
-Files |
Check output files |
-Blocks |
Print quick report |
-Locations |
Print the position of each block |
-Racks |
Print the network topology for the data node location |
Fetchdt
Obtain the delegate token from namenode. For more information, see fetchdt.
Usage: hadoop fetchdt [generic_options] [-- WebService <namenode_http_addr>] <path>
Command_option |
Description |
Filename |
The file name exists in the record |
-- WebService https_address |
Use http instead of RPC |
Jar
Run a jar file. You can package their map reduce files and run this command.
Usage: hadoop jar <jar> [mainclass] ARGs...
This command is required for stream operations. Examples can be found in streaming examples.
You can also run the word statistics example using the jar command. For this example, you can also view it in wordcount example.
Job
Interaction with map reduce job naming.
Usage: hadoop job [generic_options] [-submit <jobfile>] | [Status <job-ID>] | [counter <job-ID> <group_name> <counter-Name>] | [-Kill <job-ID>] | [-events <job-ID> <from-event-#>] | [-history [all] [joboutputdir] | [-list [All] | [kill-task <task-ID>] | [-fail-task <task-ID>] | [-set-priority <job-ID> <priority >]
Command-Options |
Description |
-Submit job-File |
Submit a job |
-Status Job-ID |
Print the percentage of map reduce completion and the number of all jobs |
-Counter job-ID group name counter-name |
Print statistical value |
-Kill job-ID |
Kill this job |
-Events job-id from-event-#-of-Events |
Print the event details received from the specified range of jobtracker. |
-History [all] joboutputdir |
Print work details, failure and death prompts. You can specify the [all] Option to obtain detailed tasks and successful task attempts. |
-List [all] |
Displays completed jobs. List all show all jobs |
-Kill-task-ID |
Kill the task. The task to be killed is not a failed attempt. |
-Fail-task-ID |
Failed task. Failed tasks count as failed attempts |
-Set-priority job-ID priority |
Change the priority of a job. The permitted finite values are very_high, high, normal, low, very_low. |
Pipes
Run an MPS queue job.
Usage: hadoop pipes [-conf <path>] [-jobconf <key = value>, [Key = value],...] [-input <path>] [-output <path>] [-jar <jarfile>]
[-Inputformat <class>] [-Map <class>] [-partitioner <class>] [-Reduce <class>] [-writer <class>] [-Program <executable >] [-reduces <num>]
Commane_option |
Description |
-Conf path |
Job configuration file |
-Jobconf key = value, key = value ,... |
Add/overwrite configuration files |
-Input path |
Input directory |
-Output path |
Output directory |
-Jar File |
JAR File |
-Inputformat class |
Inputformat class |
-Map class |
Java map class |
-Partitioner class |
Java partitioner |
-Reduce class |
Java reduce class |
-Writer class |
Java recordwriter |
-Program executable |
Executable URI |
-Reduces num |
Reduce quantity |
Queue
This command can interact with hadoop job queues.
Usage: hadoop queue [-list] | [-info <job-queue-Name> [showjobs] | [showacls]
Command_option |
Description |
-List |
Obtain the Job Queue configuration list and job-related queue scheduling information in the system. |
-Info job-queue-name [-showjobs] |
Displays the queue Information and Related scheduling information of the specified job queue. If the-showjobs option list exists, the job is submitted to the specified job queue. |
-Showacls |
Displays the queue name and queue operations related to the current user. The list contains only user access queues. |
Version
Print the hadoop version.
Usage: hadoop version
Classname
You can use a hadoop script to execute any class.
Usage: hadoop classname
The name of the class to run is classname.
Classpath
Print the path of the JAR file and requirement library required by hadoop.
Usage: hadoop classpath
Administration command
The hadoop Cluster Administrator can manage clusters based on administrator commands.
Balancer
Run a Server Load balancer tool. The administrator can simply execute Ctrl-C to stop the operation. For more information, see rebalancer.
Usage: hadoop balancer [-threshold <threshold>]
Command_option |
Description |
-Threshold threshold |
Disk capacity percentage. Overwrite the default threshold. |
Daemonlog
Set the log view or setting level for each daemon
Usage: hadoop daemonlog-getlevel <HOST: Port> <Name>
Usage: hadoop daemonlog-setlevel <HOST: Port> <Name> <level>
Command_option |
Description |
-Getlevel HOST: Port name |
Print the Log Level running on the host: port daemon. This command is internally connected to http ://HOST: Port/Loglevel? Log =Name |
-Setlevel HOST: Port name level |
Set the log level of the host: port daemon. This command is internally connected to http ://HOST: Port/Loglevel? Log =Name |
Datanode
Start an HDFS datanode.
Usage: hadoop datanode [-rollback]
Command_option |
Description |
-Rollback |
Roll back the previous version of datanode, which should be used after the old version of datanode and hadoop distributed version is stopped |
Dfsadmin
Start an HDFS management client.
Usage: hadoop dfsadmin [generic_options] [-report] [safemode enter | leave | wait | get] [-refreshnodes] [-finalizeupgrade] [-upgradeprogress status | details | force] [-metasave filename] [-setquota <quota> <dirname>... <dirname>] [-restorefailedstorage true | false | check] [-help [cmd]
Command_option |
Description |
-Report |
Report basic file system information and status |
-Safemode enter/leave/get/Wait |
Security Mode maintenance command. Namenode status in Security Mode 1. Name Space cannot be changed (read-only) 2. You cannot copy or delete a block. When namenode is enabled, it automatically enters safe mode. When the minimum block percentage configured meets the minimum replication status, it automatically leaves safe mode. You can also manually enter the security mode, but you also need to manually exit. |
-Refreshnodes |
Allow connections to namenode and those that should stop or re-enable the set, and re-read the host and excluded files to update to datanode. |
-Finalizeupgrade |
HDFS is upgraded. Datanode deletes the working directories of their previous versions, followed by namenode doing the same thing. This completes the upgrade process. |
-Upgradeprogress status/details/Force |
Request the current distributed upgrade status. Detailed status or force update. |
-Metasave filename |
Use the directory specified by the hadoop. log. dir attribute to save the main data structure of namenode to the file. If the file name already exists, it will be overwritten. Filename will contain each of the following items: 1. datanode heartbeat 2. Block waiting for replication 3. Currently, the copied Block 4. Blocks awaiting Deletion |
Setquota quota dirname... dirname |
Set a quota for each dirname directory. The directory quota is a long integer, And the directory tree name and quantity are hard restrictions. Best working directory, Error Report 1. the user is not an administrator. 2. N is not a positive integer. 3. The directory does not exist or is a file 4. The directory will exceed the new quota. |
-Clrquota dirname... dirname |
Understand the quota of each dirname directory, the best working directory, and fault report 1. The directory does not exist or is a file 2. the user is not an administrator. If the directory does not have a quota, it is not wrong. |
-Restrorefailedstorage true/false/check |
This option will enable/disable storage copies that fail to be restored automatically. If the storage to be failed is available again, the system will try to restore edits or (and) fsimage. 'check' from the checkpoint and the current settings will be returned. |
-Help [cmd] |
Displays the help of a given command, or displays all help if no command is specified. |
Mradmin
Run an MR management client.
Usage: hadoop mradmin [generic_option] [-refreshqueueacls]
Command _ Option |
Description |
-Refreshqueueacls |
Update the ACL queue used by hadoop |
Jobtracker
Run a mapreduce job tracker.
Usage: hadoop jobtracker [dumpconfiguration]
Command_option |
Description |
-Dumpconfiguration |
Dump the configuration used by jobtracker and JSON-format jobtracker and exits use the standard output configuration. |
Namenode
Run namenode. For more information about upgrade, rollback, and initialization, see upgrade rollback.
Usage: hadoop namenode [-format] [-upgrade] [-rollback] [-Finalize] [-importcheckpoint]
Command_option |
Description |
-Format |
Format namenode. It starts namenode, formats it, and closes it. |
-Upgrade |
Namenode should be enabled to upgrade the distributed option of the new hadoop version. |
-Rollback |
Roll back the previous version. The old version of hadoop distributed cluster should be used only after it is stopped. |
-Finalize |
Are you sure you want to delete the status of the previous file system. The latest upgrade is permanent and the rollback option is no longer available. Close namenode |
-Importcheckpoint |
Load the image from a checkpoint directory and save it to the current one. Read the checkpoint directory from the fs. Checkpoint. dir attribute. |
Secondarynamenode
Run HDFS secondary namenode. For more information, see secondary namenode.
Usage: hadoop secondraynamenode [-checkpoint [force] | [-geteditsize]
Command_option |
Description |
-Checkpoint [force] |
If editlog. size> = FS. Checkpoint. size, the checkpoint secondary namenode. If-force is used, the checkpoint ignores editlog. Size. |
-Geteditsize |
Print the edit Log Size |
Tasktracker
Run a tasktracker node of mapreduce.
Usage: hadoop tasktracker
Apache hadoop 2.4.1 command reference