Apache hadoop 2.4.1 command reference

Source: Internet
Author: User
Tags map class hdfs dfs hadoop fs
Overview

All hadoop commands are executed through scripts in the bin/hadoop directory. Running the hadoop script without any parameters will print the description of the command.

Usage: hadoop [-- config confdir] [command] [generic_options] [command_options]

Hadoop has an input option parsing framework that parses parameters when running class.

Command_option Description
-- Config confdir Contains all configuration directories. The default directory is $ hadoop_home/CONF.

Generic_option

Command_option

The set of this option is supported by multiple commands. The commands and their options are described in the following sections. These commands are divided into user commands and administrator commands.

General items

Dfsadmin, FS, fsck, job and fetchdt all support subordinate options. The application must implement the tool interface before it can support parsing general options.

Generic_name Description
-- Conf <Configuration File> Specifies the configuration file of a file.
-D <property >=< value> Specify a value for the property
-JT <local> or <jobtracker: Port> Specify a job tracker. Applies only to jobs.
-Files <comma separated list of files> Use commas to separate files and copy them to the map reduce cluster. Applies only to jobs.
-Libjars <comma separated list of jars> Use commas to separate jar files in classpath. Applies only to jobs.
-Archives <comma separated list archives> Separate Unarchived files with commas. Applies only to jobs.

USER commands

It is very convenient for hadoop cluster users to use commands.

Archive

Create a hadoop archive. You can find more information in the hadoop archive.

Usage: hadoop archive-archivename name <SRC> * <DEST>

Command_option Description
-Archivename name Name of the created Archive
SRC The working path of the file system, usually using a regular expression
Dest Target directory containing the archive file

Distcp

Recursively copy files or directories. For more information, see the hadoop distcp guide.

Usage: hadoop distcp <srcurl> <desturl>

Command_option Description
Srcurl URL source
Desturl Target URL

FS

Usage: hadoop FS [generic_options] [command_options]

Use hdfs dfs instead.

Use a client to run a common file system.

You can find various command_options in the file system shell guide.

Fsck

Run an HDFS System Check tool. For more information, see fsck.

Usage: hadoop fsck [generic_option] <path> [-move |-delete |-openforwrite] [-file [-blocks [-locations | racks]

Command_option Description
Path Start checking this path
-Move Move the wrong file to/lost + found
-Delete Delete the wrong file
-Openforwrite Open a file for writing
-Files Check output files
-Blocks Print quick report
-Locations Print the position of each block
-Racks Print the network topology for the data node location

Fetchdt

Obtain the delegate token from namenode. For more information, see fetchdt.

Usage: hadoop fetchdt [generic_options] [-- WebService <namenode_http_addr>] <path>

Command_option

Description

Filename The file name exists in the record
-- WebService https_address Use http instead of RPC
Jar

Run a jar file. You can package their map reduce files and run this command.

Usage: hadoop jar <jar> [mainclass] ARGs...

This command is required for stream operations. Examples can be found in streaming examples.

You can also run the word statistics example using the jar command. For this example, you can also view it in wordcount example.

Job

Interaction with map reduce job naming.

Usage: hadoop job [generic_options] [-submit <jobfile>] | [Status <job-ID>] | [counter <job-ID> <group_name> <counter-Name>] | [-Kill <job-ID>] | [-events <job-ID> <from-event-#>] | [-history [all] [joboutputdir] | [-list [All] | [kill-task <task-ID>] | [-fail-task <task-ID>] | [-set-priority <job-ID> <priority >]

Command-Options Description
-Submit job-File Submit a job
-Status Job-ID Print the percentage of map reduce completion and the number of all jobs
-Counter job-ID group name counter-name Print statistical value
-Kill job-ID Kill this job
-Events job-id from-event-#-of-Events Print the event details received from the specified range of jobtracker.
-History [all] joboutputdir Print work details, failure and death prompts. You can specify the [all] Option to obtain detailed tasks and successful task attempts.
-List [all] Displays completed jobs. List all show all jobs
-Kill-task-ID Kill the task. The task to be killed is not a failed attempt.
-Fail-task-ID Failed task. Failed tasks count as failed attempts
-Set-priority job-ID priority Change the priority of a job. The permitted finite values are very_high, high, normal, low, very_low.

Pipes

Run an MPS queue job.

Usage: hadoop pipes [-conf <path>] [-jobconf <key = value>, [Key = value],...] [-input <path>] [-output <path>] [-jar <jarfile>]

[-Inputformat <class>] [-Map <class>] [-partitioner <class>] [-Reduce <class>] [-writer <class>] [-Program <executable >] [-reduces <num>]

Commane_option Description
-Conf path Job configuration file
-Jobconf key = value, key = value ,... Add/overwrite configuration files
-Input path Input directory
-Output path Output directory
-Jar File JAR File
-Inputformat class Inputformat class
-Map class Java map class
-Partitioner class Java partitioner
-Reduce class Java reduce class
-Writer class Java recordwriter
-Program executable Executable URI
-Reduces num Reduce quantity

Queue

This command can interact with hadoop job queues.

Usage: hadoop queue [-list] | [-info <job-queue-Name> [showjobs] | [showacls]

Command_option Description
-List Obtain the Job Queue configuration list and job-related queue scheduling information in the system.
-Info job-queue-name [-showjobs] Displays the queue Information and Related scheduling information of the specified job queue. If the-showjobs option list exists, the job is submitted to the specified job queue.
-Showacls Displays the queue name and queue operations related to the current user. The list contains only user access queues.

Version

Print the hadoop version.

Usage: hadoop version

Classname

You can use a hadoop script to execute any class.

Usage: hadoop classname

The name of the class to run is classname.

Classpath

Print the path of the JAR file and requirement library required by hadoop.

Usage: hadoop classpath

Administration command

The hadoop Cluster Administrator can manage clusters based on administrator commands.

Balancer

Run a Server Load balancer tool. The administrator can simply execute Ctrl-C to stop the operation. For more information, see rebalancer.

Usage: hadoop balancer [-threshold <threshold>]

Command_option Description
-Threshold threshold Disk capacity percentage. Overwrite the default threshold.

Daemonlog

Set the log view or setting level for each daemon

Usage: hadoop daemonlog-getlevel <HOST: Port> <Name>

Usage: hadoop daemonlog-setlevel <HOST: Port> <Name> <level>

Command_option Description
-Getlevel HOST: Port name Print the Log Level running on the host: port daemon. This command is internally connected to http ://HOST: Port/Loglevel? Log =Name
-Setlevel HOST: Port name level Set the log level of the host: port daemon. This command is internally connected to http ://HOST: Port/Loglevel? Log =Name

Datanode

Start an HDFS datanode.

Usage: hadoop datanode [-rollback]

Command_option Description
-Rollback Roll back the previous version of datanode, which should be used after the old version of datanode and hadoop distributed version is stopped

Dfsadmin

Start an HDFS management client.

Usage: hadoop dfsadmin [generic_options] [-report] [safemode enter | leave | wait | get] [-refreshnodes] [-finalizeupgrade] [-upgradeprogress status | details | force] [-metasave filename] [-setquota <quota> <dirname>... <dirname>] [-restorefailedstorage true | false | check] [-help [cmd]

Command_option Description
-Report Report basic file system information and status
-Safemode enter/leave/get/Wait Security Mode maintenance command. Namenode status in Security Mode

1. Name Space cannot be changed (read-only)

2. You cannot copy or delete a block.

When namenode is enabled, it automatically enters safe mode. When the minimum block percentage configured meets the minimum replication status, it automatically leaves safe mode. You can also manually enter the security mode, but you also need to manually exit.

-Refreshnodes Allow connections to namenode and those that should stop or re-enable the set, and re-read the host and excluded files to update to datanode.
-Finalizeupgrade HDFS is upgraded. Datanode deletes the working directories of their previous versions, followed by namenode doing the same thing. This completes the upgrade process.
-Upgradeprogress status/details/Force Request the current distributed upgrade status. Detailed status or force update.
-Metasave filename

Use the directory specified by the hadoop. log. dir attribute to save the main data structure of namenode to the file. If the file name already exists, it will be overwritten. Filename will contain each of the following items:

1. datanode heartbeat

2. Block waiting for replication

3. Currently, the copied Block

4. Blocks awaiting Deletion

Setquota quota dirname... dirname

Set a quota for each dirname directory. The directory quota is a long integer, And the directory tree name and quantity are hard restrictions. Best working directory, Error Report

1. the user is not an administrator.

2. N is not a positive integer.

3. The directory does not exist or is a file

4. The directory will exceed the new quota.

-Clrquota dirname... dirname

Understand the quota of each dirname directory, the best working directory, and fault report

1. The directory does not exist or is a file

2. the user is not an administrator. If the directory does not have a quota, it is not wrong.

-Restrorefailedstorage true/false/check This option will enable/disable storage copies that fail to be restored automatically. If the storage to be failed is available again, the system will try to restore edits or (and) fsimage. 'check' from the checkpoint and the current settings will be returned.
-Help [cmd] Displays the help of a given command, or displays all help if no command is specified.

Mradmin

Run an MR management client.

Usage: hadoop mradmin [generic_option] [-refreshqueueacls]

Command _ Option Description
-Refreshqueueacls Update the ACL queue used by hadoop

Jobtracker

Run a mapreduce job tracker.

Usage: hadoop jobtracker [dumpconfiguration]

Command_option Description
-Dumpconfiguration Dump the configuration used by jobtracker and JSON-format jobtracker and exits use the standard output configuration.

Namenode

Run namenode. For more information about upgrade, rollback, and initialization, see upgrade rollback.

Usage: hadoop namenode [-format] [-upgrade] [-rollback] [-Finalize] [-importcheckpoint]

Command_option Description
-Format Format namenode. It starts namenode, formats it, and closes it.
-Upgrade Namenode should be enabled to upgrade the distributed option of the new hadoop version.
-Rollback Roll back the previous version. The old version of hadoop distributed cluster should be used only after it is stopped.
-Finalize Are you sure you want to delete the status of the previous file system. The latest upgrade is permanent and the rollback option is no longer available. Close namenode
-Importcheckpoint Load the image from a checkpoint directory and save it to the current one. Read the checkpoint directory from the fs. Checkpoint. dir attribute.

Secondarynamenode

Run HDFS secondary namenode. For more information, see secondary namenode.

Usage: hadoop secondraynamenode [-checkpoint [force] | [-geteditsize]

Command_option Description
-Checkpoint [force] If editlog. size> = FS. Checkpoint. size, the checkpoint secondary namenode. If-force is used, the checkpoint ignores editlog. Size.
-Geteditsize

Print the edit Log Size

Tasktracker

Run a tasktracker node of mapreduce.

Usage: hadoop tasktracker

Apache hadoop 2.4.1 command reference

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.