An edits and fsimage viewer for Hadoop-2.4.1 Learning

Source: Internet
Author: User

In hadoop, edits and fsimage are two crucial files. edits stores the changes in namespaces after the latest checkpoint and plays a log role, fsimage saves the latest checkpoint information. The content of these two files cannot be directly viewed using a common text editor. Fortunately, hadoop has prepared a dedicated tool for viewing the file content. These tools are oev and OIV, respectively, you can use HDFS for invocation.

Oev is the abbreviation of offline edits viewer. This tool only operates files and does not require the hadoop cluster to be running. The tool provides several output processors to convert an input file to an output file in the relevant format. You can specify it using the-p parameter. Currently, the supported output formats include binary (binary format used by hadoop), XML (default output format when parameter P is not used), and stats (output statistics of edits files ). The input formats supported by this tool are binary and XML, and the XML file is the output file of the tool using the XML processor. Because there is no input file corresponding to the stats format, once the output is in the stats format, it cannot be converted to the original format. For example, if the input format is bianry and the output format is XML, you can specify the input file as the original output file to convert binary and XML, but stats cannot. The specific syntax of this tool is:

Usage: bin/hdfs oev [OPTIONS] -i INPUT_FILE -o OUTPUT_FILEParse a Hadoop edits log file INPUT_FILE and save resultsin OUTPUT_FILE.Required command line arguments:-i,--inputFile <arg>   edits file to process, xml (case insensitive) extension means XML format, any other filename means binary format-o,--outputFile <arg>  Name of output file. If the specified file exists, it will be overwritten, format of the file is determined by -p optionOptional command line arguments:-p,--processor <arg>   Select which type of processor to apply against image file, currently supported processors are: binary (native binary format that Hadoop uses), xml (default, XML format), stats (prints statistics about edits file)-h,--help            Display usage information and exit-f,--fix-txids         Renumber the transaction IDs in the input,so that there are no gaps or invalid transaction IDs.-r,--recover          When reading binary edit logs, use recovery mode.  This will give you the chance to skip corrupt parts of the edit log.-v,--verbose         More verbose output, prints the input and output filenames, for processors that write to a file, also output to screen. On large image files this will dramatically increase processing time (default is false).

The example used by this tool and the content of some output files are as follows:

$ hdfs oev -i edits_0000000000000000081-0000000000000000089 -o edits.xml<?xml version="1.0" encoding="UTF-8"?><EDITS>  <EDITS_VERSION>-56</EDITS_VERSION>  <RECORD>    <OPCODE>OP_DELETE</OPCODE>    <DATA>      <TXID>88</TXID>      <LENGTH>0</LENGTH>      <PATH>/user/hive/test</PATH>      <TIMESTAMP>1413794973949</TIMESTAMP>      <RPC_CLIENTID>a52277d8-a855-41ee-9ca2-a5d0bc7d298a</RPC_CLIENTID>      <RPC_CALLID>3</RPC_CALLID>    </DATA>  </RECORD></EDITS>

In the output file, each record records an operation. In this example, the operation is deleted. When the edits file is damaged and the hadoop cluster becomes faulty, it is possible to save the correct part of the edits file. You can convert the original bianry file to an XML file, manually edit the XML file and convert it back to the bianry file. The most common damage to edits files is the loss of close records (opcode is-1). The close records are as follows. If no close record is disabled in the XML file, you can add a close record after the last correct record. All the records after the close record are ignored.

<RECORD>    <OPCODE>-1</OPCODE>    <DATA>    </DATA></RECORD>

OIV is the abbreviation of offline Image Viewer. It is used to dump the content of the fsimage file to a specified file for ease of reading, the tool also provides read-only webhdfs APIs to allow offline analysis and check of hadoop cluster namespaces. OIV can process very large fsimage files very quickly. If the tool cannot process fsimage, it will exit directly. The tool does not have backward compatibility, for example, using the hadoop-2.4 version of OIV cannot handle the hadoop-2.3 version of fsimage and can only use the hadoop-2.3 version of OIV. Like oev, OIV does not require the hadoop cluster to be running, just as its name prompts offline. You can enter hdfs oiv in the command line to view the OIV syntax.

OIV supports three output processors: ls, XML, and filedistribution, which are specified by option-P. Ls is the default processor. The output of this processor is extremely similar to that of the LSR command, and the same fields are output in the same order, such as the directory or file flag, permission, number of copies, owner, group, file size, modification date, and full path. What is different from LSR is that the output of the processor contains the root path/. Another important difference is that the output of the processor is not sorted by directory name and content, it is displayed in the fsimage order. Unless the namespace contains less information, it is unlikely to directly compare the output of the processor and LSR command. Ls calculates the file size using the information in the inode block and ignores the-skipblocks option. Example:

[[email protected] current]$ hdfs oiv -i fsimage_0000000000000000115 -o fsimage.ls[[email protected] current]$ cat fsimage.lsdrwxr-xr-x  -   hadoop supergroup 1412832662162          0 /drwxr-xr-x  -   hadoop supergroup 1413795010372          0 /userdrwxr-xr-x  -   hadoop supergroup 1414032848858          0 /user/hadoopdrwxr-xr-x  -   hadoop supergroup 1411626881217          0 /user/hadoop/inputdrwxr-xr-x  -   hadoop supergroup 1413770138964          0 /user/hadoop/output

The XML processor outputs fsimage XML documents, including all information in fsimage, such as inodeid. The output of the processor supports the automatic processing and analysis of XML tools. Due to the lengthy XML syntax format, the output of the processor is also the largest. Example:

[[email protected] current]$ hdfs oiv -i fsimage_0000000000000000115 -p XML -o fsimage.xml [[email protected] current]$ cat fsimage.xml<?xml version="1.0"?><fsimage><NameSection><genstampV1>1000</genstampV1><genstampV2>1004</genstampV2><genstampV1Limit>0</genstampV1Limit><lastAllocatedBlockId>1073741828</lastAllocatedBlockId><txid>115</txid></NameSection><INodeSection><lastInodeId>16418</lastInodeId><inode><id>16385</id><type>DIRECTORY</type><name></name><mtime>1412832662162</mtime><permission>hadoop:supergroup:rwxr-xr-x</permission><nsquota>9223372036854775807</nsquota><dsquota>-1</dsquota></inode><inode><id>16386</id><type>DIRECTORY</type><name>user</name><mtime>1413795010372</mtime><permission>hadoop:supergroup:rwxr-xr-x</permission><nsquota>-1</nsquota><dsquota>-1</dsquota></inode></INodeSection></fsimage>

Filedistribution is a tool used to analyze the file size in a namespace. To run the tool, you need to specify the maximum file size and number of segments to define an integer range [0, maxsize]. The Integer Range is divided into several segments based on the number of segments [0, s [1],..., s [n-1], maxsize], the processor calculates how many files fall into each segment ([s [I-1], s [I]), files larger than maxsize always fall into the final segment, namely, s [n-1] And maxsize. The output file is formatted into tables that contain the size and numfiles columns separated by tabs. Size indicates the start of the segment, and numfiles indicates the number of files that fall into the segment. When using the filedistribution processor, you also need to specify the maxsize and Step Parameters of the processor. If not specified, the default value is 0. Example:

[[email protected] current]$ hdfs oiv -i fsimage_0000000000000000115 -o fsimage.fd -p FileDistribution maxSize 1000 step 5[[email protected] current]$ cat fsimage.fdProcessed 0 inodes.SizeNumFiles20971522totalFiles = 2totalDirectories = 11totalBlocks = 2totalSpace = 4112maxFileSize = 1366

An edits and fsimage viewer for Hadoop-2.4.1 Learning

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.