Currently in Hadoop used more than lzo,gzip,snappy,bzip2 these 4 kinds of compression format, the author based on practical experience to introduce the advantages and disadvantages of these 4 compression formats and application scenarios, so that we in practice according to the actual situation to choose different compression format.
1 gzip compression
Advantages: The compression ratio is high, and the comp
as the output of a mapreduce job and the input of another mapreduce job.
4 bzip2 Compression
Advantages: Support Split, with a high compression rate, than the gzip compression rate is high; Hadoop itself supports, but does not support native, the Linux system with bzip2 command, easy to use.
Disadvantage: Compression/decompression speed is slow; native is not supported.
Application scenario: Suitable for the speed requirements are not high, but need
, but does not support native; it is easy to use with BZIP2 commands in Linux systems.
Disadvantage: Compression/decompression speed is slow; native is not supported.
Application scenario: Suitable for the speed requirements are not high, but need high compression rate, can be used as the output format of the MapReduce job, or the data after the output is larger, the data after processing need to compress the archive to reduce disk space and later data used relatively small situation , or you wa
not support native; Bzip2 command is provided in Linux for ease of use.
Disadvantages: the compression/Decompression speed is slow; Native is not supported.
Application Scenario: Suitable for scenarios where the speed requirement is not high, but the compression ratio is high, it can be used as the output format of mapreduce jobs; or the output data is large, after processing, the data needs to be compressed and archived to reduce disk space and reduce data usage in the future. Or, if you wan
Hadoop provides multioutputformat to output data to different directories and Fileinputformat to read multiple directories at once, but the default one job can only use Job.setinputformatclass Set up to process data in one format using a inputfomat. If you need to implement the ability to read different format files from different directories at the same time in a job, you will need to implement a multiinputformat to read the files in different
-ROM file system standardsIsp:x-internet Signature DocumentsIST: Digital Tracking device filesIsu:installshield Uninstall ScriptIT: Pulse Tracking System Music Module (MOD) fileITI: Pulse Tracking System equipmentIts: Pulse tracking system sampling, Internet document locationIv:open file formats used in inventorIVD: More than 20/20 microscopic data dimensions or
Comparison of the six most common prototype file formats and six prototype file formats
Internet product partners will not be unfamiliar with the term "prototype. Like "User Experience", it is often spoken by various people. Prototype is a way for users to experience products, exchange design ideas, and display compl
Mapmap read files in different formats This problem has always been, the previous reading method is to get the name of the file in the map, according to the name of different ways to read, such as the following wayFetch file name Inputsplit Inputsplit = Context.getinputsplit (); String FileName = ((filesplit) inputsplit). GetPath (). toString (), if (Filename.con
calculate the checksum again when the data is transmitted through an unreliable channel, in this way, you can see whether the data is damaged. If the two calculation checksum does not match, you think the data is damaged. However, this technology cannot repair the data and can only detect errors. Common error detection code is CRC-32 (cyclic redundancy check), any size of data input is calculated to get a 32-bit integer checksum.
6. Compress and input parts
Algorithm
File name extension
Multiple files
Severability
DEFLATE
No
DEFLATE
. Deflate
No
No
Gzip
Gzip
DEFLATE
. gz
No
No
Zip
Zip
DEFLATE
. zip
Is
Yes, within the scope of the file
Bzip2
Bzip2
Bzip2
. bz2
No
Is
LZO
Algorithm
File name extension
Multiple files
Severability
DEFLATE
No
DEFLATE
. Deflate
No
No
Gzip
Gzip
DEFLATE
. gz
No
No
Zip
Zip
DEFLATE
. zip
Is
Yes, within the scope of the file
Bzip2
Bzip2
Bzip2
. bz2
No
Is
LZO
This document describes how to operate a hadoop file system through experiments.
Complete release directory of "cloud computing distributed Big Data hadoop hands-on"
Cloud computing distributed Big Data practical technology hadoop exchange group:312494188Cloud computing practices will be released in the group every
specified, the trash, if enabled, will be bypassed and the specified file (s) deleted immediately. this can be useful when it is necessary to delete files from an over-quota directory.Example:
Hadoop FS-RMR/user/hadoop/Dir
Hadoop FS-rmr hdfs: // nn.example.com/user/hadoop
(1) First create Java projectSelect File->new->java Project on the Eclipse menu.and is named UploadFile.(2) Add the necessary Hadoop jar packagesRight-click the JRE System Library and select Configure build path under Build path.Then select Add External Jars. Add the jar package and all the jar packages under Lib to your extracted Hadoop source directory.All jar
C # how to convert a PDF file into multiple image file formats (Png/Bmp/Emf/Tiff ),
PDF is one of the most common document formats in our daily work and study, but it is often difficult to edit documents, it is annoying to edit the content of a PDF document or convert the file
Hadoop-2.5.2 cluster installation configuration details, hadoop configuration file details
Reprinted please indicate the source: http://blog.csdn.net/tang9140/article/details/42869531
I recently learned how to install hadoop. The steps below are described in detailI. Environment
I installed it in Linux. For students w
Because HDFs is different from a common file system, Hadoop provides a powerful filesystem API to manipulate HDFs.
The core classes are Fsdatainputstream and Fsdataoutputstream.
Read operation:
We use Fsdatainputstream to read the specified file in HDFs (the first experiment), and we also demonstrate the ability to locate the
When operating files in Windows in Linux, garbled characters are often encountered. For example, C \ c ++ written in Visual StudioProgramIt needs to be compiled on the Linux host, and the Chinese comments of the program are garbled. What is more serious is that the compiler on Linux reports an error due to encoding.
This is because the default file format in Windows is GBK (gb2312), while Linux is generally a UTF-8. In Linux, how does one view the
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.