Hadoop local database Introduction
Purpose
In view of performance issues and the lack of some Java class libraries, hadoop provides its own local implementation for some components. These components are stored in an independent dynamic link library of hadoop. This library is called libhadoop. So on * nix platform. This article mainly introduces how to use the local library and how to build the local library.
Component hadoop now has the following local compression codecs components:
1. zlib
2. Gzip
3. lzo
In the above components, lzo and gzip compression codecs must use the hadoop local library to run.
The usage of the hadoop local database is simple:
Let's take a look at the supported platforms.
Download the pre-built local Linux hadoop library of the 32-bit i386 architecture (which can be found in the LIB/native directory of the hadoop release version) or build these libraries by yourself.
Make sure your platform has software packages installed with zlib-1.2 or later or lzo2.0 or both (depending on your needs ).
Bin/hadoop scripts use the system property-djava. Library. Path = to check whether the hadoop local library is included in the library path.
Check the hadoop log file to check whether the hadoop database is normal. Normally, you can see:
Debug util. nativecodeloader-trying to load the custom-built native-hadoop library... info util. nativecodeloader-loaded the native-hadoop Library
If an error occurs, you will see:
Info util. nativecodeloader-unable to load native-hadoop library for your platform... using builtin-Java classes where applicable
Supported platforms
Hadoop local database only supports * nix platform, which is widely used on GNU/Linux platform, but does not support cygwin and Mac OS X.
Tested GNU/Linux versions: RHEL4/Fedora Ubuntu Gentoo can run properly with the 32/64-bit local hadoop Library and the 32/64-bit JVM on the platform above.
Build a hadoop local database
The hadoop local library is written in ansi c and constructed using the GNU autotools tool chain (Autoconf, autoheader, automake, autoscan, libtool.
That is to say, the platform for building a hadoop database requires a standard C compiler and the GNU autotools tool chain. See supported platforms.
Software packages that may be required on your target platform:
1. C compiler (e.g. gnu c compiler)
2. GNU autools tool chain: Autoconf, automake, libtool
3. zlib Development Kit (stable version> = 1.2.0)
4. lzo Development Kit (stable version> = 2.0)
If the preceding prerequisites are met, you can use the build. xml file and set compile. Native to true to generate a hadoop local database:
$ Ant-dcompile. Native = true
Because not all users require a hadoop local database, hadoop does not generate this database by default. You can view the new hadoop local database in the following path:
$ Build/native // lib
Here is a combination of the following system properties $ {OS. name}-$ {OS. Arch}-$ {sun. Arch. Data. Model}; such as a Linux-i386-32.
Note that zlib and lzo Development kits must be installed on the target platform that generates the hadoop local database;
However, if you only want to use one of them, it is sufficient to install either of them during deployment.
When generating and deploying a hadoop local database on the target platform, You must select the corresponding 32/64-bit zlib/lzo software package based on the 32/64-bit JVM.
Use distributedcache to load the local database
You can use distributedcache to load Local Shared libraries and distribute and establish symbolic links to library files.
This example describes how to distribute library files and load library files from MAP/reduce tasks.
First, copy the library file to HDFS. Bin/hadoop FS-copyfromlocal mylib. so.1/libraries/mylib. so.1
When starting a job, it includes the following:Code:
Distributedcache. createsymlink (CONF); distributedcache. addcachefile ("HDFS: // host: Port/libraries/mylib. so.1 # mylib. So", conf );
The MAP/reduce task contains the following code: system. loadlibrary ("mylib. So ");