Deploy Hadoop for production under centos6.5 and connect to Hadoop using the C language API

Source: Internet
Author: User


#####

# # # #安装hadoop2.6.0 fully distributed cluster

#####


# # # #文件及系统版本:

####

hadoop-2.6.0

Java version 1.8.0_77

CentOS 64-bit


# # #预备

####

Under/home/hadoop/: mkdir Cloud

Put the Java and Hadoop packages under/home/hadoop/cloud



# # #配置静态ip

####

Master192.168.116.100

Slave1192.168.116.110

Slave2192.168.116.120



# # # #修改机器相关名称 (all under the root authority)

####

Su Root

Vim/etc/hosts

Under the original information input: (Space +tab key)

192.168.116.100 Master

192.168.116.110 slave1

192.168.116.120 Slave2


Vim/etc/hostname

Master

Shutdown-r Now (restart machine)


Vim/etc/hostname

Slave1

Shutdown-r now


Vim/etc/hostname

Slave2

Shutdown-r now



# # #安装openssh

####

Su Root

Yum Install OpenSSH

SSH-KEYGEN-T RSA

and always confirm.


Send the public key of Slave1 and Slave2 to master:

scp/home/hadoop/.ssh/id_rsa.pub [Email protected]:~/.ssh/slave1.pub

scp/home/hadoop/.ssh/id_rsa.pub [Email protected]:~/.ssh/slave2.pub


Under Master: CD. ssh/

Cat Id_rsa.pub >> Authorized_keys

Cat Slave1.pub >> Authorized_keys

Cat Slave2.pub >> Authorized_keys


Send the public key package to slave1 and Slave2:

SCP Authorized_keys [Email protected]:~/.ssh/

SCP Authorized_keys [Email protected]:~/.ssh/


SSH slave1

SSH slave2

SSH Master

The corresponding input Yes


SSH No password login configuration completed here


####

# # # #设计JAVA_HOME Hadoop_home

####

Su Root

Vim/etc/profile

Input:

Export java_home=/home/hadoop/cloud/jdk1.8.0_77

Export Jre_home= $JAVA _home/jre

Export classpath=.: $JAVA _home/lib/dt.jar: $JAVA _home/lib/tools.jar

Export hadoop_home=/home/hadoop/cloud/hadoop-2.6.0

Export path= $PATH: $JAVA _home/bin: $HADOOP _home/bin: $HADOOP _home/sbin

Then Source/etc/profile

(all three must be configured)


####

# # #配置hadoop文件

####

Under the/home/hadoop/cloud/hadoop-2.6.0/sbin:

Vim hadoop-daemon.sh

Modifying the path of the PID


Vim yarn-daemon.sh

Modifying the path of the PID



Under the/HOME/HADOOP/CLOUD/HADOOP-2.6.0/ETC:


Vim Slaves Input:

Master

Slave1

Slave2


Vim hadoop-env.sh Input:

Export java_home=/home/hadoop/cloud/jdk1.8.0_77

Export hadoop_home_warn_suppress= "TRUE"


Vim Core-site.xml Input:

############################################## #core

<configuration>


<property>


<name>io.native.lib.avaliable</name>


<value>true</value>


</property>


<property>


<name>fs.default.name</name>


<value>hdfs://master:9000</value>


<final>true</final>


</property>


<property>


<name>hadoop.tmp.dir</name>


<value>/home/hadoop/Cloud/workspace/temp</value>


</property>


</configuration>

################################################ #core


Vim Hdfs-site.xml

##################################################### #hdfs

<configuration>


<property>


<name>dfs.replication</name>


<value>3</value>


</property>


<property>


<name>dfs.permissions</name>


<value>false</value>


</property>


<property>


<name>dfs.namenode.name.dir</name>


<value>/home/hadoop/Cloud/workspace/hdfs/data</value>


<final>true</final>


</property>


<property>


<name>dfs.namenode.dir</name>


<value>/home/hadoop/Cloud/workspace/hdfs/name</value>


</property>


<property>


<name>dfs.datanode.dir</name>


<value>/home/hadoop/Cloud/workspace/hdfs/data</value>


</property>


<property>


<name>dfs.webhdfs.enabled</name>


<value>true</value>


</property>


</configuration>

###################################################### #hdfs


Vim Mapred-site.xml


##################################### #mapred

<configuration>


<property>

<name>mapred.job.tracker</name>

<value>master:9001</value>

</property>


</configuration>

##################################### #mapred


Send configured Hadoop to slave1 and slave2

Scp-r hadoop-2.6.0 [Email protected]:~/cloud/

Scp-r hadoop-2.6.0 [Email protected]:~/cloud/

Send Java packages to slave1 and slave2:

Scp-r jdk1.8.0_77 [Email protected]:~/cloud/

Scp-r jdk1.8.0_77 [Email protected]:~/cloud/


Here, the Hadoop cluster configuration is complete


########

####### #现在可以启动hadoop

########


First of all, format Namenode

Hadoop Namenode-format (can be executed in any directory due to the hadoop-env.sh and system environment previously designed)

View the logs yes, go down.

start-all.sh

And then

Full words through JPS view:

[Email protected] ~]$ JPS

42306 ResourceManager

42407 NodeManager

42151 Secondarynamenode

41880 NameNode

41979 DataNode


[Email protected] ~]$ JPS

21033 NodeManager

20926 DataNode


[Email protected] ~]$ JPS

20568 NodeManager

20462 DataNode


At this point, the hadoop-2.6.0 fully distributed configuration is complete.


Here's the browser port number for Hadoop:

localhost:50070

localhost:8088




########

####### #配置C的API连接HDFS

########

Find/-name libhdfs.so.0.0.0

Vi/etc/ld.so.conf

Write:

/home/hadoop/cloud/hadoop-2.6.0/lib/native/

/home/hadoop/cloud/jdk1.8.0_77/jre/lib/amd64/server/

Then design the startup load:

/sbin/ldconfig–v


Then configure the environment variables:

Find and Print:

Find/home/hadoop/cloud/hadoop-2.6.0/share/-name *.jar|awk ' {printf ("Export classpath=%s: $CLASSPATH \ n", $);} '

You'll see what's printed like this:

Export Classpath=/home/hadoop/cloud/hadoop-2.6.0/share/hadoop/common/lib/activation-1.1.jar: $CLASSPATH

Export Classpath=/home/hadoop/cloud/hadoop-2.6.0/share/hadoop/common/lib/jsch-0.1.42.jar: $CLASSPATH

。。。。。。

Add all the printed content to the environment variable Vim/etc/profile


Then write the C language code to verify that the configuration was successful:

Vim above_sample.c

The code reads as follows:

#################################################################################

#include "Hdfs.h"


#include <stdio.h>


#include <stdlib.h>


#include <string.h>


int main (int argc, char **argv) {




HDFSFS FS =hdfsconnect ("192.168.116.100", 9000); Made a little change here


Const char* Writepath = "/tmp/testfile.txt";


Hdfsfile WriteFile = Hdfsopenfile (Fs,writepath, o_wronly| O_creat, 0, 0, 0);


if (!writefile) {


fprintf (stderr, "Failed toopen%s for writing!\n", Writepath);


Exit (-1);


}

char* buffer = "hello,world!";


Tsize num_written_bytes = Hdfswrite (Fs,writefile, (void*) buffer, strlen (buffer) +1);


if (Hdfsflush (FS, WriteFile)) {


fprintf (stderr, "Failed to ' flush '%s\n", Writepath);


Exit (-1);


}


Hdfsclosefile (FS, WriteFile);


}

###############################################################################

Compiling C language code:

GCC above_sample.c-i/home/hadoop/cloud/hadoop-2.6.0/include/-l/home/hadoop/cloud/hadoop-2.6.0/lib/native/-lhdfs /home/hadoop/cloud/jdk1.8.0_77/jre/lib/amd64/server/libjvm.so-o Above_sample

To perform a compile-completed build of the Above_sample file:

./above_sample

See if the log and Hadoop file directories generate a testfile file


At this point, the C language API connection HDFs configuration is complete




#########

###### #集群的文件操作

########


# # # (Automatic distribution script) auto.sh


Vim auto.sh


chmod +x auto.sh


./auto.sh jdk1.8.0_77 ~/cloud/

Automatic Distribution Scripts

############################

#!/bin/bash


nodes= (slave1 slave2)


num=${#nodes [@]}


File=$1


Dst_path=$2


For ((i=0;i<${num};i++));d o


Scp-r ${file} ${nodes[i]}:${dst_path};

Done

####################

######### #hadoop -2.6.0 Basic operations for a fully distributed cluster #hdfs  dfs -mkdir /inputecho  "Hello hadoop"   > test1.txt Import all files from the current directory into the in directory of HDFs: Hadoop dfs -put / inhadoop dfs -ls  /in/*hadoop dfs -cp /in/test1.txt /in/test1.txt.bakhadoop dfs -ls /in /*hadoop dfs -rm /in/test1.txt.bakmkdir dir_from_hdfs all files from the HDFs download directory in to Dir_from_ In HDFs: hadoop dfs -get /in/* /dir_from_hdfscd /home/hadoop/cloud/ hadoop-1.2.1 counts the number of words in all text files in the in directory, separated by spaces (note that Output/wordcount directories are not available):hadoop jar  hadoop-examples-2.6.0.jar wordcount in /output/wordcount  View Statistical results: Hadoop fs -cat  output/wordcount/part-r-00000####### #管理 # # # #1. Cluster-related management: Edit log: Modify the log, when the file system client is writing, We are going to put this record in the change log. After logging the modification log, Namenode modifies the data structure in memory. Before each write operation succeeds, Edit log synchronizes to the file system Fsimage: namespace mirroring, which is the checkpoint of the in-memory metadata on the hard disk. When Namenode fails, the latest checkpoint metadata information is loaded from fsimage into memory, and attention is paid to re-executing the operation in the modification log.For The Secondary namenode is used to help the metadata node checkpoint the in-memory metadata information to the hard disk. 2. Cluster properties: Advantages: 1) Ability to handle oversized files, 2) streaming access to data. HDFs can handle "write-once, read-write" tasks very well. That is, once a dataset is generated, it is copied to a different storage node and responds to a variety of data Analysis task requests. In most cases, the analysis task will involve most of the data in the data set. Therefore, HDFS requests to read the entire data set is more efficient than reading a record. Cons: 1) Not suitable for low latency data access: HDFS is designed to handle large data set analysis tasks, primarily to achieve big data analysis, so latency may be high. 2) cannot efficiently store large numbers of small files: Because Namenode puts the filesystem's metadata in memory, the number of files the file system can hold is determined by the size of the Namenode memory. 3) does not support multi-user write and arbitrary modification of files: There is only one writer in a file in HDFs, and the write operation can only be done at the end of the file, that is, only the append operation can be performed. Currently HDFS does not support multiple users writing to the same file, as well as modifying it anywhere in the file.



This article is from the "10700016" blog, please be sure to keep this source http://10710016.blog.51cto.com/10700016/1896278

Deploy Hadoop for production under centos6.5 and connect to Hadoop using the C language API

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.