After the Hadoop cluster is set up, the Hadoop cluster is accessed locally via the Java API, as follows (see all node name information on the Hadoop cluster)
Import org.apache.hadoop.conf.Configuration;
Import Org.apache.hadoop.f
Python Access secured Hadoop Cluster through Thrift APIApache Thrift Python Kerberos Support typical way to connect Kerberos secured Thrift server example-hive example-hbase
Apache Thrift Python Kerberos Support
Both supports are only avaliable in Linux platform Native Support
Dependency: Kerberos (Python package) >> PURE-SASL (python package) >> Thrift (Python
specify a grouping here, otherwise you may not be able to build trust
[ROOT@GC ~]# Idgrid
uid=501 (GRID) gid=54326 (Hadoop) groups=54326 (Hadoop)
[ROOT@GC ~]# passwd grid
Changingpassword for user grid.
New Unixpassword:
Bad Password:itis Too Short
Retype new Unixpassword:
Passwd:allauthentication Tokens updated successfully.
Description: In a large Hadoop
namespace and transaction logs. When this value is a comma-separated directory list, the nametable data will be copied to all directories for redundant backup.2) DFS. Data. DIR is the path of the local file system where datanode stores block data. It is a comma-separated list. When this value is a comma-separated directory list, data is stored in all directories and usually distributed across different devices.3) DFS. replication is the number of data to be backed up. The default value is 3. An
sync_h_script # In fact, these two commands are the alias of my own salt command, check/opt/hadoop_scripts/profile. d/hadoop. sh
Iii. Monitoring
A common solution is ganglia and nagios monitoring. ganglia collects a large number of metrics and uses graphical programs. nagios will trigger an alarm when a metric exceeds the threshold.
In fact, hadoop has an interface to provide our own Monitoring Program, an
Wang Jialin's in-depth case-driven practice of cloud computing distributed Big Data hadoop in July 6-7 in Shanghai
Wang Jialin Lecture 4HadoopGraphic and text training course: Build a true practiceHadoopDistributed Cluster EnvironmentHadoopThe specific solution steps are as follows:
Step 1: QueryHadoopTo see the cause of the error;
Step 2: Stop the cluster;
:50070
It is visible that all Avatardatanode are registered to two namenode,primary in a normal state, while standby is in the SafeMode state and can only be read and not writable. The status of the current Avatarnode (primary or standby) can be viewed through the Avatarshell command.
2. Store the relevant data to the cluster, the cluster is working properly.
3. kill off the primary node Avatartnode thread
01_note_hadoop introduction of source and system; Hadoop cluster; CDH FamilyUnzip Tar Package Installation JDK and environment variable configurationTAR-XZVF jdkxxx.tar.gz to/usr/app/(custom app to store the app after installation)Java-version View current system Java version and environmentRpm-qa | grep Java View installation packages and dependenciesYum-y remove xxxx (remove grep out of each package)Confi
change the configuration, it is safe and easy to use when you start the cluster daemon to synchronize to all nodes.Export hadoop_slave_sleep = 0.1# Seconds to sleep between slave commands. unset by default. This# Can be useful in large clusters, where, e.g., slave rsyncs can# Otherwise arrive faster than the master can service them.# I don't know how it works, but it seems that at startup, the master node will take a
Hadoop consists of two parts:
Distributed File System (HDFS)
Distributed Computing framework mapreduce
The Distributed File System (HDFS) is mainly used for the Distributed Storage of large-scale data, while mapreduce is built on the Distributed File System to perform distributed computing on the data stored in the distributed file system.
Describes the functions of nodes in detail.
Namenode:
1. There is only one namenode in the
, so did not start up, the other normal.Keep in mind that most of the above actions use Hadoop users, or there will be a lot of permissions issues in between.The construction of the whole environment was completed.
steps of setting up Hadoop cluster environment under REDHAT5Pre-PreparationTwo Linux virtual machines (use Redhat5,ip for 192.168.1.210, 192.168.1.2
can be accessed, the private key is mainly used to encrypt the data, in case others steal data. In a word, this is an asymmetric algorithm, and it is very difficult to crack. Data access is required between the nodes of the Hadoop cluster, the nodes being accessed must be verified for the reliability of the Access nod
Deploy Hadoop cluster service in CentOSGuideHadoop is a Distributed System infrastructure developed by the Apache Foundation. Hadoop implements a Distributed File System (HDFS. HDFS features high fault tolerance and is designed to be deployed on low-cost hardware. It also provides high throughput to access application
maintain consistency between servers.2.6 configure SSH password-less login between clusters
The cluster environment must be accessed through ssh without a password. The local machine must be logged on without a password, and the host and the slave machine must be logged on without a password, there is no limit between the slave and the slave. Take this example. For example, the steps for setting a password-free
Although I have installed a Cloudera CDH cluster (see http://www.cnblogs.com/pojishou/p/6267616.html for a tutorial), I ate too much memory and the given component version is not optional. If only to study the technology, and is a single machine, the memory is small, or it is recommended to install Apache native cluster to play, production is naturally cloudera cluster
Perform scala-version and the normal output indicates success.
3. Installing the Hadoop server
Host Name
IP Address
Jdk
User
Master
10.116.33.109
1.8.0_65
Root
Slave1
10.27.185.72
1.8.0_65
Root
Slave2
10.25.203.67
1.8.0_65
Root
Download address for Hadoop: http://hadoop.apache.org/
Configure the Hos
placed in node (randomly selected) in a different rack than the first node.
The third copy is placed on another node in the same rack as the node where the second replica resides
If there are more copies, they will be randomly placed in the cluster node.
Such a strategy ensures that access to the block's file is given precedence over this rack, and if an exception occurs throughout the rack, a co
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.