Detailed description of hadoop operating principles and hadoop principles
Introduction
HDFS (Hadoop Distributed File System) Hadoop Distributed File System. It is based on a paper published by google. The paper is a GFS (Google File System) Google File System (Chinese and English ).
HDFS has many features:
① Multiple c
Hadoop pseudo-distribution configuration and Eclipse-Based Development Environment
Directory
1. Development and configuration environment:2. Hadoop server configuration (Master node)3. Eclipse-based Hadoop2.x Development Environment Configuration4. Run the Hadoop program and view the running log
1. Development and configuration environment:
Development Environmen
, count how many maps are run in a certain day, how long the longest job is used, the number of Mapreduce tasks run by each user, and the total number of Mapreduce tasks run by each user, this is a good way to monitor Hadoop clusters. We can determine how to allocate resources to a user based on the information.
Careful personnel may find that up to 20000 historical job records can be displayed on the web ui of the
Basic software and hardware configuration:
X86 desktop, window7 64-bit system vb Virtual Machine (x86 desktop at least 4G memory, in order to open 3 virtual machines) centos6.4 operating system hadoop-1.1.2.tar.gz
Jdk-6u24-linux-i586.bin
1. configuration under root
A) modify the Host Name: vi/etc/sysconfig/network
Master, slave1, slave2
B) Resolution Ip Address: vi/etc/hosts
192.168.8.100 master
192.168.8.101 slave1
Configuration requirements
Host Memory 4GB.
Disk more than GB.
HOST Machine installs common Linux distributions.
Linux Container (LXD)Take the host Ubuntu 16.04 as an example.
Install LXD.sudo Install sudo lxd init
To view the available image sources, if you use the default image, you can skip the next two steps and go directly to the back of the launch.$ LXC Remote List
Select the image you like in the previous step, co
Two cyanEmail: [Email protected] Weibo: HTTP://WEIBO.COM/XTFGGEFWould like to install a single-node environment is good, and then after the installation of the total feel not enough fun, so today continue to study, to a fully distributed cluster installation. The software used is the same as the previous one-node installation of Hadoop, as follows:
Ubuntu 14.10-Bit Server Edition
Hadoop2.6.0
Compile the hadoop 2.x Hadoop-eclipse-plugin plug-in windows and use eclipsehadoopI. Introduction
Without the Eclipse plug-in tool after Hadoop2.x, we cannot debug the code on Eclipse. We need to package MapReduce of the written java code into a jar and then run it on Linux, therefore, it is inconvenient for us to debug the code. Therefore, we compile an Eclipse plug-in so that we can debug it locally. Afte
Hadoop-2.4.1 Fully distributed environment constructionFirst, the configuration steps are as follows:
Host environment, here is the use of 5 virtual machines, on the Ubuntu 13 system to build the Hadoop environment.
Create Hadoop user groups and Hadoop users, an
define Datanode, that is, Dfs.datanode.data.dir, which defines the path to the directory where Datanode information is stored.
Note: Make sure that the Namenode and Datanode directories are created and that the directory where the data is stored is owned by the user who will run Hadoop. Enable users to have read and write permissions in the directory.5.2 Format Namenode
Now the next step is to format the Namenode we just configured. The following
preparation: refer to the first step to the sixth step in the article "Building a Hadoop-0.20.2 Environment"
System: Ubuntu-12.04 (available for other versions)
Mode: pseudo-distributed
Build user: hadoop
Hadoop-2.2.0: http://mirrors.hust.edu.cn/apache/hadoop/common/
Full-text index-lucene,solr,nutch,hadoop LuceneFull-text index-lucene,solr,nutch,hadoop SOLRI was in last year, I want to lucene,solr,nutch and Hadoop a few things to give a detailed introduction, but because of the time of the relationship, I still only wrote two articles, respectively introduced the Lucene and SOLR, then did not write, but my heart is still loo
in ~/.ssh/: Id_rsa and id_rsa.pub; These two pairs appear, similar to keys and locks.Append the id_rsa.pub to the authorization key (there is no Authorized_keys file at this moment)$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys(3) Verify that SSH is installed successfullyEnter SSH localhost. If the display of a native login succeeds, the installation is successful.3. Close the firewall $sudo UFW disableNote: This step is very important, if you do not close, there will be no problem finding D
As you know, Namenode has a single point of failure in the Hadoop system, which has been a weakness for high-availability Hadoop. This article discusses several solution that exist to solve this problem. 1. Secondary NameNode principle: secondary NN periodically reads the editlog from the NN, merging with the image that it stores to form a new metadata image advantage: The earlier version of
Preface:
Although it seems that there are not many implementation problems in the process of building a large-scale learning hadoop platform since middle July, for a person who has never been familiar with Linux, Java, and cloud computing platforms before, it took a while. The biggest emotion is that the version of various tools is very important. VMWare, Ubuntu, JDK,
libsnappy.a-rwxr-xr-x 1 root root 953 7 11:56 libsnappy.lalrwxrwxrwx 1 root root 7 11:56 libsnappy.so libsnappy.so.1.2.1lrwxrwxrwx 1 root root 7 11:56 libsnappy.so.1-libsnappy.so.1.2.1-rwxr-xr-x 1 root root 147758 7 11:56 libsnappy.so.1.2.1It is assumed that no errors were encountered during the installation and that the/usr/local/lib folder has the above file indicating a successful installation.4, Hadoop-snappy source code compilation1)
Note: The following installation steps are performed in the Centos6.5 operating system, and the installation steps are also suitable for other operating systems, such as students using Ubuntu and other Linux Operating system, just note that individual commands are slightly different. Note the actions of different user permissions, such as shutting down the firewall and requiring root privileges. A single-node h
Hadoop Streaming provides a toolkit for MapReduce programming that enables Mapper and Reducer based on executable commands, scripting languages, or other programming languages to take advantage of the benefits and capabilities of the Hadoop parallel computing framework, To handle big data.All right, I admit the above is a copy. The following is the original dry goodsThe first deployment of the
change the mode,Owner or group of files or directories.
After modification, it seems that the hadoop process needs to be restarted to take effect.
Development Environment: Win XP SP3, eclipse 3.3, hadoop-0.20.2
Hadoop server deployment environment: Ubuntu 10.10, hadoop
of through the network.
"FileSystemCounters: HDFS_BYTES_WRITTEN", it is only the hdfs write size of a copy, and the hdfs block copy can be adjusted, so the network traffic also needs "FileSystemCounters: HDFS_BYTES_WRITTEN "* Number of copies.
Both map and reduce are user-defined, and user code may bypass the hadoop framework to generate network communication on its own. This part of traffic cannot be counted.
-------------------------------------- S
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.