CentOS-64bit to compile the Hadoop-2.5. source code, and distributed installation, centoshadoop
SummaryCentOS7-64bit compilation Hadoop-2.5.0 and distributed Installation
Directory
1. System Environment Description
2. Preparations before installation
2.1 disable Firewall
2.2 check ssh installation. If not, install ssh
2.3 install vim
2.4 set static IP addresses
2.5 Modify host name
2.6 create a
ENTER for actual configuration)
The codecs used by hadoop. gzip and Bzip2 are built-in. The lzo must be installed with hadoopgpl or kevinweil, separated by commas (,), and snappy must also be installed separately.
Io. Compression. codec. lzo. Class
Com. hadoop. Compression. lzo. lzocodec
Compression encoder used by lzo
Topology. Script. file. Name
/
Purpose
This article describes how to install, configure, and manage a meaningful hadoop cluster that can scale from a small cluster of several nodes to a large cluster of thousands of nodes.
If you want to install Hadoop on a single machine, you can find the details here. Prerequisites Ensure that all required softw
Architecture
Mapreduce isParallel Programming ModeIn this mode, software developers can easily compile distributed parallel programs. In the hadoop architecture, mapreduce is a simple and easy-to-use software framework.Task DistributionTo a cluster composed of thousands of Taiwanese machinesReliable Fault ToleranceAllows you to process a large number of datasets in parallel to implement hadoop parallel
Several Hadoop daemon and Hadoop daemon
After Hadoop is installed, several processes will appear when jps is used.
Master has:
Namenode
SecondaryNameNode
JobTracker
Slaves has
Tasktracker
Datanode
1.NameNode
It is the master server in Hadoop, managing the file system namespace and accessing the files stored in the
(key, values): emit (key, sum (values ))Hardware
These MapReduce components are executed on a random subset of approximately 20 GB of data. The complete dataset contains 1500 files. We use this script to select a random subset. It is important to keep the file name complete because the file name determines the value of n in the n-element of the data block. The Hadoop cluster contains five virtual nodes tha
Hadoop uses Eclipse in Windows 7 to build a Hadoop Development Environment
Some of the websites use Eclipse in Linux to develop Hadoop applications. However, most Java programmers are not so familiar with Linux systems. Therefore, they need to develop Hadoop programs in Windows, it summarizes how to use Eclipse in Wind
Compile the hadoop 2.x Hadoop-eclipse-plugin plug-in windows and use eclipsehadoopI. Introduction
Without the Eclipse plug-in tool after Hadoop2.x, we cannot debug the code on Eclipse. We need to package MapReduce of the written java code into a jar and then run it on Linux, therefore, it is inconvenient for us to debug the code. Therefore, we compile an Eclipse plug-in so that we can debug it locally. Afte
We have introduced the installation and simple configuration of hadoop in Linux, mainly in standalone mode. The so-called standalone Mode means that no daemon process is required ), all programs are executed on a single JVM. Because it is easier to test and debug mapreduce programs in standalone mode, this mode is suitable for use in the development phase.
Here we mainly record the process of configuring the hadoo
Full-text index-lucene,solr,nutch,hadoop LuceneFull-text index-lucene,solr,nutch,hadoop SOLRI was in last year, I want to lucene,solr,nutch and Hadoop a few things to give a detailed introduction, but because of the time of the relationship, I still only wrote two articles, respectively introduced the Lucene and SOLR, then did not write, but my heart is still loo
Original from: https://examples.javacodegeeks.com/enterprise-java/apache-hadoop/apache-hadoop-distributed-file-system-explained/
========== This article uses Google translation, please refer to Chinese and English learning ===========
In this case, we will discuss in detail the Apache Hadoop Distributed File System (HDFS), its components and architecture. HDFs is
Previously introduced me in Ubuntu under the combination of virtual machine Centos6.4 build hadoop2.7.2 cluster, in order to do mapreduce development, to use eclipse, and need the corresponding Hadoop plug-in Hadoop-eclipse-plugin-2.7.2.jar, first of all, before the hadoop1.x in the official Hadoop installation package is self-contained Eclipse plug-in, Now with
computing model, and programmers can use Hadoop to write programs that run on computer clusters to handle massive amounts of data.
In addition, Hadoop provides a distributed file System (HDFS) and distributed Database (Hbase) for storing or deploying data to individual compute nodes. So, you can think of it roughly:Hadoop=HDFS(file system, data storage technolog
Hadoop Streaming provides a toolkit for MapReduce programming that enables Mapper and Reducer based on executable commands, scripting languages, or other programming languages to take advantage of the benefits and capabilities of the Hadoop parallel computing framework, To handle big data.All right, I admit the above is a copy. The following is the original dry goodsThe first deployment of the
modify:/etc/hostname
In addition, we need to make sure that the host name and IP address of each machine can be correctly resolved.
A simple test method is to ping the host name, for example, Ping Frank-2 on Frank-1. If you can ping it, OK! If the hosts cannot be correctly parsed, you can modify the/etc/hosts file. If the hosts are used as namenode, you need to add the IP addresses of all hosts in the cluster and their corresponding host names in the hosts file; if this machine is used as a
–formatHadoop installation2.6 Starting the test HadoopAfter formatting the Namenode, we can execute the start-all.sh command to start Hadoop. As follows:Execution:./start-all.shHadoop installationOpen a browser to access the HDFS monitoring interfacehttp://localhost:50070Hadoop installation2.8Hadoop access to remote Namenode nodes using accumulate 2.8.1:./hadoop
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.