Basics: Linux Common commands, Java programming basics
Big Data: Scientific data, financial data, Internet of things data, traffic data, social network data, retail data, and more.
Hadoop: An open source distributed storage, distributed computing platform. (Apache-based)
The composition of Hadoop:
HDFS: A distributed File system that stores massive amounts of data.
MapReduce: Parallel processing framework for task decomposition and scheduling.
The use of Hadoop:
Build large-scale data warehouse, petabytes of data storage, processing, analysis, statistics and other services.
such as search engines, Web pages of data processing, various business intelligence, risk assessment, early warning, there are some log analysis, data mining tasks.
Hadoop Benefits: High-scale, low-cost, mature biosphere (Hadoop Ecosystem Map)
Hadoop Open Source Tools:
Hive: Translates SQL statements into a Hadoop task to execute, reducing the threshold for using Hadoop.
HBase: A distributed database that stores structured data, Habase provides random read-write and real-time access to data, and reads and writes the table data.
Zookeeper: Like an animal administrator, monitor the state of each node within a Hadoop cluster, manage the configuration of the entire cluster, maintain data between the nodes and so on.
The version of Hadoop is as stable as possible, the older version.
===============================================
Installation and configuration of Hadoop:
1) Install the JDK in Linux and set the environment variables
Installing JDK: >> sudo apt-get install OPENJDK-7-JDK
Set Environment variables:
>> Vim/etc/profile
>>: Wq
2) Download Hadoop and set the Hadoop environment variable
Download Hadoop decompression:
>> CD/OPT/HADOOP-1.2.1/
>> ls
>> Vim/etc/profile
>>:wq
3) Modification of 4 configuration files
(a) Modify hadoop-env.sh, set Java_home
(b) Modification of core-site.xml, setting Hadoop.tmp.dir, Dfs.name.dir, Fs.default.name
(c) Modify Mapred-site.xml, set Mapred.job.tracker
(d) Modify the Hdfs-site.xml and set the Dfs.data.dir
>> CD conf
>> ls
>> Vim Mapred-site.xml
>>: Wq
>> Vim Core-site.xml
The first part
Part II
>>: Wq
>> Vim Hdfs-site.xml
>>: Wq
>> Vim hadoop-env.sh
>>: Wq
# Hadoop format
>> Hadoop Namenode-format
# Hadoop Startup
>> start-all.sh
# View the current running process with the JPS command
>> JPS
See the following process to indicate that the Hadoop installation was successful
Hadoop Big Data Platform Build