Hadoop is a platform for storing massive amounts of data on distributed server clusters and running distributed analytics applications, with the core components of HDFS and MapReduce. HDFS is a distributed file system that can read distributed storage of data systems;MapReduce is a computational framework that distributes computing tasks based on Task Scheduler by splitting computing tasks.
Hadoop is an essential framework for big data development, so if you want to learn big data, you have to master the knowledge of Hadoop , so what doesHadoop learn?
First,Hadoop Environment Construction
1. Introduction to Hadoop Eco-environment
2. locations and relationships in Hadoop cloud computing
3. introduction of Hadoop application cases at home and abroad
4. Hadoop concept, version, history
5. Hadoop Core Composition introduction and HDFs,mapreduce architecture
6. Hadoop Standalone mode installation and testing
7. the cluster structure of Hadoop
8. detailed installation steps for Hadoop pseudo-distribution
9. View Hadoop from the command line and browser
Hadoop Startup scripting analysis
One . Hadoop fully distributed environment building
Hadoop security mode, Recycle Bin Introduction
Second,HDFS Architecture and Shell and Java Operation
1. How the HDFS layer works
2. Hdfsdatanode,namenode detailed
3. single point of failure (sp0f) and high availability (HA)
4. accessing HDFS via API
5. Common compression algorithm introduction and installation use
6. Maven Introduction and installation, using maveninEclipse to build maven local repository
third,Mapreduce Learning
1. Four stages of the Mapreduce introduction
2. JobandTask description
3. Default working mechanism
4. Create MR application development to get the highest temperature of the year
5. Run the MR job on Windows
6. Mapper,Reducer
7. Inputsplit and outputsplit
8. Shuffle:Sort,partitioner,Group,combiner
9. Debug the program with counter
Installing Hadoop in Windows
Install the hadoop plugin in Eclipse and Access Hadoop Resources
Write an ant script in Eclipse
YARN Scheduling framework event distribution mechanism
Remote Debugging Explorer
The protocol analysis of the underlying Google protobuf for Hadoop
The Hadoop underlying IPC principle and RPC
iv.Hadoop Highly Available -ha
1. Introduction of hadoop2.x cluster structure system
2. hadoop2.x Cluster Construction
3. High Availability (HA) for NameNode
4. HDFS Federation
5. High Availability (HA) for ResourceManager
6. Hadoop Cluster FAQs and workarounds
How to learn Hadoop? Hadoop Development