Detailed description of hadoop Application Development Technology

Source: Internet
Author: User
Tags shuffle hadoop ecosystem
The "big data technology series: hadoop Application Development Technology details" consists of 12 chapters. 1st ~ Chapter 2 describes the hadoop ecosystem, key technologies, and installation and configuration in detail. Chapter 2 is an introduction to mapreduce, allowing readers to understand the entire development process ~ Chapter 5 describes in detail the HDFS and hadoop file I/O of the Distributed File System, Chapter 6th analyzes the working principle of mapreduce, and Chapter 7th explains how to use eclipse to compile hadoop source code, and how to test and debug hadoop applications; 8th ~ Chapter 9 describes the development methods and advanced applications of mapreduce in detail ~ Chapter 12 systematically explains hive, hbase, and mahout. Name of the book hadoop application development technical explanation and hadoop application development detailed author Liu Gang isbn9787111452447 category big data page number 405 fixed price 79 press publishing time loaded frame flat directory

1. Book Information

2. Book catalog

1 Book information editing book name: hadoop Application Development Technology detailed description [1] Author: Liu Gang Publishing House: Mechanical Industry Publishing House published at: 2014-01-01i s B N: 9787111452447 price: $79.002 preface to book directory editing
Chapter 1 hadoop Overview
1.1 hadoop Origin
1.1.1 Google and hadoop modules
1.1.2 why hadoop
1.1.3 hadoop version
1.2 hadoop Ecosystem
1.3 introduction to common hadoop Projects
1.4 hadoop applications in China
1.5 summary of this Chapter
Chapter 4 hadoop Installation
2.1 hadoop environment installation and configuration
2.1.1 install Vmware
2.1.2 install Ubuntu
2.1.3 install vmwaretools
2.1.4 install JDK
2.2 hadoop installation mode
2.2.1 single-host Installation
2.2.2 pseudo-distributed Installation
2.2.3 distributed Installation
2.3 How to Use hadoop
2.3.1 hadoop startup and stop
2.3.2 hadoop configuration file
2.4 Summary of this Chapter
Chapter 1 mapreduce Quick Start
3.1 prepare the development environment for the wordcount instance
3.1.1 use eclipse to create a Java Project
3.1.2 import the hadoop JAR File
3.2 Implementation of mapreduce code
3.2.1 write the wordmapper class
3.2.2 write the wordreducer class
3.2.3 write the wordmain Driver Class
3.3 package, deploy, and run
3.3.1 package it into a jar file
3.3.2 deployment and operation
3.3.3 test results
3.4 summary of this Chapter
Chapter 2 hadoop Distributed File System
4.1 get to know HDFS
4.1.1 features of HDFS
4.1.2 hadoop File System Interface
4.1.3 HDFS Web Services
4.2 HDFS Architecture
4.2.1 rack
4.2.2 Data Block
4.2.3 metadata Node
4.2.4 Data Node
4.2.5 secondary metadata Node
4.2.6 namespace
4.2.7 Data Replication
4.2.8 backup principles
4.2.9 rack awareness
4.3 hadoop RPC mechanism
4.3.1 RPC implementation process
4.3.2 RPC entity model
4.3.3 File Reading
4.3.4 file writing
4.3.5 consistent file Model
4.4 HDFS ha Mechanism
4.4.1 HA cluster
4.4.2 ha Architecture
4.4.3 why is there an ha mechanism?
4.5 HDFS Federation Mechanism
4.5.1 limitations of a single namenode HDFS Architecture
4.5.2 why the Federation mechanism is introduced
4.5.3 Federation Architecture
4.5.4 management of multiple namespaces
4.6 hadoop File System Access
4.6.1 Security Mode
4.6.2 shell access to HDFS
4.6.3 HDFS command for File Processing
4.7 Java API
4.7.1 read data from hadoopurl
4.7.2 filesystem class
4.7.3 filestatus
4.7.4 fsdatainputstream class
4.7.5 fsdataoutputstream class
4.7.6 list all files under HDFS
4.7.7 file matching
4.7.8 pathfilter object
4.8 maintain HDFS
4.8.1 append data
4.8.2 parallel replication
4.8.3 upgrade and rollback
4.8.4 Add a node
4.8.5 delete a node
4.9 HDFS permission management
4.9.1 user identity
4.9.2 principle of permission management
4.9.3 shell command for permission setting
4.9.4 Super User
4.9.5 HDFS permission configuration parameters
4.10 Summary of this Chapter
Chapter 1 hadoop file I/O
5.1 hadoop file data structure
5.1.1 sequencefile Storage
5.1.2 mapfile Storage
5.1.3 convert sequencefile to mapfile
5.2 HDFS Data Integrity
5.2.1 checksum
5.2.2 data block detection program
5.3 file serialization
5.3.1 serialization requirements for inter-process communication
5.3.2 hadoop file serialization
5.3.3 writable Interface
5.3.4 writablecomparable Interface
5.3.5 custom writable Interface
5.3.6 serialization framework
5.3.7 data serialization system Avro
5.4 hadoop writable type
5.4.1 writable class hierarchy
5.4.2 text type
5.4.3 nullwritable type
5.4.4 objectwritable type
5.4.5 genericwritable type
5.5 File compression
5.5.1 compression formats supported by hadoop
5.5.2 encoder and decoder in hadoop
5.5.3 local database
5.5.4 segmented compression lzo
5.5.5 Performance Comparison of compressed files
5.5.6 snappy Compression
5.5.7 comparison of gzip, lzo, and snappy
5.6 summary of this Chapter
Chapter 1 mapreduce Working Principle
6.1 functional programming concepts of mapreduce
6.1.1 List Processing
6.1.2 mapping data list
6.1.3 renewal data list
6.1.4 how Mapper and reducer work
6.1.5 application example: Word Frequency Statistics
6.2 mapreduce Framework Structure
6.2.1 mapreduce Model
6.2.2 mapreduce framework composition
6.3 mapreduce running principle
6.3.1 job submission
6.3.2 job Initialization
6.3.3 Task Allocation
6.3.4 execution of a task
6.3.5 update of progress and status
6.3.6 mapreduce progress Composition
6.3.7 task completed
6.4 mapreduce Fault Tolerance
6.4.1 task failed
6.4.2 tasktracker failure
6.4.3 jobtracker failed
6.4.4 subtask failure
6.4.5 how to handle repeated failed tasks
6.5 shuffle and sort phases
6.5.1 shuffle on the map end
6.5.2 reduce end shuffle
6.5.3 shuffle Process Parameter Optimization
6.6 execution of tasks
6.6.1 speculative execution
6.6.2 task JVM Reuse
6.6.3 skip bad records
6.6.4 environment for task execution
6.7 Job scheduler
6.7.1 first-in-first-out Scheduler
6.7.2 capacity Scheduler
6.7.3 fair Scheduler
6.8 custom hadoop Scheduler
6.8.1 hadoop scheduler framework
6.8.2 compile a hadoop Scheduler
6.9 yarn Introduction
6.9.1 asynchronous programming model
6.9.2 computing framework supported by Yarn
6.9.3 yarn Architecture
6.9.4 yarn Workflow
6.10 summary of this Chapter
Chapter 2 Application of Eclipse plug-in
7.1 compile hadoop source code
7.1.1 download hadoop source code
7.1.2 prepare the compiling environment
7.1.3 compile common components
7.2 install mapreduce plug-in eclipse
7.2.1 search for mapreduce plug-ins
7.2.2 create a new hadooplocation
7.2.3 hadoop plug-in operation HDFS
7.2.4 driver for running mapreduce
7.3 debug debugging of mapreduce
7.3.1 enter the debug Running Mode
7.3.2 debug specific operations
7.4 unit test framework mrunit
7.4.1 understanding the mrunit framework
7.4.2 prepare test cases
7.4.3 mapper unit test
7.4.4 reducer unit test
7.4.5 mapreduce unit test
7.5 summary of this Chapter
Chapter 2 mapreduce programming and development
8.1 wordcount Case Study
8.1.1 mapreduce Workflow
8.1.2 map process of wordcount
8.1.3 reduce process of wordcount
8.1.4 results of each process
8.1.5 er abstract class
8.1.6 reducer abstract class
8.1.7 mapreduce driver
8.1.8 mapreduce minimum driver
8.2 input format
8.2.1 inputformat Interface
8.2.2 inputsplit class
8.2.3 recordreader class
8.2.4 application instance: generates 100 decimal places at random and calculates the maximum value.
8.3 output format
8.3.1 outputformat Interface
8.3.2 recordwriter class
8.3.3 application example: put words with the same initials in a file
8.4 compression format
8.4.1 how to use compression in mapreduce
8.4.2 compression of map job output results
8.5 mapreduce Optimization
8.5.1 combiner class
8.5.2 partitioner class
8.5.3 distributed cache
8.6 auxiliary classes
8.6.1 read the hadoop configuration file
8.6.2 set hadoop configuration file attributes
8.6.3 genericoptionsparser Option
8.7 streaming Interface
8.7.1 how streaming works
8.7.2 streaming Programming Interface Parameters
8.7.3 job configuration attributes
8.7.4 application example: capture the webpage title
8.8 summary of this Chapter
Chapter 2 mapreduce advanced applications
9.1 counters
9.1.1 default counter
9.1.2 custom counter
9.1.3 obtain counter
9.2 mapreduce secondary sorting
9.2.1 quadratic sorting principle
9.2.2 algorithm flow for quadratic sorting
9.2.3 code implementation
9.3 join algorithm in mapreduce
9.3.1 reduce end join
9.3.2 join on map side
9.3.3 semi-join
9.4 mapreduce reads and writes data from MySQL
9.4.1 read data
9.4.2 write data
9.5 hadoop System Optimization
9.5.1 small file Optimization
9.5.2 map and reduce count settings
9.6 summary of this Chapter
Chapter 4 data warehouse tool hive
10.1 hive
10.1.1 how hive works
10.1.2 hive Data Type
10.1.3 features of hive
10.1.4 download and install hive
10.2 hive Architecture
10.2.1 hive user interface
10.2.2 hive metabase
10.2.3 hive Data Storage
10.2.4 hive Interpreter
10.3 hive File Format
10.3.1 textfile format
10.3.2 sequencefile format
10.3.3 rcfile File Format
10.3.4 custom file format
10.4 hive operations
10.4.1 table operations
10.4.2 view operations
10.4.3 index operations
10.4.4 partition operations
10.4.5 bucket operations
10.5 hive Composite Type
10.5.1 struct type
10.5.2 array type
10.5.3 map type
10.6 hive join details
10.6.1 Join Operation syntax
10.6.2 join Principle
10.6.3 external join
10.6.4 map join
Semantic difference in processing null values in 10.6.5 join
10.7 hive Optimization Strategy
10.7.1 column pruning
10.7.2 mapjoin operation
10.7.3 groupby operation
10.7.4 merge small files
10.8 hive built-in operators and functions
10.8.1 string Function
10.8.2 set statistical functions
10.8.3 compound operation
10.9 hive User-Defined Function Interfaces
10.9.1 UDF
10.9.2 user-defined aggregate function UDAF
10.10 hive permission Control
10.10.1 role creation and Deletion
10.10.2 Role authorization and Revocation
10.10.3 super administrator privilege
10.11 application example: Use JDBC to develop hive programs
10.11.1 prepare test data
10.11.2 code implementation
10.12 summary of this Chapter
Chapter 2 open-source database hbase
11.1 understand hbase
11.1.1 hbase features
11.1.2 hbase access interface
11.1.3 hbase Storage Structure
11.1.4 hbase Storage Format
11.2 hbase Design
11.2.1 logical view
11.2.2 framework structure and process
11.2.3 relationship between table and region
11.2.4-root-table and. Meta. Table
11.3 key algorithms and processes
11.3.1 location of Region
11.3.2 read/write process
11.3.3 region allocation
11.3.4 launch and deprecate the regionserver
11.3.5 launch and deprecate the master
11.4 hbase Installation
11.4.1 hbase single-host Installation
11.4.2 hbase distributed Installation
11.5 hbase shell operations
11.5.1 General Operations
11.5.2 DDL operations
11.5.3 DML operations
11.5.4 hbaseshell script
11.6 hbase Client
11.6.1 Java API Interaction
11.6.2 mapreduce operation hbase
11.6.3 write data to hbase
11.6.4 read data from hbase
11.6.5 Avro, rest, and thrift Interfaces
11.7 summary of this Chapter
Chapter 1 mahout Algorithm
12.1 Use of mahout
12.1.1 install mahout
12.1.2 running a mahout case
12.2 mahout Data Representation
12.2.1 preference
12.2.2 Data Model datamodel class
12.2.3 mahout link to MySQL database
12.3 understand the taste framework
12.4 mahout recommender
12.4.1 user-based recommender
12.4.2 project-based recommender
12.4.3 slopeone recommendation Policy
12.5 Recommendation System
12.5.1 personalized recommendations
12.5.2 commodity recommendation system case
12.6 summary of this Chapter
Appendix A hive built-in operators and functions
Appendix B hbase default configuration description [1] parameters of three configuration files in Appendix C hadoop

Detailed description of hadoop Application Development Technology

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.