Hadoop offline Big data analytics Platform Project Combat

Source: Internet
Author: User
Tags map class hadoop ecosystem sqoop


Hadoop offline Big data analytics Platform Project Combat
Course Learning Portal: http://www.xuetuwuyou.com/course/184
The course out of self-study, worry-free network: http://www.xuetuwuyou.com


Course Description:
A shopping e-commerce website data analysis platform, divided into data collection, data analysis and data display three dimensions. Data analysis is mainly based on big data Hadoop ecosystem commonly used components to deal with, this project truly shows the big data in the enterprise practical application.


Course Content
(1) Document collection Framework Flume
①flume Design Architecture, principle (three major components)
②flume preliminary use, real-time 
③ Real-World case: real-time collection of storage HDFs using flume monitoring data

(2) Big Data analytics platform architecture
① Data Platform three modules
Let technology produce value!
② Analytics Platform Business data
③ Big Data platform technology selection and configuration testing

(3) Data analysis platform seven business analysis
① specific seven business analysis, for different data
② data received hdfs/hive/hbase, using MapReduce and Hive offline analysis, which involves geographical analysis, user-related information analysis and external chain analysis.
③ based on business in-depth mapreduce use
How to optimize the adjustment for different problems when ④ data processing



Course Catalogue:
1th: Big Data offline Project: Enterprise Big Data project business and design
1. Development process for big data projects
2. Application fields of Big Data (i)
3. Application fields of Big Data (ii)
4. Big Data analytics Platform (i)
5. Big Data analytics Platform (ii)
6. Planning of data volume and cluster size (i)
7. Planning of data volume and cluster size (ii)
8. Enterprise Common data analysis needs (i)
9. Enterprise Common data analysis requirements (ii)


Introduction of 10.Flume and its architecture composition
Installation deployment of 11.Flume
Test run of 12.Flume
13.Flume configuration using File channel and HDFs sink
Configure HDFs file generation size and time partition in 14.Flume
Configuring the use of spooling dir in 15.Flume
16.Flume Configuring spooling dir for file filtering
Introduction to configuring the fan-in architecture in 17.Flume
Test implementations for configuring the fan-in architecture in 18.Flume
Implementation of configuration fan-out architecture in 19.Flume
Introduction and compilation of Taildir in 20.Flume
Taildir configuration and test use in 21.Flume

3rd: Big Data offline Project: Nginx+flume Realization 
22. Introduction to the technical structure of the project
23. Project Technical Structure diagram
24. Technical selection of the framework in the project
25.Tengine Introduction and Source code compilation
26.Tengine start-up and test
27. Configuring the Nginx Service with service command management
28.SDK and Nginx correlation test
29.SDK design ideas and the introduction of important event types
Code implementation of the 30.JS SDK and JAVA SDK
Configuring custom collection Scenarios in 31.Nginx

33.Flume load balancing and failover and the use Case of American Regiment

The 4th chapter: Big Data Offline Project: ETL Business Analysis and implementation (I.)
Implementation of 34.Nginx Log Segmentation script (i.)
Implementation of 35.Nginx Log Segmentation script (II.)
Implementation of 36.Nginx log upload script
Process Analysis for 37.ETL
38. Import of data analysis items
39. Implementation of the Log parsing class (i)
40. Implementation of the Log parsing class (ii)
41. Implementation of the Log parsing class (iii)
42. Implementation of the Log parsing class (iv)
43.ETL Specific Code Flow analysis
44. Design of HBase tables in the project

The 5th chapter: Big Data Offline Project: ETL Business Analysis and implementation (II.)
Implementation of the Map class for ETL (I.)
46.ETL implementation of the map Class (ii)
Implementation of the driver class of 47.ETL (i)
Implementation of the driver class of ETL (II.)
Implementation of the driver class of 49.ETL (iii)
Local run test for 50.ETL
The cluster run test of the ETL

The 6th chapter: The Big Data offline project: The idea and code realization of data analysis
52. New visitors to the statistical analysis of the realization of ideas (a)
53. New visitors to the statistical analysis of the realization of ideas (b)
54. Using MapReduce to realize the thought analysis
55.Hbase filtering of median and field (i)
56.Hbase filtering of Median and field (ii)
57.Hbase filtering of values and fields (iii)
58. New User Statistics Map code implementation (a)
59. New user Statistics Map code implementation (II)
60. New User Statistics Map code implementation (III)
61. New user statistics reduce and Driver code implementation (i)
62. New user statistics reduce and Driver code implementation (II)

7th: Big Data offline Project: Hourly analysis and visual display
Custom input and output in 63.MapReduce
64. New user Statistics Code explained (a)
65. New user Statistics code explained (ii)
66.hourly Analysis-hbase integration with hive
67.hourly Analytics-Analytics for active users
68.hourly Analysis-session length analysis
69.hourly analysis-average access duration analysis and Sqoop export
70. Using Zeus to implement project task scheduling (i)
71. Using Zeus to implement project task scheduling (II.)
72. Using Zeus to implement project task scheduling (iii)
73. Using Zeus to implement project task scheduling (iv)
74. Data presentation layer and the use of highcharts (i)
75. Data presentation layer and the use of Highcharts (ii)
76. Project Summary (i)
77. Project Summary (ii)



Hadoop Course Integration Recommendations:

Learn Big Data infrastructure Hadoop with Miss Xuan Yu
Course View Address: http://www.xuetuwuyou.com/course/193

Hadoop architecture design and source code analysis
Course View Address: http://www.xuetuwuyou.com/course/88

Hadoop Combat + super-large cluster tuning
Course View Address: http://www.xuetuwuyou.com/course/97

Hadoop 0 Basic High-end combat training (CDH5, Hive, Sqoop)
Course View Address: HTTP://WWW.XUETUWUYOU.COM/COURSE/62


Hadoop offline Big data analytics Platform Project Combat


Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.