1: Project technical structure diagram:
2: Flowchart Analysis, the overall process is as follows:
ETL is the SQL of hive query;
However, as the premise of this case is to deal with massive amounts of data, the techniques used in each link in the process are completely different from the traditional bi:
1) Data acquisition: Customizing the development of the acquisition program, or using the Open source framework Flume
2) Data preprocessing: Custom development MapReduce program runs on Hadoop cluster
3) Data warehousing technology: Hive on top of Hadoop
4) Data export: Hadoop-based Sqoop data import and Export tool
5) Data visualization: Custom development of Web programs or use of products such as kettle
6) process scheduling throughout the process: Oozie tools or other similar open source products in the Hadoop ecosystem
3:
Cond......
Big Data Platform website log Analysis System