One: Project development process
1. Project Research
Understand the initial requirements of the project, and then combine the technology of the market to see if it can be completed
2. Demand Analysis
What exactly does a project need to do?
What did you do in the end?
Importance: A good demand analysis can clarify the subject direction of the follow-up development of the project
3. Program Design
Overview design:
Project structure, technology selection
Detailed design:
According to the module design
4. Encoding implementation
Specific implementation
5. Testing
Functional testing: Whether the function meets the requirements
Integration testing: Compatibility between modules
Stress test: High concurrency, multi-user, whether the system can run
User test: Modify according to User's suggestion
6. On-line
Commissioning phase: The new system runs on-line with the old system and uses the shunt technology
Official operation: Only new system runs on line
7. Post-Maintenance
One more iteration of development
II: What is the data analysis platform
1. Offline Data analysis Platform
Mapreduce,hive,sparkcore (Spark on yarn)
2. Real-time data analysis platform
Sparkcore (Spark on standalone), Sparkstreaming,strom
Three: Why do you want to do the data analysis platform
1. Advantages
There is no disclosure of data
Customizable, free to develop
Data in your own company that can be developed in the future
To help the company's talent reserve
2. Disadvantages
Need talent cost, time cost
Server cost, High machine cost
IV: Sources of data
1. Log server
Nginx logs, Apache logs, Linux logs
2. Business Log
log4j Log
3. Business Data
stored in the business database, providing business-supported data
4. User behavior Data
Click, browse, select, bookmark, order, offline
5. Purchase of third-party data
6. Crawling data from a web crawler
7. Collaborators ' data
V: Data Processing flow
1. Data collection
Collect user data and save to HDFs
2. Data processing
Cleaning, filtering, and filling of data
Demand development according to the business
Save the results to SQL, or NoSQL
3. Visualization of data
Display the results, you can use charts and so on.
4. Other results-based applications
User Portrait
Recommended
Data Analyst
The project development process, and what is the data analysis platform, and why do you want to do data analysis platform, data source and data processing process