From data source to data presentation tell me about the current situation,
Data Source:
1. mysql
2. log files
Present:
1. csv file export or send mail
2. Provide the interface data to the Web rendering icon
Some of the technical points that are currently involved are:
Read data from the log file is mainly Shell + awk, from MySQL is mainly PHP read data storage to a file, then through the PHP or shell to do some arithmetic or statistical processing, the corresponding data warehousing or send mail to the demand side.
The current project has accumulated a lot of scripts, and some temporary solutions (scattered scripts), and as the data grows, this part of MySQL is becoming more and more inefficient, and complex scripts have become increasingly difficult to maintain. Some of these scripts are required to run manually once, and many are timed to run, and if it continues, it is almost uncontrolled.
Find a solution from data entry to data presentation, or share it with experienced students.
Log files are stored in a portion of Hadoop and are not currently written in MapReduce directly to deal with this part.
->3q
Reply content:
From data source to data presentation tell me about the current situation,
Data Source:
1. mysql
2. log files
Present:
1. csv file export or send mail
2. Provide the interface data to the Web rendering icon
Some of the technical points that are currently involved are:
Read data from the log file is mainly Shell + awk, from MySQL is mainly PHP read data storage to a file, then through the PHP or shell to do some arithmetic or statistical processing, the corresponding data warehousing or send mail to the demand side.
The current project has accumulated a lot of scripts, and some temporary solutions (scattered scripts), and as the data grows, this part of MySQL is becoming more and more inefficient, and complex scripts have become increasingly difficult to maintain. Some of these scripts are required to run manually once, and many are timed to run, and if it continues, it is almost uncontrolled.
Find a solution from data entry to data presentation, or share it with experienced students.
Log files are stored in a portion of Hadoop and are not currently written in MapReduce directly to deal with this part.
->3q
0, the program depends on your goals and team strength. The complexity of the self-built scheme is proportional to your expectations and proportional to the amount of data.
1, you can study Splunk or Logstash + ES + Kibana These two scenarios, I believe there will be surprises.
2, if you want to go deeper, you can learn about Siem.
3, Dirty and quick is an option; Flexable is another option.
Source data Collation Good format, set the schema, with the hive statistics, with Oozie timed run operations, the results are placed in a good place, with the Web rendering.
It's basically a routine.
Only one answer can be adopted.
In fact, this is a discussion of the topic, I hope that more people to participate in it.