Some technical scheme and realization of solving statistical system design

Source: Internet
Author: User
From data source to data presentation tell me about the current situation,

Data Source:
1. mysql
2. log files

Present:
1. csv file export or send mail
2. Provide the interface data to the Web rendering icon

Some of the technical points that are currently involved are:
Read data from the log file is mainly Shell + awk, from MySQL is mainly PHP read data storage to a file, then through the PHP or shell to do some arithmetic or statistical processing, the corresponding data warehousing or send mail to the demand side.

The current project has accumulated a lot of scripts, and some temporary solutions (scattered scripts), and as the data grows, this part of MySQL is becoming more and more inefficient, and complex scripts have become increasingly difficult to maintain. Some of these scripts are required to run manually once, and many are timed to run, and if it continues, it is almost uncontrolled.

Find a solution from data entry to data presentation, or share it with experienced students.

Log files are stored in a portion of Hadoop and are not currently written in MapReduce directly to deal with this part.

->3q

Reply content:

From data source to data presentation tell me about the current situation,

Data Source:
1. mysql
2. log files

Present:
1. csv file export or send mail
2. Provide the interface data to the Web rendering icon

Some of the technical points that are currently involved are:
Read data from the log file is mainly Shell + awk, from MySQL is mainly PHP read data storage to a file, then through the PHP or shell to do some arithmetic or statistical processing, the corresponding data warehousing or send mail to the demand side.

The current project has accumulated a lot of scripts, and some temporary solutions (scattered scripts), and as the data grows, this part of MySQL is becoming more and more inefficient, and complex scripts have become increasingly difficult to maintain. Some of these scripts are required to run manually once, and many are timed to run, and if it continues, it is almost uncontrolled.

Find a solution from data entry to data presentation, or share it with experienced students.

Log files are stored in a portion of Hadoop and are not currently written in MapReduce directly to deal with this part.

->3q

0, the program depends on your goals and team strength. The complexity of the self-built scheme is proportional to your expectations and proportional to the amount of data.
1, you can study Splunk or Logstash + ES + Kibana These two scenarios, I believe there will be surprises.
2, if you want to go deeper, you can learn about Siem.
3, Dirty and quick is an option; Flexable is another option.

Source data Collation Good format, set the schema, with the hive statistics, with Oozie timed run operations, the results are placed in a good place, with the Web rendering.
It's basically a routine.

Only one answer can be adopted.
In fact, this is a discussion of the topic, I hope that more people to participate in it.

  • Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.