Some technical solutions and implementation for solving the design of the statistical system

Source: Internet
Author: User
From data sources to data presentation, let's talk about the current situation. Data Sources: 1. mysql2. Log File presentation: 1. Export a csv file or send an email. 2. Provide interface data to the web rendering icon. Currently, some technical points are involved: Reading data from the log file is mainly shell + awk, from m... from data sources to data presentation, let's talk about the current situation,

Data source:
1. mysql
2. Log Files

Rendering:
1. Export or send an email to a csv file
2. Provide interface data to the web display icon

Some technical points currently involved:
The main types of data read from log files are shell + awk. From mysql to mysql, data is read from php and stored to files. Then, some operations or statistical processing are performed through php or shell, data is stored in the database or sent to the demander by email.

At present, the project has accumulated a large number of scripts and some temporary solutions (scattered scripts). As data increases, the efficiency of mysql is getting slower and slower, complex scripts have become increasingly difficult to maintain. Some of these scripts run manually when needed, and many of them run at regular intervals. If they continue, they will be uncontrollable.

I am looking for a solution from data entry to data presentation, or share it with experienced students.

The log file is stored in a part of hadoop. At present, mapreduce is not written to directly process this part.

-> 3Q

Reply content:

From data sources to data presentation, let's talk about the current situation,

Data source:
1. mysql
2. Log Files

Rendering:
1. Export or send an email to a csv file
2. Provide interface data to the web display icon

Some technical points currently involved:
The main types of data read from log files are shell + awk. From mysql to mysql, data is read from php and stored to files. Then, some operations or statistical processing are performed through php or shell, data is stored in the database or sent to the demander by email.

At present, the project has accumulated a large number of scripts and some temporary solutions (scattered scripts). As data increases, the efficiency of mysql is getting slower and slower, complex scripts have become increasingly difficult to maintain. Some of these scripts run manually when needed, and many of them run at regular intervals. If they continue, they will be uncontrollable.

I am looking for a solution from data entry to data presentation, or share it with experienced students.

The log file is stored in a part of hadoop. At present, mapreduce is not written to directly process this part.

-> 3Q

0. The solution depends on your goal and team strength. The complexity of the self-built solution is proportional to your expectation and the data size.
1. You can study the Splunk or Logstash + ES + Kibana solutions. I believe they will be pleasantly surprised.
2. If you want more details, you can learn about SIEM.
3. Dirty And Quick are one choice; Flexable is another choice.

The source data is formatted, the Schema is set, Hive statistics are used, and Oozie is used to regularly run the job. The results are placed in the agreed place and presented on the Web.
Basically, this is a routine.

Only one answer can be adopted.
In fact, this is a discussion topic. I hope more people will join us.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.