This is a creation in Article, where the information may have evolved or changed.
Project background
An enterprise is a focus on the car networking, video CDN technology and other applications of scientific and technological innovation enterprises, with intelligent car, VSDN and other business systems. These systems generate a lot of logs every day, and the previous log management system does not meet the needs of real-time analytics. Because the platform query response is slow, and so on, when a problem occurs in a system, the resulting log data can not be viewed in a timely manner, it can not locate the problem, resulting in business personnel have data but can not use the Jiongzhuang.
How to fully integrate these log data and fully tap its value is an urgent problem for enterprises to solve.
Project objectives
Datahunter finally provides the enterprise with a set of functions including bandwidth, scheduling, traffic log resolution warehousing and other functional modules complete solution. Enables business people to view log data in real time on a Kanban board and to implement free mapping and dimension queries based on that data.
Business requirements
1, the bandwidth log real-time analysis, the realization of minute aggregation statistics (a total of more than 80 nodes in the country, data volume of about 3.5 billion per day, 1.6TB)
2, support according to different time granularity, different products, different customers, different nodes for real-time statistical analysis of data.
3, optimize the platform query response speed, enrich the visual interface.
Problems
1. Difficulty in data acquisition
Previously, the enterprise's log information was collected through manual scripting. Because of the serious data isolation between various machines, large log volume and a wide variety of factors, such as the difficulty imaginable.
2. The type of log is complicated
Multiple logs require different business requirements, and a single log analysis is difficult to meet the requirements.
3. Slow Log analysis
The speed of the analysis is very slow, limited by the size and format of the log.
4. Data cannot be displayed
Log data is difficult to show through reports or graphics and does not help business people get valuable information in a timely manner.
Architecture implementation
1. Datahunter based on Golang independent research and development of Dhbeat, to meet the low-load high-performance data acquisition, analysis, reporting, support 150w/s data acquisition needs.
2. Nats is an open-source, lightweight, high-performance distributed messaging system that enables a highly scalable and elegant publish/subscribe model.
3. Datahunter based on Golang independent research and development of K2DB, to meet the low-load high-performance data subscription, analysis, storage requirements.
4. Pipeline is a streaming relational database, which is characterized by the automatic processing of streaming data, not storing the raw data, only the processed data, so it is very suitable for the current popular real-time streaming processing
5. The CITUSDB distributed database can scale the PG database to accommodate big data processing. Automatic shard and fragment replication in the cluster, query requirements can be distributed in the cluster, leveraging the computing power of each node in the cluster
6. DH Visual analysis platform, based on the DH core product visualization Configuration tool, can instantly show the region, the real-time product bandwidth, traffic, scheduling situation.
▲ statistics of different customer bandwidth
▲ statistical bandwidth of different nodes
Platform Core Value
1. Multi-Data Source fusion
Business data, log information, public data, easy convergence, aggregated analytics for easy control of your business
▲ The combination of dispatching log and business data
2. Real-time data display
Through each system, the final realization of real-time data processing and key indicators display, every moment in keeping with the front line synchronization, easy for business personnel to monitor log information.
▲ Real-time bandwidth statistics
3. Interactive analysis
Business people can generate graphs based on real-time data configurations, and use these charts for collaborative filtering and arbitrary dimensional data drilling, and exploratory analysis to quickly find the root cause of the problem.
▲ Summary of any dimension aggregation