After Hive parses the SQL and optimizes the logical execution plan, it will get the optimized Operator Tree, which can be viewed through the explain command. SQL image data can be used to collect various types of Operator operations from this result.
From an intuitive understanding, we know that reading the amount of data is important for the choice of engine. For example, when reading a small amount of data, Presto performs best. When reading a large amount of data, Hive is the most stable, and when reading a medium amount of data, it can be executed by Spark.
In the initial stage, it is also possible to test on various computing engines by reading the amount of data, combining the complexity of the Join, the complexity of the aggregation, and the like, and adopting a rule-based approach for routing. During the execution process, the SQL image data of each query, the execution engine, and the down link are recorded. Based on these image data, the following can use intelligent classification such as decision tree, logistic regression, SVM and other classification algorithms to achieve intelligent routing of the engine. At present, the big data team has begun this attempt.
At present, in Eleme application, the Ad Hoc query dispatched by Dispatcher has an overall success rate of 99.95% or more and an overall PT90 value of about 300 seconds due to the increase of the pre-checking link and the failure of the downgrade. Currently, Presto assumes 50% of the traffic for Ad Hoc queries, and SparkServer mode assumes 40% of traffic.
Eleme runs every more than 250K Spark&MR task in the big data cluster records the running status of each Mapper/Reducer or Spark Task. If it can be fully utilized, it will bring great value. Make full use of the cluster itself data, data-driven cluster construction. These data can not only help the cluster administrator to monitor the computing resources, storage resource consumption, task performance analysis, and host running status of the cluster itself. It can also help users to self-analyze the reasons for the failure of the task, the performance analysis of the task, and so on.
The Grace project developed by Eleme Big Data team is an example of this.
If you don't have a clear understanding of the detailed data of the cluster task health, it is easy to get into trouble when there is a problem. After monitoring the cluster abnormality, you will not be able to continue to locate the problem further quickly.
When a user often asks you to say, why did my task fail? Why is my task running so slowly? Can my task adjust the priority? Don't tell me to read the log, I don't understand. I think everyone is in a state of collapse.
When the monitoring sends out the NameNode abnormal jitter, the network is high, the block creation is increased, and the block creation delay is increased, how to quickly locate the abnormal task of the cluster operation?
When there are too many tasks for monitoring Pending in the cluster, how can the root cause of the problem be quickly found when the user feedback task is delayed?
When a user applies for computing resources, how much resources should they be allocated to them? How do you use data to speak when users apply to increase the priority of the task, and how much should the priority be adjusted? When users only go online, regardless of the offline task, how do we locate which tasks are no longer needed?
In addition, how to calculate the resource consumption of each BU in the BU, and calculate the resource consumption of each user in the BU, which accounts for the proportion of the BU resources. And how to analyze the number of BU tasks, the proportion of resources used, the resource consumption of each user inside the BU, and the resource consumption of each task from the historical data.
1) Monitoring queue
2) Task monitoring
You can view the task type of the running task in the specified queue, the start time, the running time, the current queue resource consumption ratio, and the current BU resource ratio. Quickly locate tasks with high computational resource consumption and long running time, and quickly find the cause of queue blocking.
3) Monitoring host failure rate
You can monitor the task execution failure rate on all hosts in the cluster. The existing monitoring system monitors the hardware status of the host's CPU, disk, memory, and network. The most intuitive manifestation of these hardware failures is that tasks assigned to these problematic hosts are slow to execute or fail to execute. The task in operation is the most sensitive response. Once the failure rate of a host is detected to be too high, the fast automatic offline service can be triggered to ensure normal operation. Subsequent hardware monitoring can be used to locate the host abnormality.
4) Task performance analysis
Users can perform task performance analysis on their own. And can be self-adjusted according to the following suggestions according to the abnormal items.
5) Analysis of the reasons for the failure of the task
For failed tasks, users can also quickly view the cause of the failure from the scheduling system and the corresponding solution according to the following methods. Eleme data team will collect various typical error information and update the self-service analysis knowledge base.
In addition, we can monitor the computing resource consumption of each task in real time, GB Hours, total read and write data volume, Shuffle data volume and so on. And the amount of HDFS read and write data, HDFS operands, etc. of the running task.
When there are insufficient cluster computing resources, you can quickly locate tasks that consume more computing resources. When monitoring the HDFS cluster jitter, read and write timeout and other abnormal conditions, you can also quickly locate abnormal tasks through these data.
Based on these data, the distribution of resource consumption time periods of tasks can be distributed according to the task volume of each queue, and the proportion of resource allocation of each queue can be reasonably optimized.
According to the task health data, a task portrait is created, the task resource consumption trend is monitored, and the positioning task is abnormal. Combined with the access heat of the task output data, it can also be fed back to the scheduling system to dynamically adjust the task priority.