Eleme big data computing engine: Grace is implemented by Spark Streaming

Source: Internet
Author: User
Tags big data big data cluster data architecture eleme spark

Grace architecture

The data used in the above examples was collected by Grace. Grace is hungry. The application developed by the big data team is mainly used to monitor and analyze the online MR/Spark task running data, monitor the running queue and task details and summarize the data.

Grace is implemented by Spark Streaming. By collecting the jhist file of the completed MR task stored in Kafka or the event log path of the Spark task, the task running history data is obtained from the corresponding location of the HDFS, and the detailed data of the MR/Spark task is obtained after parsing. Based on these data, a certain aggregation analysis is performed to obtain summary information of the task level, the Job level, and the Stage level. Finally, through the customized Dr-Elephant system, the task detailed data is analyzed by heuristic algorithm, which gives the user some intuitive optimization tips.

For Dr-Elephant, we also made custom changes, such as packaging it as a component of the Grace system. The mode of deploying services from a single machine has changed to a distributed real-time resolution mode. Switch its data source to the task detail data that Grace resolves to. Increase the ActionId of each task to track link information, optimize Spark task resolution logic, add new heuristic algorithms and new monitoring indicators.


Conclusion

As the big data ecosystem becomes more and more perfect, more and more users with different backgrounds will join the ecosystem. How to reduce the user's entry threshold and facilitate users to use big data resources quickly and conveniently is also a problem to be considered.

Most of the tasks running in big data clusters are business-related, but as the cluster size becomes larger and larger, the task size becomes larger and larger, and the data generated by the cluster itself cannot be ignored. This part of the data is really reflecting the details of the use of the cluster. We need to consider how to collect and use this part of the data to measure and observe our clusters and tasks from a data perspective.

It is not enough to focus on the overall deployment, performance, and stability of the cluster. How to improve the user experience, fully exploit the data of the cluster itself, and use data to promote the construction of big data clusters is the theme of this sharing.


Q & A

Q: Can you briefly introduce the scheduling system? Managing tens of thousands of tasks is not easy.

A: The scheduling system is quite complicated to say. Just mention a few key points, one is the dependence between tasks, one is blood relationship, one is task and instance, as well as cluster back pressure, distributed scheduling, and the underlying environment.

The kinship relationship should be necessary, because when your cluster is large, users can't add dependencies completely when configuring tasks.

Through the blood system, the task is parsed. When the user configures the new task, the pre-dependency is automatically recommended to ensure that the task runs in an orderly manner.


Q: How do I get the daily read and write scale of the cluster? Hadoop has an interface?

A: The scale of cluster read and write is collected by Grace introduced earlier. Because we will analyze the HDFS data read and write amount of each mr task or spark task. It also includes spike to disk data, shuffle write, shuffle read data, and GBHour information for each task.

In fact, you can see the data through YARN or Spark's WEB UI page. All you need to do is parse and collect the data in real time. This is also mentioned in this sharing introduction, and the operation and maintenance of the cluster from the perspective of data.

In addition to business data, the data generated by the cluster itself is also valuable.


Q: Is this the data from the big data itself to refine the operation and maintenance cluster?

A: Yes. If you are also engaged in the direction of data architecture, you can recall your daily work. We are just turning the human flesh analysis into automation and then adding some real-time.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.