Where to go network Big Data stream processing system: How to use Alluxio (front tachyon) to achieve 300 times times performance improvement

Last Update:2016-05-31 Source: Internet

Author: User

Tags exit in logstash

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Overview

With the increasing competition of Internet companies ' homogeneous application services, the business sector needs to use real-time feedback data to assist decision support to improve service level. As a memory-centric virtual distributed storage System, Alluxio (former Tachyon) plays an important role in improving the performance of big data systems and integrating ecosystem components. This article will introduce a ALLUXIO-based real-time log stream processing system in Qunar, Alluxio This system focuses on the problem of remote data storage and access, so that the overall performance of the flow processing pipeline in the production environment improved by nearly 10 times times, And the peak is even about 300 times times.

At present, where to go network flow processing pipeline daily needs to process the volume of business log about 6 billion, totaling about 4.5TB of data. Many of these tasks need to be guaranteed to work in stable, low-latency situations, and quickly iterate over the results and feed back into the online business system. For example, a wireless app's user clicks, searches, and other behaviors generated by the log, will be captured in real-time and written to the pipeline to analyze the corresponding recommendation information, and then feedback to the business system and show in the application. How to ensure the reliability of the data and low latency is the most serious in the whole system development and operation.

Alluxio Big Data storage system originated from UC Berkeley Amplab, currently led by Alluxio Company in Open source community. It is the first memory-centric virtual distributed storage system in the world, and it connects the diverse upper-layer computing framework and the underlying storage system to unify the data access methods. Alluxio memory-centric storage features make data access to upper-level applications several orders of magnitude faster than existing conventional scenarios. In addition, Alluxio provides features such as hierarchical storage, unified namespaces, lineage, flexible file APIs, Web UI, and command-line tools that facilitate the user's use in different practical scenarios. In this article, we will combine the specific case to do further elaboration.

In our case, the entire streaming computing system is deployed on a physical cluster, Mesos is responsible for the management and allocation of resources, and Spark streaming and Flink are the main stream computing engines; The storage systems HDFs is located in another remote room for backup storage of the entire company's log information Alluxio is a core storage layer that is deployed with the computing system. The business pipeline generates about 4.5TB of data written to the storage layer each day, while the Kafka consumes about 6 billion logs and data from the storage layer for collision analysis. The value of the Alluxio for the entire flow-processing system mainly includes:

Use Alluxio's tiered storage features to combine memory, SSD, and disk storage resources. The LRU, LFU, and other caching policies provided by Alluxio ensure that thermal data remains in memory and that cold data is persisted to level 2 and even 3 storage devices, while HDFs acts as a long-term file backup system.
By using Alluxio to support the features of multiple computing frameworks, data sharing between the compute frameworks such as spark and Zeppelin is achieved through Alluxio, and the file transfer rate is achieved at the memory level, and we plan to migrate Flink and Presto operations to Alluxio.
Using Alluxio's unified namespace feature, the remote HDFS underlying storage system is conveniently managed, and a unified namespace is provided to the upper layer, and the computing framework and application can access data from different data sources uniformly through Alluxio;
The Alluxio provides a variety of easy-to-use APIs that reduce the user's learning costs and facilitate the migration of the entire system to Alluxio, while making the adjustment verification process much easier;
Using Alluxio solves the problem of "spark task cannot be completed" in the original system: When a spark executor fails to exit in the original system, it is mesos dispatched to any node of the cluster, even if the reserved context is set, the executor "drift "Causes the task to not complete. In the new system, Alluxio the computation and storage of the data, the calculation data will not be lost because of the "drift" of the executor, which solves the problem.

In the remainder of this paper, we will compare and analyze Qunar original flow processing system and introduce ALLUXIO improved flow processing system, and finally briefly describe our next planning and expectation of Alluxio future direction.

Original system architecture and related problems analysis

Our live stream processing system chooses Mesos as the infrastructure layer (Infrastructure layer). In the original system, the rest of the components were running on Mesos, including Spark, Flink, Logstash, and Kibana. One of the components used primarily for streaming computing is spark streaming, where spark streaming requests resources to Mesos, becomes a mesos Framework, and Dispatches tasks through Mesos.

As shown, in this stream processing system, the log data to be processed from a number of data sources, summarized by Kafka, the data flow after the Logstash cluster cleaning is again written Kafka staged, followed by the spark streaming and Flink stream computing framework to consume the data, The computed result is written to HDFs. In the original data processing process, there are the following performance bottlenecks:

The HDFs used to hold the input and output data is located in a Remote Storage cluster (located in a different room in the physical location). Local computing cluster and Remote Storage cluster have high network latency, and frequent remote data exchange becomes a bottleneck in the whole stream processing process.
The design of HDFs is disk-based, and its I/O performance, especially the write data performance, is difficult to meet the delay required by streaming computing. Spark streaming each spark executor reads data from HDFs when it is being computed. Repeated cross-room read file operation further slows down the overall efficiency of streaming computing;
Since the spark streaming is deployed above Mesos, when one executor fails, Mesos may restart the executor on the other node, but the checkpoint information of the previous failed node cannot be reused and the calculation task cannot be completed successfully. And even if the executor is restarted on the same node, the speed of completion cannot meet the requirements of streaming computation when the task can be completed.
In spark streaming, if you use memory_only to manage blocks of data, there will be a lot of even duplicate data in the Spark executor JVM, which not only increases the GC overhead, but can also cause memory overflow, and if memory_to_ Disk or disk_only, the overall stream processing speed will be limited by the slow disc I/O.

Improved system architecture and solutions

After the introduction of Alluxio, we solved the above problem well. In the new system architecture, the logic of the entire streaming process is basically the same. The only change is the use of Alluxio instead of the original HDFS as the core storage system, and the original HDFs as the Alluxio of the underlying storage system for backup. Alluxio also runs on the Mesos, and each computing framework and application is exchanged through ALLUXIO, providing high-speed data access services and maintaining the reliability of the data, and only backing up the final output to a remote HDFs storage cluster.

In the new system architecture, the initial input data is still filtered by Kafka and consumed by spark streaming, unlike the large number of intermediate results generated by the spark streaming in the calculation and the final output stored in Alluxio. Avoid interacting with slower remote HDFS clusters, while data stored in Alluxio can easily be shared with upper-level components such as Flink, Zeppelin. During the whole process, some important features of Alluxio play an important role in the performance improvement of the whole pipeline:

Support for tiered storage-we deploy ALLUXIO workers on each compute node, managing local storage media, including memory, SSD and disk, to form a hierarchical storage layer. Compute-related data per node is stored locally as much as possible, avoiding the consumption of network resources. At the same time, Alluxio itself provides an efficient substitution strategy such as LRU, Lfu and so on, which can guarantee the hot data in the fast memory layer, improve the data access rate, even if the cold data is stored on the local disk, it will not be output directly to the remote HDFs storage cluster;
Data sharing across compute frameworks-in addition to the spark streaming itself, other components, such as Zeppelin, need to use the data stored in Alluxio in the new system architecture. In addition, spark streaming and spark batch jobs can be connected and read from or written to data through Alluxio to achieve memory-level data transfer. In addition, we are migrating flink related business and logic to Alluxio to realize efficient data sharing between computing frameworks.
Unified namespaces-Manage the persistent storage locally on the compute cluster by using the HDD tier in Alluxio tiered storage, while using the Alluxio Mount feature to manage remote HDFs storage clusters. Alluxio naturally manages the storage space of HDFs and Alluxio itself. These storage resources are transparent to the upper application and computing framework, and only a unified namespace is presented, which avoids the complicated input and output logic.
The simple and easy-to-use Api--alluxio offers a number of easy-to-use APIs Its native API is a set of java.io-like file input and output interface, using its development application does not require a complex user learning curve; Alluxio provides an HDFS-compatible interface where applications that originally use HDFs as the Target store can be migrated directly to Alluxio, and the application simply needs to transfer the original HDFs ://Replace with alluxio://to work properly, the cost of migration is almost zero. In addition, Alluxio's command-line tools and Web UI facilitate the validation and debugging steps in the development process, shortening the development cycle of the entire system. For example, we use Chronos (a Mesos framework, which is used to perform timed tasks) in the early morning hours of the day by Alluxio loadufs command to pre-load data from MapReduce to Alluxio, So that subsequent operations can read these files directly.
Alluxio is tightly coupled with spark, where we store the primary data in the spark streaming in Alluxio instead of the spark executor JVM, and because the storage location is also local memory, it does not slow down the performance of the data processing. The cost of Java GC can be reduced instead. At the same time, this approach avoids the memory overflow caused by the redundancy of the data blocks on the same node. We also store the checkpoint of the sparksteaming calculation on the Alluxio of the RDD.

By taking advantage of the many features of Alluxio and optimizing the data from the remote HDFs storage cluster to the local Alluxio, the data interaction process in the whole stream processing pipeline is transferred to the memory of the local cluster, which greatly improves the overall throughput rate of data processing, reduces the response delay, Meet the needs of stream processing. The average processing throughput from 20 to 300 EPS from a previous single Mirco batch cycle is increased to a more stable 7800eps from each micro batch (10-minute interval) monitor on our online real-time monitoring. The average processing time is reduced from about 8 minutes to less than 30-40 seconds, and the entire flow process accelerates 16-300 times. Especially when the network is busy and crowded, the acceleration effect is especially obvious.

And the consumption index of the Kafka, the consumption rate from the previous 200K message stabilized to the near 1200K.

In addition, we use the Alluxio metrics component to send monitoring data to graphite to facilitate monitoring of Alluxio's JVM and Alluxio status. You can see that the Alluxio master keeps the heap memory usage low.

The number of files and operation statistics for the same period are shown.

Future prospects

The optimization method introduced in this paper is mainly aimed at using Alluxio to solve the problem of slow storage access in remote locations. The performance improvement work is endless, finally we also summed up some future work:

The Alluxio version currently in use in our online environment is the result of the 0.8.2,spark streaming calculation, which is currently only synchronous to the underlying storage system (HDFs in our case), we have tested Alluxio 1.0.1 and are ready to launch a new version, Thanks to the active development of the Alluxio community, the performance of the new version has increased in many ways.
We plan to migrate Flink's computing tasks to Alluxio, and we are also planning to modify the Presto so that they can enjoy the same capabilities of Alluxio high-speed data sharing across compute engines;
Because Alluxio can easily integrate with existing storage systems and improve the performance of top-level services, we will also promote Alluxio to more lines of business, such as batch tasks for analyzing log data.

Original link: http://geek.csdn.net/news/detail/77491

Where to go network Big Data stream processing system: How to use Alluxio (front tachyon) to achieve 300 times times performance improvement

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More