Beatles note-Distributed Data Stream Analysis Framework (2)

Source: Internet
Author: User

 

Author: Fang Weng (beginning of this article)

Email: fangweng@taobao.com

Mblog: weibo.com/fangweng

Blog: http://blog.csdn.net/cenwenchu79/

 

Local Design

First of all, it should be noted that this part of content is different from the first article. You must check the code to understand the meaning of the content. Simply looking at the design implementation will be more difficult to understand the details mentioned in it. Code: https://github.com/cenwenchu/beatles

 

Icomponent:

Building a system is like building a building block. These components will be assembled at the end, and the building blocks often need to be configured due to the customization of internal mechanisms, the building blocks that are combined at the same time often have a passed config (which can be considered as a static context ).

 

 

Inode:

 

 

The node design is relatively simple. It is a runnable single-thread loop body and has a built-in single-thread event monitoring distributor. The main thread of the node is mainly responsible for processing the events generated by the node itself (conventional known event processing): The Master maintains the status of the task list and performs some actions according to the task execution status, slave means repeated acquisition, execution of tasks, and return results. The event listener in node is mainly used for external message-driven event processing (occasional event processing). For example, the master receives a Server Load balancer request and the external requests export and load intermediate results. Here we will find that a single thread Blocking Check is used to obtain the event:

1. multi-threaded concurrent check events require concurrency control for the event bearer (Queue), that is, they are still serialized in the process of obtaining events, therefore, most event processing frameworks use a single thread for event checks, which is simple and efficient.

2. A single thread can be used to check events, but the execution events will be processed by multiple threads or the main thread directly, depending on the event execution speed and reliability (external dependency). If the event can be executed quickly (no external dependency, the logic is simple), and the check thread is used for direct processing (NIO directly processes the connection establishment and destruction in the main thread, where the master directly processes the First Half of the task event ), if the event takes a long time or depends on the stability and processing of the external system, multi-thread asynchronous processing is required (here many are internal components to ensure fast return of method execution, for example, all the methods of jobexporter are processed asynchronously by Internal Threads and the interface is returned quickly.) If a result receipt is required, you can use the callback mode or directly submit a new event (with a context that can be followed by the previous processing) to the event processing engine.

3. another method is to place events separately to improve processing efficiency, such as timeout events and common external events. Be sure to avoid judging the event by polling the object status, in addition to timeout, try to create real events by using the built-in event generator for the object state change operation. In this way, the event handler only needs to block the wait event, the complexity of event processing is still O (1) when the State scale in the system increases ). Slaveconnehas a more detailed introduction to timeout events. This is skipped. Whether or not different service processing needs to be placed in different queues for different single-thread processing depends on the speed at which system events are generated, just like multiple selectors can be processed in NIO, because multiple single threads guard the queue, if the utilization rate is not high, it is also a burden on the system. Therefore, you can make a configurable way to provide it to the outside (Beatles does not provide configuration, just one thread, because it is not a high-concurrency Web Front-end system, the message volume distribution will not be very intensive if thousands of slave connections are connected, after all, the task analysis itself will consume time ).

Masternode has two components: jobmanager and masterconnector. One is responsible for upper-layer business processing and the other is responsible for message communication. During masternode operation, the two components actually need to be coordinated. For example, masterconnector may receive messages, it must be submitted to jobmanager for processing and returned results. In order to ensure that internal components are not mutually dependent (masternode becomes a mesh structure), masternode is used as the intermediate message transmitter and is mutually driven by events or callbacks, at the same time, the context (using the channel as a structure of the event for subsequent message return) is used to transmit some environment information. It should be noted that this decoupling approach will inevitably bring about a reduction in performance. Therefore, it can be the same as the aforementioned multi-thread or single-thread event processing, and do not blindly rely on the message mechanism, use as needed. For example, if connector submits an event to masternode and masternode receives the event and calls jobmanager for processing, the event mechanism can also be used to drive the master to call connector, however, masternode can also be directly implanted into jobmanager and directly processed in reverse mode. The key here is whether you need to release your current thread so that tasks can be done asynchronously, the current thread can be recycled for more processing, resulting in thread switching and event-driven consumption. However, in general, letting the component host complete interaction can reduce coupling and complexity caused by inter-module dependency.

 

 

Filejobexporter

This class is mainly used for file output, but the code in the output part contains the lazymerge part. The so-called lazy merge refers to part of the entry <key, the result of value> depends on some results after processing. For example, the success rate is the number of successes/total. As an analysis system, if the <key, value> of the number of successes needs to be saved for a long time, and the <key, value> of the total number needs to be saved for a long time, do you need to calculate and save the success rate <key, value> in the memory before the final report is generated? In fact, this is not only a waste of CPU resources, but also a large amount of memory resources. At the same time, passing slave to the master will increase the network Io consumption. In Beatles, export is the last step. Therefore, computation and export are performed at this time. In many of our systems, consider whether many intermediate results need to be output or retained in the last step (not necessarily in the last step, depending on the cost, if there is a lot of computing to be done in the last step, you can use the memory for computer computing to reduce the pressure on the final export. If the calculation is not large during export, however, if the system's overall processing memory resources are insufficient, processing will be delayed ). In many cases, we need to consider the cost of repeated computing and the benefits of memory savings. If the computing nodes are scattered and large-scale, you can reduce the cost of centralized processing by taking into account external computing capabilities (for example, many frontend processing results can lag behind client processing rather than centralized processing on the server, data serialization on the open platform is pushed back to the business cluster for processing, rather than unified processing on the open platform)

 

Jobmanager

Because masternode is called by a single thread, it is very easy to change the job status (without concurrent control and atomic operations). However, masternode can be extended to multiple threads for processing in the future, therefore, the processing mode of atomic operations is temporarily retained.

1. for object State management, if the object has many layers and is flat as much as possible, it is like directly saving taskstatus, which is conducive to checking and atomic operations, the problem is that the status of another part of the object is synchronized (the status in the task). In short, the modification of the two data structures must be transactional and the practice is relatively simple, fine-grained atomic operations simulate lock contention. For example, to modify the task status, you must first concurrently modify the taskstatus data (if (statuspool. replace (taskid, jobtaskstatus. doing, jobtaskstatus. done). If the modification is successful, you can modify the data in the original object. In fact, if a single thread does not require concurrency control (because the concurrency mode is still somewhat consumed ).

2. an important part of the event-driven model is that the event state must be changed after all necessary operations (that is, the event creation). For example, when the master receives the slave return result in an earlier version, the result is set to the result attribute of a task in the master, and the status of the task is changed to done. The two actions must be in a certain order, that is, you need to set the content first and then change the status, because if the status changes first, if the external event processing thread finds that the status has changed, there is no lock to ensure that the event cannot be processed before the result is put in, it will be found that the event is being processed, but the content still misses the processing, and thread concurrency issues occur. This is a problem in the source code annotation of this version, and it will be modified later.

3. in the main flow, there is a method mergeandexportjobs, which is used to check the task completion status in the job and determine whether to merge or export the results. The main flow of jobmanager is limited to single-thread processing, at the same time, the internal task status changes at any time. Therefore, all operations and checks in the main process must be non-congested to ensure the processing timeliness, however, if all the operations in this method are converted into asynchronous processing by another thread, the event check I mentioned above will eventually change to serialize under concurrent control, efficiency does not rise or fall, so the uniqueness of the same business data processing daemon process is used (in simple words, it is here to manage multiple jobs in the master, multiple jobs are actually like multiple event queues, therefore, parallel processing is required. Otherwise, there is a risk of mutual impact. However, only one daemon thread can process a single job. Therefore, an event lock is applied to the job to ensure that the same event is parallel between different jobs, parallel processing of different events in the same job (because all events are ordered, although parallel processing is performed, it is still necessary to wait until the last event is completed before changing the internal status to continue ))

4. as mentioned in the first article, this framework is very simple for handling task execution exceptions. It sets in advance the maximum acceptable time for a single task to be executed. If no feedback is received at the time, in this case, the task reset can accept the Processing request of the next computing node. (If the results are returned first, they will be used.) note two points here: the task time can be evaluated based on the granularity of task splitting, in fact, it is often possible to reduce the complexity of solving problems in tasks through task refinement, and reduce the cost of re-performing tasks on computing nodes. On the other hand, you need to set the transparent Number of resetting times to ensure that if the task itself has problems (such as a problem with the data source), it will not cause all computing nodes to fall into an endless loop of processing a single task.

5. code optimization for data merging:

A. when the master node is merged, each job has only one trunk, that is, all task results of the final job must be merged into this trunk. If this is a SVN trunk, you can imagine multiple people (multithreading) it cannot be merged in parallel. When the main thread finds that there are four results to be merged at moment a, it starts to merge the four results into the trunk. Three more results may be returned during the merge process, the three results will have to wait for the next round of merging to begin. At this time, the memory consumed by these three results will increase the burden on the system, and if the system has more slave, the more serious the situation is. Therefore, the following mode is introduced: multi-threaded merge, but the master and virtual branches are performed simultaneously. When merging is required, the master lock is first competed, the thread that obtains the trunk lock merges the results to be merged this time with the virtual branches that have previously been merged to the trunk. If the thread that does not obtain the trunk lock is merged concurrently to the virtual branch. At this time, we make full use of the computing power of multiple cores to compress the memory requirements (the storage requirements will be greatly reduced after the results are merged ).

 

 

B. as you can see from the description in a, the master is stored in the memory during the job execution and merging process. Therefore, when the result set is larger, the master consumes more system memory, whether the multi-round merge of jobs can be finally loaded into the trunk of the previous round and merged with the current round of incremental results can greatly reduce memory consumption, however, the serialization cost and IO consumption caused by content export and loading will inevitably increase the processing time of each round, and the advantage of reducing the time savings caused by GC may offset or even have negative effects. Therefore, asynchronous loading and export saves memory usage and reduces the pause caused by fullgc without affecting the processing. On the other hand, it is also the cost of switching memory with a high CPU idle rate in the two phases. (For this part of code, see jobexporter and jobmanager)

 

 

 

Slavenode:

The full use of the slave single-host CPU can be: one machine can run multiple slave. You can also run a single slave to obtain multiple tasks at a time, so that you can use multiple CPUs to process multiple tasks in parallel.

To reduce the Merge pressure on the master, you can allow the slave to directly output the data, or you can use the slave to request multiple tasks, after executing multiple tasks, merge them locally (the tasks must be the same job before they can be merged). Then, the merged results are sent to the master.

You can create multiple tasks for the same data source to increase the processing speed. For example, if the log of machine A is faster than that of machine B, you can configure it, the two data sources are task of machine A. configure a task of machine B to treat the processing speed differently.

If the processed data still requires secondary processing, you can build the job's data source as the data output location after one processing. After one data output, the second processing will start.

To put it simply, many complicated sharding designs, reduce considerations, and task iterative processing can all be solved through flattening, sometimes it is better to normalize the design that seems to be very fancy with great effort. (The larger numbers are derived from each other)

 

Connector:

This part of the design mainly shields the misunderstanding of the distributed concept. Many distributed designs started with not focusing on the business interaction between the master node and the secondary node, but on the underlying design, in the end, debugging is difficult and expansion is difficult. Like the previous Normalization Design Philosophy, the so-called distributed architecture can be the interaction and collaboration of processes (in virtual machines) and the interaction and collaboration of multiple processes on one machine, multi-process interaction and collaboration between multiple machines. Therefore, how to adapt to these three scenarios makes the design simple, easy to expand, and achieves isolation from interfaces.

 

Event:

Some context design needs to be considered in the event, such as serial numbers to ensure the maintainability of loose interaction sessions and the basic transmission of subsequent operations such as channels. Events are designed to avoid business intrusion as much as possible. For example, although channel is required, different implementation channels are different. memchannel and socketchannel are different. In the future, expansion will be different, abstract Some interfaces (but shell encapsulation may be required for some implementations), or directly weaken the object type.

 

Inputadaptor & outputadaptor:

In addition to the Business Rule self-description, the task self-description also requires the input and output self-description. All calculations are simply input, processing, and output. If the three are clearly defined, in addition, protocol scalability is supported, which is very common for computing nodes. You do not have to create multiple clusters because of service differences, data sources, and output differences, in the end, we found that multiple clusters could not make full use of the resource peaks and low peaks (for computing clusters that clearly need to be protected, we can directly build them, some non-critical computing tasks can be thrown into a cluster) to reduce costs.

 

Job:

It is a set of tasks. It has multiple status bits, which are expressed by multiple statuses (which can be merged into an atomic status bit ), built-in locks are used to control concurrent access to the trunk and allocate daemon processes. (In another pipecomet project, the pay-as-you-go daemon process of the persistent connection pipeline can also be fully utilized ).

 

Operation:

This package encapsulates time-consuming operations into runnable that can be independently executed by external threads. It can be seen that external threads are used for asynchronous execution in the overall code, there are also direct online requests blocking execution, depending on the synchronous Requirements for the returned results. If the synchronous requirements are clear, you can use the asynchronous + lock method to simulate synchronization or directly synchronize, however, the former is costly. Therefore, this type of operation is abstracted and context is passed to construct logical blocks that can be asynchronously or synchronously executed, improving the flexibility of function execution.

The output mode in createreportoperation is space-saving. Let's take a look at how to use row-type records like output reports based on the <key, value> column matrix to maintain a small memory usage.

 

Reportutil:

Is a hodgedge of tools.

1. mergeentryresult. The function that combines multiple matrix results has a lot of memory-saving practices. First, select the first matrix as the base to save the application and merge process, the merged data is deleted continuously during the merge process, saving the Merging cost and releasing resources.

2. compressstring. Try to use irreversible compression to reduce the memory occupied by the Intermediate key during processing. For example, the key of each entry is a combination of several columns, and the key only indicates uniqueness. If compression can be performed without losing its uniqueness, in the end, the output results will not be affected. The short link processing method is used here. (MD5 + 16 or more hexadecimal modes)

 

Timeoutqueue:

As mentioned above, basically all external object state changes can be captured and an event is generated. The timeout event must be actively checked for determination, therefore, when the object data volume increases, the consumption of timeout checks will change to O (n). The partition mode (Time wheel, time slot) is generally recommended) to reduce the impact of n increase, another method is more suitable for situations where the timeout time does not change, just like after an object is put, the time-out period does not change from the initial stage to the destruction. In this case, you can use this class implementation method.

There is a built-in ordered one-way chain or queue, which establishes a sequence before and after the timeout time. The earliest timeout object is placed at the beginning, and the internal thread starts to check each time from the first queue or chain, if timeout is found, the processing continues. When no timeout is found, the time interval between the object and the timeout time is obtained, and then the time interval is suspended. Any data is added during this period. If the timeout time is less than the timeout time of the first object in the queue, join the queue and wake up the check thread (remember not to reverse the order, first join the queue and then wake up ). Finally, add a consumer producer ID to prevent empty queues, so that no empty loop is allowed.

 

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.