Benchmark for transaction Engine Design

Source: Internet
Author: User
Tags types of extensions msmq microsoft c
ArticleDirectory
    • Introduction
    • Processing Model
    • Operation Benchmark
    • Benchmark Results
    • Evaluation and conclusion
Benchmark for transaction Engine Design
Source: www.microsoft.com Author: Richard Drayton

Summary : Performance may be one of the least-understood and Incorrectly cited standards in the current computing field. Technical staff and Applications Program A common practice for suppliers is to consider performance as a problem that can be solved by making adjustments that can be securely left at the end of a project or system implementation. In contrast, most users regard performance as one of the most important and critical factors for evaluating whether a system is suitable for a specific purpose. This article describes Microsoft isV laboratories (isV labs) in redimon, WA, as part of a joint project between Capital Markets Company (capco) and Microsoft in October 2002) A benchmark test exercise. Content on this page
Introduction
Processing Model
Operation Benchmark
Benchmark Results
Evaluation and conclusion

Introduction

Performance may be one of the least-understood and Incorrectly cited standards in the current computing field. A common practice for technicians and application providers is to consider performance as a matter of tuning that can be securely left at the end of a project or system implementation. In contrast, most users regard performance as one of the most important and critical factors for evaluating whether a system is suitable for a specific purpose. This article describes Microsoft isV laboratories (isV labs) in redimon, WA, as part of a joint project between Capital Markets Company (capco) and Microsoft in October 2002) A benchmark test exercise.

This project was launched by capco on the basis of a commission from the Singapore Stock Exchange (SGX) to provide business assessment and develop a technical architecture for the exchange's centralized transaction processing utility. This utility needs to provide matching services for the post-transaction and pre-settlement interactions of participants in the securities and fixed income trading fields of the Singapore market. The main processing engine is designed after a process named Software Performance Engineering (SPE) [smith90, smwil02]-in this process, the entire design and verification exercise is modeled from the performance perspective rather than from the traditional object-oriented design perspective.

Capco creates two subsystems, namely STP bridge (a communication infrastructure and exchange gateway) and STE (a scalable transaction engine ). Both subsystems are used in the benchmark test exercise.

Processing Model

Ste's architecture is based on loosely coupled, stateless messaging components arranged in a queue network for high scalability, performance, and extremely high transaction throughput. For SGX, by analyzing their transaction history in the past few years, the transaction processing load is calculated for the worst case of about 600 messages/second; this data proves the highest level of processing required during the last Asian financial crisis in 1998 (when the number of exchange transactions was deemed unusually high. This value is used as the reference/target processing load. In addition, 2000 message/second elastic load targets are set to ensure that the system structure has sufficient margin to cope with future expected trade volume growth and changes on the premise of success.

Figure 1. Processing Engine Model

The decision on loose coupling components that communicate through message queues rather than through more traditional component interfaces (APIS) requires that the message itself (Persistent message stream) Or enrich the message after various processing steps (Temporary message stream) To transmit information between components. Processing components are independent and stateless to a large extent. Compared with the traditional integrated application development method, this will bring many benefits-for example, for a single component, this will reduce software development risks and implementation costs; for the entire processing engine, higher scalability and flexibility. Most general designs only support one of two possible scalability dimensions: up scaling-adding processing capabilities by adding processor resources (memory, CPU, etc, or scale out-increase the processing capability by adding the number of processing nodes. The architecture supports the above two types of extensions.

The overall architecture of the STE processing engine is shown in figureFigure 1.

The architecture consists of many ste components, which are responsible for the subareas that process basic trade business activities. The service process support provided by the engine is achieved by breaking the entire trade lifecycle into a group of related atomic messages. Each market participant generates and receives various subsets of these atomic messages during transaction execution.

This architecture indicates that the business process itself must be expressed by a fully asynchronous and interchangeable operation set, that is, it must be able to process messages in any order. In this way, you do not have to synchronize message processing and business operations throughout the engine (this will make the application quite slow and complex ). Note that synchronization is different from message-related applications in normal processing. Several other "autonomous computing" requirements are met in this processing model. These concepts include untrusted systems, idempotent operations, status management, message-related, context management, trial operations, message cancellation, and transaction compensation 1.

The business-level message stream of the processing engine is partly based on the execution model recommended by global straight through processing Association (gstpa, the Organization recommends a similar centralized utility model for cross-border trade settlement. The SGX processing model may need to establish an operation link with other centralized utilities (such as ststpa). Therefore, the message set used is based on the message sets used by the ststpa, to improve interoperability 2 in the future.

The Business Process in the Singapore market is simplified to four main message types: Notice of execution (NOE), Trade allocation (ALB), net proceeds allocation (Q3), and Settlement Instructions (SET ). Many interactions are involved in the process of these message types. These interactions generate about 35 message variants that constitute the entire business operation model. As indicated by recent events in the financial services industry, the utility may have a considerable range of processing capacity. Therefore, this exercise emphasizes the architecture's ability to support high scalability requirements.

Performance Modeling

Effective benchmarking exercises allow other organizations to repeat the benchmark themselves and achieve similar results. In addition, reasonable mathematical techniques are also needed to support parameters of the architecture model, so as to evaluate the impact of various implementation technologies on performance and decide to use a specific implementation technology.Figure 2Displays the queue network model used for performance analysis in this benchmark. Note that this model works closely with the client's participant access module (PAM. By testing the benchmark in this way, people will feel that the results of this exercise are both reliable and effective, and these results can be used as the basis for case studies, in order to directly process (STP) plans in the financial service industry as part of comprehensive high performance, the queue network model is applied to process post-transaction and pre-settlement information.

Measurement performance

One of the most frustrating aspects of performance engineering occurs when you do not identify performance units that can be effectively compared between similar systems. Performance is a subjective quality. Its absolute value is usually only dependent on the users of the system, not on any system.AlgorithmOr process. This Fuzzy aspect of performance leads to changes in the observed performance characteristics of a specific system even if no accompanying physical changes are made to the underlying technology. The results of any benchmark should be extensively interpreted, and this may cause harm to any comparative analysis of the software system.Hard to achieve performanceIt is a common feature of the benchmark that is almost irrelevant to the user's performance viewpoint. In a large number of released benchmark examples, performance is hard to achieve; the results of these benchmarks usually mean a high level of performance that is completely impossible in the real world. For example, the ADSL Connection provides a theoretical download speed of 512 KB, but is actually limited by contention with other users on the same vswitch. Therefore, the more practical way to compare performance between systems is to establish a measurement standard for easy reproduction outside the test environment and meaningful operations from the perspective of the System user. To avoid these types of frustrating issues during the benchmark test, practical technologies are required to provide a supported and reproducible number for the performance of the system that has been achieved. The selected technology is based on the work of buzen and Denning [bden78], so that credible performance figures can be obtained based on reasonable mathematical principles.

Operation Benchmark

The purpose of the benchmark is to verify the rationality of the architecture design above the Microsoft platform technology, as well as to establish a basis of credibility in an environment that is highly relevant to the operating needs to be presented to potential system users. To achieve this goal, you must first use the capacity and process loading level consistent with the final implementation of the utility for Benchmark Testing, the benchmark test is then performed in a manner that can be reproduced both in the lab and in the field. For lab exercises, a set of test management components are created for end-to-end analysis. These components include an extensible message driver and a corresponding scalable message pool with performance metrics obtained from the preceding two endpoints. Performance numbers are calculated based on the time required to fully process known messages that constitute a known number of financial transactions.

Figure 2. queue network model

Buzen and Denning Methods

Buzen and Denning [bden78] describe a practical technique used to evaluate the performance of the software system of the queue network. Essentially, they believe that you can use a simple and easy-to-measure queue length over a period of time to establish all the performance metrics they have defined for the queue network. The implementation of the buzen and Denning technologies is as follows:Figure 3Network Model execution diagram.

A simple queue model can be considered as a queue together with the associated server, suchFigure 3The message arrives (arrives) from the inbound queue on the left, is processed by the server, and then leaves (finished) from the right ). The buzen and Denning technologies involve sampling the queue length of the server at a fixed interval, recording the number of messages still in the queue, and continuing to observe for a period of time.

The figure obtained after observation is called an execution chart, which clearly shows the arrival (that is, the increase of the image height) and completion (that is, the decrease of the Image Height) of the message ). Use these numbers, buzen and Denning from J. d.C. the original result created by little (called Little's law [littl61]) derives a set of formulas to determine all required performance metrics. The summary of these formulas is as follows:Table 1.

Table 1. buzen and Denning Formulas

Measurement performance

To complete performance measurement, a set of standard business messages are required to represent the average business process lifecycle of the utility. The standard "transaction" is to use a NOE (Notice of execution) message, two equal sign (Trade allocation) messages, two net proceeds messages, and two sets (settlement instruction) message created. Therefore, the standard transaction contains seven basic messages, which together with the necessary response and validation messages between the participants constitute the entire information set for the test. After the transaction is processed, the system generates another message, called the CLS (Clearing and Settlement) message.

In the gstpa specification, as many as 17 different elements must be matched in a message, the entire transaction can be considered valid. These requirements vary depending on the specific message type to be matched, but sometimes all 17 matching conditions must be applied before the transaction can be processed. The complete matching condition set is displayed onTable 2For reference.

Table 2. Message matching conditions

Figure 3. Simple execution Diagram

The matching process needs to scan the message database matching table when receiving each message to find all elements of the specified transaction in each message. A valid CLS message is sent to the out-of-site queue of the test system when seven matching atomic messages are obtained. In addition to the matching process, the Business Process in the Singapore market also requires verification of the validity of certain message items, such as currency and country/region names, and securities references.Code, Payment and settlement date, and participant identification information. Create a static data set to form the basic reference data of the utility, as shown in figureTable 3.

Table 3. Static Data Parameters

To establish benchmark results, a test message set is generated using static data. The message generator component is used to randomly select the appropriate static data and then combine it into a standard transaction consisting of seven independent messages. Two core message sets are created. The first contains 250,000 transactions (1,750,000 messages) and the second contains 1,000,000 transactions (7,000,000 messages ). A normal business cycle requires that all matched or unmatched transactions be retained in the system for three days and then cleared from the system database; however, we decided to keep all processed transaction information, in order to study the deterioration of database performance with the increase of capacity.

Message sets are stored in standard text files in XML format to prepare for transmission to the processing engine. A management component is built to monitor benchmarks and obtain the required performance metrics. This includes a message driver component, a message pool component, and an additional component (This component monitors the queue length and internal queue length of the inbound and outbound message queues at a fixed interval ). The message driver component processes Test message set files in sequence and applies a digital signature before sending them to the processing engine's inbound queue.

Each message driver that runs independently and does not compete with other processes in the system can process up to 8000 messages/second. This number is very similar to Microsoft's benchmark results for MSMQ products. The message pool component reads the information sent to the CLS queue and monitors the time taken by the system to process a specified number of messages. The monitoring component is onFigure 4.

Figure 4. Benchmark Monitoring Device

Hardware environment

The nature of the architecture recommended for processing utility tools helps to achieve high performance with multiple computers. To eliminate resource contention within the entire architecture, it is better to use multiple single-CPU computers than to use a single multi-CPU computer. The hardware used isTable 4The four instances of the computer set shown in.

Table 4. Basic hardware environment (X4)

To scale the processing capability horizontally during the benchmark test, we deployed multiple copies of the basic hardware environment. BesidesTable 4In addition to the hardware listed in, eight single-CPU computers are used to host the message driver, and the monitoring and message pool components used to record the benchmark results. The database is placed on the Compaq MSA 1000 RAID (Redundant Array of cheap disks) storage device that is configured for level 0 support (maximum throughput, minimum recoverability. Because the Business Solution of the exchange utility requires local (client) database support for each participant connecting to the utility, disaster recovery site, and dual-system redundancy, therefore, compared with the performance improvement provided by raid level 0, loss of RAID recoverability is a small cost. At the beginning, we thought that a single database storage device could manage the entire transaction capacity. However, during the benchmark test, we quickly realized that the internal disk queue had reached an extremely high level. Later, we will learn how to avoid this problem.

Software Environment

For various hardware platforms and software operating environments that support software componentsTable 5. Although Windows Server 2003 RTM was not released at the time of testing the benchmark, its candidate product (RC1) is ready for use and is considered stable and complete, can be used as a valid part of the benchmark. All applicable service packs are applied to this operating environment, including any third-party drivers used for peripheral devices.

Table 5. Software operation environment data item reference

Extended Architecture

For benchmarking, the upward scaling model involves Executing software components and Code on 64-bit itanium processors on 4-CPU computers (two more than the basic processing nodes. Although the itanium computer contains only two processors, the available bus and I/O bandwidth are much higher than the standard 32-bit platform, and the results are indeed encouraging. We don't have time to thoroughly study the extended model. The outward scaling model we use needs to increase the number of processing nodes: Increase the number of Processing engines from one to eight, and increase the number of database processors from one to four. We spent a lot of time studying the outward Scaling Model.

Software Processing

The ste engine component is written in C/C ++. They use an exception-based test method to process XML messages: assume that the message content and format are correct until errors (data loss or incorrect) are detected. When an exception occurs, the problematic message is sent to the specified exception queue for further processing. Subsequent troubleshooting is not included in the benchmark test. Verification of each message content is performed on static data that has been pre-loaded into a memory-based array. The verification process involves organizing the required static data into the sorting array (using C/C ++QsortFunction), and find the items in the array (using C/C ++BsearchFunction) to verify the existence of XML elements. The data element in an XML message uses the Standard C/C ++StrstrFunction access.

Benchmark Results

The benchmark has produced some interesting results. Some results validate the application design, while others lead to architectural changes to address identified performance issues. The basic lesson we have learned is that we usually need to adjust to achieve percent-level performance improvement, and we need to redesign the architecture to achieve performance improvement by an order of magnitude.

Message Queue Processing

The benchmark is conducted at two basic levels: the first level has 250,000 transactions (1.75 million messages), and the second level has 1,000,000 transactions (7 million messages ). All processed information is stored in the database as an additional processing load in the system.Figure 4The device shown in is used as the basis for the benchmark evaluation. When we started the message driver component for the first time, we noticed that there was a long lag time, mainly because of contention in the single queue manager that processes the message stream.

The insert processing rate is so high that the single Queue Manager process does not have enough time to transmit messages to the remote processing computer over the network during each running startup. The consequence of this problem is that it takes several seconds for the initial transmission, and then the performance monitor can get any activity on the processing node of the system.Figure 5Explains the contention process for this queue manager. In practice, it is almost impossible for millions of messages to arrive in the utility as a single input. Therefore, the lag effect can be ignored for benchmarking.

Figure 5. Single Queue Manager contention Process

This effect is common for all asynchronous message transmission systems, rather than limited to MSMQ for reference purposes. When a large number of messages reach their inbound Queue (remote host message buffer) as a single input, similar effects can be observed in the processing component. This means that during the initial startup, the processing components will be pushed to about 100% of the load, and then tend to be stable after the message flow through the computer is stable.

Figure 6. processor time of a typical component

The output of a typical performance monitor on a single processing node is as follows:Figure 6. In the response curve of the combined dual-processor computer, you can clearly see the effect of the initial message explosion.

In addition, in the second half of the response graph of the host computer, you can see the effect of the inbound message processing in the message buffer of the inbound queue manager. Here, adding resources (or more accurately reducing contention) can increase the utilization of processor capabilities in the later stages of the processing cycle. To offset the consequences of reaching the Message Queue buffer limit too quickly and generating uneven processor utilization during message injection, we have added a larger number of inbound queue processes and inbound queues. In this way, the incoming message load can be aggregated into more resources to reduce the latency involved.

Perhaps the biggest difference is that MSMQ 3.0 is used instead of MSMQ 2.0. The former has a 4 GB memory buffer size limit before the new buffer allocation, which is three orders of magnitude higher than the size of the 4 MB buffer in MSMQ 2.03.

Multi-node Processing

In the original design of ste, a single database is used. As the number of processing nodes (dual-processor computers running component software) increases, we noticed a significant decrease in the overall processing rate. The decline in processing throughput is shown in figureFigure 7As shown in, the reason is the internal competition for use of database components. The reason for contention is not the operating system or component software problems, but the excessive disk queuing in raid arrays. This means that the available bandwidth used to transmit information to a raid array disk cannot meet the requirements of ste's software elements.

This effect is most likely to be seen when the analysis inserts information into the database. The performance of a single database server is shown in figureFigure 8. Here,Figure 7The corresponding performance graph of the database server shown in shows that the insert operation on the database decreases sharply, because disk queuing occurs in the raid array as the number of processing nodes increases. The competition for available resources caused by this queuing phenomenon means that in the original design, the system cannot reasonably cope with more than two processing nodes. The next section discusses how to overcome this problem.

Figure 7. Processing rate of each node

Figure 8. Processing rate of each node (single database engine)

Server Hash Algorithm

After discussing with the Microsoft SQL Server team the technology of increasing available bandwidth for disk operations, they suggested that we use the hash algorithm to integrate multiple database servers into the overall solution. The purpose of the hash algorithm is to use a unique key for transactions derived from message data, and use the corresponding hash function to generate a single value that can be parsed as a unique database server instance.

We select a hash algorithm to reflect our business needs, that is, we always send the messages of specific transactions to the same database server. For the benchmarkTable 2The subset of the matching condition defined in. This key is constructed by concatenating the selected subset of the matching condition value and converting it into a single very long integer. Then, the binary value that represents the number of recommended database servers (or instances) is divided by this number, as shown in the following formula:

Using this formula 4, we use hash to join multiple database servers (or instances), which effectively improves performance. We modified the infrastructure architecture of the recommended utility to reflect the fact that the hash algorithm contains a multi-database solution, as shown in figureFigure 9. For continuous benchmarking, a maximum of 16 database servers are supported within the software component. However, the system tests a maximum of four such servers. The hash algorithm is used to distribute the database load to four database servers to repeat the above tests, resulting in impressive results.

Figure 9. Final hardware infrastructure solution

Little's law Curve

The performance metrics determined by buzen and Denning are based on the Basic results produced by J. d.c. Little [littl61. The general performance features discovered by little are as follows:Figure 10.

Figure 10. Generalized little's law Curve

For any queue model, when processing a specific number of tasks, the response time increases with the arrival rate (for the balanced queue model, the completion rate. The characteristic of this curve is that it is almost linear at an earlier stage, and then gradually becomes more similar to a nearline as the input (and completion rate) increases.

Figure 11. linear expansion through multiple database instances (single database engine)

The first result that can be observed (Figure 11It indicates that when the bandwidth of the available database is increased by 400%, the system is in the linear part of the performance graph. As the processing component expands from one node to four nodes, it produces almost linear response features. In fact, the measured results show extremely linear expansion between one and four processing nodes, with only a very small deviation from the observed linear model. However, if the input rate increases (in this example, by increasing the number of processing nodes to 8), we can observe the deviation from linear expansion.Figure 12This effect is determined.

Figure 12. scalability of the performance curves of eight processing nodes

Using the measurement results, you can plot the little's law curve for the test queue network model, as shown inFigure 13. The results show that when the number of processing nodes exceeds four, as the number of nodes increases, the operational performance of the queue model is getting worse and worse, when eight processing components are added, the operation performance decreases significantly.

In this case, it is worth noting that,Figure 12The scale on the left side of the graph shows that the throughput of the whole ste queue model is about 7734 messages/second. Obviously, the next extension option is to increase the number of database servers to 8 (that is, the next available binary multiples ). In this way, as long as the number of processing nodes used is appropriately increased, we have reason to expect that the message processing throughput exceeds 15,000 messages/second.

Figure 13: measured little's law Curve

Performance decreases with the increase of storage volumes

At the measured processing rate, the queue network can obtain a stable rate of more than 27,842,400 messages/hour (that is, 222,739,200 messages/Business Day. Can such a high processing rate be measured within a few minutes? It is reasonable to have such questions. To determine the characteristics of this model with the increase of storage volumes, we use the processing load of about 2,000,000 messages as the basic number, and then use the processing of 7,000,000 messages to run, in order to determine the impact of increasing database volumes on the overall performance. The measured message throughput decreases with the increase of the volume, as shown in figureFigure 14.

Figure 14. database performance decreases with the increase of storage volumes

Here, as the volume processed reaches the target of 7,000,000 messages, the processing rate drops to 5,500 messages/second. Even at this extreme level, the queue model reaches the throughput rate of about 19,800,000 messages/hour (that is, 158,400,000 messages/Business Day. In addition to the linear approximation shown here, the granularity of the results cannot be more accurate measured on the performance reduction effect. If more fine-grained observation is performed, we can see that the rate of performance degradation is a curve rather than a straight line, which indicates that as the volume increases further, the performance degradation effect will be reduced (this may be a feature of the paging architecture used for the B-tree structure of modern RDBMS ).

The 7,000,000 messages processed during the test indicate that 1,000,000 transactions were processed in a very short period of time. It is worth noting that there are many examples of existing transaction engines in the financial service industry, although they use technology with a rated performance level higher than the Windows computer and Intel-based computer used in the above test, they cannot reach this level.

Buzen and Denning results

To determine the performance of a single component, You need to determine the buzen and Denning metrics. Monitoring process measurement is used to measure the length of each processing queue in the queue network model and the computing performance. An example of the calculation result is as follows:Table 6.

Table 6. Calculation of sample buzen and Denning results

This example shows about 398,573 messages processed through the queue model (taken from one of the two processing nodes as an example ). The master computer that supports 7 software components (one Noe component and one for each of the C, N, and set software modules) is monitored at intervals (u) determined by the Windows performance monitoring tool) the utilization rate reaches an average of 93%. For throughput computing, it is important to remember that for each of the following types of message types.

Therefore, the processing network obtained an average throughput of about 1,162 messages/time units during the test, with a latency of about 14.97 seconds. In this case, latency refers to the time difference between input and output message processing. At the measured processing level, messages entering the network will appear at the output end after about 14.97 seconds.

Figure 15. performance curve measured

The sampling rate is set to a 7-second interval (because an uncalibrated exit interval is used), and the sampling data set is 500,000 messages (two message drivers, each message with 250,000 messages from the standard test dataset ). In this example, there are two Processing engines (or the master computer), and the displayed results are taken from one of the two Processing engines (note that, the results of the C, C, and set components are aggregated in the two components, because the raw data is determined based on the message queue used to provide messages to the two components ). The slightly higher service time (s) of NOE messages indicates that the component must have a higher processing level because of the persistence of audit leads and verification processing.

Figure 15Display the performance curves of each processing component and mark the overall performance points for greater clarity. This is basically the general little's law curve of the STE processing engine. However, this general view does not provide all the details required to accurately predict the engine's operational performance. Obviously, there are different completion rates for a single component. Therefore, there are different performance values based on the standard view of the expected performance measurement. This results in the performance operation area (por) of the network, suchFigure 16In the shadow area. In this particular instance, the results are very close, so the corresponding performance operation area is very narrow.

However, this is not always the case. In some system examples, por covers more than four times the average performance level. Predicting por requires some complex mathematical knowledge beyond the scope of this benchmark exercise; however, this includes the impact of por to explain changes to the results measured during repeated tests.

Figure 16. The measured little's law Curve

Evaluation and conclusion

The impressive results of the benchmark are self-evident, and the overall performance, scalability, and flexibility can be well determined. The throughput of the entire engine undoubtedly makes it one of the largest transaction engines in the financial service industry in the Post-transaction infrastructure. The target and elastic performance levels can be appropriately slightly exceeded, and there are strong indications that the overall architecture method can support larger message throughput. Undoubtedly, it is worth noting that Microsoft's current technology can be operated at the enterprise layer of any financial institution by virtue of the obtained performance level. Some aspects of the internal operations of the enterprise layer (such as elasticity and reliability) are not tested in this benchmark and need to be proved in this design. However, such a network-or grid-based architecture has inherent characteristics and can support extremely high-level elasticity and reliability. Therefore, the use of efficient grid-based processing computers and low-cost software technologies seems to be an attractive combination.

Low cost and efficient implementation

One of the more important results of the entire benchmarking process is that Microsoft technology has now proven to be useful and scalable to the enterprise level. The processing rate obtained through the queue architecture undoubtedly enables Microsoft operating systems to have high functions and scalability in the network computing field. The second most important result of this test is that the cost of implementing the system in a Microsoft environment is quite low.

Potential Improvement Areas

In addition to the monitoring tools used to refine the benchmark results, Microsoft also provides other process monitoring tools that will be provided in Visual Studio 2005 to provide a detailed view of the execution of software elements. Microsoft Analysis tools show that software components spend an average of 30% to 35% of their time to execute functions related to data extraction from XML messages. This is not an overly surprising result, because the main function of software components is to verify and process string types. To access required information, we use C/C ++StrstrSearch for functions and treat the entire message as a complex string. (Note: For our problem domain, this is faster than directly using XML analyzer and DOM objects with XSLT .)

Although in a common environmentStrstrIt can provide sufficient performance levels, but there are other more efficient technologies that can be used to extract information from string-based message structures. R. S. Boyer and j.s. Moore [bymr77] describe a very efficient mechanism for searching strings with good structures. When the string structure is known in advance, and the string is mainly used in applications that need to search for large text strings (such as a digital library application, editor, or word processor, this algorithm works well.

This algorithm seems inappropriate for the queue network, because we cannot determine the nature of the next message structure received within the network. However, for the processing component, the structure of the message is known, because we route the message by type for further processing. The Boyer-Moore algorithm can be used to improve the existing implementation of the network. However, for the Boyer Moore algorithm, XML messages may be too small (1,500 bytes per message on average ), as a result, this algorithm cannot produce results that make the work required to implement it seem valuable.

Itanium and 64-bit processing

The operations of the queue network model were performed using the test version of the new (at least at the time) itanium 64-bit hardware and SQL server running on Windows operating systems. Although this environment cannot be accepted as a production benchmark environment (because testing software is used ), however, the results can indicate the performance gains from using Intel/Microsoft technology in the future. On this hardware, the measured throughput is an average of 872 messages/second. Considering the testing environment, we think this value is very high. First, the result is obtained using the standard SCSI disk unit instead of the raid array used in the primary benchmark exercise. The performance of standard SCSI is much lower than that of raid. Second, the itanium Database Server only has two processors installed, while the database engine in the benchmark test uses eight processors. We are looking forward to the opportunity to perform a complete benchmarking in a 64-bit environment.

C # and managed code

We have also generated software components to host the code environment in Microsoft C # (in this environment, you can directly compare the C/C ++ and C # versions). As a simple test, we compared the operations of the message driver between the operation models. The process involved is quite simple, so that the effects of inefficient coding can be ignored (there are very few lines of code that actually work ). In this process, you need to obtain the prepared message file and stream the data into the code loop. Processing continues until the message separator is received.

Then, wrap the received message to the standard ststpa header and apply a digital signature to the Message Block. Next, write the message to the Message Queue for processing by the queue network model. This process continues until the entire file is read. The parameter we are interested in is the throughput for reading messages from data files and queuing them. The test result is displayed inFigure 17. [These results clearly indicate that the performance of C/C ++ on the managed code environment (. NET Framework 1.1) is improved.]

Likewise, the following facts are taken for granted: The results also include comparison of the efficiency of the interoperability layer between C/C ++ and C # (which needs to be crossed when accessing MSMQ. At first glance, some may argue that from a performance perspective, it is never necessary to implement a hosted code environment to replace C/C ++ installation. However, this view may be misleading because all system solutions are a compromise between cost, performance, and reliability.

The overall performance results of the managed code environment reflect the tests performed on (simple) messaging driver components that generate approximately 2000 messages per second. Although the throughput is about 25% of the basic C/C ++ level, there must be some compensation factors to consider. The production efficiency of C # code is much higher than that of C/C ++ code. In fact, the speed of using C # to develop and operate code is impressive.

Please note that the low throughput of 2000 messages/second (equivalent to 7,200,000 messages/hour or 57,600,000 messages/day) is still considered to be within the high transaction engine benchmark range, and as managed code becomes faster, it will only become better.

Figure 17. Comparison between hosted and unmanaged code

In addition, be careful when comparing hosted and unmanaged code environments. Use a virtual machine environment (such. the Common Language Runtime Library (CLR) used in the. NET Framework )) even the Java/J2EE-based VM environment can generate a baseline number equivalent to the C/C ++ Code involving memory-based operations. Unfortunately, such a benchmark may lead to an incorrect impression on the expected overall performance level, most applications contain local and remote I/O operations and dynamic object creation and deletion.

On the contrary, the managed code environment is easy to implement and can be improved in terms of reliability and manageability, this allows it to be superior to C/C ++ when creating applications that sacrifice performance in exchange for lower costs and faster implementation speeds. Assuming that we have important performance hard to achieve for the STE system's unmanaged code implementation (refer to the above discussion on this topic), since the architecture can be expanded up so brilliantly, we are in fact in a very favorable position, that is, we can compromise this hard-to-achieve performance to gain the benefit of using managed code in the future implementations of this general architecture.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.