The tremendous scalability of intelligent service contracts

Source: Internet
Author: User
Tags garbage collection

It was a sunny June 2005, looking at the new Order system for two years in the production environment online, we are extremely excited. Our partners started sending orders, and the surveillance system told us everything was working properly. One hours later, our COO sent an email to our strategic partners telling them they could send the order to the new system. After five minutes, a server was down and two servers were paralyzed in a minute. The client started calling, and we understood that we would not see the sun for some time.

The system, which was meant to increase the profitability of strategic partner orders, collapsed. The COO had to send emails to their strategic partners again, but this time they were allowed to use the old system. Oddly, even though we have a backup server, a small number of orders from our strategic partners have overwhelmed one server. Systems that can be used by a large number of general partners cannot cope with a handful of strategic partners.

It's a story of how we've made mistakes, how we've corrected them, and the last thing we've done successfully.

"Best Practices" is far from enough

While we were designing the system with the best practice documentation from many vendors, using stateless request processing logic, tiering, tiered deployments, separating OLTP and OLAP servers, no one ever told us that the system was going to deal with different types of scalability issues. In 2003, we designed key components related to system efficiency. In 2004, we stood the test of load and stress tests. Therefore, we are all confident that we will cover all aspects.

By filtering events that examine server logs and monitoring systems, we find that orders from strategic partners are significantly different from those of general partners. A general partner orders hundreds of items at a time, and there are thousands of lines for a strategic partner to send an order once. The requested amount of data can even reach hundreds of megabytes. Our messaging infrastructure and object/relational mapping code have never been tested for this load. Server Core in order to deserialize all of these XML data has withstood unprecedented test, processing a single request can consume half g memory. The database lock takes up to a few minutes instead of the millisecond level. When the line is blocks until those, the garbage collection mechanism starts to reclaim the memory madly, which is more detrimental to the usability of the system.

The first thing we do is recreate the actual scene in the performance test lab. Every time we test and the system collapses again and again, we can't believe it. I keep telling myself: "We did everything that the book said, how could it be?" ”

As a matter of fact, this is the first company in my job that really gives you the time and the budget to keep everything in your book. We don't have any excuses. But what can you do when a book is far from enough to solve a problem?

Different types of scalability

Finally, it turns out that the number of requests per second is only one aspect of scalability. We've been through pain. Other aspects include:

The size of the message

CPU utilization per request

Memory utilization per request

IO (and network) utilization per request

Total processing time for each request

The size of the message seems to have a big impact on every other aspect. When the message increases, it takes up more CPU time to deserialize, consumes more memory to save the result data, more network bandwidth and IO for database read and write operations, all of these add up to affect the total processing time. However, even small requests, such as discounting all pending orders for a partner, are affected by the amount of data being processed.

We checked all the things and none of them could solve the problem. Unless we make the big news smaller, the problem will always be there. This is a fragment of our conversation:

Dan: "Binary serialization may be useful for a smaller number of strategic partners." ”

Barry: "No, there are five incompatible platforms between them." ”

Sasha: "And that doesn't help with memory and IO much." ”

Me: "Try the compression how?" That will reduce the load on the bottom of the message. ”

Dan: "That would make the CPU more burdensome." ”

Sasha: "Would you like me to repeat the memory and io?" ”

Barry: The request/response doesn't seem to work here. ”

Me: "You know how much I like to publish/subscribe, but I don't see how it can be used here." ”

But as we delve into the core of the message pattern, we stumble across the solution.

The real world is message-oriented.

What surprises us most is that the solution works for both common partners and strategic partners, and has significantly improved both performance. Not only that, it also accelerates the turnaround time of orders and thus improves inventory management capabilities. This is not even our own thought.

In fact, the solution is fairly straightforward-unlike the previous "Create Order Information", a partner can dynamically send us multiple "order information" over time, with the keyword: (partner ID, purchase order number). When all entries for the purchase order number are complete, they can send a "completion" flag to the "order information" that is true. This is a stateful interaction.

You know, there's almost always a purchasing department for partners to issue orders. These orders are added incrementally over time until they are finally "done" and sent to us. Our solution enables our partner's procurement system to send us parts, and not complete, order information while generating the order. They can modify the order information that has been sent or cancel a part of the order without having to know the order number in our system (it is managed by an existing ERP). In fact, we do not call ERP to process an order until we receive information that indicates that the order has been completed.

When we receive the order information, we return a "order status changed" message. If their system does not receive a response within a reasonable period of time they have identified, they can send a message again. In other words, we want to make sure that the message is idempotent. This means that if the partner wants to make any changes to the Product SKU (stock keeping units, inventory unit), it must resend all the lines of the SKU (with a variety of options and configurations)-not much data in fact.

Idempotent messages are messages that, regardless of how many times they are processed by the system, have the same effect as being handled by the system once.

This has a tremendous impact on performance-we no longer need to persist the message without losing it. Instead of always writing a lot of messages to disk, our application protocol allows our partners ' systems to manage the interaction state-just slightly adding complexity to their systems.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.