Distributed multi-Computer scheduling platform

Last Update:2015-04-15 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Scheme:

1,) at present, our program, a single computer analysis of 100G of XML data within a day, there is a choice of data to be required in the database (sqlserver2008 R2 64) records nearly 100 million, a 128G of memory, 32-core computer barely able to complete the task;

2,) as the market expands, the amount of data we get is about 1T of XML data per day, and a single computer running has analyzed the completion time as a bottleneck, which can take 10 days or more.

Solution:

In order to enable our products to have a stronger survivability, to attract more users, the project team has a discussion:

Scenario 1, using Hadoop for this big data processing, but due to the current limited depth of the company's understanding of Hadoop technology, normal application to the product still need some time, so the Hadoop scenario is only as low as the implementation level, but not too table we will not do, the time of the problem.

Scenario 2,) further expansion based on our current platform, how to expand?

2.1,) Let our tools run on multiple computers, splitting tasks into different computations. Assuming 1T of data, we have 10 of computers, each to the average distribution of 100G of data, so that the pressure on the database and computer will be less, scale-out is a must-have program that we can not immediately go online at present;

2.2,) After the tool expands, the storage database must also need to expand, each computer best can correspond to one storage database, to the business realization and the database pressure offloading, all has the benefit.

2.3,) After the database is expanded, how the application side merges the data becomes a must have to consider the problem. So what do we plan to do with the merger? First of all, the server on each compute node, while inserting the necessary data, more business needs to insert the required data into the summary database, and the details are only saved to the corresponding compute node corresponding to the database, the application side directly access the database as a summary database, but to view the specific information of a piece of information, The data stored in the information can be found in the information, and then the detailed information is obtained from the corresponding database.

Decided:

Program 2 has passed the research, how to implement? How difficult is the implementation? What's the technical problem?

Other problems first, talk about the technical difficulties, since the distribution of multi-calculation execution, there must be a scheduler, and the difficulty of the scheduler everyone is clear------heartbeat monitoring task execution status, communication stability, efficiency, accuracy, Message Queuing How to plan?

Distributed multi-Computer scheduling platform

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Distributed multi-Computer scheduling platform

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support