C + + Distributed computing Class Library

Source: Internet
Author: User

Distributed computing is known as the high-end stuff, and I think every programmer wants to get involved in the field.

In the previous period of time the project encountered big data calculation problems, the general calculation time will be 2-3 hours, or even a whole day. Can I use multiple machines for distributed computing to reduce computational time? The current distributed computing framework mainly includes Hadoop, Google's map/reduce, or some other framework. But these things are too large, and we need to modify the existing program code.

So think of zeromq this guy, quoting the official saying: "ZMQ (hereinafter ZEROMQ abbreviated ZMQ) is a simple and easy to use transport layer, like a framework of a socket library, he makes socket programming simpler, more concise and higher performance. is a message processing queue library that can elastically scale across multiple threads, cores, and host boxes. ZMQ's clear goal is to "become part of the standard network protocol stack and then into the Linux kernel". They are not yet seen to be successful. However, it is undoubtedly a very promising and a layer of encapsulation on the "traditional" BSD sockets that people need most. ZMQ makes writing high-performance Web applications extremely simple and fun. ”。

It turns out that ZEROMQ is good to use.

My approach is to:

1. Use ZEROMQ as the bottom of communication. Use C + + objects as the basic unit of network transmission and use C + + object reflection mechanism to implement object parsing.

2, each task, that is, an object. Task decomposition (map) and the task of the reduction (reduce) are assigned to the consumer itself, because only TA knows the specific algorithm and data of the task. A task contains data, algorithms, and results (pending calculation).

3, after the task decomposition, only need to simply call Domultitask (TaskList, waitTime) can be. The rest of the work is to wait for the calculations to be completed and then to return.

4. After the class library receives the task, it sends it to the primary server, and the primary server uses the load balancing algorithm/least Recently used algorithm to issue the task to the registered worker (worker). When the machine processing is finished, it is returned to the host, the host is returned to the customer, and the result is written to the task's result variable.

5. All calculations are completed and returned with client-side attribution.

Among them, the thing that the distributed framework does is the registration and management of the work machine, the management of the server (routing), the sending of the client task and the result receiving, the type reflection and so on.

The advantage of this is that you do not need to modify the existing program code, just add the task class.

When people work with the same software, TA is willing to choose as a distributed computing client, then it will register the machine with the server and then be used, the program will run a thread waiting for the task.

PS: There is a problem to be solved in the future: it is assumed that the client is the same as the class library of the working machine, that is, the object can be reflected successfully. If a task (object) is transferred to the work machine, but the work machine does not have the reflection information of this object, it can only reflect the data at most, and cannot dynamically generate the algorithm of the task. Can not pass a section of C + + code, let the client explain execution it? The solution I came up with was: if the reflection fails, then the client is asked to send the class library information (DLL) to the work machine, the worker loads the class library, and then the type is reflected.

Source code: Https://github.com/dario-DI/DistributedCompute

C + + Distributed computing Class Library

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.