Disco is designed for easy integration of larger applications, such as Web services, so that tasks that require computation can be delegated to a cluster that is independent of the core application. Disco provides a very compact Python api--typically only requires two functions, as well as a restful web API with work controls and an easy-to-use state monitoring web interface. In addition, Disco has a simple work agreement that allows the job to be written in any language to implement the Protocol.
Disco is very friendly to the Linux Business Server cluster, and new nodes can be added to the system dynamically by clicking the Web interface. If the server crashes, it can automatically reassign the failed task so that there is no interruption. With the help of automatic configuration mechanisms, such as automatic installation, or even maintenance of large clusters, there is only a small amount of manual work. As a proof of concept, the Nokia Research Center in Palo Alto uses disco as a setting to maintain a 800 core cluster.
This blog all content is original, if reproduced please indicate sourcehttp://blog.csdn.net/myhaspl/
• Proven scale of hundreds of CPUs and thousands of simultaneous tasks
• For processing tens of thousands of terabytes of data sets
• Easy to use: A typical task consists of two functions written in Python and two calls to disco API
• You can specify tasks in any other language by implementing the disco work agreement.
• Input data can be in any format, even binary data, like. Data at any source can be obtained or distributed to the local disk via HTTP
• Fault tolerance: Server crashes do not disrupt work. Ability to automatically reassign failed tasks
• Flexible: In addition to the core map and reduce functions, the composition function, the distribution function and an input reader can be provided by the user
• Easily integrate larger applications with standard disco modules and Web APIs
• With a built-in distributed storage System (Disco Distributed File system).
Data input:
Disco can be decentralized calculation, need to ensure that data can be divided, in general, the data into the Ddfs file system, the file system is similar to HDFs, is a distributed file system, can handle the allocation and replication of data.
The road to mathematics-distributed computing-disco (2)