Algorithm for Wang: and see how Mac Mini goes beyond the 1636-node Hadoop
Source: Internet
Author: User
Keywordsnbsp; algorithm fragment disk
The small Mac mini computing performance can exceed the 1636-node http://www.aliyun.com/zixun/aggregation/14119.html ">hadoop cluster, Even in some use cases it sounds more like an Arabian tale, but Graphchi recently claimed to have done it. To make a long story short, before we look at this feat, we need to understand Graphlab's Graphchi.
Graphchi focus on the distributed framework of small computers
Designed by computer scientists at Carnegie Mellon University, Graphchi can efficiently carry out large-scale computing frameworks on personal computers for social media or network search analytics tasks such as recommendation engines. We all know that the recommendation engine focuses on graph computation and analyzes the relationships between social media users, but such calculations typically require a huge amount of memory, usually on a cluster of computers.
Unlike storing the atlas in memory, Graphchi uses a massive hard disk on a personal computer to store the atlas on a hard disk. The lab director, Carlos Guestrin, learned that to make up for the speed gap between the hard disk and the memory, they designed a faster way to reduce random read and write hard disk access. At the same time, Graphchi can also deal with the "flow map" (streaming graphs), which can establish an accurate large-scale network model by showing the change of the relationship over time.
The war between Mac Mini and 1636-node Hadoop
For the same 1.5 billion-edge Twitter map (after 2010) (triangle count), Graphchi 1 hours to complete 1636 Hadoop nodes 7 hours of work. In recent days, through Rangespan's data scientist Christian Prokopp, we have learned the principle of this transcendence-the ultimate optimization of algorithms and the advantages of a single machine for cluster setting.
Operating Environment
The first advantage of Graphchi is that it simplifies many assumptions and subsequent algorithms without the need for distributed processing. With this advantage and understanding of single machine performance for overall evaluation (strengths and weaknesses), the entire process will be very easy to design. A single machine usually has two characteristics: 1, the large map problem will not be plugged into RAM (Random Access Memory), 2, has a large disk, can handle all the data.
Traditional disks usually do not have random read optimizations, they are only for continuous reading. New age computers may have SSD with faster random reads and writes, although they are still much slower than RAM. Therefore, any algorithm running on a single commercial machine disk still needs to avoid random access to the data as much as possible.
Divide
Aapo Kyrola, a PhD candidate at Carnegie Mellon University, uses this principle to improve Graphlab, a distributed Graph computing framework. His idea is to divide the atlas into different slices, each of which can be processed by the machine's memory. These fragments can then be processed in parallel in memory, and the other slices need to be updated through subsequent sequential writes. This minimizes the random operation on the disk and makes a reasonable use of the machine's memory for some parallel operations.
Aapo invented the PSW (Parallel sliding Window) algorithm to address key performance-enhancing problems, continuous reading and writing to disk. PSW sorts all vertices in 1 slices through the source shards, which means that each fragment is essentially divided into blocks of vertices, which are associated with other fragments.
For example, in Interval 1 (above) Shard 1 is being processed in memory, which is a subset of vertices to the end of the vertex. These target vertices are contiguous blocks of sorted source values in the remaining slices, so they can be read sequentially. All updates are computed and stored in memory for Shard 1, which is then sequentially written to other fragments, and modifications are made before reading. Eventually, the updated version in memory is written to disk sequentially. In Interval 2, shard 2 is loaded; Of course, the same method is applied to other fragments.
This method fully utilizes the architectural features of the new commercial computer, as illustrated in the original paper. For example, splitting the data on different disks, and using SSD instead of traditional disk will no longer double the performance, because the algorithm has greatly improved the high permanent storage performance. Even increasing the number of fragments will have little effect on the throughput of the Craphchi, which would ensure the reliability of larger graphs. It is noteworthy that another algorithm is highly efficient----the calculation of the total move to memory, compared to the SSD calculated time only 1.1 to 2.5 (factors) of the Ascension.
Performance comparison of Graphchi (source origin)
Graphchi announced the performance benefits of the paradigm shift, including a common solution like Hadoop, Spark, and a high optimized graph computing framework Graphlab, Powergraph. The latter belongs to a highly optimized distributed parallel solution, and it takes only 1.5 minutes to do the Twitter triangulation process. However, it uses 64 nodes, each 8 core, totaling 512 cores. In a rough calculation, the performance increases 40 times times, but consumes 256 times times the computational resources (the core).
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.