Microsoft is about to open source reef big Data frame
Source: Internet
Author: User
KeywordsMicrosoft open source run
Microsoft has developed a large data framework named Reef (which will keep the evaluation execution framework short) http://www.aliyun.com/zixun/aggregation/14294.html ">, and intends to push it to open source within one months." Reef is designed to be based on the next generation of Hadoop resource Manager yarn, especially for machine learning tasks. Microsoft technical researcher and Information Service CTO Raghu Ramakrishnan presented the reef project and Microsoft's Open source program in the form of a keynote speech at the International Conference on Knowledge Discovery and Data discovery, held in Chicago in the morning of Monday. The Yarn is a resource manager that was originally developed as part of the Apache Hadoop project, designed to help users run and manage multiple task types within the same set of multiple physical devices (such as batch mapreduce, and Storm and/ or graphics processing package to achieve flow processing, etc.). This new mechanism will not only reduce the number of systems that institutions need to manage, but also perform different types of analytical work based on the same data at the same location. In some cases, all data workflows can even be handled entirely within the same set of device clusters. However, according to Ramakrishnan, some types of work, such as machine learning, are not appropriate for the Yarn class framework because of their special requirements for data mobility, task monitoring, and iterative iteration of the result set to avoid multiple reboots. According to Reef,ramakrishnan, this is a set of libraries running on yarn, although he did not delve into his specific operating mechanism, but claimed that reef could solve the problem to some extent. But he has clearly explained that reef is divided into two main parts: first as a yarn container--evaluator to accommodate reef services, and then an activity that enables user code to run in evaluator. He also brought us a workflow demo that starts the evaluator in yarn and implements the activity code for the container to run until the end. It is worth mentioning that the same evaluator can also be restarted and maintained its initial state, so that other activity processes can also run against the initial data. It is speculated that Microsoft is probably using some kind of SQL query or other machine learning algorithm to achieve this effect.
Theoretically, reef is a very interesting technique. It wants to address the legacy problems that enterprises face when trying to further analyze data. We look forward to Microsoft's official release of the reef, and then from the actual use of the information to obtain more results. Although not yet proven, reef still deserves attention-because Microsoft has shown a high degree of concern about Hadoop (yarn, an important part of Hadoop) and the open source community. Just a few years ago, Microsoft was looking for alternatives and proprietary platforms for Hadoop. Today, software giants have begun to devote themselves to the Hadoop technology community, hoping to help themselves to a higher level through open source.
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.