The core of Hadoop--microsoft Big Data strategy

Source: Internet
Author: User
Keywords Large data strategy core large data while Java

Guide: As we all know, the big data wave is gradually sweeping all over the world. And Hadoop is the source of the Storm's power. Microsoft is an unprecedented partner with the Apache Hadoop community. Microsoft's move is to build a Microsoft-branded Hadoop biosphere, leveraging its own advantages in the software world.

Today, Microsoft has put Hadoop at the heart of its big data strategy. The reason for Microsoft's move is to fancy the potential of Hadoop, which has become the standard for distributed data processing in large data areas. By integrating Hadoop technology, Microsoft allows customers to access fast-growing hadoop ecosystems. And as more and more talented people who are adept at developing the Hadoop platform gush, this is very beneficial for Hadoop development.

Microsoft's goal is not just to integrate Hadoop into the Windows system, Microsoft is interested in contributing code to the Apache Hadoop community and wants to be accepted by the community. Ultimately, anyone can run a purely open-source Hadoop on Windows.

a Microsoft branded Hadoop

Microsoft's Hadoop version is currently evolving into the "Customer Technology Preview" phase. This means that Microsoft is evaluating the customer base and expects the official edition to be launched in mid-2012. Microsoft's Hadoop is based on the Windows Server platform or on Microsoft Cloud Platform Azure. Of the 1.0 releases that will be launched, the product core includes MapReduce, HDFS, and Hadoop components pig and hive.

Microsoft's goal is to be compatible with all Hadoop components. Components such as zookeeper, HBase, Hcatalog, and mahout in the Hadoop ecosystem are also attached to the Microsoft Hadoop version.

Microsoft's Hadoop also integrates with its own business intelligence analytics offerings.

The Hadoop connector makes it easier for Hadoop to communicate with SQL Server and SQL Server parallel data warehouses.

Hive ODBC driver that allows any Windows application to access and query the hive Data warehouse.

Excel accesses hive, which moves data directly from hive to Excel and PowerPivot.

At the back end, Microsoft has made other improvements to Hadoop, and Microsoft will consolidate Active Directory for easy access control. Integrated System Center is also used for administrative personnel management.

Microsoft Official plans to publish more details about Waad (Windows Azure Active Directory) at the upcoming TechEd conference in June. This is the same as Microsoft's concept of Active Directory on Windows Server systems. Future use of ACS (Access control Service) with existing Active Directory deployments ensures good interoperability.

uses JavaScript APIs and C # for Hadoop development

One of the most distinctive features of the Hadoop version introduced by Microsoft is the additional JavaScript API. The programming work on Hadoop is tedious, which is why other high-level languages appear (such as pig).

Microsoft has chosen to add a JavaScript layer to the Hadoop environment that developers can use to create mapredcue work and even interact with pig and hive in browser mode. The real advantage of the JavaScript layer is that it integrates itself into the Hadoop business environment, making it easy for developers to create an intranet analytics environment for business users to access.

Microsoft Node.js JavaScript into server-side Windows Server and Windows Azure platform. At the same time, Microsoft plans to contribute its own JavaScript API to the Apache open source community. This is also good news for the Hadoop community.

More importantly, Microsoft makes it possible to develop Hadoop applications using the. NET Platform. Microsoft plans to use the existing Hadoop APIs directly to create MapReduce tasks using the. NET Platform. A higher level of interface may appear in future releases. Over time, future support for the development of Hadoop projects in Visual Studio will continue to improve. The Hadoop project, which runs on Azure in the future, will allow programming based on the common Language Runtime (CLR) language, such as the C # language, on top of the. Net Framework framework.

flow Data processing system and NoSQL

It is no doubt that Hadoop is the most talked about by big data people, but streaming data processing and NoSQL are just as important for large numbers. For Microsoft, they will naturally be prepared. Microsoft has launched a streaming data solution called StreamInsight. NoSQL, Microsoft also has a NoSQL database called Azure tables on the Windows Azure platform.

Looking ahead, Microsoft's commitment to Hadoop compatibility means that streaming data solutions StreamInsight and Azure tables will be rolled out as part of the Hadoop environment with the Microsoft Distributed hbase as a core product. Today's streaming data solutions, such as Yahoo S4, will be compatible with Microsoft.

integration with existing tools

Does Microsoft's tendency to integrate existing major components into large data tools mean that Microsoft intends to provide a comprehensive data science platform for Enterprises? Madhu Reddy, Microsoft's Big Data senior product planning director, gave a positive answer. The primary purpose of Microsoft's Hadoop development effort is to get people to use familiar tools, and Microsoft is focused on interoperability with existing tools. Microsoft's move involves users at all levels, including developers, analysts, business users, and so on. Excel is a ubiquitous software, and the hive of Excel and the Internet is a good example. But other tools are also important, such as MATLAB, SAS or R.

Summary

The Microsoft Large data strategy ensures that the Windows platform continues to play its part in the large data age. and make their own cloud services in the data center business more competitive. Another approach by Microsoft is to seamlessly integrate large data with its own large and diverse software. You can see that Microsoft's focus is on strong consolidation. Microsoft's collaboration with the Apache Hadoop community ensures that new tools and talented developers migrate to the platform. (Li/compiling)

(Responsible editor: Lu Guang)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.