Hadoop Tutorial: SQL server+hadoop Large Data solution

Source: Internet
Author: User
Keywords Solution Microsoft can through

Microsoft's SQL Server is one of the most watched products in the database market. SQL Server is almost second in the list of database Db-engines published every month in the database Knowledge Web site. But from this list of monthly changes can also be seen, a large number of NoSQL database rankings rising, has begun to threaten the status of traditional databases.

"Quo" is no longer a big data age should be the strategy, the old database manufacturers in the maintenance of traditional market-leading foundation, and constantly expand new markets, Microsoft is one of them. Microsoft's change was originally designed to provide Bing with high-quality search results, similar to Google's, where the internet industry is always the first to face big data challenges.

Microsoft's End-to-end large data solution can be summed up as SQL Server, Windows Azure and Hadoop, in Microsoft's own words, data management, data expansion, and insight. Below, I will take a bottom-up inventory of Microsoft's large data solutions specific content:

I. Data management

In Microsoft's big Data solution, data management is the lowest and most basic link. A flexible data management layer that can support all data types, including structured, semi-structured, and unstructured static or dynamic data. The data management layer includes three main products: SQL Server, SQL Server parallel Data Warehouse, and Hadoop on Windows.

For different data types, Microsoft offers different solutions. Specifically, you can use SQL Server and SQL Server parallel Data Warehouse processing for structured data; Unstructured data can be handled using Windows Azure and the version of Hadoop on Windows Server, while streaming data can be managed using SQL Server StreamInsight and provide near real-time analysis.

1. SQL Server. SQL Server 2012, released last year, has made a number of improvements to large data, the most important of which is full support for Hadoop, one of the most important differences between SQL Server 2012 and SQL Server 2008. SQL Server 2014, which is due to be released at the end of this year, further adds memory database functionality to large data, accelerates data processing from a hardware perspective, and is seen as an improvement on large data.

2. SQL Server parallel Data Warehouse. A Parallel Data Warehouse (Parallel Data Warehouse Appliance, referred to as PDW) is a new product launched in SQL Server 2008 R2 and is now the main data Warehouse product of Microsoft and will be released this year based on SQL Server 2012 of the new Parallel Data Warehouse integrated machine. SQL Server Parallel Data Warehouse is a large-scale parallel processing (MPP) architecture, and traditional stand-alone version of SQL Server is fundamentally different, it will be a variety of advanced data storage and processing technology as a whole, Microsoft's Big Data strategy is an important component.

3. Hadoop on Windows. Microsoft also provides Hadoop on Windows Azure platform and Windows Server, blending the high performance, high scalability of Hadoop with the traditional advantages of easy-to-use and easy deployment of Microsoft products to form a complete large data solution. Microsoft's large data solutions also provide the ease and manageability of windows for Hadoop through simple deployments and integration with components such as Active directory and System Center. With the services based on Hadoop on Windows Azure, Microsoft provides flexibility for its large data solutions in the cloud.

Ii. Data expansion

The rise of social media has created a unique plan for businesses to gain more business value and ultimately achieve competitive advantage. Microsoft's Big Data solution combines data and models with public data and services, including social media sites such as Twitter, Facebook and LinkedIn, to achieve breakthrough discoveries. At the data expansion level, Microsoft provides the most important platform for Windows Azure marketplace.

Windows Azure Marketplace is an online marketplace for the purchase and sale of completed software-service (SaaS) applications and advanced datasets. Windows Azure Marketplace can help connect companies seeking cloud-based innovation solutions to partners who have developed solutions for use, enabling customers to use Windows Azure Marketplace application and mining algorithms to discover hidden patterns.

Sharing and collaboration through Windows Azure Marketplace: Microsoft's Big Data solution allows customers to share data through Windows Azure Marketplace and discover new insights, Windows Azure Marketplace can expose hundreds of of applications and data mining algorithms from Microsoft and third parties through Open Data Protocol (ODATA).

Integration with social media: Microsoft's Big Data solution allows customers to expand their analysis through public data from social media sites such as Twitter and Facebook. A cloud-based project code-named "Social Analytics" by Microsoft allows businesses to integrate social media information with business applications.

Perform advanced analysis with Hadoop: Microsoft's large Data solutions support traditional BI and advanced analytics such as data mining and graphics mining, allowing customers to discover new value from all their data. Hive ODBC driver enables customers to perform predictive analysis using SQL Server data mining tools. Microsoft will also support other advanced analytics tools such as Mahout, as well as mining algorithms written using C + +, C #, Python, Ruby, and Pearl.

Iii. Insight

Businesses collect, store, and process data, and the ultimate goal is to gain insight. Enterprises need to be able to easily process and analyze PB-level new data without worrying about building complex distributed storage and computing clusters, and to scale as requirements increase. Microsoft's large data solution allows customers to use familiar BI tools to gain actionable insight from their structured and unstructured data.

From an insight perspective, Microsoft offers two major products, namely Office PowerPivot and SharePoint Power View. The PowerPivot and power view tools help businesses quickly find information from data to solve business problems. Where powerpivot can be used to design data models, power view can be used to design visual reports, and reports can be published on a SharePoint platform. The end user can design and display data models according to their business perspective and requirements, and fully utilize the power of data and foreground interface to meet business needs.

Use familiar tools to analyze Hadoop data: Microsoft allows users to use Excel's hive components to interact with and analyze unstructured data in Hadoop in a familiar Excel environment.

Gain insight through any data: Organizations can use familiar BI tools (such as Microsoft SQL Server Analysis Services (SSAS), PowerPivot, and Power View) through the Hive Open Database Connectivity (ODBC) driver to analyze unstructured data in Hadoop. Organizations can also use PowerPivot and power view on SQL Server 2012 to use self-service BI products for relational data.

Drive insight through simplified programming: Microsoft through with. NET and the new JavaScript Library integration simplifies the programming of Hadoop. Developers can use new JavaScript libraries in JavaScript to easily write MapReduce programs, and then deploy their JavaScript code through simple browsers.

Summary

Microsoft's Big Data solution is essentially an upgrade of the original SQL Server and Office products, with the biggest highlight being the integration of Hadoop features in SQL Server, Windows Server, and Windows Azure. Make Hadoop a bridge between those three. Microsoft's large data solutions are rich, full-featured, but relatively lacking in innovation. In the view of users, Microsoft's biggest feature is the ease of use of products and the friendliness of the interface, which is the main reason for users to choose Microsoft.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.