HDInsight-1, Introduction

Source: Internet
Author: User
Tags sqoop hdinsight

Recently work needs, to see hdinsight part, here to take notes. Nature is the most authoritative official information, so the contents are moved from here: https://azure.microsoft.com/en-us/documentation/articles/hdinsight-hadoop-introduction/

Hadoop on HDInsight

Make big data, all know Hadoop, then hdinsight and hadoop what relationship? Hdinsight is a m$ Azure-based software architecture, mainly for data analysis, management, and it uses HDP (Hortonworks Data Platform) the Hadoop distribution. And then a little bit of attention, we're talking about Hadoop, which generally refers to the ecosystem of Hadoop, including Storm/hbase, not just the little elephant.

Hdinsight can be understood to be an Apache Hadoop implementation on Microsoft Azure that contains the corresponding storm, HBase, Pig, Hive, Sqoop, Oozie, Ambari, and, of course, bundled with their own Excel, Ssas,ssrs.

Hdinsight supports two types of operating systems, Linux and m$ own windows, the difference is mainly here:

CATEGORY HADOOP on LINUX HADOOP on WINDOWS
Cluster OS Ubuntu 12.04 Long Term support (LTS) Windows Server R2
Cluster Type Hadoop Hadoop, HBase, Storm
Deployment Azure Management Portal, Azure CLI, Azure PowerShell Azure Management Portal, Azure CLI, Azure PowerShell, HDInsight. NET SDK
Cluster UI Ambari Cluster Dashboard
Remote Access Secure Shell (SSH) Remote Desktop Protocol (RDP)

Some basic concepts and definitions

    • hadoop  (the "Query" workload): provides reliable data storage with hdfs, and a Simple mapreduc E programming model to process and analyze data in parallel.

    • hbase  (the "NoSQL" workload): A NoSQL database built on Hadoop that provides random access and strong consiste Ncy for large amounts of unstructured and semi-structured data-potentially billions of rows times millions of columns. See overview of HBase on HDInsight.

    • Apache storm  (the "Stream" workload): A distributed, real-time computation system for processing large streams of data fast. Storm is offered as a managed cluster in HDInsight. See analyze real-time sensor data using Storm and Hadoop.

    • Ambari:cluster provisioning, management, and monitoring.

    • Avro (Microsoft. NET Library for Avro): Data serialization for the Microsoft. NET environment.

    • Hive & hcatalog:structured Query Language (SQL)-like querying, and a table and storage management layer.

    • Mahout:machine Learning.

    • MapReduce and yarn:distributed processing and resource management.

    • Oozie:workflow Management.

    • Phoenix:relational database layer over HBase.

    • Pig:simpler Scripting for MapReduce transformations.

    • Sqoop:data Import and Export.

    • Tez:allows data-intensive processes to run efficiently on scale.

    • Zookeeper:coordination of processes in distributed systems.

HBase

Aaa

HDInsight-1, Introduction

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.