Big Data Savior: Apache Hadoop and Hive

Source: Internet
Author: User
Keywords Statements large data login user needs Microsoft

Apache Hadoop and MapReduce attract a large number of large data analysis experts and business intelligence experts. However, a wide range of Hadoop decentralized file systems, or the ability to write or execute mapreduce in the Java language, requires truly rigorous software development techniques. Apache Hive will be the only solution.

The Apache Software Foundation Engineering Hive's database component, is also based on the cloud Hadoop ecosystem, provides the context based query statement called Hive query statement. This set of statements automatically translates SQL class query statements into mapreduce work instructions.

Related databases, such as IBM db2,oracle and SQL Servers, and database applications are the main force in the field of business intelligence. Most of the data analysis experts have mastered more comprehensive SQL query statement skills. By the same token, business analysts generally have the skills to use Excel forms, pivot tables, and icons to summarize data.

Let's look at how an end-to-end business Intelligence Project works in Windows Azure systems. First, a large amount of data is formed, and then the Excel chart shows the flight arrival data of the United States airlines with navigational qualifications, and the entire process does not require any program code to be written.

Windows collaboration with Apache Hadoop on Azure CTP

November 2011, Microsoft SQL Server Research group announces the Sharing Technology Preview molding under Windows Azure System or hadooponazure system. Microsoft emphasizes that this simplifies the use and setting of Hadoop, and can generate hive queries to gain the flexibility of Windows Azure by generating unstructured data that analyzes the form of Hadoop Excel.

The Hadoop on Azure CTP does not expose information to anyone. Users need to fill out a simple questionnaire on the Microsoft link to get an invitation. When you receive the invitation, start browsing the Hadooponazure Web site and log in with the Windows Live ID number. Enter the globally unique DNS username, select the initial Hadoop group size, enter a group login name and password, and click Get Group icon. (See Figure 1)

Figure 1 after receiving the Hadooponazure CTP invitation, the user can modify a group with only a few simple operations

The processing group takes nearly 15-30 minutes. Browse Hadooponazure CTP resources are free, but group requirements require users to update their signatures within the last 6 hours of the first 24-hour period, and the certificate needs to be updated every day during use.

Users need Windows Azure signature and a storage account to use Windows Azure Group as a long-term data storage mode, otherwise the data stored in the Hadoop distributed filesystem will be lost once the group exits. Without a signature, users can apply for a free three-month Windows Azure account, which gives each user 20GB of storage and millions of storage and 20GB of external bandwidth.

The expansion of SQL Azure Group in the field of large data usage

The Apache hive project extracts data from the United States Federal Aviation Administration and collects information and delays on the arrival of flights on time from 2011 to 5 months to January 2012. A subset of the 6-page text data contains the Federal Aviation Administration's File column, which has 500,000 lines of information in total 25MB capacity.

Users need to upload data to a folder, which is enclosed in a group container, and hive can search for the data. My blog has detailed steps on how to create Azure Group source data. It also has database information and how to download data using the Windows Live SkyDrive account, and finally how to upload the data to Windows Azure Group at Microsoft's Chicago data Center.

When the group data is formed, the MapReduce Portal login page pops up, the page presents an urbanized browsing page, and the Group and Account Management dialog box pops up. (See Figure 2)

Figure 2:hadooponazure MapReduce dashboard page features and features.

Copy the initial login password for the Windows Azure Management portal to the Clipboard, click on the management group, open the page and then click Set ASV (Azure repository), and use the Windows Storage account as the data storage Center for the hive desktop. Alternatively, the user can store data from the hive desktop to the Amazon S3 (recommended storage service) or Windows Azure Data Center and data market.
Enter your storage account, in the Password box to paste the initial password value, click Save Settings, hive can successfully log on to the database. If the certificate is authenticated, the user will receive SMS notification that the Azure account setting is successful.

Unlike HDFs, in the Hive table, even the simplest key-value data needs to be graphically depicted.

To convert a HDFs file, an external file, separate data into a hive chart, name its columns, define the data type, the user needs to run the Create external table and look at the instance. Create the Fightdata folder with the Hive statement, which depicts the passenger aircraft information.

CREATE EXTERNAL TABLE FLIGHTDATA_ASV (
Year INT,
Month INT,
Day INT,
Carrier STRING,
Origin STRING,
Dest STRING,
Depdelay INT,
Arrdelay INT
)
COMMENT ' FAA on-time data '
ROW FORMAT delimited FIELDS terminated by ' 9 '
STORED as Textfile
LOCATION ' Asv://aircarrier/flightdata ';


Apachehive does not have too many data types and does not support date or time fields, so the source data *.csv corresponding integer number segments such as year, month and day values are good for data maintenance. The departure and arrival values are presented in minutes.

To execute a dynamic hive statement, click on the MapReduce dynamic dashboard and click on the Hive button to open the Dynamic hive page with a read-only text box at the top of the page, and click the text box below to indicate the statement. (See Figure 3)

Figure 3:hive Chart Options list includes new chart titles, and column cells display a selected chart field name. Click the ﹥﹥ key to insert the selected entry in the cell.

(Responsible editor: Lu Guang)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.