Hadoop (vi)--sub-project Pig

Source: Internet
Author: User

Front, the two pillars of hadoop , HDFS and MapReduce, We use Java to write map-reduce by placing data files of big data on HDFS. To achieve a variety of data analysis, and predict something to achieve the business value of big data, thus also reflects the value of Hadoop .

    &NBSP, But in a traditional system, we analyze data through a database, such as a relational database: oracle Span lang= "en-US" >,sql server,mysql no SQL Span lang= "ZH-CN" > database: mongodb hadoop Big data analysis, how can we quickly and smoothly over it so that it does not java hadoop hadoop pig hive pig How to operate hadoop Span lang= "ZH-CN".

One,Pig is what:


Pigis aYaHooDonate toApacheone of the projects. EquivalentHadoopa client that belongs to theHadoopthe upper-layer derivative architecture that willMap-reducefor encapsulation. ThroughPigclient users can use thePig Latina similar toSQLlanguage pairs for data flowHDFSprocessed by the data below. Just a little easier.Pigis thatPig LatinData Flow language andMap-reducetranslation, similar to the direct interface of different languages. So some people say,PigincludePig Interfaceand thePig Latintwo parts. Look at this picture below:



Two,Pig Two modes of operation:


1 , Local mode: All files and execution procedures are performed locally and are typically used for testing programs. Turn on native mode:pig-x Local


2 , Map Reduce Mode: Actual working mode, pig translates the query into a mapreduce job, and then executes on the Hadoop cluster.

Three, installation Pig:


1, download unzip: to Apache Download website Pig , I downloaded the following: pig-0.15.0.tar.gz , placed under the same path as the hadoop (self-determined); Unzip:tar- XVF pig-0.15.0.tar.gz


2 , set environment variables: VI /etc/profile setting Pig of the bin Paths and JDK 's path. Reboot to complete the local mode installation .

Export path= $PATH:/home/ljh/pig-0.15.0/bin

Export java_home=/usr/jdk1.8.0_51

3 , Map Reduce Mode configuration , Set Hadoop related environment variables, restart, direct use Pig make MapReduce mode runs.

exportpath= $PATH:/home/ljh/pig-0.15.0/bin : /home/ljh/ hadoop-1.2.1 /bin

Exportjava_home=/usr/jdk1.8.0_51

exportpig_classpath=/home/ljh/hadoop-1.2.1/conf/

Four,Grund Shell command:


Enter Grunt , you can use a similar Linux command to perform various operations, and Linux very much like, you can experiment a bit:

For example: ls , CD cat and the Linux exactly the same, just in Hadoop environment;

copytolocal Copy the file to local, copyfromlocal copy files from local to Hadoop environment;

SH, commands used to execute the operating system, such as: Sh/usr/jdk/bin/jps . Look at the bottom of a picture, do not understand can be checked.



Five,pig 's data model, and a simple comparison of traditional data models:

Pig

Database

Bag

Table

Tuple

Yes

Field

Property


Note: Pig with one Bag inside the various tuple can be made up of different numbers of different types of Field , and traditional databases are not the same.

Six,Pig latin( similar to the SQL -to-data-flow language ):


1 , common statements:


2 , corresponding to various SQL , read this blog, very good: http://guoyunsky.iteye.com/blog/1317084


3 , more grammar, read this blog, very detailed: http://blackproof.iteye.com/blog/1791980

Pig Latin is a language, and is relatively simple language, in our use, we are now available to check. No care about the data of various queries, filtering and so on.

Summary: Good, Pig relatively little knowledge, easy installation, easy to use, need to learn more is Pig Latin This scripting language proficiency, more various grammar, will be used to check. pig,hadoop 's upper-layer derivative framework, which makes operation of Hadoop a way more, now has:java 's mapreduce,pig 's Pig Latin, and of course the hive behind it . HIVEQL , I'll talk about it later. In a word, more understanding, more reading of various materials, more practice ...



Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.

Hadoop (vi)--sub-project Pig

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.