Pig system Analysis (1) Overview

Source: Internet
Author: User
Tags scalar

This series of articles analyzes the Pig running the mainline process to explore the possibility of (class) Pig Latin on Spark, using Pig Latin on Hadoop.

Pig Overview

The Apache Pig is Yahoo! in order for researchers and engineers to be able to handle, analyze, and excavate large numbers more easily. From the point of view of data access, yarn can be regarded as an operating system with large data, then pig is an integral part of various types of data applications.

Although pig's learning cost is higher than Hive's, the advantage of pig is its ability to express and flexibility. If the user uses declarative hive HQL to express only what data they want, then the user uses the process-pig lation, through a series of statement combinations, to fully control the entire process of data analysis.

Pig overall process

noun explanation

Noun Explain Note
Pig Latin Pig Data Flow Processing language
Loader/store Pig for loading and storing data
Schema Data format specified when data is loaded Pig data types are classified into scalar and complex types, scalar basic and Java basic data types, complex types including tuple (tuples), maps, and bag (unordered sets of tuples)
Relation Data collection for pig operations A set of tuples, or a bag (more precisely, external bag, because there are nested internal bag)
Logical Plan Logical Execution Plan
Physical plan Physical execution Plan
Optimizer Optimizer rule-based Logic Optimizer
Dag Directed acyclic graph with direction-free graph

See more highlights of this column: http://www.bianceng.cnhttp://www.bianceng.cn/database/storage/

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.