recently, the scattered fairy used a few weeks of pig to deal with the analysis of our website search log data, feel very good, today wrote a note about the origin of pig, in addition to big data, probably very few people know what pig is doing, including some are programming, but not big data, Also includes some not to do programming, nor to engage in big data,
Recently, the scattered fairy used a few weeks of pig to deal with the analysis of our website search log data, feel very good, today wrote a note about the origin of pig, in addition to big data, probably very few people know what pig is doing, including some are programming, But not to make big data, also include some not to do programming, also not make big da
650) this.width=650; "Src=" http://dl2.iteye.com/upload/attachment/0105/3491/ 7c7b3bef-0dda-3ac6-8cdb-1ecc1dd9c194.jpg "style=" Border:0px;font-family:helvetica, Tahoma, Arial, Sans-serif; Font-size:14px;line-height:25.1875px;white-space:normal;background-color:rgb (255,255,255); "Alt=" 7c7b3bef-0dda-3ac6-8cdb-1ecc1dd9c194.jpg "/>Before the article began, we would simply review the behind me of Pig's past:What is 1,pig?
before the article began, we would simply review the behind me of Pig's past:What is 1,pig?Pig was one of the Yahoo Company's Hadoop-based parallel processing architecture, then Yahoo donated pig to Apache (an open source software fund) a project, by Apache to maintain, Pig
Before the article begins, let's simply review the behind me past of Pig: What is 1,pig? Pig was originally a Hadoop-based parallel processing architecture for Yahoo, and later Yahoo donated pig to a project of Apache (an open source software fund), which was maint
bbbbb1961bbbbbb0060accccc1992cccccc0080cddddd1953dddddd0033deeeee1964eeeeee0051eaaaaa1960aaaaaa0024abbbbb1951bbbbbb0035accccc1952cccccc0048cddddd1953dddddd0053deeeee1954eeeeee0048e
In order to retrieve the year and temperature, you need to define the Loading Function by yourself. The sequence number of each column starts with 0. The custom loading function must inherit LoadFunc. The specific code is as follows.
Package whut; import java. io. IOException; import java. util. arrayList; import ja
What is 1,pig? Pig was originally a Hadoop-based parallel processing architecture for Yahoo, and later Yahoo donated pig to a project of Apache (an open source software fund), which was maintained by Apache, and Pig was a Hadoop's massive data analysis platform, which pro
Win or win? Pig vs Hive !!!, Pighive
From: http://www.aptibook.com/Articles/Pig-and-hive-advantages-disadvantages-features
This article discusses the features of pig and hive.
Developers usually choose a technical system that meets their business needs. In the hadoop system, pig
within billions of rows of data in the host. HBase is a database, a NOSQL database that provides the ability to read and write like other databases,Hadoop does not meet real-time needs, andHBase is ready to meet. If you need real-time access to some data, put it into HBase. You can use as a static data warehouse,HBase acts as a data store and places data that can be changed by some operations. 1,HBase for the query, it through the organization of all
PigA lightweight scripting language that operates on Hadoop, originally launched by Yahoo, but is now on the decline. Yahoo itself slowly withdrew from the maintenance of pig after the open source of its contribution to the open source community by all enthusiasts to maintain. But some companies are still using it, but I don't think it's better to use hive than using pi
installation environment: the machine has only one machine operating system: ubuntu 11.04 64 operating system hadoop: version 1.0.2, installed on/usr/local/hadoop Sun JDK: the version is 1.6.0 _ 31 64bit, install it in/usr/local/JDK pig: Version 0.9.2, install it in/usr/local/pig Installation Steps:
you want to filed to the single, then you need to take this filed, separately extracted, and then in the distinct13,filter, filters, similar to the Where condition of the database, returns a Boolean value.14,foreach, iterate, extract a column, or columns of data,15,group, grouping, database-like group16,partition by, same as partition components in Hadoop17,join, internal and external connections, similar to the relational database, in Hadoop and dif
The main introduction to the Hadoop family of products, commonly used projects include Hadoop, Hive, Pig, HBase, Sqoop, Mahout, Zookeeper, Avro, Ambari, Chukwa, new additions include, YARN, Hcatalog, O Ozie, Cassandra, Hama, Whirr, Flume, Bigtop, Crunch, hue, etc.Since 2011, China has entered the era of big data surging, and the family software, represented by
Http://blog.sina.com.cn/s/blog_537b7f1a0100m0xc.htmlGoogle's
Sawzall, Pig and Microsoft Dryad
Greg recently wrote a blog about the distributed architecture of Google, Yahoo, and Microsoft. This is: Google's Sawzall, Yahoo's pig
Pig and Microsoft Dryad.
This is really an information explosion era. In this context, the computing that consumes the most CPU will gr
1. Pig Data Model
Bag: Table
Tuple: Row, record
Field: attribute
Pig does not require that each tuple in the same bag has the same number or type of fields.
2. Common pig lating statements
1) load: indicates the method for loading data.
2) foreach: perform some processing on a row-by-row scan.
3) filter: filters rows.
4) dump: display the result
function is created from Hadoop-based InputFormat, and the base class is Loadfunc,loadfunc's default implementation is for HDFs, and Pig provides the Preparetoread method for loading functions that provide a way to initialize themselves. Once the user's load function implements the GetSchema method, the LOAD statement no longer needs to define their schema.Similarly, storage functions are built on
(ST) ;}}For the load function, the type of delimiter that is supported when loading, you can refer to the official website's documentationHere's a look at the code in the Pig script:Java code
--hadoop Technology Exchange Group:415886155
/*pig supported separators include the following:
1, arbitrary string,
2, any escape character
3,dec characters \\u
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.