Impala,hive Row Column _hive

Source: Internet
Author: User
Tags explode
Hive

For Hive, I use Collect_set () + CONCAT_WS () from Https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF.
But If you are want to remove duplicated elements, write your own UDF should is the only choice till now. Hive> Select uid, Concat_ws (', ', collect_set (tag)) from the test group by UID; failed:semanticexception [Error 10016]: line 1:27 Argument type mismatch ' tag ': Argument 2 of function Concat_ws must "String or array<string>", but "array<int>" was found. Hive> Select uid, Concat_ws (', ', Collect_set (CAST (tag as STRING)) from the test group by UID; ... Job 0:map:3 Reduce:1 Cumulative cpu:8.43 sec HDFS read:890 HDFS write:18 SUCCESS total MapReduce CPU time Spent:8 s Econds 430 msec OK 1 2,1,3 2 1,4 3 5 1 2 3 4 5 6 7 8 9 hive > select UID, concat_ws (', ', collect_set (tag)) F ROM test GROUP by UID; failed:semanticexception [Error 10016]: line 1:27 Argument type mismatch ' tag ': Argument 2 of function Concat_ws m UST be "A string or array<string>", but "array<int>" was found. Hive > select uid, concat_ws (', ', Collect_set) (CAST (tagAs STRING)) from the test group by UID; . . . Job 0:map:3    reduce:1    cumulative cpu:8.43 sec   HDFS read:890 HDFS write:18 SUCCESS Total MapReduce CPU time spent:8 seconds 430 msec OK 1 2, 1, 3 2 1, 4 3 5 Impala

Impala also has a group_concat () but different from MySQL

Group_concat (string s [, a string Sep])
Purpose:returns A single string representing the argument value concatenated together for each row of the result set. If the optional separator string is specified, the separator are added between each pair of concatenated values.
Return type:string

Usage Notes:concat () and CONCAT_WS () are appropriate for concatenating the values of the multiple columns within the same row , while Group_concat () joins together values from different rows.

By default, returns a single string covering the whole result set. To include "other" columns or values in the result set, or to produce multiple concatenated strings for subsets of Rows, Inc. Lude a GROUP by clause in the query.

Group_concat (string s [, String Sep]) is used in conjunction with grouping functions, group_concat (field, separator), and the following example:
[hadoop4.xxx.com:21000] > select UID, group_concat (CAST (tag as String), ', ') as Tag_list from Test3 Group by UID; Query:select uid, Group_concat (CAST (tag as String), ', ' as tag_list from TEST3 Group BY UID +-----+----------+ | UID | Tag_list | +-----+----------+ | 3 | 3 4 2 2 | 1,4 | | 1 | 1,2,3 | +-----+----------+ returned 3 row (s) in 0.68s 1 2 3 4 5 6 7 8 9 [HADOOP4. xxx. com:21000] > select UID, group _concat (CAST (tag as String), ', ') as Tag_list from Test3 Group by UID; Query:select uid, Group_concat (CAST (tag as String), ', ') as Tag_list from TEST3 Group by UID +-----+----- - -- -- + | UID | Tag_list | + -- -- - + -- -- -- -- -- + | 3    | 5          | | 2    | 1, 4        | | 1    | 1, 2, 3      | +-----+----------+ returned 3 row (s) in 0.68s Rows to Columns from: +------+------+------+ | UID | Tag | Val| +------+------+------+ | 1 | 1 | 1 | | 1 | 2 | 0 | | 1 | 3 | 1 | | 2 | 1 | 1 | | 2 | 4 | 0 | | 3 | 5 | 1 | +------+------+------+ to: +------+----------+----------+----------+----------+----------+ | UID | Tag1_val | Tag2_val | Tag3_val | Tag4_val | Tag5_val | +------+----------+----------+----------+----------+----------+ | 1 | 1 | 0 | 1 | 0 | 0 | | 2 | 1 | 0 | 0 | 0 | 0 | | 3 | 0 | 0 | 0 | 0 | 1 | +------+----------+----------+----------+----------+----------+ 1 2 3 4 5 6 7 8 9 m (+): +- - -- -- + -- -- -- + -- -- -- + | UID    | Tag    | Val    | + -- -- -- + -- -- -- + -- -- -- + |      1 |      1 |      1 | |      1 |      2 |      0 | |      1 |      3 |      1 | |      2 |      1 |    &nBsp 1 | |      2 |      4 |      0 | |      3 |      5 |      1 | +------+------+------+ to: +------+----------+----------+----------+----------+- - -- -- -- -- + | UID    | Tag1_val | Tag2_val | Tag3_val | Tag4_val | Tag5_val | + -- -- -- + -- -- -- -- -- + -- -- -- -- -- + -- -- -- -- -- + -- -- -- -- -- + -- -- -- -- -- + |      1 |          1 |          0 |          1 |          0 |          0 | |      2 |          1 |          0 |          0 |          0 |          0 | |      3 |          0 |          0 |          0 |          0 |          1 | +------+----------+----------+----------+----------+----------+     [hadoop4.x XX.COM:21000] > select > UID, > Max (case when Tag=1 then Val else 0) as Tag1_val, > Max (case when tag=2nd En val Else 0 end) as Tag2_val, > Max (case when Tag=3 then Val else 0) as Tag3_val, > Max (case when tag=4 then V Al Else 0 end) as Tag4_val, > Max (case when Tag=5 then Val else 0-end) as Tag5_val > from Test2 > Group by UID; Query:select uid, max (case when Tag=1 then Val else 0) as Tag1_val, Max (case when tag=2 then Val else 0) as Tag2_ Val Max (case when Tag=3 then Val else 0) as Tag3_val, Max (case when Tag=4 then Val else 0) as Tag4_val, Max Tag=5 then Val else 0 as Tag5_val from Test2 Group BY UID +-----+----------+----------+----------+----------+------- ---+ | UID | Tag1_val | Tag2_val | Tag3_val | Tag4_val | Tag5_val | +-----+----------+----------+----------+----------+----------+ | 3 | 0 | 0 | 0 | 0 | 1 | | 2 | 1 | 0 | 0 | 0 | 0 | | 1 | 1 | 0 | 1 | 0 | 0 | +-----+----------+----------+----------+----------+----------+ returned 3 row (s) in 0.99s 3 4 5 6 7 8 9 10 11 12 13 14 15 [HADOOP4. xxx. com:21000] > select &NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NB sp;               >    uid,                             >    Max (caseTag = 1 Then val else 0 end) as Tag1_val,                             >    max (case when tag = 2 then Val else 0 end) as Tag2_val,              & nbsp;             >    max (case when tag = 3 Then Val else 0 end) as Tag3_val,                             >    max (case when tag = 4 then Val else 0 end) as Tag4_val,              & nbsp;             >    max (case when tag = 5 Then Val else 0 end) as Tag5_val                             > from Test2                             > GROUP by UID; Query:select uid, max (case when tag = 1 then val else 0 ") as Tag1_val, Max (case when tag = 2 then val else 0 E nd) as Tag2_val, Max (case when tag = 3 then Val else 0 ") as Tag3_val, Max (case when tag = 4 then Val else 0 end As Tag4_val, Max (case when tag = 5 then Val else 0 ") as Tag5_val from Test2 Group by UID +-----+-------- -- + -- -- -- -- -- + -- -- -- -- -- + -- -- -- -- -- + -- -- -- -- -- + | UID | Tag1_val | Tag2_val | Tag3_val | Tag4_val | Tag5_val | + -- -- - + -- -- -- -- -- + -- -- -- -- -- + -- -- -- -- -- + -- -- -- -- -- + -- -- -- -- -- + | 3    | 0          | 0          | 0          | 0          | 1          | | 2    | 1          | 0          | 0          | 0          | 0          | | 1    | 1          | 0          | 1          | 0          | 0          | +-----+----------+----------+----------+----------+----------+ returned 3 row (s) in 0 .99s Columns to Rows Comma separated String to rows from: +-----+----------+ | UID | Tag_list| +-----+----------+ | 1 | 1,2,3 | | 2 | 1,4 | | 3 | 5 | +-----+----------+ to: +-----+-----+ | UID | Tag | +-----+-----+ | 1 | 1 | | 1 | 2 | | 1 | 3 | | 2 | 1 | | 2 | 4 | | 3 | 5 | +-----+-----+ 1 2 3 4 5 6 7 8 9-A-plus-----+----------+ | UID | Tag_list | + -- -- - + -- -- -- -- -- + |    1 | 1, 2, 3      | |    2 | 1, 4        | |    3 | 5          | +-----+----------+ to: +-----+-----+ | UID | Tag | + -- -- - + -- -- - + |    1 |    1 | |    1 |    2 | |    1 |    3 | |    2 |    1 | |    2 |    4 | |    3 |    5 | + -- -- - + -- -- - +

UNION [All] SELECT seems to be a solution. Mysql

And ... A Stored Procedure or a UDF? Hive

Lateral View is awesome!
I tried explode () which can split a array into rows and before that split () which the split string into array. Hive> Select UID, tag from test4 lateral view explode (split (tag_list, ', ')) tag_table as tag; ... Job 0:map:1 Cumulative cpu:1.69 sec HDFS read:293 HDFS write:24 SUCCESS total MapReduce CPU time spent:1 seconds 690 msec OK 1 1 1 2 1 3 2 1 2 4 3 5 time taken:12.894 seconds hive> 1 2 3 4 5 6 7 8 9 hive > select UID, Tag from test4 lateral view explode (split (tag_list, ', ')) tag_table as tag; . . . Job 0:map:1 Cumulative cpu:1.69 sec HDFS read:293 HDFS write:24 SUCCESS total MapReduce CPU time spent:1 Seconds 690 msec OK 1 1 1 2 1 3 2 1 2 4 3 5 time taken:12.894 seconds hive > Presto

Not figured out. Impala

Not figured out. Columns to Rows from: +------+----------+----------+----------+----------+----------+ | UID | Tag1_val | Tag2_val | Tag3_val | Tag4_val | Tag5_val | +------+----------+----------+----------+----------+----------+ | 1 | 1 | 0 | 1 | 0 | 0 | | 2 | 1 | 0 | 0 | 0 | 0 | | 3 | 0 | 0 | 0 | 0 | 1 | +------+----------+----------+----------+----------+----------+ to: +------+------+------+ | UID | Tag | Val | +------+------+------+ | 1 | 1 | 1 | | 1 | 2 | 0 | | 1 | 3 | 1 | | 2 | 1 | 1 | | 2 | 4 | 0 | | 3 | 5 | 1 | +------+------+------+ 1 2 3 4 5 6 7 8 9 ' ": +------+----------+---------- + -- -- -- -- -- + -- -- -- -- -- + -- -- -- -- -- + | UID    | Tag1_val | Tag2_val | Tag3_val | Tag4_val | Tag5_val | + -- -- -- + -- -- -- -- -- + -- -- -- -- -- + -- -- -- -- -- + -- -- -- -- -- + -- -- -- -- -- + |      1 |          1 |          0 |          1 |          0 |          0 | |      2 |          1 |          0 |          0 |          0 |          0 | |      3 |          0 |          0 |          0 |          0 |          1 | +------+----------+----------+----------+----------+----------+ to: +------+---- -- + -- -- -- + | UID    | Tag    | Val    | + -- -- -- + -- -- -- + -- -- -- + |      1 |      1 |      1 | |      1 |      2 |      0 | |      1 |      3 |      1 | |      2 |      1 |      1 | |      2 |      4 |      0 | |      3 |      5 |      1 | + -- -- -- + -- -- -- + -- -- --

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.