Hive
For Hive, I use Collect_set () + CONCAT_WS () from Https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF.
But If you are want to remove duplicated elements, write your own UDF should is the only choice till now. Hive> Select uid, Concat_ws (', ', collect_set (tag)) from the test group by UID; failed:semanticexception [Error 10016]: line 1:27 Argument type mismatch ' tag ': Argument 2 of function Concat_ws must "String or array<string>", but "array<int>" was found. Hive> Select uid, Concat_ws (', ', Collect_set (CAST (tag as STRING)) from the test group by UID; ... Job 0:map:3 Reduce:1 Cumulative cpu:8.43 sec HDFS read:890 HDFS write:18 SUCCESS total MapReduce CPU time Spent:8 s Econds 430 msec OK 1 2,1,3 2 1,4 3 5 1 2 3 4 5 6 7 8 9 hive > select UID, concat_ws (', ', collect_set (tag)) F ROM test GROUP by UID; failed:semanticexception [Error 10016]: line 1:27 Argument type mismatch ' tag ': Argument 2 of function Concat_ws m UST be "A string or array<string>", but "array<int>" was found. Hive > select uid, concat_ws (', ', Collect_set) (CAST (tagAs STRING)) from the test group by UID; . . . Job 0:map:3 reduce:1 cumulative cpu:8.43 sec HDFS read:890 HDFS write:18 SUCCESS Total MapReduce CPU time spent:8 seconds 430 msec OK 1 2, 1, 3 2 1, 4 3 5 Impala
Impala also has a group_concat () but different from MySQL
Group_concat (string s [, a string Sep])
Purpose:returns A single string representing the argument value concatenated together for each row of the result set. If the optional separator string is specified, the separator are added between each pair of concatenated values.
Return type:string
Usage Notes:concat () and CONCAT_WS () are appropriate for concatenating the values of the multiple columns within the same row , while Group_concat () joins together values from different rows.
By default, returns a single string covering the whole result set. To include "other" columns or values in the result set, or to produce multiple concatenated strings for subsets of Rows, Inc. Lude a GROUP by clause in the query.
Group_concat (string s [, String Sep]) is used in conjunction with grouping functions, group_concat (field, separator), and the following example:
[hadoop4.xxx.com:21000] > select UID, group_concat (CAST (tag as String), ', ') as Tag_list from Test3 Group by UID; Query:select uid, Group_concat (CAST (tag as String), ', ' as tag_list from TEST3 Group BY UID +-----+----------+ | UID | Tag_list | +-----+----------+ | 3 | 3 4 2 2 | 1,4 | | 1 | 1,2,3 | +-----+----------+ returned 3 row (s) in 0.68s 1 2 3 4 5 6 7 8 9 [HADOOP4. xxx. com:21000] > select UID, group _concat (CAST (tag as String), ', ') as Tag_list from Test3 Group by UID; Query:select uid, Group_concat (CAST (tag as String), ', ') as Tag_list from TEST3 Group by UID +-----+----- - -- -- + | UID | Tag_list | + -- -- - + -- -- -- -- -- + | 3 | 5 | | 2 | 1, 4 | | 1 | 1, 2, 3 | +-----+----------+ returned 3 row (s) in 0.68s Rows to Columns from: +------+------+------+ | UID | Tag | Val| +------+------+------+ | 1 | 1 | 1 | | 1 | 2 | 0 | | 1 | 3 | 1 | | 2 | 1 | 1 | | 2 | 4 | 0 | | 3 | 5 | 1 | +------+------+------+ to: +------+----------+----------+----------+----------+----------+ | UID | Tag1_val | Tag2_val | Tag3_val | Tag4_val | Tag5_val | +------+----------+----------+----------+----------+----------+ | 1 | 1 | 0 | 1 | 0 | 0 | | 2 | 1 | 0 | 0 | 0 | 0 | | 3 | 0 | 0 | 0 | 0 | 1 | +------+----------+----------+----------+----------+----------+ 1 2 3 4 5 6 7 8 9 m (+): +- - -- -- + -- -- -- + -- -- -- + | UID | Tag | Val | + -- -- -- + -- -- -- + -- -- -- + | 1 | 1 | 1 | | 1 | 2 | 0 | | 1 | 3 | 1 | | 2 | 1 | &nBsp 1 | | 2 | 4 | 0 | | 3 | 5 | 1 | +------+------+------+ to: +------+----------+----------+----------+----------+- - -- -- -- -- + | UID | Tag1_val | Tag2_val | Tag3_val | Tag4_val | Tag5_val | + -- -- -- + -- -- -- -- -- + -- -- -- -- -- + -- -- -- -- -- + -- -- -- -- -- + -- -- -- -- -- + | 1 | 1 | 0 | 1 | 0 | 0 | | 2 | 1 | 0 | 0 | 0 | 0 | | 3 | 0 | 0 | 0 | 0 | 1 | +------+----------+----------+----------+----------+----------+ [hadoop4.x XX.COM:21000] > select > UID, > Max (case when Tag=1 then Val else 0) as Tag1_val, > Max (case when tag=2nd En val Else 0 end) as Tag2_val, > Max (case when Tag=3 then Val else 0) as Tag3_val, > Max (case when tag=4 then V Al Else 0 end) as Tag4_val, > Max (case when Tag=5 then Val else 0-end) as Tag5_val > from Test2 > Group by UID; Query:select uid, max (case when Tag=1 then Val else 0) as Tag1_val, Max (case when tag=2 then Val else 0) as Tag2_ Val Max (case when Tag=3 then Val else 0) as Tag3_val, Max (case when Tag=4 then Val else 0) as Tag4_val, Max Tag=5 then Val else 0 as Tag5_val from Test2 Group BY UID +-----+----------+----------+----------+----------+------- ---+ | UID | Tag1_val | Tag2_val | Tag3_val | Tag4_val | Tag5_val | +-----+----------+----------+----------+----------+----------+ | 3 | 0 | 0 | 0 | 0 | 1 | | 2 | 1 | 0 | 0 | 0 | 0 | | 1 | 1 | 0 | 1 | 0 | 0 | +-----+----------+----------+----------+----------+----------+ returned 3 row (s) in 0.99s 3 4 5 6 7 8 9 10 11 12 13 14 15 [HADOOP4. xxx. com:21000] > select &NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NB sp; > uid, > Max (caseTag = 1 Then val else 0 end) as Tag1_val, > max (case when tag = 2 then Val else 0 end) as Tag2_val, & nbsp; > max (case when tag = 3 Then Val else 0 end) as Tag3_val, > max (case when tag = 4 then Val else 0 end) as Tag4_val, & nbsp; > max (case when tag = 5 Then Val else 0 end) as Tag5_val > from Test2 > GROUP by UID; Query:select uid, max (case when tag = 1 then val else 0 ") as Tag1_val, Max (case when tag = 2 then val else 0 E nd) as Tag2_val, Max (case when tag = 3 then Val else 0 ") as Tag3_val, Max (case when tag = 4 then Val else 0 end As Tag4_val, Max (case when tag = 5 then Val else 0 ") as Tag5_val from Test2 Group by UID +-----+-------- -- + -- -- -- -- -- + -- -- -- -- -- + -- -- -- -- -- + -- -- -- -- -- + | UID | Tag1_val | Tag2_val | Tag3_val | Tag4_val | Tag5_val | + -- -- - + -- -- -- -- -- + -- -- -- -- -- + -- -- -- -- -- + -- -- -- -- -- + -- -- -- -- -- + | 3 | 0 | 0 | 0 | 0 | 1 | | 2 | 1 | 0 | 0 | 0 | 0 | | 1 | 1 | 0 | 1 | 0 | 0 | +-----+----------+----------+----------+----------+----------+ returned 3 row (s) in 0 .99s Columns to Rows Comma separated String to rows from: +-----+----------+ | UID | Tag_list| +-----+----------+ | 1 | 1,2,3 | | 2 | 1,4 | | 3 | 5 | +-----+----------+ to: +-----+-----+ | UID | Tag | +-----+-----+ | 1 | 1 | | 1 | 2 | | 1 | 3 | | 2 | 1 | | 2 | 4 | | 3 | 5 | +-----+-----+ 1 2 3 4 5 6 7 8 9-A-plus-----+----------+ | UID | Tag_list | + -- -- - + -- -- -- -- -- + | 1 | 1, 2, 3 | | 2 | 1, 4 | | 3 | 5 | +-----+----------+ to: +-----+-----+ | UID | Tag | + -- -- - + -- -- - + | 1 | 1 | | 1 | 2 | | 1 | 3 | | 2 | 1 | | 2 | 4 | | 3 | 5 | + -- -- - + -- -- - +
UNION [All] SELECT seems to be a solution. Mysql
And ... A Stored Procedure or a UDF? Hive
Lateral View is awesome!
I tried explode () which can split a array into rows and before that split () which the split string into array. Hive> Select UID, tag from test4 lateral view explode (split (tag_list, ', ')) tag_table as tag; ... Job 0:map:1 Cumulative cpu:1.69 sec HDFS read:293 HDFS write:24 SUCCESS total MapReduce CPU time spent:1 seconds 690 msec OK 1 1 1 2 1 3 2 1 2 4 3 5 time taken:12.894 seconds hive> 1 2 3 4 5 6 7 8 9 hive > select UID, Tag from test4 lateral view explode (split (tag_list, ', ')) tag_table as tag; . . . Job 0:map:1 Cumulative cpu:1.69 sec HDFS read:293 HDFS write:24 SUCCESS total MapReduce CPU time spent:1 Seconds 690 msec OK 1 1 1 2 1 3 2 1 2 4 3 5 time taken:12.894 seconds hive > Presto
Not figured out. Impala
Not figured out. Columns to Rows from: +------+----------+----------+----------+----------+----------+ | UID | Tag1_val | Tag2_val | Tag3_val | Tag4_val | Tag5_val | +------+----------+----------+----------+----------+----------+ | 1 | 1 | 0 | 1 | 0 | 0 | | 2 | 1 | 0 | 0 | 0 | 0 | | 3 | 0 | 0 | 0 | 0 | 1 | +------+----------+----------+----------+----------+----------+ to: +------+------+------+ | UID | Tag | Val | +------+------+------+ | 1 | 1 | 1 | | 1 | 2 | 0 | | 1 | 3 | 1 | | 2 | 1 | 1 | | 2 | 4 | 0 | | 3 | 5 | 1 | +------+------+------+ 1 2 3 4 5 6 7 8 9 ' ": +------+----------+---------- + -- -- -- -- -- + -- -- -- -- -- + -- -- -- -- -- + | UID | Tag1_val | Tag2_val | Tag3_val | Tag4_val | Tag5_val | + -- -- -- + -- -- -- -- -- + -- -- -- -- -- + -- -- -- -- -- + -- -- -- -- -- + -- -- -- -- -- + | 1 | 1 | 0 | 1 | 0 | 0 | | 2 | 1 | 0 | 0 | 0 | 0 | | 3 | 0 | 0 | 0 | 0 | 1 | +------+----------+----------+----------+----------+----------+ to: +------+---- -- + -- -- -- + | UID | Tag | Val | + -- -- -- + -- -- -- + -- -- -- + | 1 | 1 | 1 | | 1 | 2 | 0 | | 1 | 3 | 1 | | 2 | 1 | 1 | | 2 | 4 | 0 | | 3 | 5 | 1 | + -- -- -- + -- -- -- + -- -- --