Hive provides the ability to customize mapper and reducer through scripting, which requires the use of transform functions.
By default, the parameters that are included in the transform function are separated by the ' \ T ' and passed in the user script as a string pair. The NULL value in the input is converted to the string ' \ n '. The script's output uses a delimiter that is also ' \ t ', and ' \ n ' is converted to NULL again. It is important to note that when the transform parameter contains ' \ t ', the user needs to manually process these ' \ t ' to avoid errors in the script. Here is a sample using:
from ( Pv_users TRANSFORM (Pv_users.userid, pv_users.date) USING ' map_script ' reduce_script '
The output of ' map_script ' is separated using ' \ t ', corresponding to the DT and UID two fields. By default, if a type is not specified, it is considered to be of type string by default.
This enables the functionality of the UDF through scripting (shell, Python, and so on).
Introduction to Hive Transform functions