find the data row in the installation list for each GID in table Test_gid2 that contains the PKG name in the file pkgs. The pkgs file is to be placed under a path with the Python script. with transform incoming data, regardless of the original file delimiter, in Python processing is used "\ T" to deal with. Test_gid2 table contains fields: Gid,phone_model,usertags,installed_applist tes4.py script content:Import CodecsImport OSImport reImport SYS lt1=[]f1=codecs.open ('pkgs.txt', ' r ', ' Utf-8 ')For I in F1.readlines ():Line=i.strip ()Lt1.append (line)f1.close () For lines in Sys.stdin: Arr=lines.strip (). Split (' \ t ')pkgs=re.split ("; |,", arr[-1]) For J in LT1:if J in pkgs:print ' \ t '. Join (arr) Break #避免重复读入数据, once you have a match on the Pkg to exit this cycle, to match the next row of data Note: Add two files to hive, one is the py script, test4.py One is a text file Pkgs.txtSQL script: Select TRANSFORM(gid,phone_model,usertags,installed_applist) USING ' python test4.py 'As (gid,phone_model,usertags,installed_applist)From Test_gid2;=================pkgs.txt file content format, one line at a pkgname:
com.tencent.mm
Com.tencent.mobileqq
Cn.testin.allintest
Hivepython also read into Python and handle external files in Python scripts txt