Hadoop
Unzip the GZ file to a text file
$ Hadoop fs-text/hdfs_path/compressed_file.gz | Hadoop Fs-put-/tmp/uncompressed-file.txt
Unzip the local file Gz file and upload it to HDFs
$ gunzip-c filename.txt.gz | Hadoop Fs-put-/tmp/filename.txt
Using awk to process CSV files, refer to using awk and friends with Hadoop:
$ Hadoop Fs-cat People.txt | Awk-f "," ' {print $ ', "$", "$3$4$5}" | Hadoop fs-put-people-coalesed.txt
Create a Lzo file, upload to HDFs, and add an index:
$ lzop-uf data.txt
$ hadoop fs-movefromlocal data.txt.lzo/tmp/
# 1. Standalone
$ Hadoop jar/usr/lib/hadoop/lib/had Oop-lzo.jar Com.hadoop.compression.lzo.lzoindexer/tmp/data.txt.lzo
# 2. Run the Mr
$ Hadoop jar/usr/lib/hadoop/ Lib/hadoop-lzo.jar Com.hadoop.compression.lzo.distributedlzoindexer/tmp/data.txt.lzo
If People.txt is compressed by Lzo, you can extract, manipulate, and compress the file using the following command:
$ Hadoop Fs-cat People.txt.lzo | LZOP-DC | Awk-f "," ' {print $ ', "$", "$3$4$5}" | lzop-c | Hadoop Fs-put-people-coalesed.txt.lzo
Hive
Running in the background:
$ nohup hive-f sample.hql > Output.out 2>&1 &
$ nohup hive--database "Default"-E "select * FROM table name; "> Output.out 2>&1 &
Replace delimiter:
$ hive--database "Default"-F query.hql 2> err.txt | sed ' s/[\t]/,/g ' 1> output.txt
To print a table header:
$ hive--database "default"-E "SHOW COLUMNS from table_name;" | TR ' [: Lower:] ' [: Upper:] ' | Tr ' \ n ', ' 1> headers.txt
$ hive--database ' default '-e ' SET hive.cli.print.header=true; SELECT * FROM table_name limit 0; "| TR ' [: Lower:] ' [: Upper:] ' | Sed ' s/[\t]/,/g ' 1> headers.txt
To view the execution time:
$ HIVE-E "SELECT * from tablename;" 2> err.txt 1> out.txt
$ cat Err.txt | grep "time Taken:" | awk ' {print $3,$6} '