Basic Linux Tutorial: Use awk to delete data before the specified date in hdfs
Business background
It is agreed that the HDFS data five days ago is the expired version data, and an awk script is written to automatically delete the expired version data.
$ Hadoop fs-ls/user/pms/workspace/ouyangyewei/data
Found 9 items
Drwxr-xr-x-pms 0/user/pms/workspace/ouyangyewei/data/2015-08-01
Drwxr-xr-x-pms 0/user/pms/workspace/ouyangyewei/data/2015-08-02
Drwxr-xr-x-pms 0/user/pms/workspace/ouyangyewei/data/2015-08-03
Drwxr-xr-x-pms 0/user/pms/workspace/ouyangyewei/data/2015-08-04
Drwxr-xr-x-pms 0/user/pms/workspace/ouyangyewei/data/2015-08-05
Drwxr-xr-x-pms 0/user/pms/workspace/ouyangyewei/data/2015-08-06
Drwxr-xr-x-pms 0/user/pms/workspace/ouyangyewei/data/2015-08-07
Drwxr-xr-x-pms 0/user/pms/workspace/ouyangyewei/data/
Drwxr-xr-x-pms 0/user/pms/workspace/ouyangyewei/data/
Script implementation
#---------------------------------------------------------
#
# Delete previous versions (data of expired versions five days ago)
#
#---------------------------------------------------------
Old_version = $ (hadoop fs-ls/user/pms/workspace/ouyangyewei/data | awk 'in in {five_days_ago = strftime ("% F", systime () -5*24*3600)} {split ($8, arr, "/"); if (arr [7] <five_days_ago) {printf "% s \ n ", $8 }}')
Arr = ($ {old_version ///})
For version in $ {arr [@]}
Do
Hadoop fs-rmr $ version
Done
After execution
$ Hadoop fs-ls/user/pms/workspace/ouyangyewei/data
Found 4 items
Drwxr-xr-x-pms 0/user/pms/workspace/ouyangyewei/data/2015-08-06
Drwxr-xr-x-pms 0/user/pms/workspace/ouyangyewei/data/2015-08-07
Drwxr-xr-x-pms 0/user/pms/workspace/ouyangyewei/data/
Drwxr-xr-x-pms 0/user/pms/workspace/ouyangyewei/data/
Awk value assignment operator
Awk escape sequence and Arithmetic Operator
Introduction and use of AWK
AWK introduction and Examples
Shell script-AWK text editor syntax
Learning and using AWK in Regular Expressions
AWK diagram of Text Data Processing
How to Use the awk command in Linux
Text Analysis Tool-awk
This article permanently updates the link address: