Phenomenon: Hourly data in hive tables, which are missing one hours every few days, are found to fail when data aggregation cat is done, resulting in:
Modify the script to do the following solution:
# #merge 5min data into hour data &NBSP ; , &NB Sp , &NB Sp cat $datapath/news_5min_$xhour* > $lo Calpath/data/channelnews_$hour.txt &NB Sp # # # #check tmppath= "${localpath}/data/channelnews_${hour}.txt" i=0 nbsp , &NB Sp , &NB Sp , &NB Sp while ($i <) &NBS P , &NB Sp , &NB Sp , &NB Sp do &N Bsp , &NB Sp , &NB Sp , &NB Sp &NBSP;&NBSP;&NBSP ; m= ' du-b $path | awk ' {print int ($)} ' , &NB Sp , &NB Sp &NBSP if [$m-lt]; , &N Bsp , &NB Sp , &NB Sp then , &NB Sp , &NB Sp , &NB Sp , &NB Sp echo "${path} is Small, is $m" &NB Sp , &NB Sp , &NB Sp , &NB Sp &NBSP;&NBSP; sleep 5 &NB Sp , &NB Sp , &NB Sp , &NB Sp else , &NB Sp , &NB Sp , &NB Sp , &NB Sp break , &N Bsp , &NB Sp , &NB Sp fi &N Bsp , &NB Sp , &NB Sp , &NB Sp , &NB sp; Let "i++" &N Bsp , &NB Sp , &NB Sp , &NB Sp Done echo "I is: $i"
The unreliable use of the shell Cat command causes hourly data in the Hive table to be missing for one hours every few days