Linux Files are sorted by column to get top-related awk target: Data in Hive, sorted by the total traffic size of users in each category. now we need to retrieve the top10. Hive of each category and use order by categoryId and traffic desc to sort the data, but we cannot obtain top for each categoryId. because limit affects the entire final result, it cannot be used. finally, we decided to export the text and use awk to get the top10. Script: hive-e "select category, traffic from log_table where pt = $ yesterday order by category, traffic desc "| awk '{if (cate [$1] <10) {cate [$1] ++; print $0} '> result.txt to get the top according to the category. the disadvantage is that it requires a large amount of data, which may not work, but is linear. in the beginning, the data is only exported using select. Therefore, you must first use sort for a large sort, then awk, and finally perform sort: sort-k 2-r-n result.txt | awk '{if (cate [$1] <10) {cate [$1] ++; print $0} '| sort-k1-k2-r is sensitive to the data volume due to sorting. In the end, we decided to sort the data in Hive, so we did it.