If you have a 12 million gb csv file, there are more than records and each record has 50 columns. What you need to do now is to add all the values in a column. What do you do?
This is the beginning of a recent article. This article describes how to use unix commands to analyze large-volume files. A programmer like me who is basically developed on a Windows platform first came up with a memory overflow, CPU usage of 100%, and one night. For a Linux/unix expert, this is really a piece of cake. As described in the article, a line of commands completes this task.
(The hypothetical file name is data.csv. The row data is separated by a vertical line. We need the fourth column in total .)
| Cat data.csv | awk-F "|" '{sum + = $4} END {printf "%. 2f \ n", sum }' |
It is concise and clean, and the {sum + = $4} in it is very attractive to closures. It deepened my yearning for Linux.
I wrote this blog not to express my feelings for Linux, but to focus on what happened in this article. This article attracted comments from many Linux enthusiasts, all of which were Liunx masters. It was just the first comment. After reading it, I was surprised that I couldn't close my mouth.
The comments about the name of California Lotto are:
If you think you are a Linux Command Line expert, congratulations! You have won the "most useless Cat usage" award today. You should write this command like this:
awk -F "|" '{ sum += $4 } END { printf "%.2f\n", sum }' < data.csv
Indeed, cat seems unnecessary here. I admire it first, but I immediately feel this person is really annoying. Although I do not like him to ignore the hard work of the author in writing this blog, as a layman, he and the author are just as difficult as me.
However, when I read the second comment, things changed dramatically. The second comment is obviously true to the first comment:
If you think you are a Linux Command Line expert, congratulations on winning the "most useless redirection usage" award today. You should write this command like this:
awk -F "|" '{ sum += $4 } END { printf "%.2f\n", sum }' data.csv
As the saying goes, a mountain is higher than a mountain. I suddenly realized that there is no need to be better than anyone here. In this vast Internet, there will always be people better than you in a certain method. It is important to discuss and participate. This discussion not only enriches your knowledge, but also provides a richer understanding of the solution to the problem. As I will continue to point out in the following comments, the command line redirection can be stored anywhere. It is also useful to write it like this:
| <Data.csv awk-F "|" '{sum + = $4} END {printf "%. 2f \ n", sum }' |
Amazing! Some people continue to point out that the author wrote this in his experiment:
| Head-1 data. psv | awk-F' | ''{print NF }' |
After the experiment is successful, it is logical to directly change the head to cat.
In any case, this is a good article. These people are masters and are my teachers. They not only teach me programming knowledge, but also how to behave.