Which of the following is more efficient for text parsing using shell scripts than php and python?

Source: Internet
Author: User
Shell has many powerful commands, such as awk, sort, and grep. Which of the following statements has higher execution efficiency than php or python? Shell has many powerful commands, such as awk, sort, and grep. Which of the following statements has higher execution efficiency than php or python? Reply: I did a test to analyze dozens of GB of logs every day N years ago. In the Linux environment (Redhat ES 3), the test processes a log file of hundreds of megabytes for summary analysis (Mail Log), respectively using C, Perl, Python, shell does the same processing. The processing speed ranking is C> Perl> python> shell. C is the fastest, at least one order of magnitude faster than others; followed by Perl. After all, it is generated for text processing and the most powerful built-in regular expressions; Python is slower than Perl, remember that the speed is about 60% of Perl; shell is the slowest. Although sed, grep, and awk are not slow (in fact, they are all written in C), the efficiency of shell Combination is much worse. When the server temporarily analyzes some log data, it is basically awk sort grep uniq sed and so on, the performance is very handsome.
For features that need to be processed for a long time, php or python scripts are generally made, and most of them will be run in crontab.

When the magnitude is small, development efficiency and maintainability are often more important than performance. When performance is more important, shell php python is not a good choice. Parallel Computing solutions such as hadoop will be more reliable. After all, the computing capability of each node is very easy to touch. Shell is generally used when the amount of data processed is small. Complete functions, easy to use, and fast.
We recommend that you use advanced languages when the data volume is large. If it is repetitive, remember to write scripts to facilitate reuse.
A large amount of data may cause errors and performance problems. You have encountered a problem before. You can use grep to search for a file as a query condition and cannot find it in another file... another amazing thing is that a text of millions of lines is duplicated after being de-duplicated with uniq... Common functions and shell are highly efficient, because most of these classic commands are implemented in C/C ++.
However, in some remote or complex functions, limited to the functions of these commands, you need to "save the country by curve" and repeat the data to achieve the final result.
At this time, php and python are used directly, and the speed is faster.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.