What is the more efficient shell script for text parsing than PHP and python?

Source: Internet
Author: User
Keywords Python shell grep sort awk
There are many powerful commands in the shell, such as awk, sort, grep, and so on, which are more efficient to execute than the language implementations of PHP and Python?

Reply content:

N years ago in order to analyze dozens of GB of log every day, just tested. The test is performed in a Linux environment (Redhat ES 3), and the test processes a hundreds of-megabyte log file for summary analysis (mail logs), using C, Perl, Python,shell to do the same. Processing speed ranking is C>>perl>python>shell. C is the fastest, at least one order of magnitude faster than the other, and Perl, after all, is the strongest built-in regular expression for text processing; Python is a bit slower than Perl, remembering that the speed is about 60% of Perl; The shell is the slowest, though sed, grep, Awk is not slow (in fact, it is written in C), but the combination of the shell with the efficiency is still a lot worse. When you temporarily analyze some log data on the server, it's basically the awk sort grep uniq sed, and the performance is pretty good.
If it is a long-term need to deal with the function, generally will be made of PHP or Python script, most will throw crontab run.

When the magnitude is small, development efficiency and maintainability are often more important than performance. Shell PHP Python is not a good choice when performance is more important. Parallel computing schemes such as Hadoop will be more reliable, after all, the computing power of the unit nodes is very easy to touch the top. The shell is generally used in cases where the amount of data processed is small. Full-featured, handy, very convenient, fast.
High-level languages are recommended in the case of large amounts of data. If it is repetitive, remember to write a script to facilitate reuse.
A large amount of data can cause errors and performance problems, before you have encountered a problem, a file as a query condition with grep in another file to find, the case can not find out ... There is also a more wonderful, millions of lines of a text with Uniq to go after the heavy or a variety of repetition ... Normal functionality, the shell is highly efficient, because most of these classic commands are implemented in C/s + +.
However, in some remote or complex functions, limited to the functions of these commands themselves, the need for "curve salvation", repeated Daoteng data to achieve the final result.
At this time, directly with PHP, Python, the speed is faster.
  • Related Article

    Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.