How Python handles the time-out of regular expressions

Source: Internet
Author: User

Recently encountered a problem in the project, is the need to use regular matching some of the suspected dark chain and hanging horse HTML code, and the company's boss to the regular expression of some place is not rigorous enough, resulting in a match at the time of the phenomenon of death, and the logic behind the natural can not be executed. Although the use of regular expressions to determine the dark chain and hanging horse may not be so accurate or few people in the industry to do so, but this article does not discuss how to use the correct posture to determine the dark chain hanging horse, only focus on and regular timeout processing.

When using regular expressions, if the regular writing is too bad, the time spent is amazing, and it is possible to always backtrack, and the phenomenon of death, so the general large companies will have special people to optimize the regular, thereby improving the efficiency of the program. In general, do not allow users to enter regular matches if possible. But now there is no special person to do the regular optimization, I also do not know enough about the regular, so only from another point of view to consider the issue of processing timeouts.
The first thing I think of is to open a thread to match, wait in the main thread, and kill the child thread if it finds that the child thread has not returned within the specified time. It's also a scenario, but I'm going to introduce another scenario that comes from a blog I read online.

Blog Address

The blog gives another way, is to use a signal, before the regular match to define a signal, and the violation of the time and processing functions, if the program does not end in the specified time to trigger a Timeouterror exception, and the main thread received this exception will be interrupted execution, And deal with this anomaly so that we can get out of the regular match and achieve the results we want. There are two disadvantages to this approach:

    1. Signal this thing is unique to Linux and does not apply under Windows
    2. The signal can only be used in the main thread, and if a regular match is made in a child thread, then this method does not apply

My project is running on a Linux system, so it's acceptable for the first one, but my regular match is in the sub-thread, so it's not too much of a solution at the glance, but fortunately I found in the comments later that the blogger gave a solution to the second problem-to open up a sub-process, Put the regular match into the sub-thread, so that you can take full advantage of multi-core (after all, multithreading in Python is a pseudo-multithreading), and secondly, can be easily used to solve the problem, the following is the actual code

import  re import  multiprocessingimport  signaldef  Time_out (b, c): raise  timeouterror  def  Search_with_timeout (pipe, Word, value): signal.signal (signal. SIGALRM, Time_out) signal.alarm (1 ) R =  re. compile     (word) try : Ret =  r.search (value, re. I) B_ret =  true  if  ret !=<    /span> none  else  false  pipe.send (B_ret) except  timeouterror : Pipe.send (false )  

In the above code, the first definition of a signal, a given 1s after the trigger, the trigger function is time_out and then execute the regular expression, if in this 1s can not be completed, then the processing function will be called, will run an exception, the main thread terminates the execution of the current task, into the exception processing flow, This allows the regular match to be terminated, thus returning normally. Since this part is a new process naturally involves communication between different processes, in this example I used a pipeline to communicate. Since Python can be used to pass parameters when creating a subprocess, I just need a pipeline to write the data from the child process and then read it from the Zhu Jincheng.
The following is the code that invokes the child process:

==== (pipe[0#等待进程的结束=  pipe[1#获取管道中的数据

How Python handles the time-out of regular expressions

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.