Discussion on the regular expression of JS number format
Before we mentioned the JS digital format "Talking about JS number format type", before the "JS regular practice syntax highlighting" also mentioned in the optimization of the number matching regular.
But recently the leaves gave me a regular, let me suddenly enlightened, more sharp than I write, so today take it out briefly ( just say the decimal part of the match ).
First look at the regular I wrote before:/\d+ (?: \. \d+)? (?: [ee][+-]?\d+)? | \.\d+ (?: [ee][+-]?\d+)?/
The dead leaves found in jQuery : /(?: \ d*\.|) \d+ (?: [ee][+-]?\d+|) / (PS: I've removed [+ +] because there's no need to match that. )
It's obviously a lot sharper.
My train of thought is actually very simple, according to the official description and then wrote a bloated regular.
In the MDN JavaScript guide Values, variables, and literals#floating-point literals can be seen in the section.
The syntax for the JS number format is described as [(+|-)][digits][.digits][(e|e) [(+|-)]digits] (PS: This is not a regular)
So I wrote a rough expression (?: \ d+)? (?:\. \d+)? (?: [ee][+-]?\d+)?
At that time looked more comfortable, but in the test, I found a serious problem, can be empty match, simply said that any empty string can match successfully.
This is a serious bug, so I split it into two parts, fixed the bug, so I got the above bloated code, there is no way to limit the level.
In fact, I think is too simple, I just follow the traditional idea of writing a regular, first match the integer, and then match the decimal, the last match index ...
Then look at the regular in JQuery /(?: \ d*\.|) \d+ (?: [ee][+-]?\d+|) / The writing is too domineering.
His idea is to match the floating point number first, then match the integer after the decimal point, and then match the exponent, the same is 3 parts, but the matching order is different.
Of course, if you do not match the floating-point number, you can discard the match and directly match the integer and exponential parts.
This makes it unnecessary to split into two expressions.
Let's do a test, get rid of | try, test data as follows:
?
123 1.23 1.2e3 1.2e+3 1.2e-3 .123 .12e3 .12e+3 .12e-3 |
Regular: /(?: \ D*\.) \d+ (?: [ee][+-]?\d+|) /
It is found that the 123 does not match and the floating-point data behind it can be matched correctly.
Why is this, because (?: \ D*\.) is required to match to . , so the integer cannot be matched successfully.
(?:\ d*\.|) You can backtrack in the case of a match failure and then find that the second expression is empty, that is, a mismatch.
Naturally, the position is matched to the \d+ in the back, so the integer can be matched successfully.
This regular domineering is the use of the least code to achieve optimal performance, of course, the integer condition backtracking is necessary, but the performance will not be much of a problem.
Let's take a look at the test.
For 1 million match tests of floating-point and integer, you can see that the floating-point test is 0.1 seconds apart and the integer is 0.2 seconds apart.
This is already a very small performance difference, some of the garbage regular, 1 million tests may be more than 10 or even dozens of seconds.
Such a small knowledge point let me open up the horizon, in fact, just write less, see less, so has been to the, technology this thing, must look, think more, write more.
All right, today's share is over, see you tomorrow.