Summary of the date regular expression

Source: Internet
Author: User
Tags iso 8601
As a PHP programmer, we have to contact with the date regular expression, so how much do you think of the date regular expression as a programmer? In this article, we will take you in-depth study of the date regular expression.

1 overview

Dates are generally required for formatting and are used when data is not entered directly by the user. Because of the different application scenarios, the regular writing is also different and the complexity is naturally different. Regular writing needs to be analyzed according to the specific situation, a basic principle is: write only appropriate, do not write complex.

For date extraction, as long as you can separate from the non-date area, write the simplest regular, such as

\D{4}-\D{2}-\D{2}

If the date in the YYYY-MM-DD format can be uniquely positioned in the source string, it can be used as an extract.

For validation, it is not significant to verify the character composition and format, but also to add a checksum to the rule. Because of the existence of leap years, the validation of dates becomes more complex.

Consider the valid range of dates and what is a leap year.

Rules for the period 2nd

2.1 Valid range of dates

For a valid range of dates, different scenarios will vary.

The valid range of DateTime objects defined in MSDN is: 0001-01-01 00:00:00 to 9999-12-31 23:59:59.

The Unix timestamp is 0 according to the ISO 8601 specification: 1970-01-01t00:00:00z.

In practice, the range of dates does not exceed the range specified by DateTime, so the regular validation takes the range of dates commonly used.

2.2 What is a leap year

(The following excerpt from Baidu Encyclopedia)

Leap year is set up to compensate for the time difference between the number of years due to the human calendar and the actual Earth cycle. The year in which the time difference is offset is leap years.

The cycle of the Earth around the day is 365 days, 5 hours, 48 minutes, 46 seconds (365.24219 days), that is, a year of regression (tropical years). The common year of the Gregorian calendar is only 365 days, which is about 0.2422 days shorter than the year of reunification, and accumulates about one day every four years, adding this day to the end of February (i.e. February 29), so that the length of the year becomes 366 days, which is a leap year.

It should be noted that the current Gregorian calendar is based on the Roman "Julian calendar". Since there was no understanding of the problem of 0.0078 days a year, from 46 BC to 16th century, a total of more than 10 days. For this reason, Pope Greg 13 of the time, the October 5, 1582 man was set for October 15. and started a new leap year rule. That is to specify that the Gregorian year is the whole hundred number, must be a multiple of 400 is a leap year, not a multiple of 400 is common year. For example, 1700, 1800 and 1900 are common year, and 2000 is a leap year. Thereafter, the average annual length is 365.2425 days, and about 4 years there is a 1-day deviation. According to a leap year in every four years, an average of 0.0078 days a year, after 400 years will be more than 3 days, therefore, every 400 years to reduce three leap year. Leap year calculation, which is usually said: four years a leap, a century does not leap, 400 years again leap.

2.3 Format of the date

Depending on the language culture, the date hyphen will vary, usually in the following formats:

YyyyMMdd

Yyyy-mm-dd

Yyyy/mm/dd

yyyy. Mm.dd

Regular expression construction for the 3rd period

3.1 Rule Analysis

A common method of writing complex regular is to separate the unrelated requirements, write the corresponding regular, then combine, check the relationship between each other and influence, basically can draw corresponding regular.

According to the definition of leap year, there are several ways to classify dates.

3.1.1 Divided into two categories depending on whether the number of days is related to the year

In the category unrelated to the year, depending on the number of days per month, it can be subdivided into two categories

1, 3, 5, 7, 8, 10, December for 1-31 days

4, 6, 9, November for 1-30 days

In a class related to the year

Common year February for 1-28 days

Leap year February for 1-29 days

All months of all years are inclusive of 1-28 days

All years are inclusive of 29 and 30th, except February

All years 1, 3, 5, 7, 8, 10, December all include 31st

Leap year February includes 29th

3.1.2 can be divided into four categories depending on the included date

3.1.3 Classification Method selection

Because the implementation after the date classification is implemented by this branch structure (EXP1|EXP2|EXP3), the branch structure is started from the left branch to the right to try to match, and when a branch match succeeds, it no longer attempts to the right, otherwise it tries all branches and reports a failure.

The number of branches, the complexity of each branch will affect the matching efficiency, considering the probability distribution of the date of validation, most of them fall to 1-28 days, so using the second classification method, will effectively improve the matching efficiency.

3.2 Regular implementations

Using the classification method of 3.1.2 section, we can write the corresponding regular rules for each rule, and the following is implemented by the MM-DD format.

First three rules regardless of year, year can be unified writing

(?! 0000) [0-9]{4}

The following only takes into account the month and day of the regular

The months of all years, including common year, are inclusive of 1-28 days

(0[1-9]|1[0-2])-(0[1-9]|1[0-9]|2[0-8])

All years, including common year, are inclusive of 29 and 30th, except February

(0[13-9]|1[0-2])-(29|30)

All years including common year 1, 3, 5, 7, 8, 10, December all inclusive 31st

(0[13578]|1[02])-31)

Together, all dates except February 29 of a leap year

(?! 0000) [0-9]{4}-(0[1-9]|1[0-2])-(0[1-9]|1[0-9]|2[0-8]) | ( 0[13-9]|1[0-2])-(29|30) | (0[13578]|1[02])-31)


Next consider the implementation of leap years

Leap year February includes 29th

The month and day here are fixed, that is 02-29, only the year is changed.

All leap year years can be output through the following code, examining the rules

for (int i = 1; i < 10000; i++) {  if ((i% 4 = = 0 && i%! = 0) | | I% = = 0) {   Richtextbox2.text + = String. Format ("{0:0000}", i) + "\ n";  }}

According to the rules of leap year, it is easy to sort out rules, four years a leap;

([0-9]{2} (0[48]|[ 2468][048]| [13579] [26])

Century does not leap, 400 years again leap.

(0[48]| [2468] [048]| [13579] [26]) 00

Together is the February 29 of all leap years

([0-9]{2} (0[48]|[ 2468][048]| [13579] [26]) | (0[48]| [2468] [048]| [13579] [26]) 00)-02-29)

Four rules have been implemented and have no effect on each other, together is the regular of all dates that meet the DateTime range

^((?! 0000) [0-9]{4}-(0[1-9]|1[0-2])-(0[1-9]|1[0-9]|2[0-8]) | ( 0[13-9]|1[0-2])-(29|30) | (0[13578]|1[02])-31) | ([0-9]{2} (0[48]|[ 2468][048]| [13579] [26]) | (0[48]| [2468] [048]| [13579] [26]) 00)-02-29) $


Given that the regular expression is only used as validation, the capturing group is meaningless, consumes only resources, affects matching efficiency, and can be optimized using non-capturing groups.

^(?:(?! 0000) [0-9]{4}-(?:(?: 0 [1-9]|1[0-2])-(?: 0 [1-9]|1[0-9]|2[0-8]) | (?: 0 [13-9]|1[0-2])-(?: 29|30) | (?: 0 [13578]|1[02])-31) | (?: [0-9]{2} (?: 0 [48]| [2468] [048]| [13579] [26]) | (?: 0 [48]| [2468] [048]| [13579] [26]) 00)-02-29) $

The above regular year 0001-9999, format yyyy-mm-dd. You can verify the validity and performance of the regular using the following code

DateTime dt = new DateTime (1, 1, 1);D atetime endday = new DateTime (9999, 12, 31); Stopwatch SW = new Stopwatch (); SW.   Start (); Regex Dateregex = new Regex (@ "^ (?:(?! 0000) [0-9]{4}-(?:(?: 0 [1-9]|1[0-2])-(?: 0 [1-9]|1[0-9]|2[0-8]) | (?: 0 [13-9]|1[0-2])-(?: 29|30) | (?: 0 [13578]|1[02])-31) | (?: [0-9]{2} (?: 0 [48]| [2468] [048]| [13579] [26]) | (?: 0 [48]| [2468] [048]| [13579] [26])   00)-02-29) $ "); Regex Dateregex = new Regex (@ "^ (?! 0000) [0-9]{4}-(0[1-9]|1[0-2])-(0[1-9]|1[0-9]|2[0-8]) | ( 0[13-9]|1[0-2])-(29|30) | (0[13578]|1[02])-31) | ([0-9]{2} (0[48]|[ 2468][048]| [13579] [26]) | (0[48]| [2468] [048]| [13579] [26]) 00)-02-29) $ "); Console.WriteLine ("Start date:" + dt.) ToString ("Yyyy-mm-dd")), while (DT < endday) {if (!dateregex.ismatch (DT). ToString ("Yyyy-mm-dd"))) {Console.WriteLine (dt.  ToString ("yyyy-mm-dd") + "false"); } dt = dt. AddDays (1);} if (!dateregex.ismatch (dt. ToString ("Yyyy-mm-dd"))) {Console.WriteLine (dt. ToString ("yyyy-mm-dd") + "false");} Console.WriteLine ("End Date:" + dt.) ToString ("Yyyy-mm-dd")); SW. Stop (); Console.writeliNE ("Test time:" + SW.) Elapsedmilliseconds + "MS"); Console.WriteLine ("Test done! "); Console.ReadLine ();


Regular expression extension for the 4th period

4.1 "Month Day" form expansion

The above implementation is the YYYY-MM-DD format of the date validation, taking into account the different hyphens, as well as the month and day may be M and D, that is, the yyyy-m-d format, can be extended to the above regular

^(?:(?! 0000) [0-9]{4} ([-/.]?) (?:(?: 0? [1-9]|1[0-2]) ([-/.]) (?: 0?) [1-9]|1[0-9]|2[0-8]) | (?: 0?) [13-9]|1[0-2]) ([-/.]) (?: 29|30) | (?: 0?) [13578]|1[02]) ([-/.]) 31) | (?: [0-9]{2} (?: 0 [48]| [2468] [048]| [13579] [26]) | (?: 0 [48]| [2468] [048]| [13579] [26]) 00) ([-/.]?) 0?2 ([-/.]?) 29) $


Use reverse reference for simplification, year 0001-9999, format Yyyy-mm-dd or yyyy-m-d, hyphen can be no or "-", "/", "." One.

^(?:(?! 0000) [0-9]{4} ([-/.]?) (?:(?: 0? [1-9]|1[0-2]) \1 (?: 0?) [1-9]|1[0-9]|2[0-8]) | (?: 0?) [13-9]|1[0-2]) \1 (?: 29|30) | (?: 0?) [13578]|1[02]) \1 (?: 31)) | (?: [0-9]{2} (?: 0 [48]| [2468] [048]| [13579] [26]) | (?: 0 [48]| [2468] [048]| [13579] [26]) 00) ([-/.]?) 0?2\2 (?: 29)) $


This is the "month and day" This form of the most complete a regular, the different meanings of the parts in different colors identified, can be based on their own needs to be planted shears.

4.2 Other forms of expansion

Understanding the meaning of the above-mentioned regular parts, the relationship between each other, it is easy to expand into other formats of the date regular, such as dd/mm/yyyy this "day and age" format of the date.

^ (?:(?:(?: 0? [1-9]|1[0-9]|2[0-8]) ([-/.]) (?: 0?) [1-9]|1[0-2]) | (?: 29|30) ([-/.]?) (?: 0?) [13-9]|1[0-2]) |31 ([-/.]?) (?: 0?) [13578]|1[02])) ([-/.]) (?! 0000) [0-9]{4}|29 ([-/.]?) 0?2 ([-/.]?) (?: [0-9]{2} (?: 0 [48]| [2468] [048]| [13579] [26]) | (?: 0 [48]| [2468] [048]| [13579] [26]) 00)) $


This format needs to be noted that it is not possible to use a reverse reference for optimization. Even characters can be cut according to their own needs.

4.3 Adding a time extension

Time specification is very clear, also very simple, basically on the HH:MM:SS and h:m:s two kinds of forms.

([01][0-9]|2[0-3]): [0-5][0-9]:[0-5][0-9]

Fit into the date of the regular, Yyyy-mm-dd HH:mm:ss

^(?:(?! 0000) [0-9]{4}-(?:(?: 0 [1-9]|1[0-2])-(?: 0 [1-9]|1[0-9]|2[0-8]) | (?: 0 [13-9]|1[0-2])-(?: 29|30) | (?: 0 [13578]|1[02])-31) | (?: [0-9]{2} (?: 0 [48]| [2468] [048]| [13579] [26]) | (?: 0 [48]| [2468] [048]| [13579] [26]) XX) -02-29) \s+ ([01][0-9]|2[0-3]): [0-5][0-9]:[0-5][0-9]$

4.4-Year Custom

All of the above related to common year in the year, using 0001-9999. Of course, years can also be customized according to leap year rules.

If the year 1600-9999, format Yyyy-mm-dd or yyyy-m-d, the hyphen can be no or "-", "/", "." One.

^(?:(? : 1[6-9]| [2-9] [0-9]) [0-9] {2} ([-/.]?) (?:(?: 0? [1-9]|1[0-2]) \1 (?: 0?) [1-9]|1[0-9]|2[0-8]) | (?: 0?) [13-9]|1[0-2]) \1 (?: 29|30) | (?: 0?) [13578]|1[02]) \1 (?: 31)) | (?:(? : 1[6-9]| [2-9] [0-9]) (?: 0 [48]| [2468] [048]| [13579] [26]) | (?: 16| [2468] [048]| [3579] [26]) 00) ([-/.]?) 0?2\2 (?: 29)) $

5 Special Instructions

The above is the most basic regular grammar rules, the majority of the use of traditional NFA engine language can support, including JavaScript, Java,. NET, and so on.

Another requirement is that, although the date rules are relatively clear, can be cut in this way to get the date that meets the requirements of the regular, but it is not recommended to use the regular, the power of the regular is its flexibility, can be tailored to the needs of the most appropriate regular, if only to apply the template, The regular is not called the regular.

Regular grammar rules are not many, and easy to get started, master grammar rules, tailored, is the "Tao".

6 Applications

First, look at the demand

Input of the date:

Manual input, can be entered in two formats YYYYMMDD or YYYY-MM-DD

Second, the solution of ideas

The user enters the date manually and needs to verify the date format entered

The user's possible input can be divided into the following types:

(1). The input is empty or a space

(2). Enter a non-date format

Depending on the date format saved to the database, the saved format is YYYY-MM-DD, so the user needs to convert to YYYY-MM-DD after entering YYYYMMDD.

Ideas:

Validating the date format, the first to think of is the validation control of VS, but because there are dozens of controls to validate, the use of validation controls requires a pull control, and if the later needs to be modified is cumbersome, and through the JS implementation control, and then through the regular expression to verify the date.

Third, JS implementation

Validation Date function date (id) {var idvalue = document.getElementById (ID). value;  By finding the element var tmpstr = "";  var strreturn = "";  Call Trim () to remove whitespace because JS does not support trim () var Iidno = Trim (idvalue); Regular expressions, which determine the date format, include the date's bounds, the format of the date, common year and leap year var v = idvalue.match (/^ (((1[6-9]|[ 2-9]\d) \d{2})-(0?[ 13578]|1[02])-(0?[ 1-9]| [12]\d|3[01]) | (((1[6-9]| [2-9]\d) \d{2})-(0?[ 13456789]|1[012])-(0?[ 1-9]| [12]\d|30)] | (((1[6-9]| [2-9]\d] \d{2}) -0?2-(0?[ 1-9]|1\D|2[0-8]) | (((1[6-9]| [2-9]\d] (0[48]|[ 2468][048]| [13579] [26]) | ((16| [2468] [048]| [3579] [26])  00))) ( -0?2-29-)) $/);  Skip detect if input is empty (Iidno.length = = 0) {return false;    }//automatically change the date format to Yyyy-mm-dd if (iidno.length = = 8) {tmpstr = iidno.substring (0, 8); TMPSTR = tmpstr.substring (0, 4) + "-" + tmpstr.substring (4, 6) + "-" + tmpstr.substring (6, 8) document.getElementById (i    d). value = Tmpstr;  document.getElementById (ID). focus ();    }//Verify, Determine date format if ((Iidno.length! = 8) &&!v) {strreturn = "date format error, hint: 19990101 or 1999-01-01";    alert (Strreturn); Document.geteleMentbyid (ID). Select ();  return false; }}//use regular expressions to remove whitespace at both ends of a string (because JS does not support trim ()) function trim (str) {return str.replace (/(^\s*) | ( \s*$)/g, "");} Foreground call (get Focus trigger) <input class= "Txtenterschooldate" size= "+" type= "text" id= "Txtenterschooldate" Name= " Txtenterschooldate "onblur=" date (' txtenterschooldate ') "/>

The above is about the date regular expression of ideas, if you feel useful then quickly collect it.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.