1. Overview
The first point to note, whether it is Winform, or Webform, has a very mature calendar control, whether in terms of ease of use or extensibility, the selection and validation of dates or calendar control to achieve better.
A few days ago in CSDN several sections to see the need to date regular posts, so organized this article, and everyone to discuss the exchange, such as missing or wrong place, also please correct me.
Dates are generally required for formatting and are used when data is not entered directly by the user. Because of the different application scenarios, the regular writing is also different and the complexity is naturally different. Regular writing needs to be analyzed according to the specific situation, a basic principle is: write only appropriate, do not write complex .
For date extraction, as long as you can separate from the non-date area, write the simplest regular, such as
\D{4}-\D{2}-\D{2}
If the date in the YYYY-MM-DD format can be uniquely positioned in the source string, it can be used as an extract.
For validation, it is not significant to verify the character composition and format, but also to add a checksum to the rule. Because of the existence of leap years, the validation of dates becomes more complex.
Consider the valid range of dates and what is a leap year.
2. Valid range of Date rule 2.1 date
For a valid range of dates, different scenarios will vary.
The valid range of DateTime objects defined in MSDN is: 0001-01-01 00:00:00 to 9999-12-31 23:59:59.
The UNIX timestamp is 0 according to the ISO 8601 specification: 1970-01-01t00:00:00z.
In practice, the range of dates does not exceed the range specified by DateTime, so the regular validation takes the range of dates commonly used.
2.2 What is a leap year
(The following excerpt from Baidu Encyclopedia)
Leap year is set up to compensate for the time difference between the number of years due to the human calendar and the actual Earth cycle. The year in which the time difference is offset is leap years.
The cycle of the Earth around the day is 365 days, 5 hours, 48 minutes, 46 seconds (365.24219 days), that is, a year of regression (tropical years). The common year of the Gregorian calendar is only 365 days, which is about 0.2422 days shorter than the year of reunification, and accumulates about one day every four years, adding this day to the end of February (i.e. February 29), so that the length of the year becomes 366 days, which is a leap year.
It should be noted that the current Gregorian calendar is based on the Roman "Julian calendar". Since there was no understanding of the problem of 0.0078 days a year, from 46 BC to 16th century, a total of more than 10 days. For this reason, Pope Greg 13 of the time, the October 5, 1582 man was set for October 15. and started a new leap year rule. That is to specify that the Gregorian year is the whole hundred number, must be a multiple of 400 is a leap year, not a multiple of 400 is common year. For example, 1700, 1800 and 1900 are common year, and 2000 is a leap year. Thereafter, the average annual length is 365.2425 days, and about 4 years there is a 1-day deviation. According to a leap year in every four years, an average of 0.0078 days a year, after 400 years will be more than 3 days, therefore, every 400 years to reduce three leap year. Leap year calculation, which is usually said: four years a leap, a century does not leap, 400 years again leap .
2.3 Format of the date
Depending on the language culture, the date hyphen will vary, usually in the following formats:
YyyyMMdd
Yyyy-mm-dd
Yyyy/mm/dd
yyyy. Mm.dd
3, date Regular Expression Construction 3.1 rule analysis
A common method of writing complex regular is to separate the unrelated requirements, write the corresponding regular, then combine, check the relationship between each other and influence, basically can draw corresponding regular.
According to the definition of leap year, there are several ways to classify dates.
3.1.1 Divided into two categories depending on whether the number of days is related to the year
In the category unrelated to the year, depending on the number of days per month, it can be subdivided into two categories
- 1, 3, 5, 7, 8, 10, December for 1-31 days
- 4, 6, 9, November for 1-30 days
In a class related to the year
- Common year February for 1-28 days
- Leap year February for 1-29 days
- All months of all years are inclusive of 1-28 days
- All years are inclusive of 29 and 30th, except February
- All years 1, 3, 5, 7, 8, 10, December all include 31st
- Leap year February includes 29th
3.1.2 can be divided into four categories according to the included dates 3.1.3 Classification Method selection
Because the implementation after the date classification is implemented by this branch structure (EXP1|EXP2|EXP3), the branch structure is started from the left branch to the right to try to match, and when a branch match succeeds, it no longer attempts to the right, otherwise it tries all branches and reports a failure.
The number of branches, the complexity of each branch will affect the matching efficiency, considering the probability distribution of the date of validation, most of them fall to 1-28 days, so using the second classification method, will effectively improve the matching efficiency.
3.2 Regular implementations
Using the classification method of 3.1.2 section, we can write the corresponding regular rules for each rule, and the following is implemented by the MM-DD format.
First three rules regardless of year, year can be unified writing
(?! 0000) [0-9]{4}
The following only takes into account the month and day of the regular
- The months of all years, including common year, are inclusive of 1-28 days
(0[1-9]|1[0-2])-(0[1-9]|1[0-9]|2[0-8])
- All years, including common year, are inclusive of 29 and 30th, except February
(0[13-9]|1[0-2])-(29|30)
- All years including common year 1, 3, 5, 7, 8, 10, December all inclusive 31st
(0[13578]|1[02])-31)
Together, all dates except February 29 of a leap year
(?! 0000) [0-9]{4}-((0[1-9]|1[0-2])-(0[1-9]|1[0-9]|2[0-8]) | ( 0[13-9]|1[0-2])-(29|30) | (0[13578]|1[02])-31)
Next consider the implementation of leap years
- Leap year February includes 29th
The month and day here are fixed, that is 02-29, only the year is changed.
All leap year years can be output through the following code, examining the rules
- for ( int i = 1; i < 10000; i++)
- {
- if ( (i % 4 == 0 && i % 100 != 0) | | i % 400 == 0)
- {
- &NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;&NBSP;RICHTEXTBOX2.TEXT&NBSP;+=&NBSP;STRING&NBSP;. Format ( "{0:0000}" , i) + "\ n" ;
- }
- }&NBSP;&NBSP;
- for (int i = 1; i < 10000; i++)
- {
- if ((i% 4 = = 0 && i%! = 0) | | I% 400 = = 0)
- {
- Richtextbox2.text + = string. Format ("{0:0000}", i) + "\ n";
- }
- }
According to the rules of leap year, it is easy to sort out rules, four years a leap;
([0-9]{2} (0[48]|[ 2468][048]| [13579] [26])
century does not leap, 400 years again leap .
(0[48]| [2468] [048]| [13579] [26]) 00
Together is the February 29 of all leap years
([0-9]{2} (0[48]|[ 2468][048]| [13579] [26]) | (0[48]| [2468] [048]| [13579] [26]) (XX) -- -)
Four rules have been implemented and have no effect on each other, together is the regular of all dates that meet the DateTime range
^((?! 0000) [0-9]{4}-((0[1-9]|1[0-2])-(0[1-9]|1[0-9]|2[0-8]) | ( 0[13-9]|1[0-2])-(29|30) | (0[13578]|1[02])-31) | ([0-9]{2} (0[48]|[ 2468][048]| [13579] [26]) | (0[48]| [2468] [048]| [13579] [26]) 00)-02-29) $
Given that the regular expression is only used as validation, the capturing group is meaningless, consumes only resources, affects matching efficiency, and can be optimized using non-capturing groups.
^(?:(?! 0000) [0-9]{4}-(?:(?: 0 [1-9]|1[0-2])-(?: 0 [1-9]|1[0-9]|2[0-8]) | (?: 0 [13-9]|1[0-2])-(?: 29|30) | (?: 0 [13578]|1[02])-31) | (?: [0-9]{2} (?: 0 [48]| [2468] [048]| [13579] [26]) | (?: 0 [48]| [2468] [048]| [13579] [26]) 00)-02-29) $
The above regular year 0001-9999, format yyyy-mm-dd. You can verify the validity and performance of the regular using the following code
- DateTime dt = new DateTime (1, 1, 1);
- DateTime endday = new DateTime (9999, 12, 31);
- Stopwatch SW = new Stopwatch ();
- Sw. Start ();
- Regex Dateregex = new Regex (@ "^ (?:(?! 0000) [0-9]{4}-(?:(?: 0 [1-9]|1[0-2])-(?: 0 [1-9]|1[0-9]|2[0-8]) | (?: 0 [13-9]|1[0-2])-(?: 29|30) | (?: 0 [13578]|1[02])-31) | (?: [0-9]{2} (?: 0 [48]| [2468] [048]| [13579] [26]) | (?: 0 [48]| [2468] [048]| [13579] [26]) 00)-02-29) $ ");
- Regex Dateregex = new Regex (@ "^ (?! 0000) [0-9]{4}-(0[1-9]|1[0-2])-(0[1-9]|1[0-9]|2[0-8]) | ( 0[13-9]|1[0-2])-(29|30) | (0[13578]|1[02])-31) | ([0-9]{2} (0[48]|[ 2468][048]| [13579] [26]) | (0[48]| [2468] [048]| [13579] [26]) 00)-02-29) $ ");
- Console.WriteLine ("Start date:" + dt.) ToString ("Yyyy-mm-dd"));
- while (dt <= endday)
- {
- if (!dateregex.ismatch (dt. ToString ("Yyyy-mm-dd")))
- {
- Console.WriteLine (dt. ToString ("yyyy-mm-dd") + "false");
- }
- if (dt = = Endday)
- {
- break;
- }
- DT = dt. AddDays (1);
- }
- Console.WriteLine ("End Date:" + dt.) ToString ("Yyyy-mm-dd"));
- Sw. Stop ();
- Console.WriteLine ("Test time:" + SW.) Elapsedmilliseconds + "MS");
- Console.WriteLine ("Test done! " );
- Console.ReadLine ();
- Datetime
- DT = new DateTime (1, 1, 1);
- DateTime endday = new DateTime (9999, 12, 31);
- Stopwatch SW = new Stopwatch ();
- Sw. Start ();
- Regex Dateregex = new
- Regex (@ "^ (?:(?!) 0000) [0-9]{4}-(?:(?: 0 [1-9]|1[0-2])-(?: 0 [1-9]|1[0-9]|2[0-8]) | (?: 0 [13-9]|1[0-2])-(?: 29|30) | (?: 0 [13578]|1[02])-31) | (?: [0-9]{2} (?: 0 [48]| [2468] [048]| [13579] [26]) | (?: 0 [48]| [2468] [048]| [13579] [26]) 00)-02-29) $ ");
- Regex
- Dateregex = new
- The Regex (@ "^ (?! 0000) [0-9]{4}-(0[1-9]|1[0-2])-(0[1-9]|1[0-9]|2[0-8]) |
- (0[13-9]|1[0-2])-(29|30) | (0[13578]|1[02])-31) | ([0-9]{2} (0[48]|[ 2468]
- [048]| [13579] [26]) | (0[48]| [2468] [048]| [13579] [26]) 00)-02-29) $ ");
- Console.WriteLine ("Start date:" + dt.) ToString ("Yyyy-mm-dd"));
- while (dt <= endday)
- {if (!dateregex.ismatch (dt. ToString ("Yyyy-mm-dd"))) {
- Console.WriteLine (dt. ToString ("yyyy-mm-dd") + "false"); } if (dt = =
- Endday) {break;} dt = dt. AddDays (1);
- }
- Console.WriteLine ("End Date:" + dt.) ToString ("Yyyy-mm-dd"));
- Sw. Stop ();
- Console.WriteLine ("Test time:" + SW.) Elapsedmilliseconds + "MS");
- Console.WriteLine ("Test done! ");
- Console.ReadLine ();
4, date Regular expression Extension 4.1 "Month Day" form extension
The above implementation is the YYYY-MM-DD format of the date validation, taking into account the different hyphens, as well as the month and day may be M and D, that is, the yyyy-m-d format, can be extended to the above regular
^(?:(?! 0000) [0-9]{4} ([-/.]?) (?:(?: 0? [1-9]|1[0-2]) ([-/.]) (?: 0?) [1-9]|1[0-9]|2[0-8]) | (?: 0?) [13-9]|1[0-2]) ([-/.]) (?: 29|30) | (?: 0?) [13578]|1[02]) ([-/.]?) 31) | (?: [0-9]{2} (?: 0 [48]| [2468] [048]| [13579] [26]) | (?: 0 [48]| [2468] [048]| [13579] [26]) 00) ([-/]?) 0?2 ([-/.]?) 29) $
Use reverse reference for simplification, year 0001-9999, format Yyyy-mm-dd or yyyy-m-d, hyphen can be no or "-", "/", ". One of
^(?:(?! 0000) [0-9]{4} ([-/.]?) (?:(?: 0? [1-9]|1[0-2]) \1 (?: 0?) [1-9]|1[0-9]|2[0-8]) | (?: 0?) [13-9]|1[0-2]) \1 (?: 29|30) | (?: 0?) [13578]|1[02]) \1 (?: 31)) | (?: [0-9]{2} (?: 0 [48]| [2468] [048]| [13579] [26]) | (?: 0 [48]| [2468] [048]| [13579] [26]) 00) ([-/]?) 0?2 \2 (?: 29)) $
This is the "month and day" This form of the most complete a regular, the different meanings of the parts in different colors identified, can be based on their own needs to be planted shears.
4.2 Other forms of expansion
Understanding the meaning of the above-mentioned regular parts, the relationship between each other, it is easy to expand into other formats of the date regular, such as dd/mm/yyyy this "day and age" format of the date.
^ (?:(?:(?: 0? [1-9]|1[0-9]|2[0-8]) ([-/.]) (?: 0?) [1-9]|1[0-2]) | (?: 29|30) ([-/.]?) (?: 0?) [13-9]|1[0-2]) |31 ([-/.]?) (?: 0?) [13578]|1[02])) ([-/.]) (?! 0000) [0-9]{4} |29 ([-/.]?) 0?2 ([-/.]?) (?: [0-9]{2} (?: 0 [48]| [2468] [048]| [13579] [26]) | (?: 0 [48]| [2468] [048]| [13579] [26]) 00)) $
This format needs to be noted that it is not possible to use a reverse reference for optimization. Even characters can be cut according to their own needs.
4.3 Adding a time extension
Time specification is very clear, also very simple, basically on the HH:MM:SS and h:m:s two kinds of forms.
([01][0-9]|2[0-3]): [0-5][0-9]:[0-5][0-9]
Fit into the date of the regular, Yyyy-mm-dd HH:mm:ss
^(?:(?! 0000) [0-9]{4}-(?:(?: 0 [1-9]|1[0-2])-(?: 0 [1-9]|1[0-9]|2[0-8]) | (?: 0 [13-9]|1[0-2])-(?: 29|30) | (?: 0 [13578]|1[02])-31) | (?: [0-9]{2} (?: 0 [48]| [2468] [048]| [13579] [26]) | (?: 0 [48]| [2468] [048]| [13579] [26]) XX) -02-29) \s+ ([01][0-9]|2[0-3]): [0-5][0-9]:[0-5][0-9] $
4.4-Year Custom
All of the above related to common year in the year, using 0001-9999. Of course, years can also be customized according to leap year rules.
If the year 1600-9999, format Yyyy-mm-dd or yyyy-m-d, the hyphen can be no or "-", "/", ". One of
^(?:(? : 1[6-9]| [2-9] [0-9]) [0-9] {2} ([-/.]?) (?:(?: 0? [1-9]|1[0-2]) \1 (?: 0?) [1-9]|1[0-9]|2[0-8]) | (?: 0?) [13-9]|1[0-2]) \1 (?: 29|30) | (?: 0?) [13578]|1[02]) \1 (?: 31)) | (?:(? : 1[6-9]| [2-9] [0-9]) (?: 0 [48]| [2468] [048]| [13579] [26]) | (?: 16| [2468] [048]| [3579] [26]) 00) ([-/]?) 0?2 \2 (?: 29)) $
5. Special Instructions
The above is the most basic regular grammar rules, most of the traditional NFA engine language can be supported, including JavaScript, Java,. NET and so on.
Another requirement is that, although the date rules are relatively clear, can be cut in this way to get the date that meets the requirements of the regular, but it is not recommended to use the regular, the power of the regular is its flexibility, can be tailored to the needs of the most appropriate regular, if only to apply the template, The regular is not called the regular.
Regular grammar rules are not many, and easy to get started, master grammar rules, tailored, is the "Tao".
Regular expression validation date (multiple date formats)--Reprint