1 Overview
The first thing to note, whether it is WinForm, or webform, there are very mature calendar controls, whether from ease of use or scalability, date selection and validation or calendar controls to achieve better.
A few days ago in the CSDN to see the need for a date regular posts, so sorted out this article, and we discuss exchanges, if there are omissions or errors in the place, but also please correct me.
A date is typically a requirement for a format, and data is not used directly by the user. Because of the different application scene, the writing is also different, and the complexity is naturally different. Regular writing needs to be analyzed according to the specific situation, a basic principle is: only write appropriate, do not write complex.
For date extraction, as long as can be separated from the non-date area, write the simplest regular can, such as
Copy Code code as follows:
Can be used as an extract if the date in the YYYY-MM-DD format can be uniquely positioned in the source string.
For validation, it is also important to add a checksum to a rule if it is only to verify that the character composition and format do not make much sense. Due to the existence of leap years, the calibration of the date becomes more complicated.
First look at the effective range of dates and what is a leap year.
rules for the period of 2nd
2.1 Valid range for date
For a valid range of dates, different scenarios vary.
The valid ranges for DateTime objects defined in MSDN are: 0001-01-01 00:00:00 to 9999-12-31 23:59:59.
The Unix timestamp is 0 in accordance with the ISO 8601 specification: 1970-01-01t00:00:00z.
In practice, the date range does not exceed the range of datetime, so regular validation takes the range of dates that are commonly used.
2.2 What is leap year
(The following excerpt from Baidu Encyclopedia)
Leap year is designed to make up for the difference between the number of days that are created by artificial calendars and the time lag between the Earth's actual revolution period. The year in which the time difference is filled is leap years.
The Earth's daily operation cycle is 365 days, 5 hours, 48 minutes and 46 seconds (365.24219 days), the Year of Return (tropical). The Gregorian calendar excepting is only 365 days, about 0.2422 days shorter than the return year, and accumulates about one day every four years, adding the day to the end of February (i.e. February 29), so that the length of the year becomes 366 days, which is a leap year.
It should be noted that the Gregorian calendar is now based on the Romans ' Julian calendar. Because at that time did not know to calculate more than 0.0078 days a year, from 46 BC, to 16th century, a total of more than 10 days. To this end, the then Pope Gray Fruit 13, the October 5, 1582 as a person for October 15. and began the new leap year rules. That is, the Gregorian calendar year is the whole hundred number, must be a multiple of 400 is a leap, not a multiple of the 400 is excepting. For example, 1700, 1800, and 1900 are excepting, and 2000 is a leap year. Thereafter, the average annual length of 365.2425 days, about 4 years, 1 days of deviation. According to a leap year of four years, an average of 0.0078 days will be counted annually, and approximately 3 days after 400 years, thus reducing the 400 leap year in every three years. Leap year calculation, summed up is usually said: four years a leap, hundred years does not leap, 400 year again leap.
2.3 Format of the date
Depending on the language and culture, the hyphen of the date will vary, usually in the following formats:
YyyyMMdd
Yyyy-mm-dd
Yyyy/mm/dd
yyyy. Mm.dd
Regular expression construction for the period of 3rd
3.1 Rule Analysis
A common method of writing complex regular is to separate the unrelated requirements, write out the corresponding regular, and then combine, check the relationship and impact of each other, basically can get the corresponding regular.
According to the definition of leap year, dates can be categorized in several ways.
3.1.1 Divided into two classes based on whether the days are related to the year
In a class unrelated to the year, depending on the number of days per month, it can be subdivided into two types
Ø1, 3, 5, 7, 8, 10, December for 1-31 days
Ø4, 6, 9, November for 1-30 days
In a class related to the year
Ø excepting February for 1-28 days
Ø leap year February for 1-29 days
3.1.2 can be divided into four classes depending on the containing date
Ø all months of all years contain 1-28 days
Ø 29 and 30th in all years except February
Ø all year 1, 3, 5, 7, 8, 10, December all contain 31st
Ø Leap year February contains 29th
Selection of 3.1.3 Classification method
Because the implementation after the date classification is done through this branch structure (EXP1|EXP2|EXP3), the branch structure tries to match from the left branch in turn to the right, and when one of the branches matches successfully, it no longer attempts to the right, otherwise all branches are tried and the failure is reported.
The number of branches, the complexity of each branch will affect the matching efficiency, considering the probability distribution of the validated date, most of them fall to 1-28 days, so using the second classification method will effectively improve the matching efficiency.
3.2 Regular implementations
Using the classification method of 3.1.2 section, we can write the corresponding regular for each rule, the following MM-DD format is implemented.
Consider first three rules unrelated to the year, and the year can be unified writing
Copy Code code as follows:
Only the regular of month and day is considered below
Ø all year months including excepting are inclusive of 1-28 days
Copy Code code as follows:
(0[1-9]|1[0-2])-(0[1-9]|1[0-9]|2[0-8])
Ø all years including excepting include 29 and 30th except February
Copy Code code as follows:
Ø all years including excepting 1, 3, 5, 7, 8, 10, December all included 31st
Copy Code code as follows:
All other dates except February 29 in a leap year
Copy Code code as follows:
(?! 0000) [0-9]{4}-(0[1-9]|1[0-2])-(0[1-9]|1[0-9]|2[0-8]) | ( 0[13-9]|1[0-2])-(29|30) | (0[13578]|1[02])-31)
Next, consider the implementation of a leap year
Ø Leap year February contains 29th
The month and day here is fixed, is 02-29, only the year is changed.
All leap year years can be exported through the following code to review the rules
Copy Code code as follows:
for (int i = 1; i < 10000; i++)
{
if ((i% 4 = 0 && i%!= 0) | | I% 400 = 0)
{
Richtextbox2.text = = String. Format ("{0:0000}", i) + "\ n";
}
}
According to the rules of leap year, it is easy to sort out the rules, four years a leap;
Copy Code code as follows:
([0-9]{2} (0[48]|[ 2468][048]| [13579] [26])
Hundred years does not leap, 400 year again leap.
Copy Code code as follows:
(0[48]| [2468] [048]| [13579] [26]) 00
Together is the February 29 of all leap years.
Copy Code code as follows:
([0-9]{2} (0[48]|[ 2468][048]| [13579] [26]) | (0[48]| [2468] [048]| [13579] [26]) 00)-02-29)
All four rules are implemented and have no effect on each other, and together is the regular of all dates that match the DateTime range
Copy Code code as follows:
^((?! 0000) [0-9]{4}-(0[1-9]|1[0-2])-(0[1-9]|1[0-9]|2[0-8]) | ( 0[13-9]|1[0-2])-(29|30) | (0[13578]|1[02])-31 | ([0-9]{2} (0[48]|[ 2468][048]| [13579] [26]) | (0[48]| [2468] [048]| [13579] [26]) 00)-02-29) $
Given that this regular expression is used only as validation, capturing groups are meaningless, consume resources and affect the efficiency of matching, so you can use a non-capturing group for optimization.
Copy Code code as follows:
^(?:(?! 0000) [0-9]{4}-(?:(?: 0 [1-9]|1[0-2])-(?: 0 [1-9]|1[0-9]|2[0-8]) | (?: 0 [13-9]|1[0-2])-(?: 29|30) | (?: 0 [13578]|1[02])-31 | (?: [0-9]{2} (?: 0 [48]| [2468] [048]| [13579] [26]) | (?: 0 [48]| [2468] [048]| [13579] [26]) 00)-02-29) $
The above regular year 0001-9999, format yyyy-mm-dd. You can verify regular validity and performance with the following code
Copy Code code as follows:
DateTime dt = new DateTime (1, 1, 1);
DateTime endday = new DateTime (9999, 12, 31);
stopwatch SW = new Stopwatch ();
Sw. Start ();
Regex Dateregex = new Regex (@ "^":(?! 0000) [0-9]{4}-(?:(?: 0 [1-9]|1[0-2])-(?: 0 [1-9]|1[0-9]|2[0-8]) | (?: 0 [13-9]|1[0-2])-(?: 29|30) | (?: 0 [13578]|1[02])-31 | (?: [0-9]{2} (?: 0 [48]| [2468] [048]| [13579] [26]) | (?: 0 [48]| [2468] [048]| [13579] [26]) 00)-02-29) $ ");
Regex Dateregex = new Regex (@ "^" ^ (?!) 0000) [0-9]{4}-(0[1-9]|1[0-2])-(0[1-9]|1[0-9]|2[0-8]) | ( 0[13-9]|1[0-2])-(29|30) | (0[13578]|1[02])-31 | ([0-9]{2} (0[48]|[ 2468][048]| [13579] [26]) | (0[48]| [2468] [048]| [13579] [26]) 00)-02-29) $ ");
Console.WriteLine ("Start date:" + dt.) ToString ("Yyyy-mm-dd"));
while (dt <= endday)
{
if (!dateregex.ismatch (dt. ToString ("Yyyy-mm-dd"))
{
Console.WriteLine (dt. ToString ("yyyy-mm-dd") + "false");
}
if (dt = = Endday)
{
Break
}
DT = dt. AddDays (1);
}
Console.WriteLine ("End Date:" + dt.) ToString ("Yyyy-mm-dd"));
Sw. Stop ();
Console.WriteLine ("Test:" + SW. Elapsedmilliseconds + "MS");
Console.WriteLine ("Test complete!") ");
Console.ReadLine ();
4th-Period Regular expression extension
4.1 "Month and year" form extension
The above implementation is the YYYY-MM-DD format of the date validation, taking into account the different hyphens, and the month and day may be M and D, that is, yyyy-m-d format, you can extend the above
Copy Code code as follows:
^(?:(?! 0000) [0-9]{4} ([-/.]?) (?:(?: 0? [1-9]|1[0-2]) ([-/.]?) (?: 0? [1-9]|1[0-9]|2[0-8]) | (?: 0? [13-9]|1[0-2]) ([-/.]?) (?: 29|30) | (?: 0? [13578]|1[02]) ([-/.]?) 31) | (?: [0-9]{2} (?: 0 [48]| [2468] [048]| [13579] [26]) | (?: 0 [48]| [2468] [048]| [13579] [26]) 00) ([-/.]?) 0?2 ([-/.]?) 29) $
Use reverse references to simplify, year 0001-9999, format yyyy-mm-dd or yyyy-m-d, hyphens can not be, or "-", "/", "." One.
Copy Code code as follows:
^(?:(?! 0000) [0-9]{4} ([-/.]?) (?:(?: 0? [1-9]|1[0-2]) \1 (?: 0? [1-9]|1[0-9]|2[0-8]) | (?: 0? [13-9]|1[0-2]) \1 (?: 29|30) | (?: 0? [13578]|1[02]) \1 (?: 31)) | (?: [0-9]{2} (?: 0 [48]| [2468] [048]| [13579] [26]) | (?: 0 [48]| [2468] [048]| [13579] [26]) 00) ([-/.]?) 0?2\2 (?: 29)) $
This is the "year and year" this form of the most complete a regular, different meanings of different parts of the logo, can be based on their own needs to cut.
4.2 Other forms of expansion
Understanding the meaning of the above regular parts, after the relationship between each other, it is easy to expand into other formats, such as the date of the "Moon Year" format dd/mm/yyyy.
Copy Code code as follows:
^ (?:(:(?: 0? [1-9]|1[0-9]|2[0-8]) ([-/.]?) (?: 0? [1-9]|1[0-2]) | (?: 29|30) ([-/.]?) (?: 0? [13-9]|1[0-2]) |31 ([-/.]) (?: 0? [13578]|1[02])) ([-/.]) (?! 0000) [0-9]{4}|29 ([-/.]?) 0?2 ([-/.]?) (?: [0-9]{2} (?: 0 [48]| [2468] [048]| [13579] [26]) | (?: 0 [48]| [2468] [048]| [13579] [26]) 00)) $
This format should be noted that you can not use the reverse reference to the optimization. Even characters can be cut according to their own needs.
4.3 Extension of time added
The specification of the time is very clear, also very simple, basically HH:mm:ss and h:m:s two kinds of forms.
Copy Code code as follows:
([01][0-9]|2[0-3]): [0-5][0-9]:[0-5][0-9]
Fit into the regular of the date, Yyyy-mm-dd HH:mm:ss
Copy Code code as follows:
^(?:(?! 0000) [0-9]{4}-(?:(?: 0 [1-9]|1[0-2])-(?: 0 [1-9]|1[0-9]|2[0-8]) | (?: 0 [13-9]|1[0-2])-(?: 29|30) | (?: 0 [13578]|1[02])-31 | (?: [0-9]{2} (?: 0 [48]| [2468] [048]| [13579] [26]) | (?: 0 [48]| [2468] [048]| [13579] [26]) -02-29) \s+ ([01][0-9]|2[0-3]): [0-5][0-9]:[0-5][0-9]$
4.4-Year Custom
All of the above involved in the excepting year, using 0001-9999. Of course, years can also be customized according to leap year rules.
such as year 1600-9999, format Yyyy-mm-dd or yyyy-m-d, hyphen can not be or "-", "/", "." One.
Copy Code code as follows:
^(?:(? : 1[6-9]| [2-9] [0-9]) [0-9] {2} ([-/.]?) (?:(?: 0? [1-9]|1[0-2]) \1 (?: 0? [1-9]|1[0-9]|2[0-8]) | (?: 0? [13-9]|1[0-2]) \1 (?: 29|30) | (?: 0? [13578]|1[02]) \1 (?: 31)) | (?:(? : 1[6-9]| [2-9] [0-9]) (?: 0 [48]| [2468] [048]| [13579] [26]) | (?: 16| [2468] [048]| [3579] [26]) 00) ([-/.]?) 0?2\2 (?: 29)) $
5 Special Notes
The above is the most basic rules of regular grammar, most of the traditional NFA engine language can support, including JavaScript, Java,. NET and so on.
The other requirement is that although the rules of the date are relatively clear, it can be cropped in this way to get the right date to meet the requirements, but it is not recommended to use regular, the strong is its flexibility, can be tailored to the needs of the most appropriate regular, if only to apply the template, That is not to call it a regular.
Regular grammar rules are not many, and it is easy to get started, master the rules of grammar, tailored, is the "road".