The regular expression is used to handle xml hexadecimal errors.

Source: Internet
Author: User

The regular expression is used to handle xml hexadecimal errors.

Since the data layer of our project changes from reading database to reading interface service, some similar errors often occur. Our data structure is as follows:

1 <type> <! [CDATA [gp]> </type> 2 <detail> <! [CDATA [Liu? ->, C # provides strong support for regular expressions. This course focuses on using regular expressions in the C # environment and learning to analyze and create your own regular expressions, to learn regular expressions, visit http://edu.51cto.com/course/course_id-4664.html#]> </detail> 3 In user input data, there are often some special characters such as the villain arrows (because they are in bytes). I tried to match these special characters, but failed to find the unicode code block that matches them, therefore, the method of matching normal characters is used to obtain valid data for xml parsing. We can see that the characters in our data include letters and numbers, various punctuation marks, and blank spaces. Based on this, we can write our regular expression "(\ w | \ p {P} | \ s )*". In RegxTest, it is found that the symbols "<> =" cannot be matched, after adding these symbols, the regular expression is "(\ w | \ p {P} | [<> =] | \ s) *". After testing, all normal characters can be obtained. The code in c # is as follows:

           string content = sb.ToString();            MatchCollection matches = Regex.Matches(content, @"(\w|\p{P}|[<>=]|\s)*");            sb = new StringBuilder();            foreach (Match m in matches)            {                sb.Append(m.Value);            }            content = sb.ToString();

Although many exceptions are found after running, there are still some exceptions. check whether these exceptions are caused by hexadecimal characters. The data is as follows:

1 <shinimgs> <! [CDATA [http://img6n.soufunimg.com/viewimage/agents/2015_08/24/M09/01/12/wKgEUFXaYrSILIxEAAClop_zcLMAABrYAEF2hoAAKW6316/120x120.jpg;http://img7.soufunimg.com/viewimage/agents/2015_08/24/M01/0C/FA/wKgEKlXaYrWIOQbmAACrV5PpfxIAAURwACqFtkAAKtv885/120x120.jpg;http://img7.soufunimg.com/viewimage/agents/2015_08/24/M04/0C/FA/wKgELFXaYrSIVo9xAAB3vv5fpe8AAURuwMY6CsAAHfW851/120x120.jpg;http://img6n.soufunimg.c Om/viewimage/agents/2015_08/24/M00/01/12/latest/120x120.jpg]> </shinimgs> 2 <xqimgs> <! [CDATA [metadata> </xqimgs>View Code

If the hexadecimal format of 0x is removed directly, these images cannot find the correct address. It seems rude. Therefore, I decided to extract the corresponding variables first, and then assign values to the corresponding fields in the read DataSet. My code is as follows:

 1    MatchCollection imatches = null; 2             if (Regex.IsMatch(content, "0x[0-9a-fA-F]+", RegexOptions.IgnoreCase)) 3             { 4                 Regex regex = new Regex(@"<(?'tag'\w+?)><!\[CDATA\[(?'text'.*?0[Xx].*?)\]\]></\k'tag'>"); 5                 imatches = regex.Matches(content); 6                 if (imatches != null) 7                 { 8                     content = regex.Replace(content, "<${tag}></${tag}>"); 9                 }10             }11 12             System.Xml.XmlDocument xd = new System.Xml.XmlDocument();13             xd.LoadXml(content);14             System.Xml.XmlNodeReader xnr = new System.Xml.XmlNodeReader(xd);15             ds.ReadXml(xnr);16             xnr.Close();17             if (imatches != null && imatches.Count > 0 && ds != null && ds.Tables.Count > 0)18             {19                 foreach (Match m in imatches)20                 {21                     foreach (DataTable table in ds.Tables)22                     {23                         if (table.Columns.Contains(m.Groups["tag"].Value))24                         {25                             table.Rows[0][m.Groups["tag"].Value] = m.Groups["text"].Value;26                             break;27                         }28                     }29                 }30             }

In the above Code, regular text replacement and rent sharing are used. If you do not understand this, you can go to http://edu.51cto.com/course/course_id-4664.htmlto learn the basic CEN expression.

The above code can run normally after testing, but it is put there. Although an exception is thrown in try-catch, it takes several hundred clock cycles to process the exception, however, since the problematic data is only a small part and the efficiency of the above regular expression is not very high, I put the above Code in the catch statement block.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.