This article mainly introduces C # method of fetching a tag link and innerhtml based on regular expression, and analyzes C # using regular expression to carry on the matching and grasping of the page element with the example form, and the friends can refer to the following
The example in this paper describes C # 's method of fetching a tag link and innerhtml based on regular expressions. Share to everyone for your reference, as follows:
Read page htmlstring text = file.readalltext (environment.currentdirectory + "//test.txt", Encoding.GetEncoding ("gb2312") ); string prttern = "<a (\\s+ (href=\" (?<url> ([^\]) *) \ "|" ([^ ']) * ' |\\w+=\ "(([^\]) *) \" | " ([^ ']) *)) +> (?<text> (. *?)) </a> var maths = regex.matches (text, prttern);//Fetch the file to be written using (FileStream w = new FileStream ( Environment.currentdirectory + "//wirter.txt", FileMode.Create)) {for (int i = 0; I < maths. Count; i++) { byte[] bs = Encoding.UTF8.GetBytes (string. Format ("link address: {0}, innerhtml:{1}", Maths[i]. groups["url"]. Value, Maths[i]. groups["Text"]. Value) + "\ r \ n"); W.write (BS, 0, BS. Length); Console.WriteLine (); }} Console.readkey ();
Graphical regularization
Friends need to intercept the IMG tag src and data-url similar to the above. Incidentally attached
String text =file.readalltext (environment.currentdirectory + "//test.txt", Encoding.GetEncoding ("gb2312")); string Prttern = "[^\ "]*?) \ "|data-url=\" (? <dataurl>[^\ "]*?) \"| [-\\w]+=\ "[^\"]*?\ ")) *\\s*/>"; var maths = regex.matches (text, prttern);//Fetch the file to be written using (FileStream w = new FileStream (environment.currentdirectory + "//wirter.txt", FileMode.Create)) {for (int i = 0; I < maths. Count; i++) { byte[] bs = Encoding.UTF8.GetBytes (string. Format ("Picture src:{0}, picture Data-url:{1}", Maths[i]. groups["src"]. Value, Maths[i]. groups["Dataurl"]. Value) + "\ r \ n"); W.write (BS, 0, BS. Length); Console.WriteLine (); }}