Disclaimer: This regular expression applies only to. NET, and the process used to send an HTTP request returns the entire HTML page, and then fetches the desired data from this HTML page.
Part I: Sending a HttpWebRequest request
C # code
URL address HttpWebRequest request = (HttpWebRequest) webrequest.create ("url")); HttpWebResponse response = (HttpWebResponse) request. GetResponse (); Browser type set request. useragent = "mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; SLCC1;. NET CLR 2.0.50727;. NET CLR 3.0.04506;. NET CLR 3.5.21022;. NET CLR 1.0.3705;. NET CLR 1.1.4322) "; StreamReader reader = new StreamReader (response. GetResponseStream (), encoding.getencoding ("UTF-8")); Returns the HTML page data of String htmlstr = reader. ReadToEnd ();
The second part: get useful data from the returned HTML, this method is suitable for all the tags that want to find HTML by ID or class and so on, take the following method as an example
C # code
<summary>///Get color///</summary>//<param name= "Htmlstr" ></param >//<returns></returns> public string GetColor (string htmlstr) {//Get class as Detailsc_sku HTML, can also be changed to the ID of the way//string REGSTR6 = @ "< (?
C # code
<summary>///The HTML tag in the replacement string is empty returns the contents of the label///</summary>//<param name= "src" >&l t;/param>//<returns></returns> public string removehtml (string src) { Regex htmlreg = new Regex (@ "<[^>]+>", regexoptions.compiled | Regexoptions.ignorecase); Regex htmlspacereg = new Regex ("\\ \\;", regexoptions.compiled | Regexoptions.ignorecase); Regex spacereg = new Regex ("\\s{2,}|\\ \ \;", regexoptions.compiled | Regexoptions.ignorecase); Regex stylereg = new Regex (@ "<style" (. *?) </style> ", regexoptions.compiled | Regexoptions.ignorecase); Regex scriptreg = new Regex (@ "<script" (. *?) </script> ", regexoptions.compiled | Regexoptions.ignorecase); src = stylereg.replace (src, string. Empty); src = scriptreg.replace (src, string. Empty); src = htmlreg.replace (src, string. Empty); src = HtmlspaCereg.replace (SRC, ""); src = spacereg.replace (src, ""); return SRC. Trim (); }