C#: 擷取網頁中符合代碼的正則 (獲得字串中開始和結束字串中間得值)

最後更新：2018-12-07 來源：互聯網

上載者：User

創建阿里雲帳戶，並獲得超過 40 款產品的免費試用版；而企業帳戶則可以享有總值 $1200 的免費試用版。立即註冊！

如：
<div>1div</div>
<a>1a</a>
<p>1p</p>
<p>2p</p>
<div>2div</div>
<a>2a</a>
<p>3p</p>
<p>4p</p>
<a>3a</a>
<p>5p</p>
<div>3div</div>
<a>4a</a>
<p>6p</p>
<span>1span</span>

現在的問題是：有N多DIV，N多p,N多A標籤以及最多1個span，想只擷取所有p裡的內容以及最後一個span裡的內容（其中擷取P的內容有一個條件，那就是只有前面有一個A標籤的P的內容才會被擷取），span或許有或許沒有，如果有就擷取，如果沒有就不擷取求：

C#的Regexusing System.Text.RegularExpressions;

代碼

string restult = "";
            foreach(Match m in Regex.Matches(str ,@"(?ins)(?<=(</a>\s*<(?<mark>p[^>]*>)|<(?<mark>span)[^>]*>))[\s\S]+?(?=</\k<mark>)"))
            {
                restult +=m.Value;//就是你要的結果

                MessageBox.Show(m.Value);
            }

或是用foreach(Match m in Regex.Matches(yourHtml,@"(?is)(</a>\s*<(?<mark>p[^>]*>)|<(?<mark>span)[^>]*>)(?<data>[\s\S]+?)</\k<mark>"))
{
m.Groups["data"].Value;//
}

或是>>>>>>獲得字串中開始和結束字串中間得值

代碼

  #region 獲得字串中開始和結束字串中間得值
        /// <summary>
        /// 獲得字串中開始和結束字串中間得值
        /// </summary>
        /// <param name="begin">開始匹配標記</param>
        /// <param name="end">結束匹配標記</param>
        /// <param name="html">Html字串</param>
        /// <returns>返回中間字串</returns>
        public static MatchCollection GetMidValue(string begin, string end, string html)
        {
            Regex reg = new Regex("(?<=(" + begin + "))[.\\s\\S]*?(?=(" + end + "))", RegexOptions.Multiline | RegexOptions.Singleline);
            return reg.Matches(html);
        }
        #endregion

代碼

/// <summary>
        /// 獲得字串中開始和結束字串中間得值
        /// </summary>
        /// <param name="str"></param>
        /// <param name="s">開始</param>
        /// <param name="e">結束</param>
        /// <returns></returns>
        private string getvalue(string str, string start, string end)
        {
            Regex rg = new Regex("(?<=(" + start + "))[.\\s\\S]*?(?=(" + end + "))", RegexOptions.Multiline | RegexOptions.Singleline);

            return rg.Match(str).Value;
        }

//正則抽取單個Table , 可根據table內的某個標識字元, good !

如果僅僅是以“會員資料”這樣的做為參考標識，用我上面寫的稍稍改造就可以了，問題的複雜在於，如果以“00”或者“444”做為參考標識，就要考慮到<table>標籤嵌套的問題，既要保證取包含參考標識的最內層<table>，又要保證<table>和</table>配對匹配

代碼

Match mm = Regex.Match(html, @"<table[^>]*>(((<table[^>]*>(?<o>)|</table>(?<-o>)|(?!</?table)[\s\S])*)(?(o)(?!)))\b" + "會員資料" + @"\b(?:(?!<table[^>]*>)[\s\S])*?(((<table[^>]*>(?<o>)|</table>(?<-o>)|(?!</?table)[\s\S])*)(?(o)(?!)))</table>", RegexOptions.IgnoreCase);

輸入的參考標識中如果有正則中有特殊意義的字元，需要對其進行預先處理，另外需要在程式中進行異常處理，這個自己處理下吧
如果源字串中同時多處出現輸入的參考標識，這裡取第一個出現的參考標識所在的<table>

//正則抽取單個Table中 , 解析tb中的內容.........

代碼

Match mm = Regex.Match(html, @"<table[^>]*>(((<table[^>]*>(?<o>)|</table>(?<-o>)|(?!</?table)[\s\S])*)(?(o)(?!)))\b" + "會員輸贏資料" + @"\b(?:(?!<table[^>]*>)[\s\S])*?(((<table[^>]*>(?<o>)|</table>(?<-o>)|(?!</?table)[\s\S])*)(?(o)(?!)))</table>", RegexOptions.IgnoreCase);
            if (mm.Success)
            {
                //MessageBox.Show(mm.Value);

                //MatchCollection mdd = GetMidValue("<td", "</td>", mm.Value);
                //foreach (Match m in mdd)
                //{
                //    for (int i = 1; i < m.Groups.Count; i++)
                //    {
                //        restult += m.Groups[i].Value;//就是你要的結果
                //    }
                //}

                MatchCollection mc = Regex.Matches(mm.Value, @"<td[^>]*>\s*(?<content>[\s\S]*?)\s*</td>", RegexOptions.IgnoreCase);
                foreach(Match m in mc)
                {
                    for (int i = 1; i < m.Groups.Count; i++)
                    {
                        restult += m.Groups[i].Value + "\n";
                    }
                }
                MessageBox.Show(restult);
            }

本文章原先以中文撰寫並發佈於 aliyun.com，亦設英文版本，僅作資訊用途。本網站不對文章的準確性，完整性或可靠性或其任何翻譯作出任何明示或暗示的陳述或保證。如對該文章有任何疑慮或投訴，請傳送電郵至 info-contact@alibabacloud.com 並提供相關疑慮或投訴的詳細說明。職員會於 5 個工作天內與您聯絡，一經驗證之後，即會刪除該侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More