Busy, send a C # crawl Alexa rankings Code

Source: Internet
Author: User
Keywords Crawl C #

Intermediary transaction http://www.aliyun.com/zixun/aggregation/6858.html ">seo diagnose Taobao guest cloud host technology Hall

Look for Baidu and GG, English Yahoo also looked for, did not find ready-made.
Had to write one himself, crawl very accurate. ^_^. Friends who need to take.
I don't know how many technical friends there are. )

In the writing of a small SEO tool, simulation spiders crawl links, but also detection of spiders recognized effective links!!

(Cause is a few GG group of friends Encounter, malicious false link deception, the surface to see who also can not see out, but the actual spider a crawl on the leak stuffing, with my software a test, can also be the true colours ...) )

Nonsense not to say, the code below, attached screenshot of a ... I use a friend of the environmental protection site to detect the effect of the map, probably see.

private string Getalexa (int idx, string url)


        {


//sourcecode by http://77521.cn


string css = "";


string result = "no rank";


string html = Func.gethttppage ("http://www.alexa.com/search?q=" + URL.) Replace ("http://", ""), "Utf-8");


string Patt = "<link href=\" ([^\ ' \\s]*?] \ "_fcksavedurl=" ([^\ "' \\s]*?) \ "" _fcksavedurl= "\" ([^\ "' \\s]*?) \ "" _fcksavedurl= "\" ([^\ "' \\s]*?) \ "Type=\" Text/css\ "rel=\" stylesheet\ ">";


Regex reg = new Regex (Patt, regexoptions.ignorecase);


MatchCollection mc = Reg. Matches (HTML);


if (MC. Count <= 0)


{//Wrong


Dgedit (IDX, 7, "no rankings");


        }


else


{//Find CSS


CSS = func.gethttppage (mc[0). Result ("$"), "Utf-8");


reg = new Regex ("Rank": ([\\s\\s]*?) </a> ", regexoptions.ignorecase);


MC = Reg. Matches (HTML);


if (MC. Count <= 0)


        {


Dgedit (IDX, 7, "no rankings");


        }


Else


        {


string Mao = mc[0]. Result ("$");


String Mao2 = Mao. Replace ("\", ""). Replace ("'", ""). Replace ("", "");


Mao2 = Regex.Replace (Mao2, "<\\!--. +?-->");


reg = new Regex ("<span[\\s\\s]*?class=[\"]? +?) [\] ']?> (. +?) </span> ", regexoptions.ignorecase);


MC = Reg. Matches (MAO2);


if (MC. Count <= 0)


            {


Dgedit (IDX, 7, "no rankings");


            }


else


            {


//textbox1.text = Mao2;


for (int n = 0; n < MC. Count; n++)


                {


if CSS. IndexOf (Mc[n]. Result ("$")!=-1)


                    {


                         Mao2 = Regex.Replace (Mao2, "<span[^>]*?class=" + mc[n). Result ("$") + ">.+?</span>",

                        //textbox1.text = TextBox1.Text + "\r\n\r\ n------------------------------------------
                    }
                }
                 Mao2 = Regex.Replace (Mao2, "<[\\s\\S]+?>", "");
                Mao2 = Regex.Replace (Mao2, "", "");
                Mao2 = Regex.Replace (Mao2, " ", "");
                Mao2 = Regex.Replace (Mao2, "|", "");
                Mao2 = Mao2. Replace ("\ R", ""). Replace ("\ n", "");
                dgedit (idx, 7, MAO2);
                return MAO2;
           }
         }
      }
    return result;
   }

Again nonsense point, such code may be common webmaster is useless, hair in the lounge, is also a small original, Baidu GG Yahoo no relevant code Kazakhstan.

Have a tutorial class site friends, don't miss Ah. ^_^

Although a bit unlikely, I hope you can keep a link when you reprint.

------------------------------------------------------------------------------------------------
Code, Gethttppage is a function of crawling Web page code. Such a simple one should be available on their own. Don't say it.
Dgedit is a delegate.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.