Use TaskManager to crawl 20 thousand proxy IP addresses for automatic voting. taskmanager2, 000

Source: Internet
Author: User

Use TaskManager to crawl 20 thousand proxy IP addresses for automatic voting. taskmanager2, 000

In other words, one day, I think of a whim. Some people in the circle of friends often send a voting link to help vote for XX. In the past, they would consciously open the link to help vote for XX. However, if we do more, we will consider whether we can use tools to vote. As a programmer, we decided to solve this problem. So I thought about the following:

  1. Is it possible to vote multiple times by one person? If not, what is the limit for voting multiple times by one person?

A: A voting website limits one IP address or one user to vote for only one vote to prevent malicious Ticket scalping.

2. If one IP address is used for one vote, does it mean that multiple IP addresses can be used for multiple votes?

A: the answer is yes.

 3. How can I change the requested IP address in the code?

A: Set the proxy IP address for HTTP requests.

4. Where can I obtain multiple proxy IP addresses? How can I use code to automate voting after obtaining them?

A: Please refer to the post content.

This article describes the implementation details of the built-in TaskManager task-proxy IP crawler. You need to prepare the knowledge of HtmlAgilityPack parsing HTML and Quart.net.

Reading directory

  • Proxy IP
  • Use HtmlAgilityPack
  • Proxy IP crawler implementation
  • Simple implementation of automatic voting
  • Summary
Back to the top proxy IP

Baidu encyclopedia Introduction: Proxy (English: Proxy), also known as network Proxy, is a special network service that allows a network terminal (usually a client) this service is used to establish a non-direct connection with another network terminal (generally a server. Some gateways, routers, and other network devices have the network proxy function. It is generally believed that the proxy service is conducive to protecting the privacy or security of network terminals and preventing attacks.

At present, many vendors provide online access to proxy IP addresses, but many of them offer dozens of trial services. If you want to use more proxy IP addresses, you need to pay for them. Here I found a website that provides many proxy IP addresses. You can Baidu "proxy ip address" (to avoid thinking that I am advertising), or refer to the open-source TaskManager to introduce this article.

With so many online proxy IP addresses, I can solve Issue 4 at the beginning of the article, but there is another problem where all the data is on the webpage. How can I use it in the code? The HtmlAgilityPack toolkit is used to parse HTML.

Back to the top to use HtmlAgilityPack

HtmlAgilityPack is an open-source class library for parsing HTML elements. The biggest feature is that you can use XPath to parse HMTL. If you have used C # To operate XML, you can also use HtmlAgilityPack.

Parse simple HTML

String HTML = @ "

 

Back to Top proxy IP crawler implementation

The official crawling process is started after some simple operations of the HtmlAgilityPack. Because the web page to be crawled has the IP address blocking function (the current IP address is blocked when the request frequency is too high ), during the design process, I crawled five times and automatically changed the proxy IP address to break through the website restrictions (I felt bad ).

 

Overall implementation Logic

In. net, WebRequest can be used to simulate HTTP get Post requests. In the end, you need to set the proxy IP Address used in the request, focusing on the Code marked with red.

/// <Summary> /// proxy example /// </summary> /// <param name = "Url"> </param> /// <param name = "type"> </param> // <returns> </returns> public static string GetUrltoHtml (string Url, string type) {try {var request = (HttpWebRequest) WebRequest. create (Url); request. userAgent = "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0)"; WebProxy myProxy = new WebProxy ("192.168.15.11", 8015 ); // It is recommended to connect (the proxy requires identity authentication before the user name and password are required) myProxy. credentials = new NetworkCredential ("admin", "123456"); // sets the request to use the proxy information request. proxy = myProxy; // Get the response instance. system. net. webResponse wResp = request. getResponse (); System. IO. stream respStream = wResp. getResponseStream (); // Dim reader As StreamReader = New StreamReader (respStream) using (System. IO. streamReader reader = new System. IO. streamReader (respStream, Encoding. getEncoding (type) {return reader. readToEnd () ;}} catch (System. exception ex) {// errorMsg = ex. message;} return "";}

Understanding how to use the proxy IP address is a step closer to our goal. The following describes how to obtain the proxy IP address. Because there are a lot of code, I will only post an important part here, IpProxyGet. you can download the cs source code at the end of the article.

/// <Summary> /// obtain the total number of pages /// </summary> /// <returns> total number of pages </returns> private static int GetTotalPage (string IPURL, string ProxyIp) {var doc = new HtmlDocument (); doc. loadHtml (GetHTML (IPURL, ProxyIp); var res = doc. documentNode. selectNodes (@ "// div [@ class = 'pagination']/a"); if (res! = Null & res. count> 2) {int page; if (int. tryParse (res [res. count-2]. innerText, out page) {return page ;}} return 1 ;}

Parse HTML data for each page

/// <Summary> /// parse each page of Data /// </summary> /// <param name = "param"> </param> private static void DoWork (object param) {// restore Hashtable table = param as Hashtable; int start = Convert. toInt32 (table ["start"]); int end = Convert. toInt32 (table ["end"]); List <IPProxy> list = table ["list"] as List <IPProxy>; ProxyParam Param = table ["param"] as ProxyParam; // page address string url = string. empty; string ip = strin G. empty; IPProxy item = null; HtmlNodeCollection nodes = null; HtmlNode node = null; HtmlAttribute recognition = null; for (int I = start; I <= end; I ++) {LogHelper. writeLog (string. format ("START parsing, page number {0 }~ {1}, current page {2} ", start, end, I); url = string. format ("{0}/{1}", Param. IPUrl, I); var doc = new HtmlDocument (); doc. loadHtml (GetHTML (url, Param. proxyIp); // obtain all data nodes tr var trs = doc. documentNode. selectNodes (@ "// table [@ id = 'IP _ list']/tr"); if (trs! = Null & trs. count> 1) {LogHelper. writeLog (string. format ("Current page {0}, request address {1}, total {2} pieces of data", I, url, trs. count); for (int j = 1; j <trs. count; j ++) {nodes = trs [j]. selectNodes ("td"); if (nodes! = Null & nodes. Count> 9) {ip = nodes [2]. InnerText. Trim (); if (Param. IsPingIp &&! Ping (ip) {continue;} // Add item = new IPProxy (); node = nodes [1]. FirstChild; if (node! = Null) {recognition = node. Attributes ["alt"]; if (recognition! = Null) {item. country = recognition. value. trim () ;}} item. IP = ip; item. port = nodes [3]. innerText. trim (); item. proxyIp = GetIP (item. IP, item. port); item. position = nodes [4]. innerText. trim (); item. anonymity = nodes [5]. innerText. trim (); item. type = nodes [6]. innerText. trim (); node = nodes [7]. selectSingleNode ("div [@ class = 'bar']"); if (node! = Null) {recognition = node. Attributes ["title"]; if (recognition! = Null) {item. speed = recognition. value. trim () ;}} node = nodes [8]. selectSingleNode ("div [@ class = 'bar']"); if (node! = Null) {recognition = node. Attributes ["title"]; if (recognition! = Null) {item. connectTime = recognition. value. trim () ;}} item. verifyTime = nodes [9]. innerText. trim (); list. add (item) ;}} LogHelper. writeLog (string. format ("Current page {0}, total {1} pieces of data", I, trs. count);} LogHelper. writeLog (string. format ("End resolution, page number {0 }~ {1}, current page {2} ", start, end, I ));}}View Code

More than 20 thousand data records will be obtained.

Back to Top automatic voting simple implementation

The. net WebBrowser control is used to load pages. The final effect is as follows:

# Region set proxy IP private void button2_Click (object sender, EventArgs e) {string proxy = this. textBox1.Text; RefreshIESettings (proxy); IEProxy ie = new IEProxy (proxy); ie. refreshIESettings (); // MessageBox. show (ie. refreshIESettings (). toString () ;}# endregion # region cancels proxy IP private void button3_Click (object sender, EventArgs e) {IEProxy ie = new IEProxy (null); ie. disableIEProxy () ;}# endregion # region open the webpage private void button#click (object sender, EventArgs e) {string url = txt_url.Text.Trim (); if (string. isNullOrEmpty (url) {MessageBox. show ("Enter the URL to open"); return;} this. webBrowser1.Navigate (url, null);} # endregion

 

Back to Top Summary

The content to be introduced in this article is over. Let's look forward to it! I hope that some friends who like it can work together to improve TaskManager (fully open-source) and make it a tool that can improve the convenience of life and add many new tasks. For example, if it is raining or snowing the next day, send an email to remind you to bring your umbrella .... Now it's time to release the source code. Coming soon!

Simple voting Source: http://files.cnblogs.com/files/yanweidie/SimpleIP.rar

TaskManagerSVN address: The http://code.taobao.org/svn/TaskManagerPub/Branch uses the svn checkout command to download.

GitHub address: https://github.com/CrazyJson/TaskManager

Experience tool: After TaskManager is decompressed, execute the combined SQL statement in the file, modify the Config. xml database connection, and use WSWinForm for installation.

If you believe that reading this blog has some benefits, click 【Recommendation] Button.
If you want to discover my new blog more easily, click 【Follow me].
Because my enthusiasm for writing is inseparable from your support.

Thank you for reading this article. If you are interested in the content of my blog, continue to follow up on my blog.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.