Crawl Sina Weibo with Webcollector 2.x (no need to manually obtain cookies)

Source: Internet
Author: User

Use Webcollector 2.x with another project Weibohelper to crawl data directly from Sina Weibo (no need to manually obtain cookies)

1. Import all jar packages for Webcollector 2.x and Weibohelper

Two items Address: http://git.oschina.net/webcollector/WebCollector

Http://git.oschina.net/webcollector/WeiboHelper


2. Sample code:

Package Cn.edu.hfut.dmic.webcollector.weiboapi;import Cn.edu.hfut.dmic.webcollector.crawler.deepcrawler;import Cn.edu.hfut.dmic.webcollector.model.links;import Cn.edu.hfut.dmic.webcollector.model.page;import Cn.edu.hfut.dmic.webcollector.net.httprequesterimpl;import Org.jsoup.nodes.element;import org.jsoup.select.elements;/** * * @author hu */public class Weibocrawler extends deepcrawler{public Weibocrawler (Strin        G Crawlpath) throws Exception {super (Crawlpath);        /* For Sina Weibo cookies, account passwords are transmitted in clear text, please use the trumpet */String Cookie=weibocn.getsinacookie ("Weibo username", "Weibo password");        Httprequesterimpl myrequester= (Httprequesterimpl) this.gethttprequester ();    Myrequester.setcookie (cookie); @Override Public Links visitandgetnextlinks (Page page) {/* extract Weibo */Elements Weibos=page.getdoc (). Sele        CT ("div.c");        for (Element Weibo:weibos) {System.out.println (Weibo.text ());    }/* If you want to crawl a comment, you can extract the URL of the comment page and return the */return null; } PublIC static void Main (string[] args) throws exception{Weibocrawler crawler=new weibocrawler ("/home/hu/data/weibo");        Crawler.setthreads (3); /* Crawl the first 5 pages of someone's microblog */for (int i=0;i<5;i++) {crawler.addseed ("http://weibo.cn/zhouhongyi?vt=4&page=" +i)        ;    } crawler.start (1); }    }

Operation Result:

For 2015 years, I wish Weibo friends and 360 users a happy New Year! By the way, I'd like to report to you. 2014 360 progress in science and technology: as of December 31, 2014, 360 of the total number of patent applications in the past year reached 1999, including domestic invention patent applications: 1570; Appearance and utility model patents: 212 pieces, overseas patent applications: 217, The total number of patent applications has been over 4,000 pieces.   Likes [1422]  forwarding [221]  comments [446]  collection  01 month 01th 00:09  from a mobile phone will not be forwarded   Xu Xian   Weibo: " Shandong Province Civil Affairs Department commitment: injured the original Kuomintang anti-Japanese war veterans can enjoy the same treatment with army "clear to live in the rural areas and towns without work units and life of the original Kuomintang anti-Japanese war veterans, to give life hardship relief, rescue standards can refer to the anti-Japanese armed forces in the township demobilized Soldiers implementation, the required funds through the self- Social donations and other channels to solve. HTTP://T.CN/RZYVNR3   Original   [373]  original [2246]  comments [399] Forwarding Reason: Shandong Ministry of Civil Affairs do good, praise! @ Sun Chunlong #寻找你身边的抗战老兵 #   [582]  forwarding [274]  comments [303]  collection  2014-12-31 11:13:02  from a mobile phone not forwarding  360 Antivirus   Weibo: On the last day of 2014, little fans were less than 100,000. Worry Ah. Do how With the small part just got 5 sets of 360 children's watches, a lottery is the best. Look at this 100,000 fans, the probability of winning is still very big drop. In this way, as long as ① attention to my microblog, ② forward this microblog, ③@ a friend, there will be a chance to win! Turn up, Amitabha, forward the big fortune! Do not turn the hair small wealth!  [photos Total 2 photos]   original   [94]  original [2531]  comments [1076] Reason: Not only I this low winning rate of the specialist, and wish you a rich New Year @360 security defender     Likes [313]  forwarding [464]  comments [656]  Favorites  2014-12-31 14:51:13  from 360 Security browser forwarded   Augo   Weibo: Baidu anti-Virus deleted. @ Zhou Hongyi @360 Security defender @360 Customer service you three to listen to me, yesterday to find your 167Engineer remote help me to delete Baidu anti-virus software, get a half-day said to fix, today go home boot, the dog thing and death and resurrection. Don't hit the "safest" slogan if you don't fix it for me. Baidu Antivirus has been in the computer to haunt me for a long time, delete n times useless. Another: @ Baidu anti-virus you go to die  [photos of 4]   original   [102]  original [542]  comments [173] Forwarding reasons: Bo Master this year April 24 deleted Baidu Antivirus, so to 360 seek assistance, @ 360 Security defender finally solve the user problem? Oh, Baidu Wolf sex really strong//@ guancheng small bright: Anyway i uninstall Baidu Antivirus, or use 360 software housekeeper, otherwise go not gener [black line]   Praise [530]  forwarding [428]  comments [966]  Collection  2014-12-30 19:28:37  from one plus cell phone


Webcollector Crawler official website: https://github.com/CrawlScript/WebCollector

Technical Discussion group:250108697



Crawl Sina Weibo with Webcollector 2.x (no need to manually obtain cookies)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.