Web crawler java or Python or C + +?

Source: Internet
Author: User

This question has just been queried on the Internet, summarized below.

The main development language of reptiles is Java, Python, C + +
For the general information collection needs, the different languages are not very different.
C, C + +
Search engine without exception to the use of c\c++ development crawler, guess the search engine crawler to collect a large number of sites, the page parsing requirements are not high, some support JavaScript
Python
Powerful network, analog landing, parsing JavaScript, the shortcomings are web analytics
Python writes the program to be really convenient, the famous Python crawler has scrapy and so on
Java
Java has a lot of parsers, the Web page parsing support is very good, the disadvantage is the network part
Java Open Source Crawler is very many, famous such as Nutch domestic have webmagic
Java's excellent parser has htmlparser, Jsoup
For general requirements, both Java and Python are capable.
If you need to simulate landing, anti-collection selection python more convenient, if you need to deal with complex web pages, to parse the content of the Web page to generate structured data or to the content of the Web page detailed analysis can choose Java.

Visible, to really become a reptile, Python and Java are needed, the current network of Python teaching resources, so learn Python first.

-------------------------------------------
We welcome you to join the Reptile Engineer Exchange Group: 494343497, in addition, especially welcome to Chengdu engaged in reptile work friends exchange, my QQ number 2487872782

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.