Python notes-crawler 1

Source: Internet
Author: User

Network crawler (web crawler): Also known as network Spider (web spider), the basic operation of web crawler is to crawl Web pages.

Browse the Web: Open Baidu www.baidu.com in Firefox browser, that is, the browser as a ' client ',

Send a request to the server, the server's files ' crawl ' to the local, and then to explain and show.

HTML: is a markup language that tags content and parses and differentiates it.

Browser Features: Parse the acquired HTML code and turn the original code into the site page we see directly.

URL (uniform/universal Resource Locator): called Uniform Resource Locator (also called URL)

URL format:

    • protocol
    • server (domain name or IP address), sometimes including port number (in number, omitted)
    • path (that is, the specific address of the host resource)
    • query (in? As the starting point)

The first part of the Protocol and the second part of the server with ' :// ' to be separated, the second part of the server and the third part of the path with ' / ' separated

Example:http://zh.wikipedia.org:80/w/index.php

http: is protocol

zh.wikipedia.org, is the server

is the network port number on the server

/w/index.php, is the path

The main object of the crawler is the URL

Reference resources: http://blog.csdn.net/pleasecallmewhy/article/details/8922826

Python notes-crawler 1

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.