Get started with Python crawler two crawler basics

Source: Internet
Author: User

Discover the basics of»python crawler two crawlers

2. The process of browsing the web

In the process of users to browse the Web page, we may see a lot of good-looking pictures, such as http://image.baidu.com/, we will see a few pictures and Baidu search box, the process is actually user input URL, after the DNS server, find the server host, Send a request to the server, the server after parsing, sent to the user browser HTML, JS, CSS and other files, browser resolution, the user can see all kinds of pictures.

Therefore, the user to see the Web page is essentially composed of HTML code, crawler crawling is these content, through the analysis and filtering of these HTML code, to achieve the image, text and other resources.

Meaning of the 3.URL

URL, the Uniform Resource Locator, which is what we call the URL , the Uniform Resource Locator is a concise representation of the location and access methods of resources available from the Internet, and is the address of standard resources on the Internet. Each file on the Internet has a unique URL that contains information that indicates the location of the file and how the browser should handle it.

The format of the URL consists of three parts:
① The first part is the protocol (or service mode).
② the second part is the host IP address (and sometimes the port number) where the resource is stored.
③ The third part is the specific address of the host resource, such as directory and file name.

Crawling data must have a target URL to get the data, so it is the basic basis for the crawler to obtain data, accurate understanding of its meaning for the crawler to learn a lot of help.

4. Configuration of the environment

Learning Python, of course, the environment of the configuration, initially I use notepad++, but found that its hint is too weak, so, under Windows I used Pycharm, under Linux I used Eclipse for Python, There are several more excellent ides, you can refer to this article to learn the IDE recommended by Python. Good development tools are the propulsion of the forward, I hope you can find the right IDE for you

In the next section, we will formally step into the hall of the Python crawler learning, the small partners ready?

Get started with Python crawler two crawler basics

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.