The C language obtains the source code of the web page and the C language source code.

Source: Internet
Author: User

The C language obtains the source code of the web page and the C language source code.

It seems interesting to study this one day.

I didn't understand anything at the beginning. Now I feel a little interesting to write a piece of code.

Next I will share my learning process and understanding.

The overall process is probably as follows:

First, I searched someone else's code for writing this thing.

I have studied some difficult things in the code. I wrote it slowly.

 

Simple analysis:

Create a socket to connect to the host

Send a GET packet header string

Receive

Storage

 

Preparations:

Connect () establishes a connection with the server

Send () recv () is using Socket

 

The user provides a URL for us to analyze the URL.

Some of the information contained in the URL is to be filled in the sockaddr address information, and some are to be filled in the GET header text

We can see that the information includes the host name, resource path, and port.

For example, if a www.baidu.com/1.html host name www.baidu.com resource path/1.html port does not exist, the default value is 80.

Connect to the connect () function.

The socket and sockaddr are required in the parameter. The former is the socket, and the latter is a socket address structure.

It can be forcibly converted from sockaddr_in. This is another socket address structure.

Port and IP address required

You can obtain the port by analyzing the URL.

The IP address is handled using the gethostbyname () function.

The host information is obtained through the host name.

The host information is obtained from the host name www.baidu.com.

Then we extract the IP address from the information.

The GET packet header requires the host name resource path.

Send the send () function.

Receive recv ().

 

Subdivision logic:

It mainly refers to the process of learning these functions and structures, and some simple logical organization.

The analysis URL is in string. h.

Obtain the IP address from the host structure and enter the socket address structure. For more information, see gethostbyname (), sockaddr, and sockaddr_in.

Connect ()

Store the GET packet header, and use the string function to form the packet header, which is placed in an array or other ..

Send ()

If the GET packet header you send receives a response, the target will send the information you want back.

Receive recv ()

There may be a lot of information, which may not be completely received at a time and can be received multiple times. Use memcpy (), malloc (), and realloc ()

Memcpy () copy byte count to target malloc () declare a certain size of memory region realloc () change the size of memory region

When will it be received? If there is any received content in the socket, it will be received all the time. Wait until it times out, and then return 0.

This timeout period can be set. Setsockopt. You can set receiving and sending

Int nNetTimeout = 5000;

Setsockopt (sock, SOL_SOCKET, SO_RCVTIMEO, (char *) & nNetTimeout, sizeof (int ));

 

Thinking about the legacy issues: for receiving data processing. Encoding Problems. I think the data transmitted in the socket is not a string.

I think it is easy to understand the text. Someone wrote an html webpage file in utf8 and saved it on the server. The body we obtained should be the data written in utf8 encoding (I guess)

But what encoding is the received packet header? What simple method can we use to separate them without further research .. Have a chance to see it again

 

  

  

  

  

  

 

 

 

 

 

 

 

 

  

 

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.