Hilarious net short joke regular match-how to exclude the page tab of the source code

Source: Internet
Author: User
Tags first string

Taking the short joke of hilarious net as an example, we can extract the source code of the webpage by requests method, and then get the related short joke through regular matching.

The first step, get the Web source, using the requests get method, it should be noted that the page default encoding is Utf-8, we need to specify the encoding format for utf-8, in order to correctly display the text information of the Web page

#-*-Coding:utf-8-*-
Import requests
Import re

Respon=requests.get (' https://www.pengfu.com/xiaohua_1.html?qq-pf-to=pcqq.c2c ')
respon.encoding= ' Uft-8 '
A=respon.text

Print (a)

Can view the information of the page, intercept some of the fragments below, where the bread contains all the information we need

<H1 class= "Dp-b" ><a href= "https://www.pengfu.com/content_1825126_1.html" target= "_blank" > the expectations of your teacher < /a>
<div class= "content-img clearfix pt10 Relative" >
In the hall, the master is speaking to the disciples: "The name of the teacher for you is not a casual, but for the teacher to your expectations, you understand?" "<br/>
All the disciples answered understand, only one disciple silent, the master looked at, asked the disciple said: "Died, why don't you talk?" "</div>
</dd>

We are in the source code of the entire page, find the key field with our target text approaching, here we choose <div class= "content-img clearfix pt10 Relative" > As our key field, There are the headings and bodies that we need before and after each

It can be analyzed that there are a lot of spaces, lines, and indents, [\s\s], which are explained in the following:

\s
Matches any whitespace character, including spaces, tabs, page breaks, and so on. equivalent to [\f\n\r\t\v].
\s
Matches any non-whitespace character. equivalent to [^ \f\n\r\t\v].

Then we can make regular expressions.

Pattern = Re.compile (R ' "target=" _blank "> (. +?) </a> ([\s\s]*?) Cont = Re.findall (pattern,a)

The first parenthesis represents the title we get, (. +?) Represents any character that matches a non-greedy pattern ([\s\s]*?) On behalf of us to match multiple tabs in a non-greedy pattern, as we do with any character matching in the back

Can get a list of matches

You can then output the cont format

Print (cont)

The results were as follows:

[(' Water can carry boats, ' "</p>\n\t\t\t\t\t\t\t\t<div class=" addbtnwraps ">\n\t\t\t\t\t\t\t\t\t\t<div class=") Redbag "onclick=" Postredbag (1825137) "></div>\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<a class=" plus "href=" Javascript:void (0); "onclick=" Add ($ (this)) "  userid=" 9762557 "style=" color: #ffffff; " >+ attention </a>\n\t\t\t\t\t\t\t\t\t</div>\n\t\t\t\t\t\t\t\t

It is observed that each element in the list is a tuple, each tuple contains a string, the first string is the author, the second string contains the caption we want, and the regular extraction is used.

Pattern = re.compile (' target= ' _blank ' > (. +?) </a> ', Re. S
Title=re.search (Pattern,item[1]). Group (1)

The body is in the fifth string, contains the body, the body, contains a lot of spaces and line breaks, can be processed through the strip () function, but after the output found that there are still some <br/> This special use of HTML, used as a newline, can be handled with the Replace function

Content=item[4].replace (' <br/> ', ')
Print (' Body:%s '%content.strip ())

Finally get all the processing code as follows (note indentation):
#-*-Coding:utf-8-*-
Import requests
Import re

Respon=requests.get (' https://www.pengfu.com/xiaohua_1.html?qq-pf-to=pcqq.c2c ')
respon.encoding= ' Uft-8 '
A=respon.text
Pattern = Re.compile (R ' "target=" _blank "> (. +?) </a> ([\s\s]*?) Cont = Re.findall (pattern,a)
For item in cont:
Print (' * ' *100)
Print ('%s '%item[0])
Pattern = re.compile (' target= ' _blank ' > (. +?) </a> ', Re. S
Title=re.search (Pattern,item[1]). Group (1)
Print (' title:%s '%title)
Content=item[4].replace (' <br/> ', ')
Print (' Body:%s '%content.strip ())

The result of running the output is as follows:

****************************************************************************************************
Water can carry a boat or boat.
Title: Master Kang?
Text: A teacher who opens a bus, surnamed Kang, a sister often take his bus to work. One day the sister shy said, Master Kang. Can I make you a bubble? Master Kang A Leng, then said, see if you have so much water.
****************************************************************************************************
Water can carry a boat or boat.
Title: There seems to be something wrong.
Text: To colleague's house to visit, they young couple in quarrel, I advised a few words, finally eased, I intend to leave, colleague daughter-in-law took a good big pack of bananas for me to take away, I polite way: do not give me ah, to the elder brother to leave a point. Her daughter-in-law said: "Hum, is to feed the animals and not to eat him." I said thank you and left, now think about how so wrong!
****************************************************************************************************
Water can carry a boat or boat.
Title: Chen Duxiu, you come to lecture.
Text: One day constructed find Zen master, asked: "Master, many people shout my husband, good distress, how to do?" ”

Jackson said nothing and handed him a box of condoms. Constructed took the condom thoughtfully said: "I understand, the master means that since can not be avoided, do good security measures?" ”

The master shook his head and said, "Hubby, screw me!" ”
****************************************************************************************************
Water can carry a boat or boat.
Title: Onion:???
Text: Girlfriend told me to break up, I asked her why?

She did not speak, took out an onion, a layer of peel off, eyes filled with tears.

I am also very difficult to see here: "You mean that I have not been together for so long to understand your layer of the inner world under the package?" ”

She choked and said: "No, I mean, I love to rip onions!!" ”
****************************************************************************************************
Water can carry a boat or boat.
Title: Little Prince Edward Milk
Text: Excitement of the discovery, my husband changed the name of my notes to "small Prince of Milk."

Evening spoiled the said: "Husband, is not imagine the TV pet Concubine Love Me."

The goods actually said: "Wife, you read backwards."

I:...
****************************************************************************************************
Water can carry a boat or boat.
Title: Don't Kill Fish tonight.
Body: Wife: let you kill a fish so laborious! What's your relationship with a moron?

Husband: Marital relationship ...
****************************************************************************************************
Water can carry a boat or boat.
Title: Breathtaking operation
Text: Husband: I stretch out her thin like organdy of clothes, stretched out her tongue lick her tender water's cheek, homeopathic kiss on her lips, sucking hard ...

Daughter-in-law hand is a palm: MD, eat a grape you say so ecstasy?
****************************************************************************************************
Water can carry a boat or boat.
Title: Last three days let you know what I'm good for
Text: Work charming lying in my bed, charming lips angle light kai: "My dear, this can be one months, you how not to touch others." ”

This job is also worthy of beauty, the voice of the last vocal chatter rose three degrees, like the feathers of the people itch, I slap it on the face. "The last three days let you know what I am." ”
****************************************************************************************************
Water can carry a boat or boat.
Title: I want to pee
Text: A day continuous on three boring Malay class, the teacher has refused to class,

The Deskmate finally can't help shouting: I want to pee. ”

The teacher was furious: "On my class how dare not ashamed to miss." ”
****************************************************************************************************
Water can carry a boat or boat.
Title: The expectations of the teacher for you
Text: In the hall, the master is speaking to the disciples: "The name of the teacher for you is not a casual, but for the teacher to your expectations, you understand?" ”

All the disciples answered understand, only one disciple silent, the master looked at, asked the disciple said: "Died, why don't you talk?" ”
[Finished in 0.9s]

Hilarious net short joke regular match-how to exclude the page tab of the source code

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.