Do you know how to use Python regular expressions to identify the landlord? If you want to know how to use a Python regular expression to identify the specific operations of the landlord's actual application solution, you can click the following article to learn about it. I hope you will gain some benefits.
Identify the owner:
Post code snippet: <! -- Tianya treasure chest -->
- <Script>
- Var chrType = "public ";
- Var intAuthorId = "";
- Var chrAuthorName = "GreyHouse ";
- Var chrTitle = "[light and shade records] A trip to flea Europe ";
- Var chrItem = 'travel ';
- Var intItem = '0 ';
- Var intArticleId = "191157 ";
- Var tAuthor = 'greyhouse ';
- </Script>
Use the following Python regular expression to find the owner:
- rereg_louzhu = re.compile('.*chrAuthorName = "(.*?)"; '
If mat is set to reg_louzhu.match (the html line of the webpage source code), mat. groups () [0] is the name of GreyHouse.
Identify the starting position of a post
After continuing to analyze html, we found that Tianya replies generally use the author's information column as the starting point of the post, so the post end is before the next author's information column.
<TABLE cellspacing = 0 border = 0 bgcolor = f5f9fa
Width = 100%> <TR> <td width = 100 ALIGN = RIGHT
VALIGN = bottom> </TD> <font size =-1
Color = green> <br> <center> author: <
Href = "/browse/Listwriter. asp? Vid = 11288815 & vwriter =
Go shopping with tanks & idwriter = 0 & key = 0 "target = _ blank> go shopping with tanks </a>
The above is an introduction to Python Regular Expressions and identifying the starting position of a post.