Pattern = re. Compile(' <div.*?author ' >.*?<a.*? (. *?) </a>.*?<div.*? ' +
' content ' > (. *?) <!--(. *?) -->.*?</div> (. *?) <div class= "stats.*?class=" Number "> (. *?) </i> ',re. S)
Now the regular expression is a little bit more descriptive here.
1). *? is a fixed collocation,. and * representatives can match any infinite number of characters, plus? It means matching with a non-greedy pattern, that is, we'll make the match as short as possible, and we'll use it a lot later. The match.
2) (. *?) Represents a grouping in which we match five groupings in this regular expression, and in the subsequent traversal of item, Item[0] represents the first (. *?). The content of the reference, Item[1] represents the second (. *?) The content of the reference, and so on.
3) Re. The S-flag represents the point at which the matching pattern of the points is arbitrary. You can also represent line breaks.
This allows us to get the publisher, release time, post content, add-on image, and number of likes.
Note here, we want to get the content if it is with a picture, the direct output is more cumbersome, so here we just get no picture of the satin just fine.
So, here we need to filter the satin with the picture.
Python Regular Expression Example description