When you encounter a programming problem, you must first think of simplifying it, simplifying it to a simple problem, writing the simplest code to solve it, and paying only the simplest test cost.
Simple HTML Source:
1<!--The loneliest number--> <a>2<!--Can be as bad as one--><b>3
Extract the comments from the preceding code:
From BS4 import beautifulsoup, Commentsoup = BeautifulSoup ("" "1<!--the loneliest number--> &L t;a>2<!--Can is as bad as one--><b>3 "" ") Comments = Soup.findall (Text=lambda text:isinstance (text, Comment))) for Comment in Comments:print Comment
Output Result:
The loneliest Numbercan is as bad as one
Remove the comments from the above HTML code:
From BS4 import beautifulsoup, Commentsoup = BeautifulSoup ("" "1<!--the loneliest number--> &L t;a>2<!--Can is as bad as one--><b>3 "" ") Comments = Soup.findall (Text=lambda text:isinstance (text, Comment)) [Comment.extract () for Comment in Comments]print soup
Output Result:
1<a>2<b>3</b></a>
Reference:
1. How to find the comment tags <!--...--> with BeautifulSoup?
2, BeautifulSoup documentation #Removing elements
Use Python to extract comments from HTML source and remove comments