A long time ago, I saw a question that was probably:
He crawled a piece of HTML and he took the desired part (IMG tag section), but did not want to preserve some of the properties of the IMG tag,
Like what
怎么将img标签里边的 alt属性,width属性, 给去除掉啊
I am very lazy, can use the tool, do not write their own, he intended to use the RE module to deal with.
But I still want to use BeautifulSoup to deal with it. The following code is then available:
The main idea is to use Del to remove the Alt and Width properties of the img tag.
fromBs4Importbeautifulsouphtml=' " src= [Http://127.0.0.1:80/admin/../upload/pimg1054_1.png]/>'Soup=beautifulsoup (HTML,"Html.parser")delsoup.img["alt"]delsoup.img["width"]Print(soup)
Results:
Here's the problem: someone has given you a way to work with re, and if you want to know it, you can look at it. The main idea is to match the unwanted string, and the string match succeeds after the empty string is replaced.
Attached-Ask address:
https://q.cnblogs.com/q/105540/
Python:beautifulsoup remove some unwanted properties