All filters implement the Nodefilter interface, which has only one method, Boolean accept, which determines whether a node belongs to the scope of this filter filter. Htmlparser has defined 16 different filter types within the Org.htmlparser.filters package, and can be divided into several categories.
Judging class Filter:tagnamefilter
Hasattributefilter
Haschildfilter
Hasparentfilter
Hassiblingfilter
Isequalfilter
Logical Operation Filter
Andfilter
Notfilter
Orfilter
Xorfilter
Other filter:
Nodeclassfilter
Stringfilter
Linkstringfilter
Linkregexfilter
Regexfilter
Cssselectornodefilter
In addition, you can customize some filter to complete the specific requirements of the filtering
Tag class
Mainly used in conjunction with Nodeclassfilter
Remark: Notes
Applettag:
Basehreftag:
Body Tag: "Body";//getbody (); Internal call amount is toplaintextstring ();
Bullet: "LI"
BulletList: "UL", "OL"
Compositetag:
Definitionlist: "DL"
Definitionlistbullet: "DD", "DT"
Div: "Div"
Doctypetag: "! DOCTYPE "
Formtag:
Framesettag:
Frametag:
Headingtag: "H1", "H2", "H3", "H4", "H5", "H6"
Headtag: "HEAD"
HTML: "HTML"
Imagetag:
Inputtag: "INPUT"
Jsptag: "%", "%=", "%@"
LabelTag: "LABEL"
Linktag:
MetaTag:
Objecttag:
Optiontag:
Paragraphtag: "P"
Processinginstructiontag: "?"
Scripttag:
Selecttag: "Select"
Span: "Span"
Styletag: "STYLE"
TableColumn: "TD"
TableHeader: "TH"
TableRow: "TR"
Tabletag: "TABLE"
Tagnode:
Textareatag: "TEXTAREA"
Titletag: "TITLE"
Textnode:
Various filter (1) in the Htmlparser