The html_strip
character filter strips HTML elements from the text and replaces HTML entities with their decoded value (e.g. re Placing &
with &
).
Example Outputedit
POST _analyze{ "Tokenizer":
"Char_filter": ["Html_strip"], "text": "<p>i'm so <b>happy</b>!</p>"}
COPY as Curlview in CONSOLE
|
The keyword Tokenizer returns a single term. |
The above example returns the term:
[\ni ' m so happy!\n]
The same example with the standard
Tokenizer would return the following terms:
[I ' m, so, happy]
Configurationedit
The html_strip
character filter accepts the following parameter:
escaped_tags
|
An array of HTML tags which should is stripped from the original text. |
Example Configurationedit
In this example, we configure the html_strip
character filter to leave <b>
tags on place:
PUT my_index{ "settings": {"Analysis": {"Analyzer": {" My_analyzer": { "tokenizer": "Keyword", "Char_filter": ["My_char_filter"] } }, "Char_filter": { "My_char_filter": {" type": " Html_strip ", " escaped_tags ": [" B "] } } } }}post my_index/_analyze{ " Analyzer ":" My_ Analyzer ", " text ":" <p>i'm so <b>happy</b>!</p> "}
COPY as Curlview in CONSOLE
The above example produces the following term:
[\ni ' m so <b>happy</b>!\n]
Source text: https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-htmlstrip-charfilter.html# Analysis-htmlstrip-charfilter
HTML Strip Char Filter