Use Lucene query Charfilter to remove script scripts and HTML tags from characters

Source: Internet
Author: User
Tags scream

1. Prepare the data, here I read a data from a database with HTML tags and script scripts

Code:

@Before Public voidinit () {SQLService SQLService=NewSQLService (); Sqlservice.regist (NULL); BASEDAO BD=NewBasedao (); String SQL= "SELECT * from T where title like '% read once a day, tongue more invincible% '"; Lists=bd.getlist (SQL);        System.out.println (Lists.size ()); Content= Lists.get (0). Get ("content"). toString ();//System.out.println (content);            }

2. Use the character filter-htmlstripcharfilter and Mappingcharfilter. Because these character filters are inherited by reader. So it can be handled like reading reader.

Code:

@Test Public voidTest2 ()throwsioexception{StringBuilder SB=NewStringBuilder (); //HTML filteringHtmlstripcharfilter Htmlscript =NewHtmlstripcharfilter (NewStringReader (content)); //Add Map filter main filter swap line characterNormalizecharmap.builder Builder =NewNormalizecharmap.builder (); Builder.add ("\ r", "" ");//EnterBuilder.add ("\ T", "");//Horizontal Jump GridBuilder.add ("\ n", "");//line BreakCharfilter cs =NewMappingcharfilter (Builder.build (), htmlscript); Char[] buffer =New Char[10240]; intcount;  while((count = cs.read (buffer))! =-1) {sb.append (NewString (buffer, 0, Count));        } System.out.println (Sb.tostring ());        Cs.close (); //String keywords = hanlp.extractkeyword (sb.tostring (), (). toString ();//System.out.println (keywords);}

Processing results:

Dear little friends, tired, just relax! 1. Can can you can a can as a canner can? Can you put canned goods like canned workers? ­
2. I wish to wish the wish are wish to wish, but if you wish the wish the Witch wishes, I won ' t wish the wish for you wish
Wish. I want to dream about your dream, but if you dream of a witch dream, I don't want to dream about your dream. 3. I scream, you scream, we all scream
For ice-cream! I yell, you shout, we all shout for ice cream!4. How many cookies could a good cook cook if a good cook could cook cookies?
A good cook could cook as much cookies as a good cook who could cook cookies. If a good cook could make cookies, how many cookies could he do?
A good cook can make as many cookies as other good cooks. 5. The driver was drunk and drove the doctor's car directly into the deep ditch.
The driver was drunk, and he drove the doctor's car into a big deep ditch. 6. Whether The weather is fine or Whether the weather be not.whether the weather
Be cold or whether the weather is hot.we ' ll weather the weather whether We like it or not. Whether it's sunny or cloudy. Whether it's cold or warm,
Whether you like it or not, we must endure the frost and rain. 7. Peter Piper picked a peck of pickled peppers. A peck of pickled peppers Peter Piper
Picked. If Peter Piper picked a peck of pickled peppers, Where ' s The peck of pickled peppers Peter Piper picked?
Peter Piper picked pinch up a pinch of pickles. Peter Piper picked a pinch of pickles. So where's the pickle Peter Piper picked? 8. I thought a thought. But the thought I thought
Wasn ' t the thought I thought I thought. If The thought I thought I thought had been the thought I thought, I wouldn ' t
There are thought so much. I have an idea, but it's not the kind of thought I thought I had. If this idea was the thought I had thought of, I wouldn't have thought so much.
9. Amid the mists and coldest frosts, with barest wrists and stoutest boasts, He thrusts his fists against the posts,
And still insists he sees the ghosts. Misty, icy frost, the wrist is empty, the words of the son, saw his fist to the pillar smashed, speak to himself to touch the ghost.
10. Badmin was able to beat Bill at billiards and Bill always beat badmin badly at badminton.
Badmin can beat bill at billiards, but playing badminton bill often defeats Badmin. 11. Betty beat a bit of butter to make a better butter.
Betty beats a small piece of butter to make a better creamy face. 12. Rita repeated what Reardon recited when Reardon read the remarks.

Use Lucene query Charfilter to remove script scripts and HTML tags from characters

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.