A part of Natural Language processing (NLP) was processing text by "tokenizing" Language strings. This means we can break up a string of text to parts by word, sentence, etc. In this lesson, we'll use the natural
library to tokenize a string. First, we'll break the string into words using WordTokenizer
, WordPunctTokenizer
and TreebankWordTokenizer
. Then we'll break the string into sentences using RegexpTokenizer
.
var natural = require ('natural'), new Natural. Wordtokenizer (); Console.log (tokenizer.tokenize("your dog has fleas. " )); //
Tokenizer =NewNatural. Treebankwordtokenizer (); Console.log (Tokenizer.tokenize ("my dog hasn ' t any fleas."));//[' My ', ' dog ', ' have ', ' n\ ' t ', ' any ', ' fleas ', '. ']Tokenizer=NewNatural. Regexptokenizer ({pattern:/\-/}); Console.log (Tokenizer.tokenize ("Flea-dog"));//[' Flea ', ' dog ']Tokenizer=NewNatural. Wordpuncttokenizer (); Console.log (Tokenizer.tokenize ("my dog hasn ' t any fleas."));//[' My ', ' dog ', ' hasn ', ' \ ', ' t ', ' any ', ' fleas ', '. ']
[Javascript Natural] Break up language strings to parts using Natural