The Org. apdplat. Word. wordfrequencystatistics class in Word Segmentation provides the word frequency statistics function.
The command line script is called as follows:
Write text that requires Word Frequency Statistics to the file: text.txt chmod + x WFS. Sh & WFS. Sh -textfile=text.txt -statisticsresultfile=statistics-result.txtprogram, open the file statistics-result.txt, and view
The calling method in the program is as follows:
// Set wordfrequencystatistics to wordfrequencystatistics = new wordfrequencystatistics (); wordfrequencystatistics. setremovestopword (false); wordfrequencystatistics. setresultpath ("word-frequency-statistics.txt"); wordfrequencystatistics. setsegmentationalgorithm (segmentationalgorithm. maxngramscore); // start word segmentation wordfrequencystatistics. SEG. du MP (); // prepare the file files. write (paths. get ("text-to-seg.txt"), arrays. aslist ("Word Segmentation is a distributed Chinese Word Segmentation component implemented by Java. It provides a variety of dictionary-Based Word Segmentation Algorithms and uses the Ngram model to eliminate ambiguity. "); // Clear the previous statistical result wordfrequencystatistics. reset (); // word segmentation for the file wordfrequencystatistics. SEG (new file ("text-to-seg.txt"), new file ("text-seg-result.txt"); // output wordfrequencystatistics. dump ("file-seg-statistics-result.txt ");
Word Frequency Statistics In the first sentence:
1. Rain 22, Tomorrow 23, molecular 24, course 15, lecture 16, combined with 17, Atom 18, go to 19, 110, about 111, and 112, but also 113. 114, 115, 1
Word Frequency Statistics in the second sentence:
1. 22, 23, based on 14, word 15, component 16, Dictionary 17, Ngram 18, a variety of 19, 110, and 111, 112, 113, and Chinese characters word Segmentation 114, algorithm 115, 116, distributed 117, 118, 119, model 120, lai121, a 122, Java 1
Use Word Segmentation for Word Frequency Statistics