The installation and basic usage of the jieba word splitting tool have been described in yesterday's blog. The content to be said today is closer to the actual application-reading Chinese information from the text, and using the jieba word segmentation tool for Word Segmentation and part-of-speech tagging.
ExampleCodeAs follows:
# Coding = utf-8import jiebaimport jieba. posseg as container gimport timet1 = time. time () F = open ("t_with_splitter.txt", "R") # Read text string = f. read (). decode ("UTF-8") Words = coding G. cut (string) # perform word segmentation result = "" # The variable for recording the final result for W in words: Result + = STR (W. word) + "/" + STR (W. flag) # add part-of-speech annotation F = open ("t_with_pos_tag.txt", "W") # Save the result to another document F. write (result) F. close () T2 = time. time () print ("Word Segmentation and part of speech tagging completed, time:" + STR (t2-t1) + "seconds. ") # Feedback results