This article mainly introduces the usage of Pythonsmallseg word segmentation, and analyzes Python implementation analysis techniques in the form of examples. For more information, see the following example. Share it with you for your reference. The specific analysis is as follows:
# Encoding = UTF-8 # import psyco # psyco. full () words = [x. rstrip () for x in open ("main. dic ", mode = 'R', encoding = 'utf-8')] from smallseg import SEG seg = SEG () print ('load dict... ') seg. set (words) print ("Dict is OK. ") def cuttest (text): wlist = seg. cut (text) wlist. reverse () tmp = "". join (wlist) print (tmp) print ("================================== ") if _ name __= = "_ main _": cuttest ("This is a dark night without a finger. My name is Sun Wukong. I love Beijing. I love Python and C ++. ") Cuttest (" I don't like Japanese kimono. ") Cuttest (" The Thunder monkey returns to the human world. ") Cuttest ") cuttest ("Yonghe Clothing & Accessories Co., Ltd.") cuttest ("I love Tiananmen Square, Beijing") cuttest ("abc") cuttest ("Hidden Markov ") cuttest ("Ray monkey is a good website") cuttest ("The word" Microsoft "consists of" MICROcomputer "and" SOFTware ") cuttest ("grass mud horse and deception horse is a popular term this year") cuttest ("ito Yang Huatang head office store") cuttest ("Institute of Computing Technology, Chinese Emy of sciences ") cuttest ("Romeo and Juliette") cuttest ("I bought items and costumes ")
Smallseg word segmentation, running on python3.3 is slightly problematic. The py code xrange has been renamed as range in 3. In addition, there is no decode function in 3.
After modifying the above two points, the code will be portable. The effect is acceptable.
I hope this article will help you with Python programming.